| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 15
|
|
| Author |
|
|
tombell12
Advanced Cruncher Australia Joined: Oct 8, 2009 Post Count: 87 Status: Offline Project Badges:
|
Wondering why my particular workunit errored out despite completing almost 12 hours without anything obvious going wrong. This is from the workunit status page of workunit "E235158_918_S.298.C38H26N4O2.QGTIVOPIQVYRAQ-UHFFFAOYSA-N.11_s1_14", my copy is in bold.
E235158_ 918_ S.298.C38H26N4O2.QGTIVOPIQVYRAQ-UHFFFAOYSA-N.11_ s1_ 14_ 3-- 700 Valid 9/12/15 15:11:00 10/12/15 04:36:16 13.21 399.4 / 445.6 E235158_ 918_ S.298.C38H26N4O2.QGTIVOPIQVYRAQ-UHFFFAOYSA-N.11_ s1_ 14_ 2-- 700 Error 8/12/15 08:14:32 9/12/15 14:24:51 11.65 117.9 / 0.0 E235158_ 918_ S.298.C38H26N4O2.QGTIVOPIQVYRAQ-UHFFFAOYSA-N.11_ s1_ 14_ 1-- 700 Valid 8/12/15 08:09:44 9/12/15 03:15:09 18.00 491.7 / 445.6 E235158_ 918_ S.298.C38H26N4O2.QGTIVOPIQVYRAQ-UHFFFAOYSA-N.11_ s1_ 14_ 0-- 700 Error 8/12/15 08:05:54 8/12/15 08:08:33 0.00 383.2 / 0.0 And here is the actual log from the workunit itself: <core_client_version>7.2.31</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [18:45:47] Number of jobs = 8 [18:45:47] Starting job 0,CPU time has been restored to 0.000000. [07:38:44] Finished Job #0 [07:38:44] Starting job 1,CPU time has been restored to 21456.611541. [17:47:14] Finished Job #1 [17:47:14] Starting job 2,CPU time has been restored to 22524.407186. [18:09:30] Finished Job #2 [18:09:30] Starting job 3,CPU time has been restored to 23802.554580. [18:32:52] Finished Job #3 [18:32:52] Starting job 4,CPU time has been restored to 25137.018334. [18:48:58] Finished Job #4 [18:48:58] Starting job 5,CPU time has been restored to 26068.469104. [19:03:44] Finished Job #5 [19:03:44] Starting job 6,CPU time has been restored to 26940.998298. Application exited with RC = 0x1 [23:16:18] Finished Job #6 [23:16:18] Starting job 7,CPU time has been restored to 41927.780366. [23:16:18] Skipping Job #7 23:16:23 (13736): called boinc_finish </stderr_txt> ]]> No obvious errors in log, still errored out with no credit while others with similar results considered valid (one timed out at the 18hr limit, still valid??). Can it be explained where my copy went out of whack? ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Me too!
The volunteer who received it after my failure is working on it so maybe he and the others will fail too instead of succed and mine isn't the same scenario as yours, let's see. Project Name: The Clean Energy Project - Phase 2 Created: 12/07/2015 10:40:06 Name: E235171_471_S.302.C36H24N2S3.XICVLFGYQUMYSL-UHFFFAOYSA-N.1_s1_14 Minimum Quorum: 1 Replication: 1 Result Name: E235171_ 471_ S.302.C36H24N2S3.XICVLFGYQUMYSL-UHFFFAOYSA-N.1_ s1_ 14_ 0-- <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [13:57:28] Number of jobs = 8 [13:57:28] Starting job 0,CPU time has been restored to 0.000000. [13:57:28] Starting new Job [13:57:28] Qink name = fldman [13:57:29] Qink name = gesman [13:57:30] Qink name = scfman [14:48:32] Qink name = anlman [14:48:32] Qink name = drvman [14:51:09] Qink name = optman [14:51:10] Qink name = fldman [14:51:10] Qink name = gesman [14:51:11] Qink name = scfman [15:08:26] Qink name = anlman [15:08:26] Qink name = drvman [15:10:58] Qink name = optman [15:10:58] Qink name = fldman [15:10:58] Qink name = gesman [15:11:00] Qink name = scfman [15:27:51] Qink name = anlman [15:27:51] Qink name = drvman [15:30:23] Qink name = optman [15:30:23] Qink name = fldman [15:30:23] Qink name = gesman [15:30:25] Qink name = scfman [15:47:24] Qink name = anlman [15:47:24] Qink name = drvman [15:49:57] Qink name = optman [15:49:57] Qink name = fldman [15:49:57] Qink name = gesman [15:49:59] Qink name = scfman [16:06:44] Qink name = anlman [16:06:44] Qink name = drvman [16:09:15] Qink name = optman [16:09:15] Qink name = fldman [16:09:15] Qink name = gesman [16:09:17] Qink name = scfman [16:25:13] Qink name = anlman [16:25:13] Qink name = drvman [16:27:42] Qink name = optman [16:27:42] Qink name = fldman [16:27:42] Qink name = gesman [16:27:44] Qink name = scfman [16:44:37] Qink name = anlman [16:44:37] Qink name = drvman [16:47:10] Qink name = optman [16:47:10] Qink name = fldman [16:47:10] Qink name = gesman [16:47:12] Qink name = scfman [17:04:22] Qink name = anlman [17:04:22] Qink name = drvman [17:07:11] Qink name = optman [17:07:12] Qink name = fldman [17:07:12] Qink name = gesman [17:07:13] Qink name = scfman [17:22:31] Qink name = anlman [17:22:31] Qink name = drvman [17:25:05] Qink name = optman [17:25:05] Qink name = fldman [17:25:05] Qink name = gesman [17:25:07] Qink name = scfman [17:38:43] Qink name = anlman [17:38:43] Qink name = drvman [17:41:15] Qink name = optman [17:41:16] Qink name = fldman [17:41:16] Qink name = gesman [17:41:17] Qink name = scfman [17:56:21] Qink name = anlman [17:56:21] Qink name = drvman [17:59:07] Qink name = optman [17:59:07] Qink name = fldman [17:59:07] Qink name = gesman [17:59:09] Qink name = scfman [18:13:25] Qink name = anlman [18:13:25] Qink name = drvman [18:16:23] Qink name = optman [18:16:23] Qink name = fldman [18:16:23] Qink name = gesman [18:16:25] Qink name = scfman [18:28:54] Qink name = anlman [18:28:54] Qink name = drvman [18:31:24] Qink name = optman [18:31:24] Qink name = fldman [18:31:24] Qink name = gesman [18:31:26] Qink name = scfman [18:43:01] Qink name = anlman [18:43:01] Qink name = drvman [18:45:33] Qink name = optman [18:45:33] Qink name = fldman [18:45:33] Qink name = gesman [18:45:35] Qink name = scfman [18:57:30] Qink name = anlman [18:57:30] Qink name = drvman [19:00:03] Qink name = optman [19:00:03] Qink name = fldman [19:00:03] Qink name = gesman [19:00:04] Qink name = scfman [19:11:13] Qink name = anlman [19:11:13] Qink name = drvman [19:13:45] Qink name = optman [19:13:45] Qink name = anlman [19:15:48] End of Job [19:15:49] Finished Job #0 [19:15:49] Starting job 1,CPU time has been restored to 17577.720000. [19:15:50] Starting new Job [19:15:50] Qink name = fldman [19:15:51] Qink name = gesman [19:15:51] Qink name = scfman [19:31:15] Qink name = anlman [19:33:17] End of Job [19:33:19] Finished Job #1 [19:33:19] Starting job 2,CPU time has been restored to 18572.128000. [19:33:19] Starting new Job [19:33:19] Qink name = fldman [19:33:20] Qink name = gesman [19:33:21] Qink name = scfman [19:46:47] Qink name = anlman [19:48:41] End of Job [19:48:42] Finished Job #2 [19:48:42] Starting job 3,CPU time has been restored to 19443.780000. [19:48:42] Starting new Job [19:48:42] Qink name = fldman [19:48:43] Qink name = gesman [19:48:44] Qink name = scfman [20:07:01] Qink name = anlman [20:08:54] End of Job [20:08:56] Finished Job #3 [20:08:56] Starting job 4,CPU time has been restored to 20625.900000. [20:08:56] Starting new Job [20:08:56] Qink name = fldman [20:08:58] Qink name = gesman [20:08:58] Qink name = scfman [20:20:56] Qink name = anlman [20:22:51] End of Job [20:22:53] Finished Job #4 [20:22:53] Starting job 5,CPU time has been restored to 21435.456000. [20:22:53] Starting new Job [20:22:53] Qink name = fldman [20:22:54] Qink name = gesman [20:22:55] Qink name = scfman [20:30:28] Qink name = anlman [20:34:19] End of Job [20:34:20] Finished Job #5 [20:34:20] Starting job 6,CPU time has been restored to 22107.544000. [20:34:20] Starting new Job [20:34:20] Qink name = fldman [20:34:28] Qink name = gesman [20:34:30] Qink name = scfman Application exited with RC = 0x100 [23:45:27] Finished Job #6 [23:45:27] Starting job 7,CPU time has been restored to 33302.508000. [23:45:27] Skipping Job #7 23:45:31 (3118): called boinc_finish </stderr_txt> ]]> |
||
|
|
tombell12
Advanced Cruncher Australia Joined: Oct 8, 2009 Post Count: 87 Status: Offline Project Badges:
|
Me too! It's interesting hey? It kinda comes off as a "phantom" error of sorts. Like something goes wrong but no obvious indicator as to where or why The volunteer who received it after my failure is working on it so maybe he and the others will fail too instead of succed and mine isn't the same scenario as yours, let's see. ![]() [Edit 1 times, last edit by tombell12 at Dec 11, 2015 9:28:04 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yep! Let's see if someone (probably sekerob) knows someting about it.
|
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Offline Project Badges:
|
tombell12, although you got the 'Error' status, I'm wondering what your Minimum Quorum and Replication is for said WU:
E235158_ 918_ S.298.C38H26N4O2.QGTIVOPIQVYRAQ-UHFFFAOYSA-N.11_ s1_ 14_ 2-- 700 Error 8/12/15 08:14:32 9/12/15 14:24:51 11.65 117.9 / 0.0 |
||
|
|
tombell12
Advanced Cruncher Australia Joined: Oct 8, 2009 Post Count: 87 Status: Offline Project Badges:
|
tombell12, although you got the 'Error' status, I'm wondering what your Minimum Quorum and Replication is for said WU: E235158_ 918_ S.298.C38H26N4O2.QGTIVOPIQVYRAQ-UHFFFAOYSA-N.11_ s1_ 14_ 2-- 700 Error 8/12/15 08:14:32 9/12/15 14:24:51 11.65 117.9 / 0.0 The workunit status data has disappeared now but I do recall it saying 2 for both those values ![]() [Edit 1 times, last edit by tombell12 at Dec 11, 2015 8:41:09 PM] |
||
|
|
tombell12
Advanced Cruncher Australia Joined: Oct 8, 2009 Post Count: 87 Status: Offline Project Badges:
|
Yep! Let's see if someone (probably sekerob) knows someting about it. Your log is interesting though, it has all these "Qink name" values ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yep! Let's see if someone (probably sekerob) knows someting about it. Your log is interesting though, it has all these "Qink name" values ![]() I don't know what it means, do you? It appears in log files of all my WUs crunched under Linux, I don't recall having ever seen it in the ones under Windows. Meanwhile the WUs has been sent to four other volunteer: three of them had already reported it and all got an error but none of them shows error in result log. Every once in a while such a WU appears and someone on this forum calls it a toxic one. |
||
|
|
tombell12
Advanced Cruncher Australia Joined: Oct 8, 2009 Post Count: 87 Status: Offline Project Badges:
|
I don't know what it means, do you? It appears in log files of all my WUs crunched under Linux, I don't recall having ever seen it in the ones under Windows. Meanwhile the WUs has been sent to four other volunteer: three of them had already reported it and all got an error but none of them shows error in result log. Every once in a while such a WU appears and someone on this forum calls it a toxic one. Only obvious thing is that it would pertain to your Linux configuration. I've been stuck on such a "toxic" WU which just cuts out after 5 copies. I read that apparently those WU's get put on some "investigation" list. |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
TME's, (Too Many Errors) give credit for time, but in a highly inconsistent fashion. Testing against a very big cruncher account and see 9 listed at this time with the 18:00:01 hour mark, none with credit, 6 of which are of previous stats period i.e. they got returned before last night 00:06. The program takes a snapshot of all results each period end which then allows to track back which got moved off [without credit for time]. Crunched they were, even in futility, they inform of that fact! Maybe they get re-crunched on a powerhouse cluster of the Harvard team, but that's all likely only when statistics tell them the mol is of greater interest.
----------------------------------------[Edit 1 times, last edit by SekeRob* at Dec 14, 2015 9:34:42 AM] |
||
|
|
|