| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 28
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It's good news that the researchers will at least end up with valid results from those particular workunits. Thanks for the update.
|
||
|
|
littlepeaks
Veteran Cruncher USA Joined: Apr 28, 2007 Post Count: 748 Status: Offline Project Badges:
|
OK, I got a similar problem last night. I errored out on the first task (job #0) with a 0x1, everyone else errored out with a 0x1 or 0x100 on the first task.
E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 7-- 640 Pending Validation 9/26/13 17:19:20 9/26/13 17:30:43 0.15 3.6 / 0.0 E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 6-- - In Progress 9/26/13 17:16:40 9/29/13 17:16:40 0.00 0.0 / 0.0 E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 4-- 640 Error 9/26/13 15:50:59 9/26/13 17:03:14 0.14 5.6 / 0.0 E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 5-- 640 Error 9/26/13 15:50:42 9/26/13 16:50:27 0.23 4.2 / 0.0 E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 3-- 640 Error 9/26/13 14:04:32 9/26/13 14:23:34 0.24 7.2 / 0.0 E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 2-- 640 Error 9/26/13 13:57:03 9/26/13 15:43:54 0.18 4.2 / 0.0 E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 1-- 640 Error 9/26/13 08:19:12 9/26/13 12:26:47 0.18 4.3 / 0.0 E215793_ 616_ I.26.C17H7N7O2.00265778.2.set1d06_ 0-- 640 Error 9/26/13 08:14:01 9/26/13 13:42:18 0.17 4.7 / 0.0 I am copy number 0. The PV (Copy # _7) got a 0x100 on job 0. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yep, getting the same here. Looks like a bad batch.
Result Log Result Name: E215751_ 782_ I.23.C15F3H6N3O2.00214865.3.set1d06_ 3-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [01:39:50] Number of jobs = 16 [01:39:50] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [01:54:02] Finished Job #0 [01:54:02] Starting job 1,CPU time has been restored to 841.718996. [01:54:02] Skipping Job #1 [01:54:02] Starting job 2,CPU time has been restored to 841.718996. [01:54:02] Skipping Job #2 [01:54:02] Starting job 3,CPU time has been restored to 841.718996. Snip................................... [01:54:02] Skipping Job #13 [01:54:02] Starting job 14,CPU time has been restored to 841.718996. [01:54:02] Skipping Job #14 [01:54:02] Starting job 15,CPU time has been restored to 841.718996. [01:54:02] Skipping Job #15 01:54:02 (4528): called boinc_finish </stderr_txt> ]]> Return to Top |
||
|
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges:
|
I see a lot of WUs failing recently. Many of them have multiple resends and most are failing.
----------------------------------------E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 8-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 7-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 5-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 6-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 3-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 4-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 2-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 1-- 640 Error E215758_ 214_ I.23.C17H7N5S.00268758.4.set1d06_ 0-- 640 Error ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Maybe the tasks are ok, but very short and system consider them as error. I got E215802_952_I.27.C17F3H6N5O2.00202347.0.set1d06 with time 0.09hr, exited RC = 0x1 and other 15 jobs were skipped.
----------------------------------------[Edit 1 times, last edit by Former Member at Sep 27, 2013 12:33:13 PM] |
||
|
|
kateiacy
Veteran Cruncher USA Joined: Jan 23, 2010 Post Count: 1027 Status: Offline Project Badges:
|
I've had a half dozen in the last couple of days in which all wingmen had errors, for example:
----------------------------------------Workunit Status Project Name: The Clean Energy Project - Phase 2 Created: 09/24/2013 12:45:08 Name: E215774_571_I.23.C17H11N5S.00224467.2.set1d06 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 7-- 640 Error 9/26/13 21:41:38 9/26/13 22:39:19 0.18 3.7 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 8-- 640 Error 9/26/13 21:39:16 9/27/13 13:50:54 0.26 3.6 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 5-- 640 Error 9/26/13 14:53:15 9/26/13 21:27:53 0.22 4.8 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 6-- 640 Error 9/26/13 14:47:34 9/26/13 15:05:51 0.18 2.7 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 4-- 640 Error 9/25/13 17:55:55 9/25/13 18:32:50 0.19 3.0 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 3-- 640 Error 9/25/13 17:55:47 9/26/13 14:41:07 0.26 4.4 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 2-- 640 Error 9/25/13 13:26:58 9/25/13 17:51:58 0.24 3.8 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 1-- 640 Error 9/25/13 13:14:26 9/25/13 15:17:28 0.28 4.9 / 0.0 E215774_ 571_ I.23.C17H11N5S.00224467.2.set1d06_ 0-- 640 Error 9/25/13 12:48:31 9/25/13 13:06:18 0.25 3.9 / 0.0 As others have seen, some of the errors are RC = 0x1 and others are RC = 0x100. ![]() |
||
|
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18667 Status: Offline Project Badges:
|
I had a couple of WUs error out today. Both had the Application exited with RC = 0x1 about 20-30 minutes into job #0. I have to think this is a different problem from that described in this thread initially. Those WUs were running more normal times and the problem was in validation, not in the crunching.
---------------------------------------- |
||
|
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges:
|
I have tons of such WUs, but nobody seems to care about that...
----------------------------------------![]() |
||
|
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges:
|
We are aware of the increase in work units failing for CEP2 and are working with the Harvard team to resolve the issue. The problem is that some work units cause a fatal error in the Q-Chem code. Ideally these work units work units would be identified ahead of time, but if that proves impossible we will make sure this is handled on the validation side. Until a more permanent solution can be found, work units that experience this problem are manually being given credit.
Seippel |
||
|
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges:
|
We are aware of the increase in work units failing for CEP2 and are working with the Harvard team to resolve the issue. The problem is that some work units cause a fatal error in the Q-Chem code. Ideally these work units work units would be identified ahead of time, but if that proves impossible we will make sure this is handled on the validation side. Until a more permanent solution can be found, work units that experience this problem are manually being given credit. Seippel Thanks for the ACK. ![]() |
||
|
|
|