| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
Albatros010
Cruncher Joined: Aug 8, 2007 Post Count: 14 Status: Offline Project Badges:
|
|
||
|
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 452 Status: Offline Project Badges:
|
I'm getting an extra high error rate too. When I look at the WU's history, it's errored on previous users. There is a CEP-2 Beta underway, with work units that are erroring nearly 100% of the time. Most likely this will be fixed soon.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The current error rate borders on unacceptable. I tend to lose about 2 WUs out of 10 to errors, and they are never zippy 1h WUs but +14h behemoths whose crashes to error... well, let's just say, agitate me "a bit" (lol!).
----------------------------------------I think I'll crunch something else in the meantime. Hopefully any problems in the WU production pipeline can be identified and fixed. [Edit 1 times, last edit by Former Member at Feb 26, 2016 11:01:46 PM] |
||
|
|
etienne06
Advanced Cruncher France Joined: Jun 11, 2009 Post Count: 56 Status: Offline Project Badges:
|
Hello. I agree with Odestoteles : the error rate is unacceptable. Almost all the WU crash into error. Obviously, 1 WU was should be valid. If it is not, I will crunch for other projects than CEP2, waiting for this problem to be fixed.
|
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
The links posted to WU distributions and logs likehttps://secure.worldcommunitygrid.org/ms/devi...Log.do?resultId=736360541 are on personal result status pages. We can not see unless you also post your password (course not). So, if you want to provide info, post a copy...then we can read what's on. Also pieces of the event log when the task failed helps to identify client side issues.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Result Log
Result Name: E236323_ 354_ S.328.Br1C27H20N3O1S3Se1.VCFDOMJJGSXAPC-UHFFFAOYSA-N.13_ s1_ 14_ 0-- <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code 293 (0x125) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [10:04:25] Number of jobs = 8 [10:04:25] Starting job 0,CPU time has been restored to 0.000000. [14:45:31] Finished Job #0 [14:45:31] Starting job 1,CPU time has been restored to 16581.703125. Error job name too large 14:55:09 (6844): called boinc_finish </stderr_txt> ]]> |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Result Name: E236322_ 529_ S.330.C36Ge1H26O1S3.IIQIDJWHGYVTHI-UHFFFAOYSA-N.3_ s1_ 14_ 1--
<core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [08:24:29] Number of jobs = 8 [08:24:29] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [08:24:31] Finished Job #0 08:24:37 (6360): called boinc_finish </stderr_txt> ]]> |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Result Name: E236322_ 633_ S.322.C27Ge1H17N3O1S3Se1.AAVXRFWCMMUYJZ-UHFFFAOYSA-N.6_ s1_ 14_ 4--
<core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code 195 (0xc3) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [14:05:32] Number of jobs = 8 [14:05:32] Starting job 0,CPU time has been restored to 0.000000. Application exited with RC = 0x1 [14:05:33] Finished Job #0 14:05:39 (5152): called boinc_finish </stderr_txt> ]]> |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Result Name: E236303_ 758_ S.304.C34H28O3S3.VOXRZONOOMWRSN-UHFFFAOYSA-N.14_ s1_ 14_ 0--
<core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [21:23:19] Number of jobs = 8 [21:23:19] Starting job 0,CPU time has been restored to 0.000000. Quit requested: Exiting INFO: No state to restore. Start from the beginning. [10:10:55] Number of jobs = 8 [10:10:55] Starting job 0,CPU time has been restored to 0.000000. Quit requested: Exiting INFO: No state to restore. Start from the beginning. [07:53:05] Number of jobs = 8 [07:53:05] Starting job 0,CPU time has been restored to 0.000000. [18:30:00] Finished Job #0 [18:30:00] Starting job 1,CPU time has been restored to 37680.187500. [18:40:28] Finished Job #1 [18:40:28] Starting job 2,CPU time has been restored to 38305.453125. [18:50:06] Finished Job #2 [18:50:06] Starting job 3,CPU time has been restored to 38874.125000. [19:01:36] Finished Job #3 [19:01:36] Starting job 4,CPU time has been restored to 39560.265625. [19:08:46] Finished Job #4 [19:08:46] Starting job 5,CPU time has been restored to 39986.046875. [19:13:35] Finished Job #5 [19:13:35] Starting job 6,CPU time has been restored to 40270.734375. Application exited with RC = 0x1 [20:41:20] Finished Job #6 [20:41:20] Starting job 7,CPU time has been restored to 45510.078125. [20:41:20] Skipping Job #7 20:41:24 (6356): called boinc_finish </stderr_txt> ]]> |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
OK, the 195 error was something that got highlighted during the beta and uplinger replied to this along the line of "we're testing this issue on our own cluster and *hope* to remove the issue from production" [find his post to see the exact wording].
----------------------------------------The 293 is rarer... there's no minus sign front of the error number, meaning it's something on [your] system. My hunting index lists it as "Exit Code 293: Finish file present, job name too long", which suggests the ol race condition... the output file was written, then another attempt was made [don't know why], but the previous one somehow not released. You'd have to find armstrdj posts for how he described it. Suspect the race conditions develops during a system overload situation. The very big contributor account I draw large stats data from has 6 listed on the RS pages and 3 of them have credit [end of the replication cycle of 5] i.e. it's a known issue, but below the panic room button threshold. [Edit 3 times, last edit by SekeRob* at Mar 9, 2016 4:59:41 PM] |
||
|
|
|