Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3554 times and has 10 replies Next Thread
Albatros010
Cruncher
Joined: Aug 8, 2007
Post Count: 14
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
All WUs erroing out

Hi,

first please excuse my poor english.

Since a couple of weeks almost every WU my PC trys to crunch errors out, like this one or this one .

Dont't know what could be the reason. Other projects are working well.

Regards Uli
[Feb 22, 2016 8:34:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

I'm getting an extra high error rate too. When I look at the WU's history, it's errored on previous users. There is a CEP-2 Beta underway, with work units that are erroring nearly 100% of the time. Most likely this will be fixed soon.
[Feb 23, 2016 1:28:35 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

The current error rate borders on unacceptable. I tend to lose about 2 WUs out of 10 to errors, and they are never zippy 1h WUs but +14h behemoths whose crashes to error... well, let's just say, agitate me "a bit" (lol!).

I think I'll crunch something else in the meantime. Hopefully any problems in the WU production pipeline can be identified and fixed.
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 26, 2016 11:01:46 PM]
[Feb 26, 2016 10:56:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
etienne06
Advanced Cruncher
France
Joined: Jun 11, 2009
Post Count: 56
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

Hello. I agree with Odestoteles : the error rate is unacceptable. Almost all the WU crash into error. Obviously, 1 WU was should be valid. If it is not, I will crunch for other projects than CEP2, waiting for this problem to be fixed.
[Mar 9, 2016 8:33:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

The links posted to WU distributions and logs likehttps://secure.worldcommunitygrid.org/ms/devi...Log.do?resultId=736360541 are on personal result status pages. We can not see unless you also post your password (course not). So, if you want to provide info, post a copy...then we can read what's on. Also pieces of the event log when the task failed helps to identify client side issues.
[Mar 9, 2016 10:32:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

Result Log

Result Name: E236323_ 354_ S.328.Br1C27H20N3O1S3Se1.VCFDOMJJGSXAPC-UHFFFAOYSA-N.13_ s1_ 14_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 293 (0x125)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[10:04:25] Number of jobs = 8
[10:04:25] Starting job 0,CPU time has been restored to 0.000000.
[14:45:31] Finished Job #0
[14:45:31] Starting job 1,CPU time has been restored to 16581.703125.
Error job name too large
14:55:09 (6844): called boinc_finish

</stderr_txt>
]]>
[Mar 9, 2016 3:27:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

Result Name: E236322_ 529_ S.330.C36Ge1H26O1S3.IIQIDJWHGYVTHI-UHFFFAOYSA-N.3_ s1_ 14_ 1--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[08:24:29] Number of jobs = 8
[08:24:29] Starting job 0,CPU time has been restored to 0.000000.
Application exited with RC = 0x1
[08:24:31] Finished Job #0
08:24:37 (6360): called boinc_finish

</stderr_txt>
]]>
[Mar 9, 2016 3:28:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

Result Name: E236322_ 633_ S.322.C27Ge1H17N3O1S3Se1.AAVXRFWCMMUYJZ-UHFFFAOYSA-N.6_ s1_ 14_ 4--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[14:05:32] Number of jobs = 8
[14:05:32] Starting job 0,CPU time has been restored to 0.000000.
Application exited with RC = 0x1
[14:05:33] Finished Job #0
14:05:39 (5152): called boinc_finish

</stderr_txt>
]]>
[Mar 9, 2016 3:29:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

Result Name: E236303_ 758_ S.304.C34H28O3S3.VOXRZONOOMWRSN-UHFFFAOYSA-N.14_ s1_ 14_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[21:23:19] Number of jobs = 8
[21:23:19] Starting job 0,CPU time has been restored to 0.000000.
Quit requested: Exiting
INFO: No state to restore. Start from the beginning.
[10:10:55] Number of jobs = 8
[10:10:55] Starting job 0,CPU time has been restored to 0.000000.
Quit requested: Exiting
INFO: No state to restore. Start from the beginning.
[07:53:05] Number of jobs = 8
[07:53:05] Starting job 0,CPU time has been restored to 0.000000.
[18:30:00] Finished Job #0
[18:30:00] Starting job 1,CPU time has been restored to 37680.187500.
[18:40:28] Finished Job #1
[18:40:28] Starting job 2,CPU time has been restored to 38305.453125.
[18:50:06] Finished Job #2
[18:50:06] Starting job 3,CPU time has been restored to 38874.125000.
[19:01:36] Finished Job #3
[19:01:36] Starting job 4,CPU time has been restored to 39560.265625.
[19:08:46] Finished Job #4
[19:08:46] Starting job 5,CPU time has been restored to 39986.046875.
[19:13:35] Finished Job #5
[19:13:35] Starting job 6,CPU time has been restored to 40270.734375.
Application exited with RC = 0x1
[20:41:20] Finished Job #6
[20:41:20] Starting job 7,CPU time has been restored to 45510.078125.
[20:41:20] Skipping Job #7
20:41:24 (6356): called boinc_finish

</stderr_txt>
]]>
[Mar 9, 2016 3:31:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: All WUs erroing out

OK, the 195 error was something that got highlighted during the beta and uplinger replied to this along the line of "we're testing this issue on our own cluster and *hope* to remove the issue from production" [find his post to see the exact wording].

The 293 is rarer... there's no minus sign front of the error number, meaning it's something on [your] system. My hunting index lists it as "Exit Code 293: Finish file present, job name too long", which suggests the ol race condition... the output file was written, then another attempt was made [don't know why], but the previous one somehow not released. You'd have to find armstrdj posts for how he described it. Suspect the race conditions develops during a system overload situation. The very big contributor account I draw large stats data from has 6 listed on the RS pages and 3 of them have credit [end of the replication cycle of 5] i.e. it's a known issue, but below the panic room button threshold.
----------------------------------------
[Edit 3 times, last edit by SekeRob* at Mar 9, 2016 4:59:41 PM]
[Mar 9, 2016 4:55:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread