World Community Grid - View Thread

World Community Grid Forums

Category: Completed Research

Forum: The Clean Energy Project - Phase 2 Forum

Thread: All WUs erroing out

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 11

[ ]

Author

This topic has been viewed 3599 times and has 10 replies

Albatros010
Cruncher
Joined: Aug 8, 2007
Post Count: 14
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding - Phase 2

14 day badge for Nutritious Rice for the World

90 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

45 day badge for Drug Search for Leishmaniasis

45 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

10 year badge for Mapping Cancer Markers

90 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

180 day badge for Smash Childhood Cancer

2 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


All WUs erroing out

Hi,

first please excuse my poor english.

Since a couple of weeks almost every WU my PC trys to crunch errors out, like this one or this one .

Dont't know what could be the reason. Other projects are working well.

Regards Uli

[Feb 22, 2016 8:34:16 PM]

Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:

1 year badge for The Clean Energy Project - Phase 2

14 day badge for Drug Search for Leishmaniasis

100 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: All WUs erroing out

I'm getting an extra high error rate too. When I look at the WU's history, it's errored on previous users. There is a CEP-2 Beta underway, with work units that are erroring nearly 100% of the time. Most likely this will be fixed soon.

[Feb 23, 2016 1:28:35 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: All WUs erroing out

The current error rate borders on unacceptable. I tend to lose about 2 WUs out of 10 to errors, and they are never zippy 1h WUs but +14h behemoths whose crashes to error... well, let's just say, agitate me "a bit" (lol!).

I think I'll crunch something else in the meantime. Hopefully any problems in the WU production pipeline can be identified and fixed.

----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 26, 2016 11:01:46 PM]

[Feb 26, 2016 10:56:33 PM]

etienne06
Advanced Cruncher
France
Joined: Jun 11, 2009
Post Count: 56
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Fight Childhood Cancer

14 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for The Clean Energy Project - Phase 2

45 day badge for Uncovering Genome Mysteries

90 day badge for Outsmart Ebola Together

90 day badge for FightAIDS@Home - Phase 2

14 day badge for Microbiome Immunity Project


Re: All WUs erroing out

Hello. I agree with Odestoteles : the error rate is unacceptable. Almost all the WU crash into error. Obviously, 1 WU was should be valid. If it is not, I will crunch for other projects than CEP2, waiting for this problem to be fixed.

[Mar 9, 2016 8:33:43 AM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: All WUs erroing out

The links posted to WU distributions and logs likehttps://secure.worldcommunitygrid.org/ms/devi...Log.do?resultId=736360541 are on personal result status pages. We can not see unless you also post your password (course not). So, if you want to provide info, post a copy...then we can read what's on. Also pieces of the event log when the task failed helps to identify client side issues.

[Mar 9, 2016 10:32:55 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: All WUs erroing out

Result Log

Result Name: E236323_ 354_ S.328.Br1C27H20N3O1S3Se1.VCFDOMJJGSXAPC-UHFFFAOYSA-N.13_ s1_ 14_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 293 (0x125)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[10:04:25] Number of jobs = 8
[10:04:25] Starting job 0,CPU time has been restored to 0.000000.
[14:45:31] Finished Job #0
[14:45:31] Starting job 1,CPU time has been restored to 16581.703125.
Error job name too large
14:55:09 (6844): called boinc_finish

</stderr_txt>
]]>

[Mar 9, 2016 3:27:19 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: All WUs erroing out

Result Name: E236322_ 529_ S.330.C36Ge1H26O1S3.IIQIDJWHGYVTHI-UHFFFAOYSA-N.3_ s1_ 14_ 1--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[08:24:29] Number of jobs = 8
[08:24:29] Starting job 0,CPU time has been restored to 0.000000.
Application exited with RC = 0x1
[08:24:31] Finished Job #0
08:24:37 (6360): called boinc_finish

</stderr_txt>
]]>

[Mar 9, 2016 3:28:10 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: All WUs erroing out

Result Name: E236322_ 633_ S.322.C27Ge1H17N3O1S3Se1.AAVXRFWCMMUYJZ-UHFFFAOYSA-N.6_ s1_ 14_ 4--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[14:05:32] Number of jobs = 8
[14:05:32] Starting job 0,CPU time has been restored to 0.000000.
Application exited with RC = 0x1
[14:05:33] Finished Job #0
14:05:39 (5152): called boinc_finish

</stderr_txt>
]]>

[Mar 9, 2016 3:29:26 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: All WUs erroing out

Result Name: E236303_ 758_ S.304.C34H28O3S3.VOXRZONOOMWRSN-UHFFFAOYSA-N.14_ s1_ 14_ 0--
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
[21:23:19] Number of jobs = 8
[21:23:19] Starting job 0,CPU time has been restored to 0.000000.
Quit requested: Exiting
INFO: No state to restore. Start from the beginning.
[10:10:55] Number of jobs = 8
[10:10:55] Starting job 0,CPU time has been restored to 0.000000.
Quit requested: Exiting
INFO: No state to restore. Start from the beginning.
[07:53:05] Number of jobs = 8
[07:53:05] Starting job 0,CPU time has been restored to 0.000000.
[18:30:00] Finished Job #0
[18:30:00] Starting job 1,CPU time has been restored to 37680.187500.
[18:40:28] Finished Job #1
[18:40:28] Starting job 2,CPU time has been restored to 38305.453125.
[18:50:06] Finished Job #2
[18:50:06] Starting job 3,CPU time has been restored to 38874.125000.
[19:01:36] Finished Job #3
[19:01:36] Starting job 4,CPU time has been restored to 39560.265625.
[19:08:46] Finished Job #4
[19:08:46] Starting job 5,CPU time has been restored to 39986.046875.
[19:13:35] Finished Job #5
[19:13:35] Starting job 6,CPU time has been restored to 40270.734375.
Application exited with RC = 0x1
[20:41:20] Finished Job #6
[20:41:20] Starting job 7,CPU time has been restored to 45510.078125.
[20:41:20] Skipping Job #7
20:41:24 (6356): called boinc_finish

</stderr_txt>
]]>

[Mar 9, 2016 3:31:48 PM]

SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline


Re: All WUs erroing out

OK, the 195 error was something that got highlighted during the beta and uplinger replied to this along the line of "we're testing this issue on our own cluster and *hope* to remove the issue from production" [find his post to see the exact wording].

The 293 is rarer... there's no minus sign front of the error number, meaning it's something on [your] system. My hunting index lists it as "Exit Code 293: Finish file present, job name too long", which suggests the ol race condition... the output file was written, then another attempt was made [don't know why], but the previous one somehow not released. You'd have to find armstrdj posts for how he described it. Suspect the race conditions develops during a system overload situation. The very big contributor account I draw large stats data from has 6 listed on the RS pages and 3 of them have credit [end of the replication cycle of 5] i.e. it's a known issue, but below the panic room button threshold.

----------------------------------------
[Edit 3 times, last edit by SekeRob* at Mar 9, 2016 4:59:41 PM]

[Mar 9, 2016 4:55:59 PM]

[ ]