World Community Grid - View Thread - This WU is Quite Strange

World Community Grid Forums

Category: Completed Research

Forum: Help Cure Muscular Dystrophy - Phase 2 Forum

Thread: This WU is Quite Strange

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 13

[ ]

Author

This topic has been viewed 22035 times and has 12 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


This WU is Quite Strange

I don't believe I have ever seen this before but in this case a picture is worth a thousand words. I let this run just out of curiosity and here is further progress and finally later progress. This is on an AMD/Linux system.
The WU finished at 100% and when uploaded to WCG resulted in an error. The error log indicates I processed this WU twice. How is this possible?
Result Log

Result Name: CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 1--
<core_client_version>6.2.15</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish
called boinc_finish

</stderr_txt>
]]>
I did reboot the system and as I recall, the WU was around 98% to 99% complete. The BOINC client is 6.2.15 for Linux. Really weird.
Edit: Stupid keyboard smile

----------------------------------------
[Edit 2 times, last edit by Former Member at Jul 18, 2009 4:48:23 AM]

[Jul 18, 2009 12:16:24 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: This WU is Quite Stage

Hmm, I don't know about the error.
But however, it going past 100% isn't a big deal, it's happened before, it's an issue related to the WU trying to reach the next point in the WU or docking or such (sorry for lame info, been busy lately.) Don't think it going past 100% caused the error, haven't heard about it doing that.

[Jul 18, 2009 3:16:11 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: This WU is Quite Strange

Hi ShadowJ,
Thanks for the reply. Yes this one was weird in that my error log showed two "called boinc_finish" whereas only one usually occurs. I suspect a timing glitch or something that caused the WU to be ready to report about the time I shut the system down but when it was restarted, the WU thought it was a new one again and restarted. Definitely strange from what I have seen in the past.

----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 18, 2009 4:49:23 AM]

[Jul 18, 2009 4:45:43 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: This WU is Quite Strange

STARBASEn,

Did you compare the log with a successful result [guess so]? If so what are the findings on the log entry differences?

Your 1st screenshot indicates all results are unique, other than that the picture is not telling me many words, other than the % going over 100, which happens at times since BOINC is not a very good progress timekeeper... how could it on non-deterministic calculations... I've seen 900% and up on jobs that when wrapped up reverted back to 100% and HCMD2 jobs run up to 4 hours and longer ;>)

Processing twice, well not reported twice for sure, else you'd see different message in the BOINC log. Jobs do resume from a prior checkpoint though at times, mostly on system/client restarts. 99.99999% of the cases it never does it by itself.

Blaming the keyboard? Sure you can laughing

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at Jul 18, 2009 9:50:54 AM]

[Jul 18, 2009 7:39:54 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: This WU is Quite Strange

STARBASEn is correct, a double boinc_finish is unusual. Resuming from checkpoint, the INFO line is repeated. The only other HCMD2 message that is commonly seen is: Finishing early because max runtime has been exceeded.14430.680299

STARBASEn, how did your quorum peers fare with this work unit? If they failed, then there is a problem with the work unit. Otherwise, we can only assume your computer threw a cog somewhere.

[Jul 18, 2009 8:21:29 AM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: This WU is Quite Strange

Now I wonder. There have been a few cases of exceed over 100%, but can't remember them ending having been reported as ending in error. The once I've had did not topple. Most odd comparing with me own result logs.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[Jul 18, 2009 9:54:57 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: This WU is Quite Strange

STARBASEn, how did your quorum peers fare with this work unit? If they failed, then there is a problem with the work unit. Otherwise, we can only assume your computer threw a cog somewhere.

Here are the quorum results:

Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 2-- 614 Valid 7/17/09 23:30:59 7/18/09 07:56:06 1.89 23.1 / 22.6
CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 1-- 614 Error 7/17/09 02:18:25 7/17/09 23:27:32 3.49 50.6 / 0.0
CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 0-- 614 Valid 7/17/09 02:12:27 7/17/09 06:23:09 2.17 22.1 / 22.6

And here are the valid return logs from the two successful wingmen. These were not yet completed when I posted this last night:

Result Log

Result Name: CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 2--
<core_client_version>6.2.15</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing Platform.
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
]]>

Result Log

Result Name: CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 0--
<core_client_version>6.4.5</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
]]>

It appears that the WU is fine, just mine hic upped somewhere.
Edit: Stupid me this time smile

----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 18, 2009 5:33:39 PM]

[Jul 18, 2009 5:28:02 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: This WU is Quite Strange

Let us know if you notice this happening again.

[Jul 18, 2009 9:00:17 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: This WU is Quite Strange

Will do and Thanks.

[Jul 18, 2009 10:14:03 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: This WU is Quite Strange

There was a report at the Berkeley forum by Aurora Borealis of a case where pausing at 100%, ending in error:

snippet

Result Name: CMD2_ 0017-MYH1.clustersOccur-MYH2A.clustersOccur_ 866_ 838043_ 838215_ 2--

<core_client_version>6.6.31</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish
called boinc_finish

</stderr_txt>
]]>

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[Jul 20, 2009 6:20:49 PM]

[ ]