Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 13
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I don't believe I have ever seen this before but in this case a picture is worth a thousand words. I let this run just out of curiosity and here is further progress and finally later progress. This is on an AMD/Linux system.
----------------------------------------The WU finished at 100% and when uploaded to WCG resulted in an error. The error log indicates I processed this WU twice. How is this possible? Result Log Result Name: CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 1-- <core_client_version>6.2.15</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. called boinc_finish called boinc_finish </stderr_txt> ]]> I did reboot the system and as I recall, the WU was around 98% to 99% complete. The BOINC client is 6.2.15 for Linux. Really weird. Edit: Stupid keyboard ![]() [Edit 2 times, last edit by Former Member at Jul 18, 2009 4:48:23 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hmm, I don't know about the error.
But however, it going past 100% isn't a big deal, it's happened before, it's an issue related to the WU trying to reach the next point in the WU or docking or such (sorry for lame info, been busy lately.) Don't think it going past 100% caused the error, haven't heard about it doing that. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi ShadowJ,
----------------------------------------Thanks for the reply. Yes this one was weird in that my error log showed two "called boinc_finish" whereas only one usually occurs. I suspect a timing glitch or something that caused the WU to be ready to report about the time I shut the system down but when it was restarted, the WU thought it was a new one again and restarted. Definitely strange from what I have seen in the past. [Edit 1 times, last edit by Former Member at Jul 18, 2009 4:49:23 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
STARBASEn,
----------------------------------------Did you compare the log with a successful result [guess so]? If so what are the findings on the log entry differences? Your 1st screenshot indicates all results are unique, other than that the picture is not telling me many words, other than the % going over 100, which happens at times since BOINC is not a very good progress timekeeper... how could it on non-deterministic calculations... I've seen 900% and up on jobs that when wrapped up reverted back to 100% and HCMD2 jobs run up to 4 hours and longer ;>) Processing twice, well not reported twice for sure, else you'd see different message in the BOINC log. Jobs do resume from a prior checkpoint though at times, mostly on system/client restarts. 99.99999% of the cases it never does it by itself. Blaming the keyboard? Sure you can ![]()
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jul 18, 2009 9:50:54 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
STARBASEn is correct, a double boinc_finish is unusual. Resuming from checkpoint, the INFO line is repeated. The only other HCMD2 message that is commonly seen is: Finishing early because max runtime has been exceeded.14430.680299
STARBASEn, how did your quorum peers fare with this work unit? If they failed, then there is a problem with the work unit. Otherwise, we can only assume your computer threw a cog somewhere. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Now I wonder. There have been a few cases of exceed over 100%, but can't remember them ending having been reported as ending in error. The once I've had did not topple. Most odd comparing with me own result logs.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
STARBASEn, how did your quorum peers fare with this work unit? If they failed, then there is a problem with the work unit. Otherwise, we can only assume your computer threw a cog somewhere. Here are the quorum results: Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 2-- 614 Valid 7/17/09 23:30:59 7/18/09 07:56:06 1.89 23.1 / 22.6 CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 1-- 614 Error 7/17/09 02:18:25 7/17/09 23:27:32 3.49 50.6 / 0.0 CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 0-- 614 Valid 7/17/09 02:12:27 7/17/09 06:23:09 2.17 22.1 / 22.6 And here are the valid return logs from the two successful wingmen. These were not yet completed when I posted this last night: Result Log Result Name: CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 2-- <core_client_version>6.2.15</core_client_version> <![CDATA[ <stderr_txt> INFO: Initializing Platform. INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> ]]> Result Log Result Name: CMD2_ 0018-MOESA.clustersOccur-MYH2A.clustersOccur_ 451_ 701018_ 701486_ 0-- <core_client_version>6.4.5</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> ]]> It appears that the WU is fine, just mine hic upped somewhere. Edit: Stupid me this time ![]() [Edit 1 times, last edit by Former Member at Jul 18, 2009 5:33:39 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Let us know if you notice this happening again.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Will do and Thanks.
|
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
There was a report at the Berkeley forum by Aurora Borealis of a case where pausing at 100%, ending in error:
----------------------------------------snippet Result Name: CMD2_ 0017-MYH1.clustersOccur-MYH2A.clustersOccur_ 866_ 838043_ 838215_ 2-- <core_client_version>6.6.31</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. called boinc_finish called boinc_finish </stderr_txt> ]]>
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
![]() |