| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 11
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I happened to notice that one of my validated WUs was awarded precisely the points that it claimed. I thought this was unusual, if not odd, so looked a little deeper. The wingman claimed 0 (no) points. Again, odd. So I looked at the result logs. The wingman's log is shown here:
----------------------------------------Result Log Result Name: ARP1_ 0012607_ 002_ 0-- <core_client_version>7.6.31</core_client_version> <![CDATA[ <stderr_txt> INFO: Initializing INFO: No state to restore. Start from the beginning. Starting WRFMain INFO: Initializing INFO: No state to restore. Start from the beginning. Starting WRFMain [10:46:49] INFO: Checkpoint taken at 2018-07-05_06:00:00 INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain [20:09:20] INFO: Checkpoint taken at 2018-07-05_18:00:00 INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain INFO: Initializing Starting WRFMain [01:05:52] INFO: Checkpoint taken at 2018-07-06_00:00:00 [02:40:23] INFO: Checkpoint taken at 2018-07-06_06:00:00 [03:57:56] INFO: Checkpoint taken at 2018-07-06_12:00:00 [05:13:41] INFO: Checkpoint taken at 2018-07-06_18:00:00 INFO: Initializing Starting WRFMain [17:25:52] INFO: Checkpoint taken at 2018-07-07_00:00:00 INFO: Simulation complete compressing output. 17:29:06 (19023): called boinc_finish(0) </stderr_txt> ]]> You will see that the WU restarted several times, apparently without incident. But you might also notice that there is no checkpoint recorded for 2018-07-05_12:00:00. Surely something is wrong somewhere? [Edit 1 times, last edit by Former Member at Jan 31, 2020 5:41:19 PM] |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
Wonder if it's the same wingman as the 7.6.31 person in this post. Also claimed 0 points. I didn't get a chance to see if it was missing a checkpoint.
----------------------------------------
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It's a pity that there's not an abundance of thought going into the 'presentation' of the user accessible result logs. Here I think every entry getting a local time stamp will allow users to more easily correlate this back to their client event log and diagnose the 'why', particularly if they've set a more verbose logging in their cc_config.xml.
|
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
Did you notice, that the time stamps refer to 2018, i.e. prior project release ?
----------------------------------------The time stamps are very strange. Cheers, Yves |
||
|
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 865 Status: Offline Project Badges:
|
KerSamson said:
----------------------------------------Did you notice, that the time stamps refer to 2018, i.e. prior project release ? The time stamps are very strange. Cheers, Yves The 002 generation work units simulate the weather at the following 8 timestamps, using weather data from the past: <core_client_version>7.14.2</core_client_version> Not odd at all. What's very odd from your logs is it's indeed missing the checkpoint at 2018-07-05_12:00:00! It'd be nice for a WCG staffer to look at that, as I'd hate for there to be data corruption.
|
||
|
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1684 Status: Offline Project Badges:
|
OK
----------------------------------------My mistake ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
What's very odd from your logs is it's indeed missing the checkpoint at 2018-07-05_12:00:00! It'd be nice for a WCG staffer to look at that, as I'd hate for there to be data corruption. Yup, that was kinda the point of posting it in the first place ... |
||
|
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges:
|
The stderr out that you see in this is a log of what is happening up until that point. However, if the stderr is not flushed and say the cord is pulled, then the file location where the text is written is never actually written, it is just saved in memory which gets lost. This does not mean that the actual checkpoint wasn't saved. We do a binary compare between the two results that are returned, so they have to match exactly for a result to be considered valid. As for the 0 points claimed, I'm not sure why that would be at first. But the missing checkpoint data is not missing, just a missing line in the log.
Thanks, -Uplinger |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Thanks for the info Keith. I find your logic reasonable, but I hope you could see why I wasn't the only one who found it (shall I say) disturbing.
I can't add anything to the potential zero points issue, but as long as you guys know about it I'm happy to have done my bit. Thanks again. |
||
|
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12594 Status: Offline Project Badges:
|
I have seen the zero points issue several times but have not investigated it. The zero pointer is always given the sames points as his/her wingman. Maybe it is the "invisible" line which causes the zero points!
Mike |
||
|
|
|