Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3362 times and has 10 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Missing Checkpoint? [Resolved]

I happened to notice that one of my validated WUs was awarded precisely the points that it claimed. I thought this was unusual, if not odd, so looked a little deeper. The wingman claimed 0 (no) points. Again, odd. So I looked at the result logs. The wingman's log is shown here:

Result Log

Result Name: ARP1_ 0012607_ 002_ 0--
<core_client_version>7.6.31</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[10:46:49] INFO: Checkpoint taken at 2018-07-05_06:00:00
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
[20:09:20] INFO: Checkpoint taken at 2018-07-05_18:00:00
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
INFO: Initializing
Starting WRFMain
[01:05:52] INFO: Checkpoint taken at 2018-07-06_00:00:00
[02:40:23] INFO: Checkpoint taken at 2018-07-06_06:00:00
[03:57:56] INFO: Checkpoint taken at 2018-07-06_12:00:00
[05:13:41] INFO: Checkpoint taken at 2018-07-06_18:00:00
INFO: Initializing
Starting WRFMain
[17:25:52] INFO: Checkpoint taken at 2018-07-07_00:00:00
INFO: Simulation complete compressing output.
17:29:06 (19023): called boinc_finish(0)

</stderr_txt>
]]>

You will see that the WU restarted several times, apparently without incident. But you might also notice that there is no checkpoint recorded for 2018-07-05_12:00:00.

Surely something is wrong somewhere?
----------------------------------------
[Edit 1 times, last edit by Former Member at Jan 31, 2020 5:41:19 PM]
[Jan 27, 2020 12:59:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

Wonder if it's the same wingman as the 7.6.31 person in this post. Also claimed 0 points. I didn't get a chance to see if it was missing a checkpoint.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Jan 27, 2020 7:38:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

It's a pity that there's not an abundance of thought going into the 'presentation' of the user accessible result logs. Here I think every entry getting a local time stamp will allow users to more easily correlate this back to their client event log and diagnose the 'why', particularly if they've set a more verbose logging in their cc_config.xml.
[Jan 27, 2020 11:33:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

Did you notice, that the time stamps refer to 2018, i.e. prior project release ?
The time stamps are very strange.
Cheers,
Yves
----------------------------------------
[Jan 27, 2020 4:44:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 865
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

KerSamson said:
Did you notice, that the time stamps refer to 2018, i.e. prior project release ?
The time stamps are very strange.
Cheers,
Yves

The 002 generation work units simulate the weather at the following 8 timestamps, using weather data from the past:
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<stderr_txt>
INFO: Initializing
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[22:00:34] INFO: Checkpoint taken at 2018-07-05_06:00:00
[00:40:11] INFO: Checkpoint taken at 2018-07-05_12:00:00
[02:38:58] INFO: Checkpoint taken at 2018-07-05_18:00:00
[03:59:21] INFO: Checkpoint taken at 2018-07-06_00:00:00
[05:55:26] INFO: Checkpoint taken at 2018-07-06_06:00:00
[08:20:44] INFO: Checkpoint taken at 2018-07-06_12:00:00
[10:14:17] INFO: Checkpoint taken at 2018-07-06_18:00:00
[11:35:29] INFO: Checkpoint taken at 2018-07-07_00:00:00
INFO: Simulation complete compressing output.
11:37:09 (10116): called boinc_finish(0)

</stderr_txt>
]]>

Not odd at all. What's very odd from your logs is it's indeed missing the checkpoint at 2018-07-05_12:00:00! It'd be nice for a WCG staffer to look at that, as I'd hate for there to be data corruption.
----------------------------------------
  • i5-7500 (Kaby Lake, 4C/4T) @ 3.4 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • i5-3570 (Broadwell, 4C/4T) @ 3.4 GHz

[Jan 27, 2020 5:42:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1684
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

OK
My mistake rolling eyes
----------------------------------------
[Jan 27, 2020 11:28:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

What's very odd from your logs is it's indeed missing the checkpoint at 2018-07-05_12:00:00! It'd be nice for a WCG staffer to look at that, as I'd hate for there to be data corruption.

Yup, that was kinda the point of posting it in the first place ...
[Jan 28, 2020 12:44:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

The stderr out that you see in this is a log of what is happening up until that point. However, if the stderr is not flushed and say the cord is pulled, then the file location where the text is written is never actually written, it is just saved in memory which gets lost. This does not mean that the actual checkpoint wasn't saved. We do a binary compare between the two results that are returned, so they have to match exactly for a result to be considered valid. As for the 0 points claimed, I'm not sure why that would be at first. But the missing checkpoint data is not missing, just a missing line in the log.

Thanks,
-Uplinger
[Jan 31, 2020 3:50:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

Thanks for the info Keith. I find your logic reasonable, but I hope you could see why I wasn't the only one who found it (shall I say) disturbing.

I can't add anything to the potential zero points issue, but as long as you guys know about it I'm happy to have done my bit.

Thanks again.
[Jan 31, 2020 5:09:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12594
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Missing Checkpoint?

I have seen the zero points issue several times but have not investigated it. The zero pointer is always given the sames points as his/her wingman. Maybe it is the "invisible" line which causes the zero points!

Mike
[Feb 1, 2020 3:19:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread