Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 16
Posts: 16   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3914 times and has 15 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: checkpointing working correctly?

pvh513, only you and your wingmen can see Workunit Status as per your link above. The solution is to copy and paste the status into the forum message, as you've done for the Result Log.

I agree with your conclusion, though.
[Oct 22, 2014 10:43:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: checkpointing working correctly?

We are currently testing a fix for the checkpoint issue. If things run good in our alpha test environment there should be a beta test soon. We will provide an update once we are further into alpha testing. The bug occurs when a user restarts a task from a checkpoint and then has another stop before a new checkpoint is taken. That second restart is then not from the proper point. A temporary workaround until the updated applicaion is in production would be to check the setting to leave applications in memory. This should minmize the impact.

Thanks,
armstrdj
[Oct 23, 2014 6:52:32 PM]   Link   Report threatening or abusive post: please login first  Go to top 
dkester788
Cruncher
USA
Joined: May 3, 2007
Post Count: 44
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: checkpointing working correctly?

WU - INVALID! Looks like the logs are still being analyzed for the checkpoint issue on the "ugm" WUs. Thanks to all the WCG techs and staff for all they do keeping this running and functional. Only a few of my "ugm" tasks are coming up INVALID and all with the same errors.


Result Log

Result Name: ugm1_ ugm1_ 00376_ 0359_ 0--
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
Unable to open checkpoint file starting from 0
500 query sequences compared.
1000 query sequences compared.
1500 query sequences compared.
2000 query sequences compared.
2500 query sequences compared.
3000 query sequences compared.
3500 query sequences compared.
4000 query sequences compared.
4500 query sequences compared.
5000 query sequences compared.
5500 query sequences compared.
6000 query sequences compared.
6500 query sequences compared.
7000 query sequences compared.
7500 query sequences compared.
8000 query sequences compared.
8500 query sequences compared.
9000 query sequences compared.
9500 query sequences compared.
10000 query sequences compared.
10500 query sequences compared.
11000 query sequences compared.
11500 query sequences compared.
12000 query sequences compared.
12500 query sequences compared.
13000 query sequences compared.
13500 query sequences compared.
14000 query sequences compared.
14500 query sequences compared.
15000 query sequences compared.
15500 query sequences compared.
16000 query sequences compared.
16500 query sequences compared.
17000 query sequences compared.
17500 query sequences compared.
18000 query sequences compared.
18500 query sequences compared.
19000 query sequences compared.
19500 query sequences compared.
20000 query sequences compared.
20500 query sequences compared.
21000 query sequences compared.
21500 query sequences compared.
22000 query sequences compared.
22500 query sequences compared.
23000 query sequences compared.
23500 query sequences compared.
24000 query sequences compared.
24500 query sequences compared.
25000 query sequences compared.
25500 query sequences compared.
26000 query sequences compared.
26500 query sequences compared.
Checkpoint restored: 25987
26000 query sequences compared.
26500 query sequences compared.
Checkpoint restored: 24414
24500 query sequences compared.
Checkpoint restored: 24414
24500 query sequences compared.
25000 query sequences compared.
25500 query sequences compared.
26000 query sequences compared.
26500 query sequences compared.
27000 query sequences compared.
27500 query sequences compared.
28000 query sequences compared.
28500 query sequences compared.
29000 query sequences compared.
29500 query sequences compared.
30000 query sequences compared.
30500 query sequences compared.
31000 query sequences compared.
31500 query sequences compared.
32000 query sequences compared.
32500 query sequences compared.
33000 query sequences compared.
33500 query sequences compared.
34000 query sequences compared.
34500 query sequences compared.
35000 query sequences compared.
35500 query sequences compared.
36000 query sequences compared.
36500 query sequences compared.
Run complete, CPU time: 15107.869665
15:53:52 (2700): called boinc_finish

</stderr_txt>
]]>
Close

Return to Top



Result Name Device Name Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
ugm1_ ugm1_ 00382_ 1493_ 0-- Dave-PC Invalid 10/20/14 21:11:54 10/23/14 23:20:55 2.46 / 2.46 68.0 / 31.4

ugm1_ ugm1_ 00382_ 0223_ 0-- Dave-PC Invalid 10/20/14 21:11:54 10/23/14 23:20:55 2.48 / 2.49 68.8 / 31.5

ugm1_ ugm1_ 00376_ 0499_ 0-- Dave-PC Invalid 10/20/14 19:18:15 10/23/14 22:11:29 4.16 / 4.16 117.4 / 61.3

ugm1_ ugm1_ 00376_ 0130_ 1-- Dave-PC Invalid 10/20/14 19:18:15 10/23/14 22:11:29 4.15 / 4.15 117.2 / 68.0

ugm1_ ugm1_ 00376_ 0359_ 0-- Dave-PC Invalid 10/20/14 19:18:15 10/23/14 22:11:29 4.20 / 4.20 118.5 / 64.5
----------------------------------------
[Oct 24, 2014 6:53:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: checkpointing working correctly?

Alpha testing for the checkpointing fix is going well and beta testing should begin soon.

Thanks,
armstrdj
[Oct 29, 2014 2:19:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: checkpointing working correctly?

For those who have not seen the beta test is running well and several users have reported that checkpointing is behaving properly. Barring any new issues this should get promoted to production soon. Thanks to the beta testers for the thorough job testing this issue.

Thanks,
armstrdj
[Oct 31, 2014 3:41:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: checkpointing working correctly?

Thanks for that. Reading the beta thread, the quasi continuous writing to storage every few seconds is an open item on the list of issues to resolve. Degradation of efficiency continues anytime more than 4 of 8 threads is allowed to run ugm, so 'hard' coded ugm to never run more through the app_config.xml, if the feeder has queued enough to ugm occupy 4 threads.
[Oct 31, 2014 4:02:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 16   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread