| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 6
|
|
| Author |
|
|
RTS48
Veteran Cruncher Bolivia Joined: Aug 2, 2009 Post Count: 1353 Status: Offline Project Badges:
|
I don't know if this has been reported before but here goes.
----------------------------------------Because I travel and take my MacBook computer with me, it is necessary to either shut down or suspend the computer (by closing the screen) when boarding an aircraft. In every case, when OET WUs restart all seems fine - the completed totals are unchanged from when the shutdown or suspension occurred. Within a few minutes nearly every WU has returned to 10% completion even though the elapsed time remains unchanged. This may result in the time to completion jumping from say 15 minutes (for a 90% complete - 2h 30m elapsed WU) to over a day. While I have not studied the completion of these suspended / resumed WUs it is clear that they do not take the projected time to complete and complete much more quickly. I have all the correct preferences about keeping WUs in memory during suspension etc etc so I am puzzled about what is happening. Can anyone enlighten me?
Rod Peel
Santa Cruz Bolivia South America , ![]() |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I am going to hazard a guess that all of the jobs are reverting to their last checkpoint even though you have them even though you are keeping all WU's in memory. I don't run any MAC machines, so it could be something entirely different.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
RTS48
Veteran Cruncher Bolivia Joined: Aug 2, 2009 Post Count: 1353 Status: Offline Project Badges:
|
I am going to hazard a guess that all of the jobs are reverting to their last checkpoint even though you have them even though you are keeping all WU's in memory. I don't run any MAC machines, so it could be something entirely different. Cheers Sgt.Joe Thanks for taking time to reply to me. Yes that's what I thought was happening too and I was bemoaning the lack of foresight of the project coders to provide a regular checkpoint. Then, on really thinking about it, it would make no sense to checkpoint at 10% and not do so subsequently. I don't want to shut down one of my machines to prove it but I am certain that all WUs revert to 10% and that all WUs had a CPU checkpoint with a maximum age of 30 minutes (15 minutes being the mean). I don't know if the checkpoints are timed or progress dependent. The mystery coninues.....
Rod Peel
Santa Cruz Bolivia South America , ![]() |
||
|
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2346 Status: Recently Active Project Badges:
|
What really happens is that the WUs revert to their latest checkpoint (a rounded 10%) and have forgotten which rounded 10% that was, be it 0% (no checkpoint achieved) or 90% (final checkpoint before 100% is reached). So the elapsed time is real, but the percentage is lagging behind. So, after a restart the WU will go back to its latest checkpoint and soon forget about its latest corresponding percentage, but when the WU is finished, the percentage will suddenly jump to 100%. Conclusion: I wouldn't worry too much about it, it's only the percentage that is wrong, not the time spent.
----------------------------------------[Edit 1 times, last edit by adriverhoef at Jul 5, 2017 9:44:30 AM] |
||
|
|
RTS48
Veteran Cruncher Bolivia Joined: Aug 2, 2009 Post Count: 1353 Status: Offline Project Badges:
|
Thanks adriverhoef that makes sense. It is just when you see a day to completion of a WU that normally takes 2 hours you are a little concerned, not only for the time to completion but also for the crunching time lost on other WUs.
----------------------------------------
Rod Peel
Santa Cruz Bolivia South America , ![]() |
||
|
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
It's so easy to go into the job slots directory and look into the strerr.txt and chk log files and verify what's facts and fiction. OET has 8 checkpoints each 12.5 % and for whatever rounds the sum of steps % on nux platforms. The checkpoints log the cpu time in seconds IIRC.
|
||
|
|
|