| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 99
|
|
| Author |
|
|
foxfire
Advanced Cruncher United States Joined: Sep 1, 2007 Post Count: 121 Status: Offline Project Badges:
|
Suspended at checkpoint, BOINC shutdown, BOINC started. All resumed from checkpoint.
----------------------------------------BETA_OET1_0000297_xZAGP_0784_0 BETA_OET1_0000297_xZAGP_0785_0 BETA_OET1_0000297_xZAGP_0780_0 BETA_OET1_0000297_xZAGP_0721_1 BETA_OET1_0000297_xZAGP_0727_1 BETA_OET1_0000297_xZAGP_0742_1 BETA_OET1_0000297_xZAGP_0749_1 BETA_OET1_0000298_xEBGP-FA_rig_0920_1 Seeing some that have long intervals between checkpoints: WU; Elap; (CPU); Since Checkpoint --------------------------------- BETA_OET1_0000298_xEBGP-FA_rig_0503_0; 00:30:28; (00:30:27); [0] 00:30:27 BETA_OET1_0000298_xEBGP-FA_rig_0474_1; 00:33:58; (00:33:48); [0] 00:33:48 BETA_OET1_0000298_xEBGP-FA_rig_0276_0; 00:28:08; (00:27:57); [0] 00:27:57 ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm now starting to believe that the rigid unit is checkpointing every 10% of the way through (but not based on time). When we know just how long these WUs might run we will have a better idea if that is often enough, but my feeling is that it probably isn't (though anything is better than nothing, of course).
I'm also seeing the progress jumping in 10% increments now, but after the 5th checkpoint it was showing 60% complete, not 50%, and it is now showing 70% without another checkpoint. That suggests it is going to sit for quite a while at 100% -- or maybe it will then drop back and increment apparently more normally -- but I don't think I'll still be awake when it gets there to see. While us old hands know that this is not a problem, it would be good if this could be improved on as previous posts by newbies have demonstrated that this confuses them, even to the point of killing WUs in the belief that they are "stuck". Just my 2p'th. |
||
|
|
Paul Schlaffer
Senior Cruncher USA Joined: Jun 12, 2005 Post Count: 278 Status: Offline Project Badges:
|
Suspended and resumed at 70%. No % complete loss, so it must have been very close to a checkpoint. (The CEP WU fell back as expected).
----------------------------------------Edit: After running for several minutes it fell back to 10%. Concur that the progress indicator is not incrementing smooth like the other work-units. Batch 298.
“Where an excess of power prevails, property of no sort is duly respected. No man is safe in his opinions, his person, his faculties, or his possessions.” – James Madison (1792)
----------------------------------------[Edit 2 times, last edit by Paul Schlaffer at Jan 8, 2015 1:41:02 AM] |
||
|
|
DadX
Advanced Cruncher Joined: Sep 9, 2006 Post Count: 56 Status: Offline Project Badges:
|
Exited the app and stopped the processing between checkpoints. The WU re-started cleanly losing about the amount of time I expected.
----------------------------------------![]() |
||
|
|
deltavee
Ace Cruncher Texas Hill Country Joined: Nov 17, 2004 Post Count: 4894 Status: Offline Project Badges:
|
I haven't found any problems. The 298s are completing without incident in between 1.50 and 1.98 hours. Congratulations to the techs.
|
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
Slightly off-topic, but ...
I haven't had any beta WUs for many months. I may have disabled participating in beta tests before leaving home for a week away, with all or most machines crunching away unattended, and forgotten to re-enable it upon my return. Now I can't find the beta test participation option. I can't see it in Settings >> My Profile or anywhere in Device Manager, and the other functions in Settings would not be relevant. Before the website was unimproved, access was via a sidebar option, in our member's profile I think, and there was only 1 setting to cover all devices. Update: I just found the settings, 1 for each device, under "My Contribution". Why is there not a box for this in each Device Profile, so that it can be reached under "Settings"? Please add to website "to do" list. And good luck with the current beta test. |
||
|
|
NixChix
Veteran Cruncher United States Joined: Apr 29, 2007 Post Count: 1187 Status: Offline Project Badges:
|
I received 4 295 WUs and all completed while I was at work and was not able to test checkpointing. All 4 ran simultaneously with no issues.
----------------------------------------Cheers ![]() ![]() |
||
|
|
OldChap
Veteran Cruncher UK Joined: Jun 5, 2009 Post Count: 978 Status: Offline Project Badges:
|
LAIM off it seemed to fall back between 10 and 20% to a checkpoint..... When I next looked some moments later it was back to 10%
----------------------------------------![]() |
||
|
|
ca05065
Senior Cruncher Joined: Dec 4, 2007 Post Count: 328 Status: Offline Project Badges:
|
I had one 00298 work unit.
After the sequence LAIM off, suspend, removed from memory message, resume, running I noticed the following: Properties showed CPU last checkpoint as i hour 48 minutes but the stderr file shows zero CPU time at restart: Result Log Result Name: BETA_ OET1_ 0000298_ xEBGP-FA_ rig_ 1327_ 1-- <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [23:41:20] Number of tasks = 1 [23:41:20] Starting task 0,CPU time is 0.000000 [23:41:20] ./ZINC13130211_1.pdbqt size = 24 4 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-FA_rig.pdbqt size = 2451 0 [00:18:01] Number of tasks = 1 [00:18:01] Starting task 0,CPU time is 0.000000 [00:18:01] ./ZINC13130211_1.pdbqt size = 24 4 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-FA_rig.pdbqt size = 2451 0 [10:34:12] Number of tasks = 1 [10:34:12] Starting task 0,CPU time is 0.000000 [10:34:12] ./ZINC13130211_1.pdbqt size = 24 4 ../../projects/www.worldcommunitygrid.org/beta20.xEBGP-FA_rig.pdbqt size = 2451 0 [11:01:36] Finished task #0 cpu time used 8109.087677 11:01:36 (192716): called boinc_finish Note that the CPU time changes from 0 to 8109 seconds in 27 minutes (1620 seconds). |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
19 of batch 295/296 on Windows, most valid, BOINCTasks showing the 'normal' number of checkpoints taken, 5-7 per task, when the log indicates there were for instance 45 jobs included. This confirms the app follows the 'Write to Disk at Most' setting properly, which is set at 1000 seconds.
Result Name: BETA_ OET1_ 0000296_ xZAGP_ 0644_ 0-- <core_client_version>7.4.27</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. [23:27:37] Number of tasks = 45 [23:27:37] Starting task 0,CPU time is 0.000000 [23:27:37] ./ZINC12785727_1.pdbqt size = 29 5 ../../projects/www.worldcommunitygrid.org/beta20.xZAGP.pdbqt size = 2321 0 [08:05:25] Finished task #0 cpu time used 482.167891 ... [10:31:30] ./ZINC12788076_1.pdbqt size = 30 6 ../../projects/www.worldcommunitygrid.org/beta20.xZAGP.pdbqt size = 2321 0 [10:34:56] Finished task #44 cpu time used 205.921320 10:34:56 (9948): called boinc_finish |
||
|
|
|