Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 4
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
My machines were hit by several power outages this morning. When one restarted I noticed that WU ZIKA_ 000446370_ x5i3q_ DENV3_ NS5pol_ s2_ 0399_ 0 seemed to have over 8 hours of runtime, with a run-rate of over 10%/hour, yet only had a tiny amount of CPU time. Now it's finished I just took a look at the result log. Here are some lines that may be relevant:
<core_client_version>7.14.2</core_client_version> <![CDATA[ <stderr_txt> INFO: result number = 0 INFO: No state to restore. Start from the beginning. [00:40:31] Number of tasks = 48 [00:40:31] Running task 0,CPU time at start of task 0 was 0.000000 [00:40:32] ./ZINC000032550943.pdbqt size = 21 4 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [00:53:03] Finished task #0 cpu time used 658.953125 [00:53:03] Running task 1,CPU time at start of task 1 was 658.953125 ... [09:23:22] Finished task #46 cpu time used 544.312500 [09:23:22] Running task 47,CPU time at start of task 47 was 28643.108125 [09:23:22] ./ZINC000032565314.pdbqt size = 19 3 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [09:33:03] Finished task #47 cpu time used 538.921875 INFO: result number = 0 INFO: No state to restore. Start from the beginning. [09:42:13] Number of tasks = 48 [09:42:13] Running task 0,CPU time at start of task 0 was 0.000000 [09:42:13] ./ZINC000032550943.pdbqt size = 21 4 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 INFO: result number = 0 [09:48:18] Number of tasks = 48 [09:48:18] Running task 0,CPU time at start of task 0 was 0.000000 [09:48:18] ./ZINC000032550943.pdbqt size = 21 4 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [09:56:34] Finished task #0 cpu time used 29805.596875 [09:56:34] Running task 1,CPU time at start of task 1 was 29805.596875 [09:56:34] ./ZINC000032550963.pdbqt size = 22 5 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [10:09:48] Finished task #1 cpu time used 745.109375 [10:09:48] Running task 2,CPU time at start of task 2 was 30550.706250 [10:09:48] ./ZINC000032550977.pdbqt size = 20 2 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [10:19:19] Finished task #2 cpu time used 520.812500 [10:19:19] Running task 3,CPU time at start of task 3 was 31071.518750 [10:19:19] ./ZINC000032550992.pdbqt size = 22 2 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [10:28:49] Finished task #3 cpu time used 535.265625 [10:28:49] Running task 4,CPU time at start of task 4 was 31606.784375 [10:28:49] ./ZINC000032551035.pdbqt size = 20 2 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [10:36:21] Finished task #4 cpu time used 417.562500 [10:36:21] Running task 5,CPU time at start of task 5 was 32024.346875 [10:36:21] ./ZINC000032551039.pdbqt size = 21 4 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 INFO: result number = 0 [10:52:04] Number of tasks = 48 [10:52:04] Running task 5,CPU time at start of task 5 was 32024.346875 [10:52:04] ./ZINC000032551039.pdbqt size = 21 4 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [11:00:00] Finished task #5 cpu time used 722.706250 [11:00:00] Running task 6,CPU time at start of task 6 was 32747.053125 ... [18:36:23] Finished task #46 cpu time used 554.468750 [18:36:23] Running task 47,CPU time at start of task 47 was 58049.053125 [18:36:23] ./ZINC000032565314.pdbqt size = 19 3 ../../projects/www.worldcommunitygrid.org/zika.x5i3q_DENV3_NS5pol_s2.pdbqt size = 6486 0 [18:46:18] Finished task #47 cpu time used 549.578125 18:46:19 (2820): called boinc_finish(0) It looks like the power went off three times. The second and third times it restarted in the manner I would expect, but something strange happened the first time -- this is when it got reset back to the start, but was still recording time held over from the first time around. It looks like the first power outage was just as the last task finished. It seems that the checkpoint file was cleared but, upon restart, nothing noticed any output file that should have been sent. So it seems that there might be some sort of timing window that got hit when the WU is in a non-recoverable state. I guess the project is too near to ending for it to be worth spending much time looking at this, but I wondered if it might provide any lessons for other projects. |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7668 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Was it valid or invalid ?
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
wcg checkpoint files and boinc task state file are different files.
Probably the boinc task state was written well, but the wcg checkpoint files were being written and got corrupted during the power outage. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sgt.Joe: It went valid. It recorded double time and was awarded double points. I'm not complaining about that. I just think it's a waste of resources to run it twice.
Crystal Pellet: I don't disagree with your logic, not that I know any details. It's just that I think that that window could be closed with a bit more logic. It's the sort of problem that may not have been (fully) considered when the code was written. It might not be worth changing anything, but I wanted to bring it to people's attention. |
||
|
|
![]() |