| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 4
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I started za071_00581 on the 28th (at 10:05 local, if the filestamps are to be believed. It had clocked up quite a large time, and was sitting at 80% doing nothing much when I checked last night (I don't usually babysit my work units, but this one had been running for 2 or 3 days). I decided to give it overnight to see if it made any further progress. Imagine my surprise, then, when I see it only reports 6:55 hours of CPU time today. This was not what it said last night. It is still using 90+% of my CPU, and should be reporting about 74 hours.
----------------------------------------Here is the stderr: [11:35:50] [INFO] Checkpoint complete [12:05:34] [INFO] Checkpoint complete [12:35:40] [INFO] Checkpoint complete [12:59:08] [INFO] Checkpoint complete [13:26:03] [INFO] Checkpoint complete [13:50:07] [INFO] Checkpoint complete [14:12:58] [INFO] Checkpoint complete [14:26:06] [INFO] Checkpoint complete [14:53:30] [INFO] Checkpoint complete [15:12:55] [INFO] Checkpoint complete [15:24:43] [INFO] Checkpoint complete [15:44:03] [INFO] Checkpoint complete [16:05:59] [INFO] Checkpoint complete [16:17:34] [INFO] Checkpoint complete [16:39:11] [INFO] Checkpoint complete [17:00:46] [INFO] Checkpoint complete No heartbeat from core client for 31 sec - exiting No heartbeat from core client for 31 sec - exiting No heartbeat from core client for 31 sec - exiting And the messages from when it died: 29/07/2006 00:42:58|World Community Grid|Task za071_00581_7 exited with zero status but no 'finished' file 29/07/2006 00:42:58|World Community Grid|If this happens repeatedly you may need to reset the project. 29/07/2006 00:42:58|World Community Grid|Restarting task za071_00581_7 using hpf2 version 507 31/07/2006 05:41:56|World Community Grid|Task za071_00581_7 exited with zero status but no 'finished' file 31/07/2006 05:41:56|World Community Grid|If this happens repeatedly you may need to reset the project. 31/07/2006 05:41:56|World Community Grid|Restarting task za071_00581_7 using hpf2 version 507 31/07/2006 11:11:52|World Community Grid|Task za071_00581_7 exited with zero status but no 'finished' file 31/07/2006 11:11:52|World Community Grid|If this happens repeatedly you may need to reset the project. 31/07/2006 11:11:52|World Community Grid|Restarting task za071_00581_7 using hpf2 version 507 I believe I will kill it now. [Edit 1 times, last edit by Former Member at Jul 31, 2006 12:06:25 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I started za071_00581 on the 28th (at 10:05 local, if the filestamps are to be believed. It had clocked up quite a large time, and was sitting at 80% doing nothing much when I checked last night (I don't usually babysit my work units, but this one had been running for 2 or 3 days). I decided to give it overnight to see if it made any further progress. Imagine my surprise, then, when I see it only reports 6:55 hours of CPU time today. This was not what it said last night. It is still using 90+% of my CPU, and should be reporting about 74 hours. What do you think caused the odd behaviour? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, it's a beta workunit so trouble was to be expected. I'm posting about it here in the hope that someone can explain it.
|
||
|
|
BobCat13
Senior Cruncher Joined: Oct 29, 2005 Post Count: 295 Status: Offline Project Badges:
|
You had a stuck WU. The last checkpoint was 6:55 hours from the start of the WU. Each time you got the no heartbeat message, the WU restarted from the last checkpoint. From the 29/07 no heartbeat to the 31/07 no heartbeat was almost 53 hours. You would have been close to 60 hours CPU time before that first 31/07 no heartbeat reset it back to the last checkpoint (6:55 CPU time).
|
||
|
|
|