Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 4
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1086 times and has 3 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Odd behaviour of beta workunit

I started za071_00581 on the 28th (at 10:05 local, if the filestamps are to be believed. It had clocked up quite a large time, and was sitting at 80% doing nothing much when I checked last night (I don't usually babysit my work units, but this one had been running for 2 or 3 days). I decided to give it overnight to see if it made any further progress. Imagine my surprise, then, when I see it only reports 6:55 hours of CPU time today. This was not what it said last night. It is still using 90+% of my CPU, and should be reporting about 74 hours.

Here is the stderr:

[11:35:50] [INFO] Checkpoint complete
[12:05:34] [INFO] Checkpoint complete
[12:35:40] [INFO] Checkpoint complete
[12:59:08] [INFO] Checkpoint complete
[13:26:03] [INFO] Checkpoint complete
[13:50:07] [INFO] Checkpoint complete
[14:12:58] [INFO] Checkpoint complete
[14:26:06] [INFO] Checkpoint complete
[14:53:30] [INFO] Checkpoint complete
[15:12:55] [INFO] Checkpoint complete
[15:24:43] [INFO] Checkpoint complete
[15:44:03] [INFO] Checkpoint complete
[16:05:59] [INFO] Checkpoint complete
[16:17:34] [INFO] Checkpoint complete
[16:39:11] [INFO] Checkpoint complete
[17:00:46] [INFO] Checkpoint complete
No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting

And the messages from when it died:

29/07/2006 00:42:58|World Community Grid|Task za071_00581_7 exited with zero status but no 'finished' file
29/07/2006 00:42:58|World Community Grid|If this happens repeatedly you may need to reset the project.
29/07/2006 00:42:58|World Community Grid|Restarting task za071_00581_7 using hpf2 version 507
31/07/2006 05:41:56|World Community Grid|Task za071_00581_7 exited with zero status but no 'finished' file
31/07/2006 05:41:56|World Community Grid|If this happens repeatedly you may need to reset the project.
31/07/2006 05:41:56|World Community Grid|Restarting task za071_00581_7 using hpf2 version 507
31/07/2006 11:11:52|World Community Grid|Task za071_00581_7 exited with zero status but no 'finished' file
31/07/2006 11:11:52|World Community Grid|If this happens repeatedly you may need to reset the project.
31/07/2006 11:11:52|World Community Grid|Restarting task za071_00581_7 using hpf2 version 507

I believe I will kill it now.
----------------------------------------
[Edit 1 times, last edit by Former Member at Jul 31, 2006 12:06:25 PM]
[Jul 31, 2006 11:58:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Odd behaviour of beta workunit

I started za071_00581 on the 28th (at 10:05 local, if the filestamps are to be believed. It had clocked up quite a large time, and was sitting at 80% doing nothing much when I checked last night (I don't usually babysit my work units, but this one had been running for 2 or 3 days). I decided to give it overnight to see if it made any further progress. Imagine my surprise, then, when I see it only reports 6:55 hours of CPU time today. This was not what it said last night. It is still using 90+% of my CPU, and should be reporting about 74 hours.

What do you think caused the odd behaviour?
[Jul 31, 2006 2:23:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Odd behaviour of beta workunit

Well, it's a beta workunit so trouble was to be expected. I'm posting about it here in the hope that someone can explain it.
[Jul 31, 2006 3:51:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Odd behaviour of beta workunit

You had a stuck WU. The last checkpoint was 6:55 hours from the start of the WU. Each time you got the no heartbeat message, the WU restarted from the last checkpoint. From the 29/07 no heartbeat to the 31/07 no heartbeat was almost 53 hours. You would have been close to 60 hours CPU time before that first 31/07 no heartbeat reset it back to the last checkpoint (6:55 CPU time).
[Aug 1, 2006 6:06:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread