Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 5586 times and has 4 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
WU in some sort of loop

Is this considered normal behavior and should I let it continue?

[03:22:58] Finished Job #13
[03:22:58] Starting job 14,CPU time has been restored to 22654.640625.
[04:55:30] Finished Job #14
[04:55:30] Starting job 15,CPU time has been restored to 28115.718750.
06:46:42 (3288): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[00:06:00] Number of jobs = 16
[00:06:00] Starting job 15,CPU time has been restored to 28115.718750.
00:08:30 (3436): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[00:08:47] Number of jobs = 16
[00:08:47] Starting job 15,CPU time has been restored to 28115.718750.
01:39:47 (2336): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[01:39:50] Number of jobs = 16
[01:39:50] Starting job 15,CPU time has been restored to 28115.718750.
02:40:36 (1524): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[02:40:38] Number of jobs = 16
[02:40:38] Starting job 15,CPU time has been restored to 28115.718750.
04:11:47 (2340): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[04:11:56] Number of jobs = 16
[04:11:56] Starting job 15,CPU time has been restored to 28115.718750.
05:12:38 (2576): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[05:12:39] Number of jobs = 16
[05:12:39] Starting job 15,CPU time has been restored to 28115.718750.
06:43:48 (3048): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[06:43:50] Number of jobs = 16
[06:43:50] Starting job 15,CPU time has been restored to 28115.718750.
07:44:37 (164): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[07:45:03] Number of jobs = 16
[07:45:03] Starting job 15,CPU time has been restored to 28115.718750.
08:45:26 (2380): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[08:45:37] Number of jobs = 16
[08:45:37] Starting job 15,CPU time has been restored to 28115.718750.
09:46:15 (4048): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[09:46:19] Number of jobs = 16
[09:46:19] Starting job 15,CPU time has been restored to 28115.718750.
10:47:02 (1352): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[10:47:04] Number of jobs = 16
[10:47:04] Starting job 15,CPU time has been restored to 28115.718750.
12:18:15 (4004): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[12:18:41] Number of jobs = 16
[12:18:41] Starting job 15,CPU time has been restored to 28115.718750.
12:48:41 (3688): No heartbeat from core client for 30 sec - exiting
No heartbeat: Exiting
[12:49:04] Number of jobs = 16
[12:49:04] Starting job 15,CPU time has been restored to 28115.718750.


Windows XP SP3
BOINC Client 6.10.58 32bit
Quad core 2GB memory
[Nov 13, 2010 6:57:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4894
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU in some sort of loop

It looks pretty hopeless.
[Nov 13, 2010 9:33:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in some sort of loop

That sure looks like an endless loop to me. I would reboot. If the loop continued, I would then Abort the Task.

Lawrence
[Nov 13, 2010 10:16:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in some sort of loop

I just happened to catch it one time and what happens is the disk light comes on and stays on for several minutes. Task manager shows that the only process doing I/O is BOINC.EXE. Evidently during this heavy I/O period, it doesn't talk to the science processes and all four reset back to the beginning of the job they were working on. Task Manager shows ~26,000,000,000 I/O bytes for BOINC.EXE and ~1.5 million I/O Reads. Single largest process when sorted by I/O Reads
[Nov 14, 2010 2:23:03 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in some sort of loop

Looks like a know issue but no resolution. Berkeley has ticket 336 open:

Problem with the heartbeat mechanism: if the client does something that blocks for > 30 secs (e.g. a synchronous DNS lookup, a disk-space scan, a debugger break) then all apps quit, producing confusing messages and possibly wasting CPU time.

Proposed solution: remove heartbeat mechanism. Include client process ID in the app_init_data file. The API periodically sees if that process is still alive, and exits if not.

Between not being able to upload results and not being able to run the science, this isn't a very productive project for me. I think I'm going to take my cue from the science and EXIT...
[Nov 14, 2010 4:31:57 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread