Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 2
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2250 times and has 1 reply Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
No heartbeat from core client for 30 sec - exiting

This is the name of the topic in the Start Here FAQ index , item 4-X presently, which has 3 sub-topics, link-marked (2) (3) (4) and Doneske suffering too seeing his WU in some sort of loop .

Why this new topic? Well, I'm testing allot and saw the same same yesterday night in a CEP2 Result Log that came off my quad and tracked it back to the BOINC message log tab and thought it needed a highlight.

14-Nov-2010 19:21:31 [CPDN] Task hadsm3fub_k2e2_006460276_6 exited with zero status but no 'finished' file
14-Nov-2010 19:21:31 [CPDN] If this happens repeatedly you may need to reset the project.
14-Nov-2010 19:21:31 [WCG] Task c4cw_target02_051025434_0 exited with zero status but no 'finished' file
14-Nov-2010 19:21:31 [WCG] If this happens repeatedly you may need to reset the project.
14-Nov-2010 19:21:31 [WCG] Task E200529_783_A.26.C19H10N2OS4.185.1.set1d06_1 exited with zero status but no 'finished' file
14-Nov-2010 19:21:31 [WCG] If this happens repeatedly you may need to reset the project.
14-Nov-2010 19:21:31 [WCG] Task c4cw_target02_051019134_0 exited with zero status but no 'finished' file
14-Nov-2010 19:21:31 [WCG] If this happens repeatedly you may need to reset the project.
14-Nov-2010 19:21:31 [WCG] Task E200529_784_A.26.C19H10N2OS4.99.1.set1d06_0 exited with zero status but no 'finished' file
14-Nov-2010 19:21:31 [WCG] If this happens repeatedly you may need to reset the project.

Did I RTFM?... well I wrote them FAQs, so instantly knew the very likely cure as result of me own incomplete actions: Had moved the BOINC data_dir into its own partition, logical drive L:\BOINC, all clean to itself, with larger than normal block sizes of 64K to minimize file fragmentation , and had forgotten to tell my Avast AntiVirus software to exclude scanning of that data area which in the previous install trial was restored to best for all C:\ProgramData\BOINC (Vista and W7).

Why?: Because AVs have to work very hard to keep scanning what BOINC is doing, in the CEP2 case some 6600+ intermediate task files which during some checkpointing can take several minutes to read and update... the slower the disk, the longer it takes.

Thus: Anyone incurring these "zero status" and "No Heartbeat, exiting messages", might want to do some RTFMs. It's not a guaranteed cure, but on busy systems will certainly help to reduce the incurring of these lost computing time events biggrin

Learn and reap: See something bad in the Result Log >>>> track it back to the message log on or before the same timestamp, all stored for weeks in the stdoutdae.txt log file found in the BOINC data dir (path printed at start of client session). Support likes to hear of those too to round out the picture.

Good morning world.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Nov 15, 2010 8:05:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: No heartbeat from core client for 30 sec - exiting

I've had a few like that Sek.

Was wondering if I was maybe doing something else on the machine at the time ? Dunno though, cant remember. It seems to get around it in the end though. :thumbsup:

Result Log

Result Name: E200535_ 702_ A.23.C18H13NSSe2Si.6.2.set1d06_ 0--



<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
17:52:32 (428): No heartbeat from core client for 30 sec - exiting
17:52:33 (428): No heartbeat from core client for 30 sec - exiting
17:52:34 (428): No heartbeat from core client for 30 sec - exiting
17:52:35 (428): No heartbeat from core client for 30 sec - exiting
17:52:36 (428): No heartbeat from core client for 30 sec - exiting
[17:52:37] Number of jobs = 16
[17:52:37] Starting job 0,CPU time has been restored to 0.000000.
No heartbeat: Exiting
[17:52:39] Number of jobs = 16
[17:52:39] Starting job 0,CPU time has been restored to 0.000000.
[17:54:25] Finished Job #0
[17:54:25] Starting job 1,CPU time has been restored to 92.484375.
[17:59:09] Finished Job #1
[17:59:09] Starting job 2,CPU time has been restored to 365.109375.
[20:17:36] Finished Job #2
[20:17:36] Starting job 3,CPU time has been restored to 8527.312500.
[20:22:48] Finished Job #3
[20:22:48] Starting job 4,CPU time has been restored to 8826.203125.
[20:26:25] Finished Job #4
[20:26:25] Starting job 5,CPU time has been restored to 9034.250000.
[20:30:05] Finished Job #5
[20:30:05] Starting job 6,CPU time has been restored to 9244.828125.
[20:33:36] Finished Job #6
[20:33:36] Starting job 7,CPU time has been restored to 9449.343750.
[20:37:36] Finished Job #7
[20:37:36] Starting job 8,CPU time has been restored to 9679.421875.
[20:41:03] Finished Job #8
[20:41:03] Starting job 9,CPU time has been restored to 9877.156250.
[20:44:48] Finished Job #9
[20:44:48] Starting job 10,CPU time has been restored to 10089.515625.
[20:52:14] Finished Job #10
[20:52:14] Starting job 11,CPU time has been restored to 10529.000000.
[20:56:46] Finished Job #11
[20:56:46] Starting job 12,CPU time has been restored to 10791.562500.
[21:20:22] Finished Job #12
[21:20:22] Starting job 13,CPU time has been restored to 12175.812500.
[22:05:29] Finished Job #13
[22:05:29] Starting job 14,CPU time has been restored to 14855.796875.
[22:46:15] Finished Job #14
[22:46:15] Starting job 15,CPU time has been restored to 17275.546875.
[23:34:17] Finished Job #15
23:34:23 (1392): called boinc_finish

</stderr_txt>
]]>



Will suck it and see and let them run. smile
[Nov 15, 2010 3:02:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread