Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 2
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1175 times and has 1 reply Next Thread
anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
changing heartbeat settings?

I was checking on how one of my system was doing when I got a message that said that something like can't reach boinc heartbeat, and it ask me if I wanted to close boinc manager or cancel. I hit the cancel button a few times before things started working properly. Then I looked under the "Messages" tab of boinc and I noticed that my WUs had to restart from the last checkpoint (Message below). Is there a setting someway (like the config file) that I can change to make it not restart my WUs unless it can't talk to the "core" boinc for 1 minute instead of a few seconds?


Message under the "Message" tab:
Task XCCC exited with zero status but no 'finished file. If this happens repeatedly you may need to reset the project.


I know that this problem has been reported before and it is normal, but I am trying to reduce since CEP2 checkpoints are so far apart.
----------------------------------------

[Mar 18, 2011 3:18:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: changing heartbeat settings?

Nada Niente. Heartbeat (2 ticks per minute) is the key element between the core client and the science app to say the science is alive, not between BOINC Manager and core client, though when BM is trying really hard working at grabbing interrupts, it's not unthinkable that it distracts the CC so much that it causes the rest to fall over.

The cure is in your system, and as Rickjb experienced, the disk subsystem slowness when running multiple CEP2s to cause this too... large amounts of continuous swapping could cause this to include. I'm actually able to let it fall over intentionally, so I dont do certain things unless only HCC / HCMD2 are running.

Other then referring you to various FAQ's it's "the system was too busy" really.

--//--

PS, you're lucky if the task resumes from last checkpoint... often it's task crash. Talk to the developers for a fix... somehow it feels like we've been throwing this about for way way too long and putting it off. Think there are TRAC tickets for this at the Berkeley ... yes a shopping list of them according this search: http://boinc.berkeley.edu/trac/search?q=heartbeat
[Mar 18, 2011 4:21:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread