| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 2
|
|
| Author |
|
|
anhhai
Veteran Cruncher Joined: Mar 22, 2005 Post Count: 839 Status: Offline Project Badges:
|
I was checking on how one of my system was doing when I got a message that said that something like can't reach boinc heartbeat, and it ask me if I wanted to close boinc manager or cancel. I hit the cancel button a few times before things started working properly. Then I looked under the "Messages" tab of boinc and I noticed that my WUs had to restart from the last checkpoint (Message below). Is there a setting someway (like the config file) that I can change to make it not restart my WUs unless it can't talk to the "core" boinc for 1 minute instead of a few seconds?
----------------------------------------Message under the "Message" tab: Task XCCC exited with zero status but no 'finished file. If this happens repeatedly you may need to reset the project. I know that this problem has been reported before and it is normal, but I am trying to reduce since CEP2 checkpoints are so far apart. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Nada Niente. Heartbeat (2 ticks per minute) is the key element between the core client and the science app to say the science is alive, not between BOINC Manager and core client, though when BM is trying really hard working at grabbing interrupts, it's not unthinkable that it distracts the CC so much that it causes the rest to fall over.
The cure is in your system, and as Rickjb experienced, the disk subsystem slowness when running multiple CEP2s to cause this too... large amounts of continuous swapping could cause this to include. I'm actually able to let it fall over intentionally, so I dont do certain things unless only HCC / HCMD2 are running. Other then referring you to various FAQ's it's "the system was too busy" really. --//-- PS, you're lucky if the task resumes from last checkpoint... often it's task crash. Talk to the developers for a fix... somehow it feels like we've been throwing this about for way way too long and putting it off. Think there are TRAC tickets for this at the Berkeley ... yes a shopping list of them according this search: http://boinc.berkeley.edu/trac/search?q=heartbeat |
||
|
|
|