| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 3
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello !
----------------------------------------I hoped the new version 5.8.8 of Boinc would solve the problem, but it seems it's still there ! The problem is : each time I put any of my computers in hibernation mode and restart it, this message appear in BOINC and the WU resumes at the last checkpoint, so that I lose CPU time ! It's exactly the same for each and every WU. The message is as follows : "World Community Grid|Task (...) exited with zero status but no 'finished' file If this happens repeatedly you may need to reset the project." Of course resetting the project doesn't work... I have this problem only with WCG, but I've seen on other message boards that some people from other projects sometimes experience the very same problem. I run Rosetta@home and Einstein@home without any problem of this kind ! Maybe it's BOINC related ? Here is the end of the stderr file from my last WU : "No heartbeat from core client for 31 sec - exiting World Community Grid AutoDock (projects/www.worldcommunitygrid.org/wcg_faah_autodock_5.28_windows_intelx86) version Failed to get VersionInfo size: 1812 Failed to get VersionInfo size: 1812 INFO: projects/www.worldcommunitygrid.org/wcg_faah_autodock_5.28_windows_intelx86 Start AutoGrid... AG Check: Found receptor.A.map Beginning AutoDock... INFO: Setting num_generations: 27000 Setting maxGen to 6750 autodock4: WARNING: Unrecognized keyword in docking parameter file, in line: compute_unbound_extended # compute extended ligand energyAbout to enter main loop...(dockings already completed: 26) call_glss(): pop_size: 200 num_evals: 10000000 start: [20:41:41] _maxGenSeenSoFar changed: 6750" Other people with this problem also had the "No heartbeat (...) for 31s - exiting" To be complete : I have the same problem on both machines (laptop and PC), running XP SP2. I hope somebody can help or has an idea on how to solve this problem. Many thanks in advance ! Duanra. [Edit 1 times, last edit by Former Member at Feb 8, 2007 6:08:22 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Anyone ?
|
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
duanra, if u look for the 'zero status' message in previous posts you find the solution(s)... U may appreciate, that answering the same question is pretty tedious. We, the CA's do this for fun and not as paid professionals, though we try to reply as professional as possible, with the occasional quip and banter :P
----------------------------------------I now do 'Suspend' thru the Activity menu prior to hibernation or standby. Rarely the error shows up, but in any case, the work unit completes normal, though some progress may get lost. If u are able to wait until a checkpoint has been saved, least frustration develops on crunch time loss. Considering that one particular box has been up for 14 days now and put in hibernation every night, I'm happy with the methodology. The no heartbeat occurs when the BOINC.exe looses contact with the science. The suspending should prevent that, particular during the hibernation routine. Upon restart and the OS fully up, take BOINC out of suspend. Works for me, the ultra save approach. 5.8.8 is not the answer to that issue and i dont think it will be going away when hibernation or standby is used in the future. cheers
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Feb 8, 2007 6:37:59 PM] |
||
|
|
|