| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 24
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Still getting work that goes in an endless loop.
So FAAH is off the menu for now...cancer won't run on my rigs......are the other projects running normally??? Lost more work than I have actually got done..........back to CP in the meantime |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi vaio,
I just looked at my Results Status page. All the FAAH units on it validated normally. I seem to recall a FAAH unit several days ago that errored out on all units it was sent to, but that was a surprising exception. Rick Alther explained early in the year that many FAAH units would repeatedly cycle back a few percent as an attempt failed. Whatever an 'attempt' is. But apparently it always succeeds eventually. Just let it run and ignore the vacillating progress. It is HPF2 that gives us problems. Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I have had 3 or 4 FAAH workunits that I have aborted in the past couple of days due to percentage complete going back to 0 or less than 10% each time they restart. Not only does the percentage complete decrease, but the CPU time spent crunching drops too.
Having read some of the recent posts, I am now running a policy of aborting WUs where the CPU time does not match reality. Regarding previous answers, if the Progress was being simply reset to the appropriate percentage, the CPU time should still stay the same. On a related (?) note, on many FAAH units I have noticed a slight loss of both CPU time and progress when the WU restarts. The CPU time loss is around 15 seconds, and the Progress lost is around 0.200%. My suggestion is, look in your Messages window to see how long your WU has been running. If that is much longer than the CPU time (e.g. the Messages window restarts suggest your CPU time should be 4 hours but your CPU time is showing as less than an hour, you have a problem. My 2c worth. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I had the same symtomes, 33% done, fallback to 32% and this in an endless loop. After stopping and restarting UD, it started with 0% again. I killed the task (kill with taskmanager WCGrid_AutoDock.exe). I got another HAAH package, but also new software. I hope this will it now, the current prgress is 45%.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I had the same symtomes, 33% done, fallback to 32% and this in an endless loop. After stopping and restarting UD, it started with 0% again. I killed the task (kill with taskmanager WCGrid_AutoDock.exe). I got another HAAH package, but also new software. I hope this will it now, the current prgress is 45%. Same is happening here. Has been doing this for three days now ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Hobwell,
Has been doing this for three days now Do you mean that it is resetting to 0% with each reboot, or that it is stuck in a loop at some point? If you need speed to reach a checkpoint before rebooting, are you running UD with the throttle set to 60%? What is your computer system, what is your clent and what sort of schedule are you running? Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Agent Version : 3.0 (2844)
----------------------------------------Device Id : 316110 The task keeps reaching 33% completed and then resets to 0% and then keeps going up till 33% and then starting again. Running default profile Have not rebooted the machine has been on since 8.00 a.m GMT its now 13.50 GMT and has alread gone back to zero once. [Edit 2 times, last edit by Former Member at Jul 26, 2006 12:47:25 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Hobwell,
This is really unusual. I do not have an explanation. What I would do is reboot, to reset the operating system to a known good state, then use Task Manager to terminate the wcgrid_autodock program in order to download a fresh copy and a new work unit. If problems showed up in the new unit, I would start running the diagnostic programs listed in 'Useful Utilities' at http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=2490 Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Have just received a new work unit and everything is as it should be.
Thanks for your help. Must of been a rogue unit ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I am only getting this problem with the FAAH units, but also crunching other units too. Running BOINC ver 5.4.9 on Windows xp. It seems like the save points/checkpoints/whatever are set too far apart on some of the WUs.
Because WCG is not the only project running, I have 50% of my BOINC time devoted. If the WU won't update and save in one hour, it's never going to update and save, as BOINC switches over to another project at that time. This has only been happening in the past couple of weeks, and my time to finish analysis for the FAAH units is showing at either 6h 13m or 5h 27m depending on machine. Has there been a recent change to how the FAAH unit save points/checkpoints are set? cheers |
||
|
|
|