| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 5
|
|
| Author |
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
I apologize if this has posted before but I thought I would mention it.
----------------------------------------I had the following work unit: nh205_ 00017_ 5-- x-8 Pending Validation 4/19/10 11:00:51 4/22/10 10:43:06 11.65 103.5 / 0.0 get hung in one of my systems. It is a system which I only check every couple of days and it is only periodically connected to the internet. When I did check it it showed this work unit at 27% done at 32 hours of run time with 36 hours to finish. I knew this was either wrong or it was a huge work unit. I watched it for a short while and saw no progress. I exited BOINC and restarted my system. After the restart the unit sat at the same 27% done until BOINC finally got its bearings and then it went back to the last checkpoint at 25%, reset its run time to about 3 1/2 hours and showed 5 hours to completion. After this it appeared to run normally, showing continuing progress past the 27% mark, so I left it alone to finish. It appears to be OK at this point. This was on a 3.4ghz HT P4 with Windows XP Service Pack 3. The second jobs running in the other thread experienced no problems. Hope this helps someone else if they experience one of these. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Apr 24, 2010 12:29:58 AM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Start apologizing, Sgt.Joe :D
----------------------------------------Yes these occasional infinite loopers with HPF2 [until max runtime exceed is reached] are the occasional time suckers at no progress, an unload of client or science and reload in 100% - 1 of the cases letting them finish properly and validate. I've got hopes they will be a thing of the past when the beta tested version is put in production.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
rilian
Veteran Cruncher Ukraine - we rule! Joined: Jun 17, 2007 Post Count: 1460 Status: Offline Project Badges:
|
yes there are several of such WUs reported in "no funny" thread
---------------------------------------- |
||
|
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7846 Status: Offline Project Badges:
|
Start apologizing, Sgt.Joe :D OK sorry. I did read the entire "no funny" thread and saw this issue was mentioned in there but briefly, mostly the start and end quickly problem. I did try the suspend option with no change and just thought to try the the quit/reboot option which did work. Just thought the techs might want to know of another one which got into the endless loop business, but did successfully complete after a kick start. 'Nuff said. Perhaps this thread should get locked if it is redundant. Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
No worriez Sgt.Joe, there's been an FAQ on this and how to resume without even leaving BOINC, for least loss of up-time. Bad thing of the whole exercise is that on a core 4 with hyper-threading, all 8 jobs start again from last checkpoint.
----------------------------------------As noted, I'm in hopes of the v 6.17 science app when released will catch this bug too. Somehow think it is related to the long standing 401 error, which looked in the beta to have been licked for 90% of the cases. If you wish you can edit the OP title and insert as a first to appear at WCG [REDUNDANT] :D
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
|