Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 6806 times and has 4 replies Next Thread
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7844
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
HPF2 anomaly ???[REDUNDANT]

I apologize if this has posted before but I thought I would mention it.
I had the following work unit:

nh205_ 00017_ 5-- x-8 Pending Validation 4/19/10 11:00:51 4/22/10 10:43:06 11.65 103.5 / 0.0

get hung in one of my systems. It is a system which I only check every couple of days and it is only periodically connected to the internet. When I did check it it showed this work unit at 27% done at 32 hours of run time with 36 hours to finish. I knew this was either wrong or it was a huge work unit. I watched it for a short while and saw no progress. I exited BOINC and restarted my system. After the restart the unit sat at the same 27% done until BOINC finally got its bearings and then it went back to the last checkpoint at 25%, reset its run time to about 3 1/2 hours and showed 5 hours to completion. After this it appeared to run normally, showing continuing progress past the 27% mark, so I left it alone to finish. It appears to be OK at this point. This was on a 3.4ghz HT P4 with Windows XP Service Pack 3. The second jobs running in the other thread experienced no problems. Hope this helps someone else if they experience one of these.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Apr 24, 2010 12:29:58 AM]
[Apr 22, 2010 12:40:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: HPF2 anomaly ???

Start apologizing, Sgt.Joe :D

Yes these occasional infinite loopers with HPF2 [until max runtime exceed is reached] are the occasional time suckers at no progress, an unload of client or science and reload in 100% - 1 of the cases letting them finish properly and validate. I've got hopes they will be a thing of the past when the beta tested version is put in production.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Apr 22, 2010 12:51:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rilian
Veteran Cruncher
Ukraine - we rule!
Joined: Jun 17, 2007
Post Count: 1460
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 anomaly ???

yes there are several of such WUs reported in "no funny" thread
----------------------------------------
[Apr 22, 2010 1:12:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7844
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: HPF2 anomaly ???

Start apologizing, Sgt.Joe :D


OK sorry. I did read the entire "no funny" thread and saw this issue was mentioned in there but briefly, mostly the start and end quickly problem. I did try the suspend option with no change and just thought to try the the quit/reboot option which did work. Just thought the techs might want to know of another one which got into the endless loop business, but did successfully complete after a kick start. 'Nuff said. Perhaps this thread should get locked if it is redundant.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Apr 23, 2010 2:46:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: HPF2 anomaly ???

No worriez Sgt.Joe, there's been an FAQ on this and how to resume without even leaving BOINC, for least loss of up-time. Bad thing of the whole exercise is that on a core 4 with hyper-threading, all 8 jobs start again from last checkpoint.

As noted, I'm in hopes of the v 6.17 science app when released will catch this bug too. Somehow think it is related to the long standing 401 error, which looked in the beta to have been licked for 90% of the cases.

If you wish you can edit the OP title and insert as a first to appear at WCG [REDUNDANT]

:D
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Apr 23, 2010 9:33:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread