Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 37
Posts: 37   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3220 times and has 36 replies Next Thread
Sandoor
Cruncher
Canada
Joined: May 22, 2008
Post Count: 8
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

Found one at 13h this morning, after reboot cpu time dropped to 49 min. Work unit completed in regular time, now in PV with wingman erroring at 30h
[Feb 2, 2011 9:46:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JollyJimmy
Advanced Cruncher
USA
Joined: Aug 23, 2005
Post Count: 115
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I'm in the process of testing several of the work units mentioned in this thread (and the other thread) to attempt to recreate the problem of very long work units that some users have reported.

Seippel
Thanks for checking into this, Seippel.
I've got one too:
Reports 3.5% done after almost 6h CPU and 16h remaining.

While you are crunching on bugs (tastey!), any advice? "Keep 'er running" or "abort and abandon"?
Would hate to time out after 12h or dump even 22h into an error.

Edit - The same task is now reporting 4.8% after almost 8h CPU and over 22h remaining!! shock
I sure hope the task is not going fungal by simulating the growth of mushrooms.
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by JollyJimmy at Feb 3, 2011 8:02:25 PM]
[Feb 3, 2011 5:27:17 PM]   Link   Report threatening or abusive post: please login first  Go to top 
GB033533
Senior Cruncher
UK
Joined: Dec 8, 2004
Post Count: 198
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I too have had a couple of these bad boys;

ts02_ c283_ sr67b1_ 1-- IBM-60B0387EC84 Error 2/1/11 13:53:51 2/3/11 18:14:30 13.13 207.8 / 0.0
ts02_ c283_ sr78b0_ 0-- IBM-60B0387EC84 Error 2/1/11 13:53:51 2/3/11 13:04:32 13.09 204.3 / 0.0

both with <message>Maximum elapsed time exceeded</message>

Am I likely to get any credit for these? The worrying thing is that the replacement wingman for the second one completed in a normal time of 1.35 hours;

ts02_ c283_ sr78b0_ 2-- 617 Pending Validation 2/3/11 13:27:40 2/3/11 16:10:39 1.35 23.7 / 0.0

Nothing yet from the original wingmen.
----------------------------------------

[Feb 3, 2011 6:25:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
verheyde
Cruncher
Belgium
Joined: Dec 7, 2004
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

Got one of those running.. It ran for > 16h CPU already and is only at 6.66% :
ts02_c395_sr45b1_0

I'll leave it running for now. (and sent info to support).
[Feb 3, 2011 9:56:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I have also had a long running work unit:
ts02_c395_sr45b1_3
It was 2.33% complete after 2hours 48min when they usually only take 1.5 hours.
I tried a re-boot of the PC: the work unit restarted and even checkpointed but so slowly that the time to completion was over 10 hours and increasing.
I looked around and found the wcg_checkpoint_00.ckp file. In this I noticed messages of the type:
EIPHIFS: WARNING. dihedral 5 is almost linear.
derivatives may be affected for atoms: 11 13 12 16
EIPHIFS> Total of 17 WARNINGs issued.

and

EPHI: WARNING. dihedral 9 is almost linear.
derivatives may be affected for atoms: 2 56 3 36
TOTAL OF 135 WARNINGS FROM EPHI

I did not see similar warnings in sr units which ran normally.
The above may or may not be relevent to the problem.
I have aborted the work unit. The wingmen's results were in progress, user aborted, error (max cpu exceeded).
[Feb 4, 2011 10:35:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JollyJimmy
Advanced Cruncher
USA
Joined: Aug 23, 2005
Post Count: 115
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

The task I noted yesterday has timed out.
Here are the details:
Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time (hours)
ts02_ b483_ sr02b0_ 1-- 617 Error 1/29/11 07:50:34 2/4/11 05:22:20 15.19


Result Log

Result Name: ts02_ b483_ sr02b0_ 1--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x77540004

Engaging BOINC Windows Runtime Debugger...
[snip]
----------------------------------------
[Feb 4, 2011 12:54:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
ov7
Cruncher
Joined: May 14, 2009
Post Count: 15
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I have another one : ts02_c432_pda0004
It has been calculating for 17 hours, 8 remaining and 0.000% complete !
BOINC Manager 6.2.28

I think I will shoot it...
[Feb 4, 2011 1:13:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

Plz do reboot the client, soft-boot of course. For some, including me it kicked the task out of an endless loop and let them finish normal.

--//--
[Feb 4, 2011 1:19:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Powhatan
Advanced Cruncher
Joined: Oct 20, 2009
Post Count: 58
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

While you are crunching on bugs (tastey!), any advice? "Keep 'er running" or "abort and abandon"?
Would hate to time out after 12h or dump even 22h into an error.
Didn't see a response to this. I've had 3 WUs like these go to 12.5 hours then Error and I have 3 more that look like they are going to do the same. If it's a waste of CPU cycles, I'd like to abort them. I've reboot, but problem persists. The device is Win 7 x64, my other devices x64 and x86 do not have this problem.

ts02_ c283_ sqa008_ 0-- patawomeck Error 2/1/11 13:30:27 2/3/11 23:40:48 12.56 262.6 / 0.0
ts02_ c283_ sqa003_ 1-- patawomeck Error 2/1/11 13:30:07 2/3/11 15:26:33 12.67 264.9 / 0.0
ts02_ c283_ sda002_ 0-- patawomeck Error 2/1/11 13:13:05 2/3/11 15:26:33 12.52 261.8 / 0.0

ts02_ c283_ sr45a1_ 0-- patawomeck In Progress 2/1/11 13:52:50 2/11/11 13:52:50 0.00 0.0 / 0.0
ts02_ c283_ sr45a0_ 1-- patawomeck In Progress 2/1/11 13:52:50 2/11/11 13:52:50 0.00 0.0 / 0.0
ts02_ c284_ sr34b1_ 1-- patawomeck In Progress 2/1/11 13:55:45 2/11/11 13:55:45 0.00 0.0 / 0.0
[Feb 4, 2011 2:50:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

There was no ''official'' response what to do. The techs have a range of reference WU names to work with to see if they can reproduce this in the labs. Predominantly it seems to hit the 's' types, but had a 'p' myself that did this. Maybe the techs can teach the validator to look for that "Maximum elapsed time exceeded" line in the Result log and then take these automatically out of circulation till a fix is in place.

If your wingmen are OK and your device makes an above average number, only had 2 on about 250 on my quad ofwhich 1 finished properly after restart, then your particular device could be of interest, so you might want to post the startup piece of the message log in order that we can see full setup. Certainly if the wingmen are OK, I'd abort them overlong running unless instructed otherwise for diagnostic purposes.

--//--
[Feb 4, 2011 3:15:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 37   Pages: 4   [ Previous Page | 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread