Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 37
Posts: 37   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3219 times and has 36 replies Next Thread
pramo
Veteran Cruncher
USA
Joined: Dec 14, 2005
Post Count: 703
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
extremely long running DDD2 w/u

Wondering whether or not to let it run- I can go either way. ts02_c201_sda004
running 8:07 hours, progress 9.666%. 19:59:51 to go (and rising).
Progrees is going up slowly... Wingman not reported in.
Other units are starting and finishing around this one.

Haven't had one like this for- well, don't remember when. This is the only odd unit out of 53 PV w/u's and 28+ pages of Valid (going back to 6January) on this particular box.

Box runs 1 CEP too, no issues with that...
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by pramo at Feb 4, 2011 12:16:57 AM]
[Feb 1, 2011 6:07:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Randzo
Senior Cruncher
Slovakia
Joined: Jan 10, 2008
Post Count: 339
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

[Feb 1, 2011 7:36:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

Just gave up (aborted) on these two after 2hrs with < 1% progress on my core i7-720qm. Other wingers had errors after long runs.

ts02_c201_sr89a0
ts02_c223_sr34a0

-j
----------------------------------------
[Edit 1 times, last edit by Former Member at Feb 2, 2011 7:00:19 AM]
[Feb 2, 2011 4:12:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Jack007
Master Cruncher
CANADA
Joined: Feb 25, 2005
Post Count: 1604
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

Yeah i got one that will prob run into error ville.
Oh well... gonna let it run,
hopefully they can learn from them.
----------------------------------------

[Feb 2, 2011 5:38:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

Wondering whether or not to let it run- I can go either way. ts02_c201_sda004
running 8:07 hours, progress 9.666%. 19:59:51 to go (and rising).
Progrees is going up slowly... Wingman not reported in.
Other units are starting and finishing around this one.

Haven't had one like this for- well, don't remember when. This is the only odd unit out of 53 PV w/u's and 28+ pages of Valid (going back to 6January) on this particular box.

Box runs 1 CEP too, no issues with that...


I've been seeing the same thing. WU runs for 11+ hours showing less than 10% complete then ends with: Unrecoverable error for result ts02_c237_sr89b1_0 (Maximum elapsed time exceeded)

at a little over 12 hours CPU time. Based on what I saw a couple of days ago we do get credit for the time and points.
Interesting thing is that the "Error" detail shows:
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x76F722A1

Engaging BOINC Windows Runtime Debugger...

and then a complete dump. The wingmen that errer'ed show the same. One would think if it is a planned end it should not envoke the error handler? Guess the coders could not figure out how to end gracefully so just branched to op code zero

Others I've seen in the same condition:

In Progress: WU Elapsed(CPU) CPU% Progress% Time Left
ts02_c237_sr78b0_0 11:17:11(11:08:24) 98.67 7.166% 1d:05:16:21

Errors today in the same condition:
ts02_ c237_ sr56b0_ 0-- 617 Error 1/31/11 23:14:47 2/2/11 05:08:07 12.58 236.9 / 0.0<- 2 wingmen IP
ts02_ c237_ sr34b0_ 0-- 617 Error 1/31/11 23:14:12 2/2/11 05:08:07 12.57 236.7 / 0.0<- 2 wingmen IP
ts02_ c201_ sr89b1_ 0-- 617 Error 1/31/11 17:16:51 2/2/11 00:08:29 12.56 236.5 / 0.0<- 1 wingman with same - 2 IP

From 2 days ago:
ts02_ b483_ sqb005_ 1-- 617 Error 1/29/11 07:31:10 1/30/11 12:53:34 12.47 226.8 / 226.8

So I think you should let it continue.

Bill P
----------------------------------------
Bill P

[Feb 2, 2011 6:03:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
FAHE
Advanced Cruncher
Australia
Joined: Apr 27, 2007
Post Count: 122
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I have one of these long WU's too. ts02_c247_sr91a0_1 running 13 hours so far, progress 8.8%, 32 hours to go and rising. My wingman still showing IP as well. Will let it run for a while longer and hopefully it will break out soon.
Peter
----------------------------------------

[Feb 2, 2011 7:26:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I have one as well.... ts02_c283_sdb003_1 - now at 7 hours and seen progress climb from 6.500% to 6.666% as I type this. Might as well as let it run...

Wonder if they made this one too big.

Edit: Decided to follow Sek's advice and restart the WCG client since it has just checkpointed. Will see what happens.

Edit <2> Didn't make any difference. Timed out after 12 hours.

Result Name: ts02_ c283_ sdb003_ 1--
<core_client_version>6.2.28</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Copying wcgrestart.rst


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 6.3.3


Dump Timestamp : 02/02/11 15:16:38
Install Directory : C:\Program Files\BOINC\
Data Directory : C:\Documents and Settings\All Users\Application Data\BOINC
Project Symstore :
Loaded Library : C:\Program Files\BOINC\\dbghelp.dll
Loaded Library : C:\Program Files\BOINC\\symsrv.dll
Loaded Library : C:\Program Files\BOINC\\srcsrv.dll
LoadLibraryA( C:\Program Files\BOINC\\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\Documents and Settings\All Users\Application Data\BOINC\slots\1;C:\Documents and Settings\All Users\Application Data\BOINC\projects\www.worldcommunitygrid.org


ModLoad: 00400000 16445000 C:\Documents and Settings\All Users\Application Data\BOINC\projects\www.worldcommunitygrid.org\wcg_dddt2_charmm_6.17_windows_intelx86 (-nosymbols- Symbols Loaded)

ModLoad: 7c900000 000b2000 C:\WINDOWS\system32\ntdll.dll (5.1.2600.5755) (-exported- Symbols Loaded)
File Version : 5.1.2600.5755 (xpsp_sp3_gdr.090206-1234)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5755

ModLoad: 7c800000 000f6000 C:\WINDOWS\system32\kernel32.dll (5.1.2600.5781) (-exported- Symbols Loaded)
File Version : 5.1.2600.5781 (xpsp_sp3_gdr.090321-1317)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5781

ModLoad: 77c00000 00008000 C:\WINDOWS\system32\VERSION.dll (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 7e410000 00091000 C:\WINDOWS\system32\USER32.dll (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 77f10000 00049000 C:\WINDOWS\system32\GDI32.dll (5.1.2600.5698) (-exported- Symbols Loaded)
File Version : 5.1.2600.5698 (xpsp_sp3_gdr.081022-1932)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5698

ModLoad: 77dd0000 0009b000 C:\WINDOWS\system32\ADVAPI32.dll (5.1.2600.5755) (-exported- Symbols Loaded)
File Version : 5.1.2600.5755 (xpsp_sp3_gdr.090206-1234)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5755

ModLoad: 77e70000 00093000 C:\WINDOWS\system32\RPCRT4.dll (5.1.2600.6015) (-exported- Symbols Loaded)
File Version : 5.1.2600.6015 (xpsp_sp3_gdr.100721-1631)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.6015

ModLoad: 77fe0000 00011000 C:\WINDOWS\system32\Secur32.dll (5.1.2600.5834) (-exported- Symbols Loaded)
File Version : 5.1.2600.5834 (xpsp_sp3_gdr.090624-1305)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5834

ModLoad: 76c90000 00028000 C:\WINDOWS\system32\imagehlp.dll (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 77c10000 00058000 C:\WINDOWS\system32\msvcrt.dll (7.0.2600.5512) (-exported- Symbols Loaded)
File Version : 7.0.2600.5512 (xpsp.080413-2111)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 7.0.2600.5512

ModLoad: 76390000 0001d000 C:\WINDOWS\system32\IMM32.DLL (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 629c0000 00009000 C:\WINDOWS\system32\LPK.DLL (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2105)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 74d90000 0006b000 C:\WINDOWS\system32\USP10.dll (1.420.2600.5969) (-exported- Symbols Loaded)
File Version : 1.0420.2600.5969 (xpsp_sp3_gdr.100416-1716)
Company Name : Microsoft Corporation
Product Name : Microsoft(R) Uniscribe Unicode script processor
Product Version : 1.0420.2600.5969

ModLoad: 77690000 00021000 C:\WINDOWS\system32\NTMARTA.DLL (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 774e0000 0013e000 C:\WINDOWS\system32\ole32.dll (5.1.2600.6010) (-exported- Symbols Loaded)
File Version : 5.1.2600.6010 (xpsp_sp3_gdr.100712-1633)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.6010

ModLoad: 71bf0000 00013000 C:\WINDOWS\system32\SAMLIB.dll (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 76f60000 0002c000 C:\WINDOWS\system32\WLDAP32.dll (5.1.2600.5512) (-exported- Symbols Loaded)
File Version : 5.1.2600.5512 (xpsp.080413-2113)
Company Name : Microsoft Corporation
Product Name : MicrosoftWindowsOperating System
Product Version : 5.1.2600.5512

ModLoad: 1fb00000 00115000 C:\Program Files\BOINC\dbghelp.dll (6.8.4.0) (-exported- Symbols Loaded)
File Version : 6.8.0004.0 (debuggers(dbg).070515-1751)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version : 6.8.0004.0

ModLoad: 00390000 00048000 C:\Program Files\BOINC\symsrv.dll (6.8.4.0) (-exported- Symbols Loaded)
File Version : 6.8.0004.0 (debuggers(dbg).070515-1751)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version : 6.8.0004.0

ModLoad: 1fd20000 0003b000 C:\Program Files\BOINC\srcsrv.dll (6.8.4.0) (-exported- Symbols Loaded)
File Version : 6.8.0004.0 (debuggers(dbg).070515-1751)
Company Name : Microsoft Corporation
Product Name : Debugging Tools for Windows(R)
Product Version : 6.8.0004.0



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 37523, Write: 0, Other 175581

- I/O Transfers Counters -
Read: 0, Write: 20833, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 382980, QuotaPeakPagedPoolUsage: 383100
QuotaNonPagedPoolUsage: 2368, QuotaPeakNonPagedPoolUsage: 2704

- Virtual Memory Usage -
VirtualSize: 538206208, PeakVirtualSize: 538329088

- Pagefile Usage -
PagefileUsage: 514895872, PeakPagefileUsage: 514895872

- Working Set Size -
WorkingSetSize: 27656192, PeakWorkingSetSize: 27656192, PageFaultCount: 8864

*** Dump of thread ID 7548 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 156250.000000, User Time: 1093750.000000, Wait Time: 49244624.000000

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E

- Registers -
eax=00000000 ebx=00000000 ecx=00b2a746 edx=1cd5613c esi=7c802446 edi=00000001
eip=7c90120e esp=1cd5fb9c ebp=1cd5ffec
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
1cd5ffec 00000000 0043f280 00000000 00000000 00000000 ntdll!DbgBreakPoint+0x0

*** Dump of thread ID 5916 (state: Running): ***

- Information -
Status: Base Priority: Above Normal, Priority: Above Normal, , Kernel Time: 30937500.000000, User Time: 211600785408.000000, Wait Time: 49244624.000000

- Registers -
eax=124705b8 ebx=124b32c0 ecx=00001b98 edx=0000000b esi=1251b3a8 edi=124e99d0
eip=0092c64f esp=1884d290 ebp=1884d318
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
1884d318 0092c0e7 00000000 80000000 00000000 1884d450 wcg_dddt2_charmm_6!+0x0
1884d480 0092b5c0 1884d550 00a4841b 0ac26d50 11a00860 wcg_dddt2_charmm_6!+0x0
1884d574 009fa9f0 138e7600 1341dc40 135089c0 135f3740 wcg_dddt2_charmm_6!+0x0
1884d808 008b3126 1341dc40 135089c0 135f3740 11915ae0 wcg_dddt2_charmm_6!+0x0
1884da48 00983847 1341dc40 135089c0 135f3740 11915ae0 wcg_dddt2_charmm_6!+0x0
1884e158 0066ee5a 12308fa0 12306808 12304070 133a8580 wcg_dddt2_charmm_6!+0x0
1884ec08 006683d6 00357008 00000018 1884ec9c 12315598 wcg_dddt2_charmm_6!+0x0
1884eebc 00447312 14204000 138e61e0 00001388 00000020 wcg_dddt2_charmm_6!+0x0
1884f048 00445640 00b2b3b0 138e61e0 ffffffff 00445640 wcg_dddt2_charmm_6!+0x0
1884f24c 0042d6e2 00a700a6 00a900a8 00ab00aa 00ad00ac wcg_dddt2_charmm_6!+0x0
1884fee4 00b03052 0000001a 00352c10 00352e28 00000094 wcg_dddt2_charmm_6!+0x0
1884ffc0 7c817077 00592ef0 00000000 7ffde000 805512fa wcg_dddt2_charmm_6!+0x0
1884fff0 00000000 00b02ee6 00000000 00000000 00000000 kernel32!RegisterWaitForInputIdle+0x0

*** Dump of thread ID 5152 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 47528212.000000

- Registers -
eax=00b05e35 ebx=000000e0 ecx=1884eab0 edx=00000000 esi=000000e0 edi=00000000
eip=7c90e514 esp=1ed5fef0 ebp=1ed5ff54
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
1ed5ff54 7c802542 000000e0 ffffffff 00000000 1ed5ff80 ntdll!KiFastSystemCallRet+0x0
1ed5ff68 00ac5d2f 000000e0 ffffffff 0035f428 0035f428 kernel32!WaitForSingleObject+0x0
1ed5ff80 00b05ea1 00356c38 7c910041 00041318 0035f428 wcg_dddt2_charmm_6!+0x0
1ed5ffb4 7c80b729 0035f428 7c910041 00041318 0035f428 wcg_dddt2_charmm_6!+0x0
1ed5ffec 00000000 00b05e35 0035f428 00000000 00000008 kernel32!GetModuleFileNameA+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>
----------------------------------------
[Edit 2 times, last edit by bieberj at Feb 2, 2011 8:49:48 PM]
[Feb 2, 2011 12:42:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
pramo
Veteran Cruncher
USA
Joined: Dec 14, 2005
Post Count: 703
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I let it run... and it errored
[ot] I wonder which one will come back first, the original with 8 days remaining or the 4 day rush job? [/ot]

ts02_ c201_ sda004_ 2-- - In Progress 2/1/11 21:07:45 2/5/11 21:07:45 0.00 0.0 / 0.0
ts02_ c201_ sda004_ 0-- 617 Error 1/31/11 16:44:34 2/1/11 21:03:51 11.27 214.2 / 0.0 <-me
ts02_ c201_ sda004_ 1-- - In Progress 1/31/11 16:44:33 2/10/11 16:44:33 0.00 0.0 / 0.0

Beginning of the error-
Result Name: ts02_ c201_ sda004_ 0--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Copying wcgrestart.rst
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C81A3E1
Engaging BOINC Windows Runtime Debugger...
********************
<xnip>

@Ranzo- Thanks for that- after reading I tried restarting the client (exiting running apps). Tasks picked up at the checkpoint and that w/u still ran like molasses.
Other than losing a machine for some reason, the only time I've ever aborted a job is when a tech said to or there was overwhelming evidence posted in the forums ...I figure if a job errors, points don't matter - If the result helps fix something, killing it out of hand seems pointless ;)
----------------------------------------

[Feb 2, 2011 12:45:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
seippel
Former World Community Grid Tech
Joined: Apr 16, 2009
Post Count: 392
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I'm in the process of testing several of the work units mentioned in this thread (and the other thread) to attempt to recreate the problem of very long work units that some users have reported.

Seippel
[Feb 2, 2011 8:49:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
bieberj
Senior Cruncher
United States
Joined: Dec 2, 2004
Post Count: 406
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: extremely long running w/u

I'm in the process of testing several of the work units mentioned in this thread (and the other thread) to attempt to recreate the problem of very long work units that some users have reported.

Seippel


Thanks Seippel! I just updated my post to reflect timeout with a dump.

Edit: P.S. As an afterthought, I recall a few years back when we had very long FAAH work units - that took anywhere from 36-80 hours to complete, we have been advised by the CAs to bump up the timeout for the specific work unit by a factor of ten. If I had done this for the task that I reported, it might have finished within 6 days.

I don't remember the procedure, but please don't try it without consulting a CA. Even if your job finishes, the rest of your wingmen will error out and you won't have a wingman to validate your work unit so you can earn over 2000 credits. sad
----------------------------------------
[Edit 1 times, last edit by bieberj at Feb 2, 2011 9:17:39 PM]
[Feb 2, 2011 8:54:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 37   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread