Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Discovering Dengue Drugs - Together - Phase 2 Forum Thread: extremely long running DDD2 w/u |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 37
|
Author |
|
pramo
Veteran Cruncher USA Joined: Dec 14, 2005 Post Count: 703 Status: Offline Project Badges: |
Wondering whether or not to let it run- I can go either way. ts02_c201_sda004
----------------------------------------running 8:07 hours, progress 9.666%. 19:59:51 to go (and rising). Progrees is going up slowly... Wingman not reported in. Other units are starting and finishing around this one. Haven't had one like this for- well, don't remember when. This is the only odd unit out of 53 PV w/u's and 28+ pages of Valid (going back to 6January) on this particular box. Box runs 1 CEP too, no issues with that... [Edit 2 times, last edit by pramo at Feb 4, 2011 12:16:57 AM] |
||
|
Randzo
Senior Cruncher Slovakia Joined: Jan 10, 2008 Post Count: 339 Status: Offline Project Badges: |
Read last 3 posts from this thread:
http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,30200_offset,80 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Just gave up (aborted) on these two after 2hrs with < 1% progress on my core i7-720qm. Other wingers had errors after long runs.
----------------------------------------ts02_c201_sr89a0 ts02_c223_sr34a0 -j [Edit 1 times, last edit by Former Member at Feb 2, 2011 7:00:19 AM] |
||
|
Jack007
Master Cruncher CANADA Joined: Feb 25, 2005 Post Count: 1604 Status: Offline Project Badges: |
Yeah i got one that will prob run into error ville.
----------------------------------------Oh well... gonna let it run, hopefully they can learn from them. |
||
|
wplachy
Senior Cruncher Joined: Sep 4, 2007 Post Count: 423 Status: Offline |
Wondering whether or not to let it run- I can go either way. ts02_c201_sda004 running 8:07 hours, progress 9.666%. 19:59:51 to go (and rising). Progrees is going up slowly... Wingman not reported in. Other units are starting and finishing around this one. Haven't had one like this for- well, don't remember when. This is the only odd unit out of 53 PV w/u's and 28+ pages of Valid (going back to 6January) on this particular box. Box runs 1 CEP too, no issues with that... I've been seeing the same thing. WU runs for 11+ hours showing less than 10% complete then ends with: Unrecoverable error for result ts02_c237_sr89b1_0 (Maximum elapsed time exceeded) at a little over 12 hours CPU time. Based on what I saw a couple of days ago we do get credit for the time and points. Interesting thing is that the "Error" detail shows: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x76F722A1 Engaging BOINC Windows Runtime Debugger... and then a complete dump. The wingmen that errer'ed show the same. One would think if it is a planned end it should not envoke the error handler? Guess the coders could not figure out how to end gracefully so just branched to op code zero Others I've seen in the same condition: In Progress: WU Elapsed(CPU) CPU% Progress% Time Left ts02_c237_sr78b0_0 11:17:11(11:08:24) 98.67 7.166% 1d:05:16:21 Errors today in the same condition: ts02_ c237_ sr56b0_ 0-- 617 Error 1/31/11 23:14:47 2/2/11 05:08:07 12.58 236.9 / 0.0<- 2 wingmen IP ts02_ c237_ sr34b0_ 0-- 617 Error 1/31/11 23:14:12 2/2/11 05:08:07 12.57 236.7 / 0.0<- 2 wingmen IP ts02_ c201_ sr89b1_ 0-- 617 Error 1/31/11 17:16:51 2/2/11 00:08:29 12.56 236.5 / 0.0<- 1 wingman with same - 2 IP From 2 days ago: ts02_ b483_ sqb005_ 1-- 617 Error 1/29/11 07:31:10 1/30/11 12:53:34 12.47 226.8 / 226.8 So I think you should let it continue. Bill P
Bill P
|
||
|
FAHE
Advanced Cruncher Australia Joined: Apr 27, 2007 Post Count: 122 Status: Offline Project Badges: |
I have one of these long WU's too. ts02_c247_sr91a0_1 running 13 hours so far, progress 8.8%, 32 hours to go and rising. My wingman still showing IP as well. Will let it run for a while longer and hopefully it will break out soon.
----------------------------------------Peter |
||
|
bieberj
Senior Cruncher United States Joined: Dec 2, 2004 Post Count: 406 Status: Offline Project Badges: |
I have one as well.... ts02_c283_sdb003_1 - now at 7 hours and seen progress climb from 6.500% to 6.666% as I type this. Might as well as let it run...
----------------------------------------Wonder if they made this one too big. Edit: Decided to follow Sek's advice and restart the WCG client since it has just checkpointed. Will see what happens. Edit <2> Didn't make any difference. Timed out after 12 hours. Result Name: ts02_ c283_ sdb003_ 1-- <core_client_version>6.2.28</core_client_version> <![CDATA[ <message> Maximum CPU time exceeded </message> <stderr_txt> INFO: No state to restore. Start from the beginning. Copying wcgrestart.rst Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.3.3 Dump Timestamp : 02/02/11 15:16:38 Install Directory : C:\Program Files\BOINC\ Data Directory : C:\Documents and Settings\All Users\Application Data\BOINC Project Symstore : Loaded Library : C:\Program Files\BOINC\\dbghelp.dll Loaded Library : C:\Program Files\BOINC\\symsrv.dll Loaded Library : C:\Program Files\BOINC\\srcsrv.dll LoadLibraryA( C:\Program Files\BOINC\\version.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:\Documents and Settings\All Users\Application Data\BOINC\slots\1;C:\Documents and Settings\All Users\Application Data\BOINC\projects\www.worldcommunitygrid.org ModLoad: 00400000 16445000 C:\Documents and Settings\All Users\Application Data\BOINC\projects\www.worldcommunitygrid.org\wcg_dddt2_charmm_6.17_windows_intelx86 (-nosymbols- Symbols Loaded) ModLoad: 7c900000 000b2000 C:\WINDOWS\system32\ntdll.dll (5.1.2600.5755) (-exported- Symbols Loaded) File Version : 5.1.2600.5755 (xpsp_sp3_gdr.090206-1234) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5755 ModLoad: 7c800000 000f6000 C:\WINDOWS\system32\kernel32.dll (5.1.2600.5781) (-exported- Symbols Loaded) File Version : 5.1.2600.5781 (xpsp_sp3_gdr.090321-1317) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5781 ModLoad: 77c00000 00008000 C:\WINDOWS\system32\VERSION.dll (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 7e410000 00091000 C:\WINDOWS\system32\USER32.dll (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 77f10000 00049000 C:\WINDOWS\system32\GDI32.dll (5.1.2600.5698) (-exported- Symbols Loaded) File Version : 5.1.2600.5698 (xpsp_sp3_gdr.081022-1932) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5698 ModLoad: 77dd0000 0009b000 C:\WINDOWS\system32\ADVAPI32.dll (5.1.2600.5755) (-exported- Symbols Loaded) File Version : 5.1.2600.5755 (xpsp_sp3_gdr.090206-1234) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5755 ModLoad: 77e70000 00093000 C:\WINDOWS\system32\RPCRT4.dll (5.1.2600.6015) (-exported- Symbols Loaded) File Version : 5.1.2600.6015 (xpsp_sp3_gdr.100721-1631) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.6015 ModLoad: 77fe0000 00011000 C:\WINDOWS\system32\Secur32.dll (5.1.2600.5834) (-exported- Symbols Loaded) File Version : 5.1.2600.5834 (xpsp_sp3_gdr.090624-1305) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5834 ModLoad: 76c90000 00028000 C:\WINDOWS\system32\imagehlp.dll (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 77c10000 00058000 C:\WINDOWS\system32\msvcrt.dll (7.0.2600.5512) (-exported- Symbols Loaded) File Version : 7.0.2600.5512 (xpsp.080413-2111) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 7.0.2600.5512 ModLoad: 76390000 0001d000 C:\WINDOWS\system32\IMM32.DLL (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 629c0000 00009000 C:\WINDOWS\system32\LPK.DLL (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 74d90000 0006b000 C:\WINDOWS\system32\USP10.dll (1.420.2600.5969) (-exported- Symbols Loaded) File Version : 1.0420.2600.5969 (xpsp_sp3_gdr.100416-1716) Company Name : Microsoft Corporation Product Name : Microsoft(R) Uniscribe Unicode script processor Product Version : 1.0420.2600.5969 ModLoad: 77690000 00021000 C:\WINDOWS\system32\NTMARTA.DLL (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 774e0000 0013e000 C:\WINDOWS\system32\ole32.dll (5.1.2600.6010) (-exported- Symbols Loaded) File Version : 5.1.2600.6010 (xpsp_sp3_gdr.100712-1633) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.6010 ModLoad: 71bf0000 00013000 C:\WINDOWS\system32\SAMLIB.dll (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 76f60000 0002c000 C:\WINDOWS\system32\WLDAP32.dll (5.1.2600.5512) (-exported- Symbols Loaded) File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : MicrosoftWindowsOperating System Product Version : 5.1.2600.5512 ModLoad: 1fb00000 00115000 C:\Program Files\BOINC\dbghelp.dll (6.8.4.0) (-exported- Symbols Loaded) File Version : 6.8.0004.0 (debuggers(dbg).070515-1751) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.8.0004.0 ModLoad: 00390000 00048000 C:\Program Files\BOINC\symsrv.dll (6.8.4.0) (-exported- Symbols Loaded) File Version : 6.8.0004.0 (debuggers(dbg).070515-1751) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.8.0004.0 ModLoad: 1fd20000 0003b000 C:\Program Files\BOINC\srcsrv.dll (6.8.4.0) (-exported- Symbols Loaded) File Version : 6.8.0004.0 (debuggers(dbg).070515-1751) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.8.0004.0 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 37523, Write: 0, Other 175581 - I/O Transfers Counters - Read: 0, Write: 20833, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 382980, QuotaPeakPagedPoolUsage: 383100 QuotaNonPagedPoolUsage: 2368, QuotaPeakNonPagedPoolUsage: 2704 - Virtual Memory Usage - VirtualSize: 538206208, PeakVirtualSize: 538329088 - Pagefile Usage - PagefileUsage: 514895872, PeakPagefileUsage: 514895872 - Working Set Size - WorkingSetSize: 27656192, PeakWorkingSetSize: 27656192, PageFaultCount: 8864 *** Dump of thread ID 7548 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 156250.000000, User Time: 1093750.000000, Wait Time: 49244624.000000 - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E - Registers - eax=00000000 ebx=00000000 ecx=00b2a746 edx=1cd5613c esi=7c802446 edi=00000001 eip=7c90120e esp=1cd5fb9c ebp=1cd5ffec cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 - Callstack - ChildEBP RetAddr Args to Child 1cd5ffec 00000000 0043f280 00000000 00000000 00000000 ntdll!DbgBreakPoint+0x0 *** Dump of thread ID 5916 (state: Running): *** - Information - Status: Base Priority: Above Normal, Priority: Above Normal, , Kernel Time: 30937500.000000, User Time: 211600785408.000000, Wait Time: 49244624.000000 - Registers - eax=124705b8 ebx=124b32c0 ecx=00001b98 edx=0000000b esi=1251b3a8 edi=124e99d0 eip=0092c64f esp=1884d290 ebp=1884d318 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 1884d318 0092c0e7 00000000 80000000 00000000 1884d450 wcg_dddt2_charmm_6!+0x0 1884d480 0092b5c0 1884d550 00a4841b 0ac26d50 11a00860 wcg_dddt2_charmm_6!+0x0 1884d574 009fa9f0 138e7600 1341dc40 135089c0 135f3740 wcg_dddt2_charmm_6!+0x0 1884d808 008b3126 1341dc40 135089c0 135f3740 11915ae0 wcg_dddt2_charmm_6!+0x0 1884da48 00983847 1341dc40 135089c0 135f3740 11915ae0 wcg_dddt2_charmm_6!+0x0 1884e158 0066ee5a 12308fa0 12306808 12304070 133a8580 wcg_dddt2_charmm_6!+0x0 1884ec08 006683d6 00357008 00000018 1884ec9c 12315598 wcg_dddt2_charmm_6!+0x0 1884eebc 00447312 14204000 138e61e0 00001388 00000020 wcg_dddt2_charmm_6!+0x0 1884f048 00445640 00b2b3b0 138e61e0 ffffffff 00445640 wcg_dddt2_charmm_6!+0x0 1884f24c 0042d6e2 00a700a6 00a900a8 00ab00aa 00ad00ac wcg_dddt2_charmm_6!+0x0 1884fee4 00b03052 0000001a 00352c10 00352e28 00000094 wcg_dddt2_charmm_6!+0x0 1884ffc0 7c817077 00592ef0 00000000 7ffde000 805512fa wcg_dddt2_charmm_6!+0x0 1884fff0 00000000 00b02ee6 00000000 00000000 00000000 kernel32!RegisterWaitForInputIdle+0x0 *** Dump of thread ID 5152 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 47528212.000000 - Registers - eax=00b05e35 ebx=000000e0 ecx=1884eab0 edx=00000000 esi=000000e0 edi=00000000 eip=7c90e514 esp=1ed5fef0 ebp=1ed5ff54 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 - Callstack - ChildEBP RetAddr Args to Child 1ed5ff54 7c802542 000000e0 ffffffff 00000000 1ed5ff80 ntdll!KiFastSystemCallRet+0x0 1ed5ff68 00ac5d2f 000000e0 ffffffff 0035f428 0035f428 kernel32!WaitForSingleObject+0x0 1ed5ff80 00b05ea1 00356c38 7c910041 00041318 0035f428 wcg_dddt2_charmm_6!+0x0 1ed5ffb4 7c80b729 0035f428 7c910041 00041318 0035f428 wcg_dddt2_charmm_6!+0x0 1ed5ffec 00000000 00b05e35 0035f428 00000000 00000008 kernel32!GetModuleFileNameA+0x0 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> [Edit 2 times, last edit by bieberj at Feb 2, 2011 8:49:48 PM] |
||
|
pramo
Veteran Cruncher USA Joined: Dec 14, 2005 Post Count: 703 Status: Offline Project Badges: |
I let it run... and it errored
----------------------------------------[ot] I wonder which one will come back first, the original with 8 days remaining or the 4 day rush job? [/ot] ts02_ c201_ sda004_ 2-- - In Progress 2/1/11 21:07:45 2/5/11 21:07:45 0.00 0.0 / 0.0 ts02_ c201_ sda004_ 0-- 617 Error 1/31/11 16:44:34 2/1/11 21:03:51 11.27 214.2 / 0.0 <-me ts02_ c201_ sda004_ 1-- - In Progress 1/31/11 16:44:33 2/10/11 16:44:33 0.00 0.0 / 0.0 Beginning of the error- Result Name: ts02_ c201_ sda004_ 0-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Maximum elapsed time exceeded </message> <stderr_txt> INFO: No state to restore. Start from the beginning. Copying wcgrestart.rst Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x7C81A3E1 Engaging BOINC Windows Runtime Debugger... ******************** <xnip> @Ranzo- Thanks for that- after reading I tried restarting the client (exiting running apps). Tasks picked up at the checkpoint and that w/u still ran like molasses. Other than losing a machine for some reason, the only time I've ever aborted a job is when a tech said to or there was overwhelming evidence posted in the forums ...I figure if a job errors, points don't matter - If the result helps fix something, killing it out of hand seems pointless ;) |
||
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges: |
I'm in the process of testing several of the work units mentioned in this thread (and the other thread) to attempt to recreate the problem of very long work units that some users have reported.
Seippel |
||
|
bieberj
Senior Cruncher United States Joined: Dec 2, 2004 Post Count: 406 Status: Offline Project Badges: |
I'm in the process of testing several of the work units mentioned in this thread (and the other thread) to attempt to recreate the problem of very long work units that some users have reported. Seippel Thanks Seippel! I just updated my post to reflect timeout with a dump. Edit: P.S. As an afterthought, I recall a few years back when we had very long FAAH work units - that took anywhere from 36-80 hours to complete, we have been advised by the CAs to bump up the timeout for the specific work unit by a factor of ten. If I had done this for the task that I reported, it might have finished within 6 days. I don't remember the procedure, but please don't try it without consulting a CA. Even if your job finishes, the rest of your wingmen will error out and you won't have a wingman to validate your work unit so you can earn over 2000 credits. [Edit 1 times, last edit by bieberj at Feb 2, 2011 9:17:39 PM] |
||
|
|