Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Discovering Dengue Drugs - Together - Phase 2 Forum Thread: DDDT2 Wu Failures |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 119
|
Author |
|
gb077492
Advanced Cruncher Joined: Dec 24, 2004 Post Count: 96 Status: Offline |
Hi Sek,
I checked the logs and the PV wingman shows: Result Name: ts02_ b483_ sqb000_ 3-- <core_client_version>6.2.28</core_client_version> <![CDATA[ <stderr_txt> INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> ]]> but one of the erroring wingmen shows: Result Name: ts02_ b483_ sqb000_ 2-- <core_client_version>6.2.28</core_client_version> <![CDATA[ <message> Maximum CPU time exceeded </message> <stderr_txt> INFO: No state to restore. Start from the beginning. Copying wcgrestart.rst Copying wcgrestart.rst Copying wcgrestart.rst Copying wcgrestart.rst Copying wcgrestart.rst Copying wcgrestart.rst Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E The other just shows Result Name: ts02_ b483_ sqb000_ 0-- <core_client_version>6.2.28</core_client_version> <![CDATA[ <message> Maximum CPU time exceeded </message> <stderr_txt> INFO: No state to restore. Start from the beginning. Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x75AD22A1 A strange mix, but the "Copying wcgrestart.rst" suggests you're on the right track. Thanks for your comments. I do have LAIM on. I'll bear your restart trick in mind if I do see this again (and my memory is good enough). Mike |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi gb0077492
Caught one more "pr" in the act yesterday. At 4.5 Hours, this time progress was only 10.2% where normally this rig does them in just over 5 hours. After applying "the old trick", described above, the job resumed at 9.5%, and at 5.5 hours then stood at 39%, so decided it was okay and went for shuteye. This morning the product showed as valid: ts02_ b481_ pr23a1_ 1-- 617 Valid 1/29/11 06:43:47 1/29/11 19:23:26 4.26 89.0 / 163.0 ts02_ b481_ pr23a1_ 0-- 617 Valid 1/29/11 06:43:41 2/2/11 02:58:33 8.99 163.0 / 163.0 < moi The log shows the (re)start i.e. 2 were logged. Result Name: ts02_ b481_ pr23a1_ 0-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. Calling gridPlatform.init() Copying wcgrestart.rst called boinc_finish </stderr_txt> ]]> Not saying it works always, but it does for me :D Happy crunching. |
||
|
rilian
Veteran Cruncher Ukraine - we rule! Joined: Jun 17, 2007 Post Count: 1452 Status: Offline Project Badges: |
Got same "grid" error on linux box after 3 hours of crunching
----------------------------------------Result Name: ts02_ b195_ sr67a0_ 0-- <core_client_version>6.10.56</core_client_version> <![CDATA[ <stderr_txt> Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> ]]> ---------------------- And this one on Mac 10.6.6 after 20 minutes: Result Name: ts02_ a360_ pr23a1_ 2-- <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> process exited with code 29 (0x1d, -227) </message> <stderr_txt> Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. CHARGE OUTSIDE INNER GSBP REGION Encountered error. Exiting. </stderr_txt> ]]> |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
rilian, nope, not the same as the symptom discussed in last 5 posts. Your first log is one for a uninterrupted run and surprised of ending in error without sign, the second is an a known older fail 29. These tasks are crashing prematurely. Had one last night that went home in 0.3 hours.
----------------------------------------edit: Logged per BOINCTasks: 6.17 dddt2 ts02_c259_pdb004_0 00:18:20 (00:18:08) 02-02-2011 04:17 02-02-2011 04:19 Reported: Computation error (29,) [Edit 1 times, last edit by Former Member at Feb 2, 2011 10:21:16 AM] |
||
|
gb077492
Advanced Cruncher Joined: Dec 24, 2004 Post Count: 96 Status: Offline |
There's definitely something wacky going on in DDT2-land. I just checked my result status and a fast, remote machine is showing 2 different units (ts02_ c223_ sr02b0_ 0-- and ts02_ c223_ sr78a0_ 1--) that were killed with "Maximum elapsed time exceeded" after nearly 12 hours CPU time. This type of unit normally takes just one hour on this box. All wingmen are still in progress.
|
||
|
rilian
Veteran Cruncher Ukraine - we rule! Joined: Jun 17, 2007 Post Count: 1452 Status: Offline Project Badges: |
SekeRob , thank you for investigation! :)
---------------------------------------- |
||
|
kskjold
Senior Cruncher Norway Joined: May 20, 2008 Post Count: 469 Status: Offline Project Badges: |
I have a repair unit that have ended with one error so far and one in progress.
----------------------------------------ts02_c283_sqa002 The repair unit is estimated to run over 14 hour on a I7 850, and thats a long time. So I wounder, what to do? |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I been getting these failures too on both a XP system and Windows 7 64Bit box.. HPF2 still messes up on Windows 7 compared to XP. hope this project gets stablized and thrown into the mainstream soon.. would be nice to see a steady stream of work flow in and out full time..
|
||
|
kskjold
Senior Cruncher Norway Joined: May 20, 2008 Post Count: 469 Status: Offline Project Badges: |
I have a repair unit that have ended with one error so far and one in progress. ts02_c283_sqa002 The repair unit is estimated to run over 14 hour on a I7 850, and thats a long time. So I wounder, what to do? I aborted this one. It had then been running for over 5 hours and estimated time had raised too over 15 hours. |
||
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges: |
I'm in the process of testing several of the work units mentioned in this thread (and the other thread) to attempt to recreate the problem of very long work units that some users have reported.
Seippel |
||
|
|