Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Beta Testing Forum: Beta Test Support Forum Thread: New Beta Test starting Nov 4, 2013 [ Issues Thread ] Version 7.21 |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 152
|
Author |
|
gomeyer
Senior Cruncher USA Joined: Jul 11, 2008 Post Count: 161 Status: Offline Project Badges: |
WU BETA_BETA_9999984_0498a_0 hung at just over 5% indicated. I let it run for over 2 hours, then stopped BOINC and restarted it. The WU immediately errored out.
----------------------------------------The error message under Results Status is very lengthy but begins: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7678FC16 Engaging BOINC Windows Runtime Debugger... I'm guessing you can read the rest if you need to. This is a Windows Vista Home Premium 32 bit machine. [Edit 1 times, last edit by gomeyer at Nov 5, 2013 2:06:50 AM] |
||
|
slakin
Advanced Cruncher Joined: Jul 4, 2008 Post Count: 79 Status: Offline Project Badges: |
I had a laptop with 4 BETA WU's that were continually restarting after just over 2 minutes of running, they did not progress beyond 0% completed, tried suspend/restart with no LAIM, same result.
I bumped the CPU usage to 100% and after each restarted again they began to progress and so far seem to be running normally, hopefully this helps to identify the cause of the restart error!!! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've got one Beta WU that has been restarting roughly every 4 minutes for a couple hours now. I tried to suspend/resume it, but it didn't take. My other three cores are OK running FAAH WUs, but the one Beta WU just seems to spin its wheels at .50% CPU done. Any suggestions? Do I just let it spin for a few hours more and see that happens? Or try some setting changes to get it moving along? This being my first run with Beta WUs, I just want to make sure I get the most help out of what is running/cycling here to help out the cause.... By the way I'm running Windows 7 on my laptop if that info helps.... When I ran into this problem, I did an Abort for all tasks having the problem. I have gotten a bunch of WUs today so I will keep an eye on them to make sure they do not get into the multiple restart problem. When I first saw this my machine had been running over 24 hours and restarting the task after 2 to 3 minutes. |
||
|
ashrader330
Advanced Cruncher Joined: Jan 6, 2008 Post Count: 97 Status: Offline Project Badges: |
I just ran into an issue where a work unit was over 80% done restart back to 0.5%. I accidentally logged out without shutting everything down cleanly. When I brought everything back up, the work unit jumped back to 0.5%. I have just check the properties and it said it had checkpointed within the last 10 minutes. The work unit had been running long (>7 hours while the rest have been in the 2-4 hour range). I am running OpenSUSE 12.3 64 bit. The 7 other CEP2 work units that were running at the same time picked up right from the last check point without any issue.
----------------------------------------EDIT: As soon as I hit submit, the progress bar jumped back up to 84%. It said it was at 0.5% for about 5 minutes or so. Other than running long and this weird progress bar refresh issue, it seems back on track. Run time: 4.2y HPF2, 6.9y FAAH, 7.9y HFCC, 20.8y HCC, 26.0y CEP2, 26.0y MCM, 2.1y UGM, 2.0y OET WU: 4.8k HPF2, 12.3k FAAH, 12.4k HFCC, 135k HCC, 34.3k CEP2, 43.8k MCM, 4.2k UGM, 19.7k OET [Edit 2 times, last edit by ashrader330 at Nov 5, 2013 2:54:10 AM] |
||
|
yoro42
Ace Cruncher United States Joined: Feb 19, 2011 Post Count: 8976 Status: Offline Project Badges: |
Current status of 43 WU
----------------------------------------3 In Progress 7 Error - Maximum elapsed time exceeded 3 Error - upload failure: <file_xfer_error> <error_code>-131</error_code> 8 Valid 22 Pending Validation |
||
|
gomeyer
Senior Cruncher USA Joined: Jul 11, 2008 Post Count: 161 Status: Offline Project Badges: |
Another wierd happening. A Win Vista Home Basic 32 bit machine completed 4 BETA's and the DCF jumped to over 48. All remaining FAAH WU's are showing 126 hours to completion and of course everything is running in panic mode.
----------------------------------------The DCF will correct itself of course or I can fix it manually, but you might want to check that out. [Edit 1 times, last edit by gomeyer at Nov 5, 2013 4:08:09 AM] |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Paraphrasing Knreed, the first run of betas were extremely over estimated. Assuming the unit you just completed was from that batch, it's already understood. The beta 7.19 are the issue, 7.21 should not be as bad.
----------------------------------------Distributed computing volunteer since September 27, 2000 |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7546 Status: Offline Project Badges: |
BETA_ BETA_ 9999985_ 0628a_ 1-- HomeComputer Error 11/4/13 19:08:13 11/5/13 02:54:17 0.54 / 0.54 6.7 / 0.0
----------------------------------------Result Log Result Name: BETA_ BETA_ 9999985_ 0628a_ 1-- <core_client_version>6.2.28</core_client_version> <![CDATA[ <message> - exit code -529697949 (0xe06d7363) </message> <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_windows_intelx86 -SettingsFile BETA_9999985_0628a.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 750000 Running Commandline = projects/www.worldcommunitygrid.org/wcgrid_beta17_7.21_windows_intelx86 -SettingsFile BETA_9999985_0628a.txt -DatabaseFile dataset-GDS2771-v1.txt Initializing wcg_learn_limit = 750000 Running Unhandled Exception Detected... - Unhandled Exception Record - Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7563C41F
Sgt. Joe
*Minnesota Crunchers* |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The two BETA tasks I received for my laptop had an estimated run time of a little more than 4 minutes. I suspended my FAAH work so I could follow the execution of these tasks. Both tasks appeared to have the restart problem. I am pasting the two messages for that WU as I did not allow the WUs to restart more than once:
11/4/2013 7:56:56 PM World Community Grid Starting BETA_BETA_9999985_0055a_1 11/4/2013 7:56:56 PM World Community Grid Starting task BETA_BETA_9999985_0055a_1 using beta17 version 721 11/4/2013 8:00:26 PM World Community Grid Restarting task BETA_BETA_9999985_0055a_1 using beta17 version 721 |
||
|
gomeyer
Senior Cruncher USA Joined: Jul 11, 2008 Post Count: 161 Status: Offline Project Badges: |
Paraphrasing Knreed, the first run of betas were extremely over estimated. Assuming the unit you just completed was from that batch, it's already understood. The beta 7.19 are the issue, 7.21 should not be as bad. Thanks KWSN - A Shrubbery. The other machines with BETA's didn't do that and I should have mentioned that this machine is running an older version of BOINC 6.10.58 which MAY make a difference. or not. |
||
|
|