| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 115
|
|
| Author |
|
|
darth_vader
Veteran Cruncher A galaxy far, far away... Joined: Jul 13, 2005 Post Count: 514 Status: Offline Project Badges:
|
I've got a 765 WU where I and two wingmen have all received -161 errors. The WU has been sent out to two additional victims... E000765_ 596C_ 005v04613_ 2-- 632 Error 6/27/09 10:00:56 6/27/09 15:13:34 5.12 58.6 / 0.0 E000765_ 596C_ 005v04613_ 1-- 632 Error 6/26/09 19:24:25 6/27/09 09:19:06 2.34 48.1 / 0.0 E000765_ 596C_ 005v04613_ 0-- 632 Error 6/26/09 19:23:15 6/27/09 10:05:05 2.33 60.0 / 0.0 All the logs look like this: Result Log <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> Calling initGraphics() INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>E000765_596C_005v04613_2_2</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>E000765_596C_005v04613_2_3</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> - D Coincidence. Plz check the result logs of your wingmen. Your error is definitely different from the previous reported: ERR_NOT_FOUND -161 The first 764 was successful here: E000764_ 284C_ 003a08409_ 0-- 632 Valid 26-6-09 09:03:42 27-6-09 03:08:37 7.90 121.0 / 126.5 E000764_ 284C_ 003a08409_ 1-- 632 Valid 26-6-09 09:01:09 27-6-09 17:59:38 8.04 132.0 / 126.5 < mine Both wingmen also got -161 (Note that one of the wingmen has a different BOINC version). <core_client_version>6.5.0</core_client_version> <![CDATA[ <stderr_txt> Calling initGraphics() INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>E000765_596C_005v04613_0_2</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>E000765_596C_005v04613_0_3</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> .... and ..... <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> Calling initGraphics() INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>E000765_596C_005v04613_1_2</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>E000765_596C_005v04613_1_3</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> The system that got the error for my WU has not had any invalid WUs since AC@H, and the last error it got was during the first round or CEP before the temporary shutdown. - D |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Ack!
To all the people who have seen work units from batches 765+ crashing at the end, thank you very very much for posting here. We do watch the forums for signs of trouble, and this is obviously a big problem. The set of molecules starting in batch 765 looks to be the issue. And now for the good news: I have already been working to generate work units to fix this issue. The preliminary calculations that we have to do in-house are almost complete and hopefully the problematic work units will be flushed out of the system soon. Thank you for all the time you spend keeping ALL the projects on WCG going, and especially for letting us know when we need to clean up after problems. Leslie |
||
|
|
Trotador
Senior Cruncher Joined: Mar 26, 2009 Post Count: 154 Status: Offline Project Badges:
|
Hi
----------------------------------------Also 766 units crahing at 100% here. Windows XP, 6.2.28 Good to read that project responsibles got aware. I've cancelled downloaded units and wait for news on solution. regards, Trotador. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Sekerob
----------------------------------------In reply to your comment: "What I'd like to hear is if anyone is finishing 765 or 766 or 767 successfully. Is it just some rogues and of course those most likely coming to the board." My system (i7-920 2.67 GHz HyperThreaded running Vista Home Premiun 64 bit service pack 2, and WCG recommended Bonic 6.2.28) just completed two 767 work units. One is now pending validation: Project Name: The Clean Energy Project Created: 6/24/09 Name: E000767_698C_002x01802 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit E000767_ 698C_ 002x01802_ 1-- 632 Pending Validation 6/27/09 13:37:32 6/27/09 22:44:30 8.72 166.3 / 0.0 However the other has the same error as the 765 and 766 work units Project Name: The Clean Energy Project Created: 6/24/09 Name: E000767_700C_005v0200z Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit E000767_ 700C_ 005v0200z_ 1-- 632 Error 6/27/09 13:37:53 6/27/09 22:44:30 8.74 166.7 / 0.0 It would have been nice to have both come through pending but unfortunately that's not the case. Better let the techs know that there could also be problems with the 767 work units. I am also currently working on a replacement unit for an errored 765 work unit still about 2.5 hours to go, I will advise upon completion. Edit 765 work unit just came back with error [Edit 1 times, last edit by Former Member at Jun 28, 2009 1:45:50 AM] |
||
|
|
RaymondFO
Veteran Cruncher USA Joined: Nov 30, 2004 Post Count: 561 Status: Offline Project Badges:
|
Hello Sekerob,
----------------------------------------In reply to your request, this 766 CEP WU produced a "computational error" message in the BOINC Manager Tasks tab prior to being reported to WCG. Computer was a Windows 7 RC with an Intel quad core 2.66 (Q9400) processor. <core_client_version>6.2.28</core_client_version> <![CDATA[ <stderr_txt> Calling initGraphics() INFO: No state to restore. Start from the beginning. called boinc_finish</stderr_txt> <message> <file_xfer_error> <file_name>E000766_584C_002x0240c_1_0</file_name> <error_code>-131</error_code> </file_xfer_error> </message> ]]> Wingman has not reported, and WU has been sent out to another computer. Hope this helps. Edit: Here is the result from another CEP WU that resulted in an error: <core_client_version>6.2.28</core_client_version> <![CDATA[ <stderr_txt> Calling initGraphics() INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>E000767_854C_003007408_1_0</file_name> <error_code>-131</error_code> </file_xfer_error> </message> ]]> Again a "computational error" message in the BOINC Manager Tasks tab prior to being reported to WCG. This computer is running Windows Vista 64. No more CEP WU's for me until this is resolved. Edit#2: The wingman are also reporting a -131 error code and these specific WU's mentioned above are being sent out to other computers. [Edit 3 times, last edit by RaymondFO at Jun 30, 2009 2:21:22 PM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I had two 765 WUs error out this morning
----------------------------------------<core_client_version>6.2.28</core_client_version> <![CDATA[ <stderr_txt> Calling initGraphics() INFO: No state to restore. Start from the beginning. called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>E000765_552C_005v0020x_0_0</file_name> <error_code>-131</error_code> </file_xfer_error> </message> ]]> AMD dual core, Win7 RC. By the amount of CPU time they accumulated I would say they ran to completion. I currently have two 766 WUs that will finish in 1 to 2 hours. Those will probably error out too. So, a wasted 3/4 day for two cores. NBD. Funny thing is, when I searched for <error_code>-131</error_code> here in the forums, it found no matches. But when I searched for just "error 131" this thread came up on top of the 12000+ matches. Why wouldn't Search find the exact error in the brackets, when I see it posted numerous times, above? From BOINCView Messages tab Output file E000765_552C_005v0020x_0_0 for task E000765_552C_005v0020x_0 exceeds size limit. File size: 22208689.000000 bytes. Limit: 15000000.000000 bytes Output file E000765_833C_003e0430q_1_0 for task E000765_833C_003e0430q_1 exceeds size limit. File size: 35861249.000000 bytes. Limit: 15000000.000000 bytes (Edit - added messages tab lines) Well, task E000766_548C_00300980i_1 finished just fine. (Edit2 - confirm 766 WU finishing with no error) Output file E000767_541C_002y02110_1_0 for task E000767_541C_002y02110_1 exceeds size limit. File size: 34680836.000000 bytes. Limit: 15000000.000000 bytes Output file E000767_688C_003e0380m_0_0 for task E000767_688C_003e0380m_0 exceeds size limit. File size: 26031296.000000 bytes. Limit: 15000000.000000 bytes (Edit3 - Messages tab clips showing 767 WU's errors) Output file E000768_642C_002y0920h_1_0 for task E000768_642C_002y0920h_1 exceeds size limit. File size: 38680736.000000 bytes. Limit: 15000000.000000 bytes (Edit4 - Messages tab clip showing 768 WU error) [Edit 4 times, last edit by Former Member at Jun 28, 2009 10:40:52 PM] |
||
|
|
mreuter80
Advanced Cruncher Joined: Oct 2, 2006 Post Count: 83 Status: Offline Project Badges:
|
Ack! To all the people who have seen work units from batches 765+ crashing at the end, thank you very very much for posting here. We do watch the forums for signs of trouble, and this is obviously a big problem. The set of molecules starting in batch 765 looks to be the issue. And now for the good news: I have already been working to generate work units to fix this issue. The preliminary calculations that we have to do in-house are almost complete and hopefully the problematic work units will be flushed out of the system soon. Thank you for all the time you spend keeping ALL the projects on WCG going, and especially for letting us know when we need to clean up after problems. Leslie Thanks for the quick response. What is your advice on what to do with the WUs that are already in the cache of our machines. Should we let them run or is it safe to say they will error out anyways and can be aborted? Can you confirm that batch 766 and/or any higher batch is also affected? Thanks again. your support is very much appreciated. [Edit 1 times, last edit by mreuter80 at Jun 28, 2009 3:42:42 AM] |
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
As dseto has found, some 765 WUs are completing OK - the wingman's copy of one of "mine" is Pending Validation with a clean log. (E000765_314C_005v06413).
----------------------------------------I had not started that WU before finding this, but I am now running it. I suspended all other 765s and the 768s which comprise the remainder of CEPs in my queue. (Only 1 machine doing CEP - the AMD, which performs relatively well on them). Until instructed otherwise, I will hold back every other CEP until the wingman, who is unlikely to read the forum, returns his copy. If he gets to PV status with a clean log, I will then run my copy. The others I will hold "indefinitely". I have deselected CEP from my device profiles. The micromanagement required is OK for 1 machine, but would be time-consuming for people running a room-full of them. Note Sekerob's plea that we DO NOT ABORT these 765+ WUs yet, because that would cause another copy to be sent to someone else, who would waste their crunching time. Comments/alternative suggestions welcome. [Edit 1 times, last edit by Rickjb at Jun 28, 2009 3:22:12 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello Rickjb
Good luck with your 765 WU, I didn't have any 765 complete ok, check my post it was a 767 that came through ok |
||
|
|
newman3437
Cruncher Joined: Dec 5, 2008 Post Count: 4 Status: Offline Project Badges:
|
Are there already any experience with batch 767 ?
|
||
|
|
|