| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 29
|
|
| Author |
|
|
Viktors
Former World Community Grid Tech Joined: Sep 20, 2004 Post Count: 653 Status: Offline Project Badges:
|
We have made a change in the software to better handle certain conditions that were causing the program to exit with an error and not giving members points. As a result, members may see a one-time, larger download for this update the next time they receive a new work unit from this project. Thank you for your patience with this and thank you for contributing.
|
||
|
|
Flavio Bessa
Advanced Cruncher Brazil Joined: Aug 3, 2007 Post Count: 83 Status: Offline Project Badges:
|
And thanks for updating us!! It´s very good to know that we hopefully won´t have those dreadful errors anymore...
|
||
|
|
teletran
Senior Cruncher Joined: Jul 27, 2005 Post Count: 378 Status: Offline |
Thanks for the update. I was getting errors on the first work units I downloaded but I'll give it a try again.
---------------------------------------- |
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
@Viktors: Thanks for the software upgrade and for the info about it.
----------------------------------------knreed said that the fix is designed to reduce/eliminate failures with Exit code 29 (0x1d). (For non-programmers/non-geeks: In some circles, eg C/C++ programming languages, 0x prefix denotes that a hexadecimal number follows. 1d hex = 29 decimal. I thought it time someone explained this). Many of these Type A WUs have had some instances validate OK while other copies get Exit code 29. Example 1: E000505_095A_003b8700p_2 | Valid | 22/04/09 12:51:53 | 23/04/09 05:39:35 | 4.45 | 68.8 / 68.2 | Mine, CEP v6.31, AMD A64X2 E000505_095A_003b8700p_0 | Valid | 19/04/09 01:37:12 | 21/04/09 07:14:56 | 3.62 | 67.6 / 68.2 | Clean log E000505_095A_003b8700p_1 | Error | 19/04/09 01:36:45 | 22/04/09 12:50:23 | 1.20 | 29.9 / 0.0 | Exit code 29 - NOT mine Example 2: E000508_989A_003c8k00d_2 | Valid | 21/04/09 04:04:40 | 22/04/09 02:59:29 | 7.07 | 122.1 / 119.0 E000508_989A_003c8k00d_0 | Valid | 19/04/09 22:57:32 | 20/04/09 10:51:20 | 7.39 | 115.9 / 119.0 E000508_989A_003c8k00d_1 | Error | 19/04/09 22:57:20 | 21/04/09 04:04:34 | 4.04 | 62.4 / 0.0 Exit code 29 - Mine, AMD A64X2 Example 2 would suggest that I question the integrity of my machine, but it does not give errors running test programs or other WCG projects. And it's happening to other machines (example 1). Has the phenomenon (athlonenon?) of getting different outcomes for one WU on different machines been explained? Does this fix address the problem? Is it possible to give a brief explanation of the problem and the fix? Are random numbers used anywhere in the CEP program, eg for Monte Carlo Method modelling, and if so, is this likely to be the source of legitimate divergence of the computation paths of the WUs? [Edit 5 times, last edit by Rickjb at Apr 23, 2009 6:06:15 AM] |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
Yes, further to Rickjb's comment, I've got a CEP WU that my machine is currently crunching, which I did have doubts as to whether to simply abort or let run... in the end, I decided the latter - although, looking at the following, I did wonder...
----------------------------------------Workunit Status Project Name: The Clean Energy Project Created: 16/04/09 Name: E000496_489A_00397i00t Minimum Quorum: 2 Replication: 2 Result Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit E000496_489A_00397i00t_5-- Pending Validation 21/04/09 02:14:40 22/04/09 02:37:31 7.79 164.0 / 0.0 E000496_489A_00397i00t_4-- Error 20/04/09 12:03:56 21/04/09 02:14:00 1.06 19.3 / 0.0 (0x1d) - exit code 29 (0x1d) E000496_489A_00397i00t_3-- Error 19/04/09 15:13:41 20/04/09 12:03:24 4.22 73.4 / 0.0 (0x1d) - exit code 29 (0x1d) E000496_489A_00397i00t_2-- Error 16/04/09 17:32:25 19/04/09 15:13:25 4.93 97.5 / 0.0 (0x1d) - exit code 29 (0x1d) E000496_489A_00397i00t_0-- Detached 16/04/09 17:19:26 16/04/09 17:30:20 0.00 0.0 / 0.0 E000496_489A_00397i00t_1-- In Progress 16/04/09 16:42:35 26/04/09 16:42:35 0.00 0.0 / 0.0 <- mine, currently processing (this has now finished, successfully - see below) Once my machine has finished crunching this WU, I'll pop back and add in it's result... but I, like Rickjb, would be curious in a simple explanation as to why this is happening... Edit: Well, I'm extremely surprised (and pleased), to record that my WU did eventually finish and validate E000496_ 489A_ 00397i00t_ 1-- Valid 16/04/09 16:42:35 23/04/09 04:49:22 14.20 166.4 / 165.2 <- mine ![]() [Edit 1 times, last edit by gb009761 at Apr 23, 2009 9:30:52 AM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
So let's see, the announcement of a new version is made [Apr 21, 2009 5:37:00 PM]. It's actually version 6.31. Then what I see above is work units with original time stamps from before. Were these processed with version 6.30 or version 6.31?
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 3010 Status: Offline Project Badges:
|
So let's see, the announcement of a new version is made [Apr 21, 2009 5:37:00 PM]. It's actually version 6.31. Then what I see above is work units with original time stamps from before. Were these processed with version 6.30 or version 6.31? Okay, thanks Sek for clarifying this. So, in my case, the WU that's in PV will have been resent as a repair WU with version 6.31, whilst any WU's that are still due to run (or are currently running) against version 6.30, still have a chance of failing with this error... Thus, with me, there's a strong betting that mine will abort the same way ![]() Edit: Well Sekerob, I'm extremely surprised to announce that, my thinking of this WU running with 6.30 code, would abort, was in fact incorrect - it crunched all the way to completion and validated even though 3 others had gone down with this error ![]() I don't know whether the Core Client Version had anything to do with it, although I did notice that the two that validated (WU #'s 1 & 5), ran against 6.2.28, whilst the ones that failed, ran with 5.10.13 (#2), 6.4.7 (#3) and 5.10.45 (#4). ![]() [Edit 1 times, last edit by gb009761 at Apr 23, 2009 9:38:29 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It's actually a petty you can't determine the software version of a workunit anywhere on the website. Would be nice if it could be included somewhere on the results page, or possibly in the log popup.
|
||
|
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges:
|
@Sekerob: The results up to my initial posting time for Example 1 and all of Ex 2 were crunched several days before the upgrade, so they ran v6.30. My copy of Ex 1 ran under v6.31 and validated against copy _1.
----------------------------------------I have also received copy _3 of another WU that has 2x Exit code 29 error results so far. Another test for the v6.31 upgrade - mine crunched OK. @_Sunnyboy_: Each "slot" subdirectory in the BOINC data directory contains a copy of the science program file being used for the WU, but these directories are cleared when the WU run finishes. ... [Off-topic - Suggestion]: The science applications to write their version number in the log file. Would this info not be just as useful as the BOINC client version, which is already there? [Edit 4 times, last edit by Rickjb at Apr 23, 2009 2:24:05 PM] |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
It's actually a petty you can't determine the software version of a workunit anywhere on the website. Would be nice if it could be included somewhere on the results page, or possibly in the log popup. FAQ exists with the effective date of the new version (not the time). For the job running of course the Tasks window of BOINC Manager tells, the Task Manager of the Operating System, The Slot. It's been suggested a few times in past to also include the science version number in the Result Log, same as the client version is recorded there. In a period of transition it's useful as results from new versions in past were found to be slightly different causing validation fail and needing more copies of new version to process before succeeding.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
|