| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 44
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Patrick - Sorry you felt it necessary to get involved, we all know you have important science to attend to. I mean that seriously and sincerly, I am not trying to be a joker or to tweak anyone up. I really do appreciate the opportunity to participate in your project in what ever small way I can.
The following is only a bit of detail on my thoughts so feel free to stop reading now I fully understand that the error rate is not particularly high so does not warrant spending effort to resolve at this point in time and that WCG is resilent enough to re-process WUs results if necessary so we can keep you filled with valid results. I had decided to ignore the error when it was always happening within a few minutes of starting because the only real effect was that my "reliability rating" took a hit but now there are a couple of examples of this happening after a couple of hours. On my i7 it was just over 2 hours into processing (normally that is about 50% complete), so instead of wasting crunching resources and potentially slowing down the return of valid WU results to you (if enough failures occur then more instances need to be sent out to meet the minimum quorum) I will be unsubscribing to this project. I will be keeping an eye on this forum to see when progress is made in addressing this issue and will rejoin at that time. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sekerob - I was thinking about this a bit and have a question as to the effect of returning "Errors". Is it possible that these "Error" results are making a machine be flagged as suspect and subject to ...
"Each time a dubious result is returned (aborts of work in progress e.g.), the device is drafted for an in-depth physical ;>" from your first post in the thread listed below? http://www.worldcommunitygrid.org/forums/wcg/...thread=24779&offset=0 Could this be the cause of what appears to be a higher than average rate of "Inconclusives" being returned on other projects or am I seeing something that doesn't exist / whining too much ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Snow Crash,
A high number of Inconclusives can be caused by A) previous errors and invalids or B) a fast CPU that is rated reliable and used to double-check slower computers that are rated unreliable. You have to keep track of your machine and its quorums to decide which is happening. Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
If we start from the premiss that the number of HPFII errors are small and only effects the crunchers we may be missing the bigger picture where these failures are now causing uneccesary validation work on other projects, with perhaps not as insignificant an impact as originally suggested.
I think a shortcut to the answer would be to see if the expected average of 1.2 returns for the zero redundacy projects is holding true. These are the results from my PC that are leading me to post my concerns: I am throwing 1-3 errors per day for HPFII. Seeing as this PC averages ~40 results per day across all projects I think I am always on the fence between reliable (15 valid returns in a row) and unreliable. Between FAAH and HFCC I am getting a combined 2-5 per day inconclusive where it is only after my result is returned that a validating unit is sent out to another machine. Also, I am not including the cases when I am sent the validating PC or when there are two units initially sent out. When you take the three scenarios together my Zero Redundancy WUs are ~ 1.4 - 1.5 instead of the expected 1.2. Perhaps I am just on the wrong side of the curve. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
It's an interesting postulation that if a HPF2 regularly fails and the only failing it effecting the oft "inconclusive" marking and extra copies send out for the ZR projects. Let me pass this by the techs, specifically knreed.
----------------------------------------- There is, and you may have seen the comment, an objective to automatically torque off devices from specific projects they have a high failure rate on. That would mitigate the 'extra' inconclusive duplications, if para 1 were fully correct on x-project effects. - Members have been asking for the option to have a secondary set of work if primary set fails. Don't know how that fits into the philosophy of WCG v.v. 'community' and all, but the "alternate" work option I think could do with boxes to opt out of a specific project. Snag is, there are only 4 profiles available, so how do you manage for those with many devices. A juggle for sure and needing deeper analysis as how to implement this if it fits in with the whole. As for Lawrence's comment point B), is not correct, lest I misread what he wrote. The one who's got the 'Inconclusive' listed on the general Results Status page is the one forcing out a second copy to the 'reliable' devices. The other party (the reliables) will just see "In progress" and usually on return an instant "Valid", thus mostly unaware, but looking at the deadline which is currently about 33% of the original deadline. edits: 4, for afterthoughts and augmentation.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 4 times, last edit by Sekerob at Apr 4, 2009 11:12:46 AM] |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sekerob is right, I was not thinking correctly.
|
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
The other party (the reliables) will just see "In progress" and usually on return an instant "Valid", thus mostly unaware, but looking at the deadline which is currently about 33% of the original deadline. And if the reliable cruncher is "covering" a newbie from the beginning of the quorum creation it will be "Pending Validation" as usual if it is first to return. Cheers. Jean. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Think that's a different premiss from what we were discussing on 'inconclusive'. Then up front both show "in progress" on the RS overview page and the one returning the result first seeing PV.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3716 Status: Offline Project Badges:
|
Sorry, it was not obvious for me (so probably also for a few silent readers) that Lawrence's
----------------------------------------"B) a fast CPU that is rated reliable and used to double-check slower computers that are rated unreliable." was excluding the case where both WUs are distributed together from the start. Cheers. Jean. |
||
|
|
mclaver
Veteran Cruncher Joined: Dec 19, 2005 Post Count: 566 Status: Offline Project Badges:
|
I am also getting a lot of errors, the last couple of days, all Quads, all Vista, both Intel and AMD. At least two different error messages.
----------------------------------------Result Name Device Name Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit mi616_ 00017_ 3-- ASUS-i7-965 Error 4/5/09 06:41:47 4/7/09 10:42:48 0.02 0.5 / 0.0 mi616_ 00043_ 6-- ASUS-i7-965 Error 4/5/09 06:24:24 4/7/09 10:27:00 0.02 0.4 / 0.0 mi601_ 00071_ 3-- ASUS-i7-965 Error 4/5/09 02:34:34 4/7/09 09:27:38 0.02 0.5 / 0.0 mi717_ 00036_ 6-- fox-amd-9950 Error 4/6/09 16:52:44 4/7/09 08:10:04 0.02 0.3 / 0.0 mi598_ 00028_ 12-- ASUS-i7-965 Error 4/5/09 01:17:52 4/7/09 06:20:06 0.02 0.4 / 0.0 mi598_ 00008_ 12-- ASUS-i7-965 Error 4/5/09 01:17:52 4/7/09 03:13:21 0.02 0.5 / 0.0 mi699_ 00079_ 10-- fox-amd-9950 Error 4/6/09 11:26:53 4/6/09 22:59:31 0.03 0.5 / 0.0 mi696_ 00027_ 5-- fox-amd-9950 Error 4/6/09 09:55:26 4/6/09 22:57:45 0.01 0.2 / 0.0 mi678_ 00034_ 9-- fox-amd-9950 Error 4/6/09 03:41:49 4/6/09 21:34:22 0.02 0.4 / 0.0 mi570_ 00081_ 15-- GIGA-Q9450 Error 4/4/09 15:49:21 4/6/09 18:03:02 0.02 0.3 / 0.0 mi664_ 00080_ 17-- fox-amd-9950 Error 4/5/09 22:05:33 4/6/09 16:52:44 0.02 0.4 / 0.0 mi647_ 00009_ 12-- fox-amd-9950 Error 4/5/09 16:27:57 4/6/09 13:06:58 0.02 0.3 / 0.0 mi638_ 00013_ 3-- fox-amd-9950 Error 4/5/09 14:00:46 4/6/09 13:06:58 0.01 0.2 / 0.0 mi627_ 00051_ 8-- fox-amd-9950 Error 4/5/09 10:38:30 4/6/09 09:55:26 0.03 0.4 / 0.0 mi651_ 00011_ 12-- MSI-I7-920 Error 4/5/09 17:41:30 4/5/09 19:10:18 0.05 0.4 / 0.0 mi644_ 00036_ 7-- MSI-I7-920 Error 4/5/09 15:49:38 4/5/09 17:56:55 0.02 0.2 / 0.0 mi644_ 00071_ 18-- MSI-I7-920 Error 4/5/09 15:49:38 4/5/09 17:41:29 0.97 8.0 / 0.0 mi630_ 00047_ 17-- MSI-I7-920 Error 4/5/09 11:41:18 4/5/09 15:49:38 0.02 0.2 / 0.0 mi627_ 00085_ 7-- MSI-I7-920 Error 4/5/09 10:59:21 4/5/09 15:49:38 0.05 0.4 / 0.0 mi630_ 00041_ 13-- MSI-I7-920 Error 4/5/09 11:38:07 4/5/09 15:49:38 0.21 1.7 / 0.0 mi600_ 00021_ 18-- MSI-I7-920 Error 4/5/09 01:54:39 4/5/09 15:11:33 4.65 39.0 / 0.0 mi606_ 00002_ 2-- MSI-I7-920 Error 4/5/09 03:56:06 4/5/09 15:00:14 0.15 1.3 / 0.0 mi608_ 00038_ 11-- MSI-I7-920 Error 4/5/09 04:25:00 4/5/09 14:57:35 0.01 0.1 / 0.0 mi598_ 00067_ 5-- MSI-I7-920 Error 4/5/09 02:05:55 4/5/09 14:57:35 3.04 25.5 / 0.0 Result Log <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> SIGSEGV: segmentation violation Stack trace (19 frames): [0x8789e9f] [0x877cfa4] [0xf7f70400] [0x8601f2b] [0x8645a99] [0x864f54c] [0x8654774] [0x866840e] [0x8669dac] [0x843b5d4] [0x870e50b] [0x85e9a87] [0x85eb7c5] [0x805cf24] [0x8331f6b] [0x83f3cdd] [0x83f3f5c] [0x87ed062] [0x8048131] Exiting... </stderr_txt> ]]> Result Log <core_client_version>6.4.7</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> ERROR:: Exit at: .\dock_structure.cc line:401 </stderr_txt> ]]> ![]() ![]() ![]() |
||
|
|
|