Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 44
Posts: 44   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 62504 times and has 43 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

Hi Patrick - Sorry you felt it necessary to get involved, we all know you have important science to attend to. I mean that seriously and sincerly, I am not trying to be a joker or to tweak anyone up. I really do appreciate the opportunity to participate in your project in what ever small way I can.

The following is only a bit of detail on my thoughts so feel free to stop reading now smile

I fully understand that the error rate is not particularly high so does not warrant spending effort to resolve at this point in time and that WCG is resilent enough to re-process WUs results if necessary so we can keep you filled with valid results. I had decided to ignore the error when it was always happening within a few minutes of starting because the only real effect was that my "reliability rating" took a hit but now there are a couple of examples of this happening after a couple of hours. On my i7 it was just over 2 hours into processing (normally that is about 50% complete), so instead of wasting crunching resources and potentially slowing down the return of valid WU results to you (if enough failures occur then more instances need to be sent out to meet the minimum quorum) I will be unsubscribing to this project. I will be keeping an eye on this forum to see when progress is made in addressing this issue and will rejoin at that time.
[Apr 3, 2009 5:47:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

Sekerob - I was thinking about this a bit and have a question as to the effect of returning "Errors". Is it possible that these "Error" results are making a machine be flagged as suspect and subject to ...

"Each time a dubious result is returned (aborts of work in progress e.g.), the device is drafted for an in-depth physical ;>"

from your first post in the thread listed below?
http://www.worldcommunitygrid.org/forums/wcg/...thread=24779&offset=0

Could this be the cause of what appears to be a higher than average rate of "Inconclusives" being returned on other projects or am I seeing something that doesn't exist / whining too much confused
[Apr 3, 2009 10:16:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

Hi Snow Crash,
A high number of Inconclusives can be caused by A) previous errors and invalids or B) a fast CPU that is rated reliable and used to double-check slower computers that are rated unreliable.

You have to keep track of your machine and its quorums to decide which is happening.

Lawrence
[Apr 4, 2009 12:04:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

If we start from the premiss that the number of HPFII errors are small and only effects the crunchers we may be missing the bigger picture where these failures are now causing uneccesary validation work on other projects, with perhaps not as insignificant an impact as originally suggested.

I think a shortcut to the answer would be to see if the expected average of 1.2 returns for the zero redundacy projects is holding true.

These are the results from my PC that are leading me to post my concerns:
I am throwing 1-3 errors per day for HPFII. Seeing as this PC averages ~40 results per day across all projects I think I am always on the fence between reliable (15 valid returns in a row) and unreliable. Between FAAH and HFCC I am getting a combined 2-5 per day inconclusive where it is only after my result is returned that a validating unit is sent out to another machine. Also, I am not including the cases when I am sent the validating PC or when there are two units initially sent out. When you take the three scenarios together my Zero Redundancy WUs are ~ 1.4 - 1.5 instead of the expected 1.2. Perhaps I am just on the wrong side of the curve.
[Apr 4, 2009 10:41:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

It's an interesting postulation that if a HPF2 regularly fails and the only failing it effecting the oft "inconclusive" marking and extra copies send out for the ZR projects. Let me pass this by the techs, specifically knreed.

- There is, and you may have seen the comment, an objective to automatically torque off devices from specific projects they have a high failure rate on. That would mitigate the 'extra' inconclusive duplications, if para 1 were fully correct on x-project effects.

- Members have been asking for the option to have a secondary set of work if primary set fails. Don't know how that fits into the philosophy of WCG v.v. 'community' and all, but the "alternate" work option I think could do with boxes to opt out of a specific project. Snag is, there are only 4 profiles available, so how do you manage for those with many devices. A juggle for sure and needing deeper analysis as how to implement this if it fits in with the whole.

As for Lawrence's comment point B), is not correct, lest I misread what he wrote. The one who's got the 'Inconclusive' listed on the general Results Status page is the one forcing out a second copy to the 'reliable' devices. The other party (the reliables) will just see "In progress" and usually on return an instant "Valid", thus mostly unaware, but looking at the deadline which is currently about 33% of the original deadline.

edits: 4, for afterthoughts and augmentation.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 4 times, last edit by Sekerob at Apr 4, 2009 11:12:46 AM]
[Apr 4, 2009 11:07:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

Sekerob is right, I was not thinking correctly.
[Apr 4, 2009 12:35:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

The other party (the reliables) will just see "In progress" and usually on return an instant "Valid", thus mostly unaware, but looking at the deadline which is currently about 33% of the original deadline.

And if the reliable cruncher is "covering" a newbie from the beginning of the quorum creation it will be "Pending Validation" as usual if it is first to return.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 4, 2009 2:12:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

Think that's a different premiss from what we were discussing on 'inconclusive'. Then up front both show "in progress" on the RS overview page and the one returning the result first seeing PV.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Apr 4, 2009 2:18:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3716
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

Sorry, it was not obvious for me (so probably also for a few silent readers) that Lawrence's
"B) a fast CPU that is rated reliable and used to double-check slower computers that are rated unreliable."
was excluding the case where both WUs are distributed together from the start.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 4, 2009 2:49:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
mclaver
Veteran Cruncher
Joined: Dec 19, 2005
Post Count: 566
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Many WU in error on one of my PC's

I am also getting a lot of errors, the last couple of days, all Quads, all Vista, both Intel and AMD. At least two different error messages.

Result Name Device Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
mi616_ 00017_ 3-- ASUS-i7-965 Error 4/5/09 06:41:47 4/7/09 10:42:48 0.02 0.5 / 0.0
mi616_ 00043_ 6-- ASUS-i7-965 Error 4/5/09 06:24:24 4/7/09 10:27:00 0.02 0.4 / 0.0
mi601_ 00071_ 3-- ASUS-i7-965 Error 4/5/09 02:34:34 4/7/09 09:27:38 0.02 0.5 / 0.0
mi717_ 00036_ 6-- fox-amd-9950 Error 4/6/09 16:52:44 4/7/09 08:10:04 0.02 0.3 / 0.0
mi598_ 00028_ 12-- ASUS-i7-965 Error 4/5/09 01:17:52 4/7/09 06:20:06 0.02 0.4 / 0.0
mi598_ 00008_ 12-- ASUS-i7-965 Error 4/5/09 01:17:52 4/7/09 03:13:21 0.02 0.5 / 0.0
mi699_ 00079_ 10-- fox-amd-9950 Error 4/6/09 11:26:53 4/6/09 22:59:31 0.03 0.5 / 0.0
mi696_ 00027_ 5-- fox-amd-9950 Error 4/6/09 09:55:26 4/6/09 22:57:45 0.01 0.2 / 0.0
mi678_ 00034_ 9-- fox-amd-9950 Error 4/6/09 03:41:49 4/6/09 21:34:22 0.02 0.4 / 0.0
mi570_ 00081_ 15-- GIGA-Q9450 Error 4/4/09 15:49:21 4/6/09 18:03:02 0.02 0.3 / 0.0
mi664_ 00080_ 17-- fox-amd-9950 Error 4/5/09 22:05:33 4/6/09 16:52:44 0.02 0.4 / 0.0
mi647_ 00009_ 12-- fox-amd-9950 Error 4/5/09 16:27:57 4/6/09 13:06:58 0.02 0.3 / 0.0
mi638_ 00013_ 3-- fox-amd-9950 Error 4/5/09 14:00:46 4/6/09 13:06:58 0.01 0.2 / 0.0
mi627_ 00051_ 8-- fox-amd-9950 Error 4/5/09 10:38:30 4/6/09 09:55:26 0.03 0.4 / 0.0
mi651_ 00011_ 12-- MSI-I7-920 Error 4/5/09 17:41:30 4/5/09 19:10:18 0.05 0.4 / 0.0
mi644_ 00036_ 7-- MSI-I7-920 Error 4/5/09 15:49:38 4/5/09 17:56:55 0.02 0.2 / 0.0
mi644_ 00071_ 18-- MSI-I7-920 Error 4/5/09 15:49:38 4/5/09 17:41:29 0.97 8.0 / 0.0
mi630_ 00047_ 17-- MSI-I7-920 Error 4/5/09 11:41:18 4/5/09 15:49:38 0.02 0.2 / 0.0
mi627_ 00085_ 7-- MSI-I7-920 Error 4/5/09 10:59:21 4/5/09 15:49:38 0.05 0.4 / 0.0
mi630_ 00041_ 13-- MSI-I7-920 Error 4/5/09 11:38:07 4/5/09 15:49:38 0.21 1.7 / 0.0
mi600_ 00021_ 18-- MSI-I7-920 Error 4/5/09 01:54:39 4/5/09 15:11:33 4.65 39.0 / 0.0
mi606_ 00002_ 2-- MSI-I7-920 Error 4/5/09 03:56:06 4/5/09 15:00:14 0.15 1.3 / 0.0
mi608_ 00038_ 11-- MSI-I7-920 Error 4/5/09 04:25:00 4/5/09 14:57:35 0.01 0.1 / 0.0
mi598_ 00067_ 5-- MSI-I7-920 Error 4/5/09 02:05:55 4/5/09 14:57:35 3.04 25.5 / 0.0

Result Log

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (19 frames):
[0x8789e9f]
[0x877cfa4]
[0xf7f70400]
[0x8601f2b]
[0x8645a99]
[0x864f54c]
[0x8654774]
[0x866840e]
[0x8669dac]
[0x843b5d4]
[0x870e50b]
[0x85e9a87]
[0x85eb7c5]
[0x805cf24]
[0x8331f6b]
[0x83f3cdd]
[0x83f3f5c]
[0x87ed062]
[0x8048131]

Exiting...

</stderr_txt>
]]>

Result Log

<core_client_version>6.4.7</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
ERROR:: Exit at: .\dock_structure.cc line:401

</stderr_txt>
]]>
----------------------------------------



[Apr 7, 2009 2:15:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 44   Pages: 5   [ Previous Page | 1 2 3 4 5 | Next Page ]
[ Jump to Last Post ]
Post new Thread