Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 12
|
![]() |
Author |
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I understand how/why devices are marked reliable or not and how error tasks and such can get a device marked unreliable but why does that include server aborted tasks? I got a resend yesterday that the overdue wingman returned shortly after I got it as a resend so it was aborted on my machine by the server. Now, every WU for that project on that machine has to go thru Pending Verification until enough get returned as valid. Yea, it all evens out in the long run but it doesn't seem like a valid reason to mark a device unreliable.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Makes no sense. Recently had an offline exchange with knreed on status class codes 202 and 203 [user / server aborts], which client 7 caused to be recorded as error, then loosing the "do it alone" rating for the Zero Redundant sciences. It was fixed, and though these results still list as "error" they do no longer cause on my -v7- clients to get this knock out, and continue to receive single distributions.
----------------------------------------Re: FAO Techs: Client 7 Server and User aborted tasks misreporting. I've changed this. user aborts and server aborts should no longer reset the 'consecutive valid' counter on the host_app_version table. That will allow them to retain their 'reliable' status. [Oct 10, 2012 8:59:58 PM] Question is, what client are you running, or is there some other regression? edit: "consecutive valid" means that the last 5 validated must not have an error or invalid code, which of course would include results that were returned before but were waiting on a wingman [those pesky interspersed X% that are required to re-rate if a device is still reliable for instance to serve as a repair man]. Still see those on ZR sciences, either with a _0 or _1 suffix. [Edit 2 times, last edit by Former Member at Nov 14, 2012 1:55:32 PM] |
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hmm, I'm running BOINC 6.10.58. How does that factor in? I would think this is all happening on the server end.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Something changed in the server 700 code creating incompatibility with the result classes send back by v7 clients. There was a patch issued by Berkeley so the server side would handle the codes correctly again. I'll leave knreed to look into this issue again... the fix may have had other implications, impacting v6 client result returns. I just know that the v6 clients never stopped getting the proper status printed on the RS pages.
|
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks Sek - for the explanation and the follow-up.
---------------------------------------- |
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This is proving to be a bit painful. I would think that, for a device that is truly unreliable, it would get sent WUs that all have quorum of 2 (or more), even on single redundancy projects. Since, as I understand it, the device has to return at least five valid quorum-of-2+ WU before it can get mark as reliable again. With this situation, every WU I return for the project goes to Pending Verification yet only 1 out of every 9 WU sent to me is quorum of 2. I'm going to try suspending all -0 WUs for now and crunch only the -1 WUs to try to expidite my recovery from a simple Server Aborted WU. This happened to me once before as I recall but I didn't recognize the circumstances. Will be nice when Kevin (has quite a plateful undoubtedly) has time to look at this.
---------------------------------------- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm having kind of the contrary to this. I'm getting repair jobs on a device that, in my understanding, shouldn't be reliable, since return times exceed 2 days. This is the information on Results Status page:
X0930076441415200610161235_ 2-- In Pr 11/16/12 20:58:44 11/19/12 16:10:44 X0930076371173200610061050_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12 X0930076380978200610271524_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12 X0930074890511200608311301_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12 X0930074890380200608311304_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12 X0960076350092200609290941_ 2-- In Pr 11/16/12 14:09:53 11/19/12 09:21:53 --- Above the repair jobs X0930076310926200610061326_ 1-- In Pr 11/16/12 03:46:46 11/23/12 03:46:46 X0960075350585200609251039_ 0-- In Pr 11/15/12 03:19:06 11/22/12 03:19:06 X0930076180630200610041023_ 0-- In Pr 11/15/12 00:27:32 11/22/12 00:27:32 X0930076180633200610041023_ 0-- In Pr 11/15/12 00:27:32 11/22/12 00:27:32 X0930076180335200610041028_ 0-- In Pr 11/15/12 00:27:09 11/22/12 00:27:09 X0930076660745200610121844_ 1-- In Pr 11/14/12 20:52:37 11/21/12 20:52:37 X0960076551428200610171541_ 0-- In Pr 11/14/12 09:56:22 11/21/12 09:56:22 X0900076010707200609211003_ 0-- In Pr 11/14/12 06:53:03 11/21/12 06:53:03 --- Below the returned tasks, all of them >2 days X0900075371085200609251121_ 1-- Valid 11/13/12 11:11:36 11/16/12 14:09:53 X0900075360995200609291456_ 1-- Valid 11/13/12 09:38:59 11/16/12 03:46:46 X0960075280142200610052031_ 1-- Valid 11/13/12 06:05:58 11/16/12 03:46:46 X0900075820486200610061800_ 1-- P Val 11/13/12 01:51:05 11/16/12 03:46:46 X0960075750494200610111627_ 1-- Valid 11/12/12 23:05:58 11/16/12 03:46:46 X0960075210853200610051725_ 1-- Valid 11/12/12 16:22:32 11/16/12 03:46:46 X0960075210408200610051732_ 1-- Valid 11/12/12 16:21:49 11/16/12 03:46:46 X0960075080249200609261613_ 1-- P Val 11/11/12 04:39:00 11/14/12 06:51:31 X0930075050096200609191516_ 0-- P Val 11/11/12 00:21:43 11/14/12 06:51:31 X0960074951072200609291842_ 1-- P Val 11/10/12 15:13:18 11/13/12 01:51:05 What I'm missing? |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am just speculating here, but it may be that the sheer volume of HCC units being processed necessitated relaxing the definition for "reliable" machine. Maybe they upped the threshold to 3 days.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This is proving to be a bit painful. I would think that, for a device that is truly unreliable, it would get sent WUs that all have quorum of 2 (or more), even on single redundancy projects. Since, as I understand it, the device has to return at least five valid quorum-of-2+ WU before it can get mark as reliable again. With this situation, every WU I return for the project goes to Pending Verification yet only 1 out of every 9 WU sent to me is quorum of 2. I'm going to try suspending all -0 WUs for now and crunch only the -1 WUs to try to expidite my recovery from a simple Server Aborted WU. This happened to me once before as I recall but I didn't recognize the circumstances. Will be nice when Kevin (has quite a plateful undoubtedly) has time to look at this. I've looked at so many WU's in Results Status that my head is swimming but it would appear that when the Server Aborted task happened, I did start getting sent quorum of 2 WUs. I just returned a quorum 1 WU from the machine involved and it was marked valid so I think my machine is marked reliable again. Net is that this problem looks to be with how the servers handle the result classes from V6 clients. I think. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I am just speculating here, but it may be that the sheer volume of HCC units being processed necessitated relaxing the definition for "reliable" machine. Maybe they upped the threshold to 3 days. It makes me wonder why a logic-flowchart for WCG that shows the factors in determining when/if a computer would be deemed as 'reliable' at WCG -- is not made available.; |
||
|
|
![]() |