Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 12
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1626 times and has 11 replies Next Thread
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Server aborted task forces device to unreliable

I understand how/why devices are marked reliable or not and how error tasks and such can get a device marked unreliable but why does that include server aborted tasks? I got a resend yesterday that the overdue wingman returned shortly after I got it as a resend so it was aborted on my machine by the server. Now, every WU for that project on that machine has to go thru Pending Verification until enough get returned as valid. Yea, it all evens out in the long run but it doesn't seem like a valid reason to mark a device unreliable.
----------------------------------------
Join/Website/IMODB



[Nov 14, 2012 12:23:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

Makes no sense. Recently had an offline exchange with knreed on status class codes 202 and 203 [user / server aborts], which client 7 caused to be recorded as error, then loosing the "do it alone" rating for the Zero Redundant sciences. It was fixed, and though these results still list as "error" they do no longer cause on my -v7- clients to get this knock out, and continue to receive single distributions.
Re: FAO Techs: Client 7 Server and User aborted tasks misreporting.

I've changed this. user aborts and server aborts should no longer reset the 'consecutive valid' counter on the host_app_version table. That will allow them to retain their 'reliable' status.
[Oct 10, 2012 8:59:58 PM]

Question is, what client are you running, or is there some other regression?

edit: "consecutive valid" means that the last 5 validated must not have an error or invalid code, which of course would include results that were returned before but were waiting on a wingman [those pesky interspersed X% that are required to re-rate if a device is still reliable for instance to serve as a repair man]. Still see those on ZR sciences, either with a _0 or _1 suffix.
----------------------------------------
[Edit 2 times, last edit by Former Member at Nov 14, 2012 1:55:32 PM]
[Nov 14, 2012 1:41:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

Hmm, I'm running BOINC 6.10.58. How does that factor in? I would think this is all happening on the server end.
----------------------------------------
Join/Website/IMODB



[Nov 14, 2012 2:05:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

Something changed in the server 700 code creating incompatibility with the result classes send back by v7 clients. There was a patch issued by Berkeley so the server side would handle the codes correctly again. I'll leave knreed to look into this issue again... the fix may have had other implications, impacting v6 client result returns. I just know that the v6 clients never stopped getting the proper status printed on the RS pages.
[Nov 14, 2012 2:14:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

Thanks Sek - for the explanation and the follow-up.
----------------------------------------
Join/Website/IMODB



[Nov 15, 2012 6:05:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

This is proving to be a bit painful. I would think that, for a device that is truly unreliable, it would get sent WUs that all have quorum of 2 (or more), even on single redundancy projects. Since, as I understand it, the device has to return at least five valid quorum-of-2+ WU before it can get mark as reliable again. With this situation, every WU I return for the project goes to Pending Verification yet only 1 out of every 9 WU sent to me is quorum of 2. I'm going to try suspending all -0 WUs for now and crunch only the -1 WUs to try to expidite my recovery from a simple Server Aborted WU. This happened to me once before as I recall but I didn't recognize the circumstances. Will be nice when Kevin (has quite a plateful undoubtedly) has time to look at this.
----------------------------------------
Join/Website/IMODB



[Nov 16, 2012 7:46:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

I'm having kind of the contrary to this. I'm getting repair jobs on a device that, in my understanding, shouldn't be reliable, since return times exceed 2 days. This is the information on Results Status page:

X0930076441415200610161235_ 2-- In Pr 11/16/12 20:58:44 11/19/12 16:10:44
X0930076371173200610061050_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12
X0930076380978200610271524_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12
X0930074890511200608311301_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12
X0930074890380200608311304_ 2-- In Pr 11/16/12 14:10:12 11/19/12 09:22:12
X0960076350092200609290941_ 2-- In Pr 11/16/12 14:09:53 11/19/12 09:21:53
--- Above the repair jobs
X0930076310926200610061326_ 1-- In Pr 11/16/12 03:46:46 11/23/12 03:46:46
X0960075350585200609251039_ 0-- In Pr 11/15/12 03:19:06 11/22/12 03:19:06
X0930076180630200610041023_ 0-- In Pr 11/15/12 00:27:32 11/22/12 00:27:32
X0930076180633200610041023_ 0-- In Pr 11/15/12 00:27:32 11/22/12 00:27:32
X0930076180335200610041028_ 0-- In Pr 11/15/12 00:27:09 11/22/12 00:27:09
X0930076660745200610121844_ 1-- In Pr 11/14/12 20:52:37 11/21/12 20:52:37
X0960076551428200610171541_ 0-- In Pr 11/14/12 09:56:22 11/21/12 09:56:22
X0900076010707200609211003_ 0-- In Pr 11/14/12 06:53:03 11/21/12 06:53:03
--- Below the returned tasks, all of them >2 days
X0900075371085200609251121_ 1-- Valid 11/13/12 11:11:36 11/16/12 14:09:53
X0900075360995200609291456_ 1-- Valid 11/13/12 09:38:59 11/16/12 03:46:46
X0960075280142200610052031_ 1-- Valid 11/13/12 06:05:58 11/16/12 03:46:46
X0900075820486200610061800_ 1-- P Val 11/13/12 01:51:05 11/16/12 03:46:46
X0960075750494200610111627_ 1-- Valid 11/12/12 23:05:58 11/16/12 03:46:46
X0960075210853200610051725_ 1-- Valid 11/12/12 16:22:32 11/16/12 03:46:46
X0960075210408200610051732_ 1-- Valid 11/12/12 16:21:49 11/16/12 03:46:46
X0960075080249200609261613_ 1-- P Val 11/11/12 04:39:00 11/14/12 06:51:31
X0930075050096200609191516_ 0-- P Val 11/11/12 00:21:43 11/14/12 06:51:31
X0960074951072200609291842_ 1-- P Val 11/10/12 15:13:18 11/13/12 01:51:05

What I'm missing?
[Nov 16, 2012 10:46:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

I am just speculating here, but it may be that the sheer volume of HCC units being processed necessitated relaxing the definition for "reliable" machine. Maybe they upped the threshold to 3 days.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Nov 17, 2012 2:45:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

This is proving to be a bit painful. I would think that, for a device that is truly unreliable, it would get sent WUs that all have quorum of 2 (or more), even on single redundancy projects. Since, as I understand it, the device has to return at least five valid quorum-of-2+ WU before it can get mark as reliable again. With this situation, every WU I return for the project goes to Pending Verification yet only 1 out of every 9 WU sent to me is quorum of 2. I'm going to try suspending all -0 WUs for now and crunch only the -1 WUs to try to expidite my recovery from a simple Server Aborted WU. This happened to me once before as I recall but I didn't recognize the circumstances. Will be nice when Kevin (has quite a plateful undoubtedly) has time to look at this.


I've looked at so many WU's in Results Status that my head is swimming but it would appear that when the Server Aborted task happened, I did start getting sent quorum of 2 WUs. I just returned a quorum 1 WU from the machine involved and it was marked valid so I think my machine is marked reliable again. Net is that this problem looks to be with how the servers handle the result classes from V6 clients. I think.
----------------------------------------
Join/Website/IMODB



[Nov 17, 2012 6:26:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted task forces device to unreliable

I am just speculating here, but it may be that the sheer volume of HCC units being processed necessitated relaxing the definition for "reliable" machine. Maybe they upped the threshold to 3 days.
It makes me wonder why a logic-flowchart for WCG that shows the factors in determining when/if a computer would be deemed as 'reliable' at WCG -- is not made available.
;
[Nov 17, 2012 12:46:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 12   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread