Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3314 times and has 17 replies Next Thread
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 817
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Server aborted: WU sent to 3 ppl

I'm finding multiple UGM WUs are sent to 3 computers within 45 minutes which ends in a race for whoever finishes first. A quick search for "aborted" shows me three WUs that I never started since two other ppl were able to finish the files first:
ugm1_ugm1_20518_0033
ugm1_ugm1_20527_0360
ugm1_ugm1_20298_1778

It looks like I didn't waste any CPU time but bandwidth at 5 megs per WU will have ppl complain and servers at WCG are wasting cycles if WUs get sent to three people right off the bat.

Anyone else finding this?

Edit: 2 linux boxes, 1 Windows box.

I checked a bunch of my valids and am not seeing those WUs being sent to more than 2 ppl unless someone detached/errored/or didn't reply.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Seoulpowergrid at Dec 3, 2015 1:48:47 PM]
[Dec 3, 2015 1:41:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

Post copies of the result work unit distribution and it will be a lot more informative in terms of helping to guess what's on with those.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Dec 3, 2015 3:27:32 PM]
[Dec 3, 2015 3:27:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7693
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

I posted on this earlier in another thread :https://secure.worldcommunitygrid.org/forums/...ead,38555_offset,0#505428.
Never really heard back on the issue. I have seen it more than once.

Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Dec 3, 2015 3:36:14 PM]
[Dec 3, 2015 3:34:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Seoulpowergrid
Veteran Cruncher
Joined: Apr 12, 2013
Post Count: 817
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

@Rob
So for one of the instances I was the first to get the WU (WU ends in 0) and another I was the 3rd to get the WU. The last instance I can't check as WCG deleted it.

@Sgt. Joe
I was pretty sure I saw this before but never checked again as it looked like a one off. Today, as I saw three more aborted WUs, I decided to post this.


----------------------------------------

[Dec 3, 2015 4:51:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 772
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

Found these two where I am -2:

Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
ugm1_ ugm1_ 20546_ 0679_ 2-- - In Progress 03/12/15 09:24:17 10/12/15 09:24:17 0.00 0.0 / 0.0
ugm1_ ugm1_ 20546_ 0679_ 1-- 728 Pending Validation 03/12/15 09:18:39 03/12/15 11:39:00 2.33 71.9 / 0.0
ugm1_ ugm1_ 20546_ 0679_ 0-- - In Progress 03/12/15 09:18:17 10/12/15 09:18:17 0.00 0.0 / 0.0

ugm1_ ugm1_ 20550_ 2115_ 2-- - In Progress 03/12/15 12:26:59 10/12/15 12:26:59 0.00 0.0 / 0.0
ugm1_ ugm1_ 20550_ 2115_ 1-- - In Progress 03/12/15 12:07:04 10/12/15 12:07:04 0.00 0.0 / 0.0
ugm1_ ugm1_ 20550_ 2115_ 0-- 728 Server Aborted 03/12/15 12:06:40 03/12/15 12:43:11 0.00 0.0 / 0.0

Edit: added 2nd.

Paul.
----------------------------------------
Paul.
----------------------------------------
[Edit 1 times, last edit by PMH_UK at Dec 3, 2015 5:10:19 PM]
[Dec 3, 2015 5:07:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

It's a pity they were not caught at point of distribution, to see the original deadlines. Those results are rather current, so it's not like troppo tardi and techs in a rush to get a quorum complete to finish an old lingering batch. "So happens" [quoting Bill], have two 8 cores on running a combined 13 UGM and 3 CEP2 which they've been doing for over a month now. I'll take out the flee-comb and see if any fresh show this, i.e. none have reported yet. Maybe something surfaces from this.

ttyl

... so 10 minutes later and opening up 45 IPs, only one showed with 1 returned results, 1 had a 3rd copy because an original went legs up, in amongst the IPs were copies of batch 20546 and 20550. Just wonder [Bills dartgun], if there's joysticks attached to the servers and a shoot them up [read wind them up], games are played, kidding of course. No raison d'être for this to happen, maybe one or the other host is spitting intermittent errors [wingman] soon after, and therefor some anticipating action is taken [extra copy], to make sure the server lifetime of a batch does not go long.

Summation: Without techs fezzing, nothing to understand to let it rest [shut up and crunch ;o]

(I admit, now that I have a solid Hunting Tool v2.0q (one with now 76000+ results having passed through and the other 12000), I hardly ever visit the RS pages... what I do not know, cannot concern me ;P)
[Dec 3, 2015 5:42:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

Wellll, seeing the added post by PMH_UK, # _2 has a normal 10 day deadline... more puzzle.
[Dec 3, 2015 5:47:05 PM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

Seeing this as well:

Workunit Status



Project Name: Uncovering Genome Mysteries
Created: 12/01/2015 00:39:02
Name: ugm1_ugm1_20524_1200
Minimum Quorum: 2
Replication: 2



Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
ugm1_ ugm1_ 20524_ 1200_ 2-- 728 Valid 12/2/15 14:02:57 12/3/15 08:46:10 2.90 72.6 / 64.5
ugm1_ ugm1_ 20524_ 1200_ 0-- 728 Valid 12/2/15 13:58:53 12/2/15 16:00:41 2.00 56.4 / 64.5
ugm1_ ugm1_ 20524_ 1200_ 1-- 728 Server Aborted 12/2/15 13:58:43 12/3/15 11:00:39 0.00 0.0 / 0.0



I'm the _1 that got the WU before the _0. The _2 went out about FOUR minutes later. Either we have an anomaly in the spacetime continuum or BOINC is psychic! laughing
----------------------------------------
Join/Website/IMODB



[Dec 4, 2015 5:27:54 AM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 772
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

More oddity:

Re-sent as _2 after error on _1 but also to me as _3 with _0 in progress.

Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
ugm1_ ugm1_ 20660_ 1733_ 3-- - In Progress 07/12/15 03:24:04 14/12/15 03:24:04 0.00 0.0 / 0.0
ugm1_ ugm1_ 20660_ 1733_ 2-- 728 Pending Validation 07/12/15 03:21:25 07/12/15 06:38:04 3.25 102.6 / 0.0
ugm1_ ugm1_ 20660_ 1733_ 1-- 728 Error 07/12/15 03:19:04 07/12/15 03:21:07 0.00 84.8 / 0.0
ugm1_ ugm1_ 20660_ 1733_ 0-- - In Progress 07/12/15 03:18:40 14/12/15 03:18:40 0.00 0.0 / 0.0

Paul.
----------------------------------------
Paul.
[Dec 7, 2015 8:32:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: Server aborted: WU sent to 3 ppl

Don't know why any of the tag tech-team is not responding 'we're looking at it' or 'this is why', but I'm assuming [dangerous], that whence there's N copies 'in-progress', an 'error/no reply' will just invoke sending a new copy until max copies is reached without checking if enough IP are IP.

Is 7 days the standard deadline these days(?), then those extras also go out with seven [a film I will never ever watch again]. Spooky.
[Dec 7, 2015 9:23:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread