World Community Grid - View Thread

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: One upon a quorum

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 13

[ ]

Author

This topic has been viewed 1236 times and has 12 replies

Mysteron347
Senior Cruncher
Australia
Joined: Apr 28, 2007
Post Count: 179
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

1 year badge for Help Fight Childhood Cancer

5 year badge for Help Cure Muscular Dystrophy - Phase 2

2 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

45 day badge for Uncovering Genome Mysteries

1 year badge for Outsmart Ebola Together

5 year badge for Microbiome Immunity Project

1 year badge for Africa Rainfall Project

5 year badge for OpenPandemics - COVID-19


One upon a quorum

Once a quorum has been achieved for a WU, is there any point in further completing the calculations?

Given that I don't care about points - wouldn't it be better to have the processor crunching a fresh WU - and shouldn't there be an option to can such redundant processing automatically?

Or have I got the wrong end of the stick (not unusual..?)

[Jun 19, 2007 1:14:21 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: One upon a quorum

Once the work unit has validated successfully, all additional results are superfluous - for all projects except HPF2 (which uses a different quorum method).

However, extra copies aren't usually sent out unless one of the original copies is an error, or is not returned on time. There is very little wasted work.

[Jun 19, 2007 1:27:53 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Once upon a quorum

Occasionally the projects have an extra copy send out to speed up the completion of the quorum 3 (all projects but HPF2). That is e.g. to clear a batch that needs return to the scientists. It's though done rarely and usually announced in the Member News, Known Issues or Active Research project forum sections.

Every opportunity is looked at to reduce the initial distributions or even extend the deadlines to minimize redundancy. A "No Reply" with a 7 day deadline sends out an extra copy with e.g. 2 days return. The "No reply" copy could still come back before the extra copy. With a 9 day deadline, the chances increase though only slightly, given the bulk of the work, > 90%, is returned within 4 days. In the cases where there is no issue known and you see that you are crunching the 4th copy, refer noted the exceptional needs in above post on HPF2, you could manually abort the unit. If not started yet, no loss at all and if started, no loss to those who're not interested in points.

ATM i dont know what the 'waste' is due 'late' returns. I believe that if a late return comes back after the extra copy, no credit is given.... now that is sad.

In general crunching a spare copy is unsatisfying, but dont spend to much time on it.... just let it run, set and forget (for most of the time).

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

[Jun 19, 2007 1:51:34 PM]

Mysteron347
Senior Cruncher
Australia
Joined: Apr 28, 2007
Post Count: 179
Status: Offline
Project Badges:


Re: Once upon a quorum

Now just one cotton-pickin' minute there!!

Since a WU is distributed to a number of machines at substantially the same time, then the effect of a slow machine (in terms of return time on a WU) is this:

1. The points-allotment is delayed until the slowest returner has responded (this is only relevant to points-chasers)
2. The result returned by a slow machine is more than likely redundant.

It's this second point that is significant.

Take a simple case of a Quorum-2, Distribution-3 WU which is sent to 1 slow machine taking 8 hours and 2 fast machines taking 4 hours. After four hours, it's likely that a quorum would be achieved - the result are delayed to to arrival of the slow machine's response four hours later AND that response is redundant in any case. Sure - the two fast machines would be off doing the next WU, but the slow machine has been operating for 8 hours to produce a redundant result.

And so the slow machine lumbers on to the next WU - and again returns a redundant result. The nett effect is that despite running 24/7, the slow machine actually contributes NOTHING to the accumulated results, and delays the allotment of the precious points into the bargain.

The name of the game would be to get some nett work from the slow machines - not to ban them.

This could be achieved by better managing the distribution of WUs on a JIT-type basis. Send out the WUs in a pattern which would cause the responses to arrive at roughly the same time and set the deadline as say a day later. Only send out enough to fill the quorum. If the quorum hasn't been achieved by the deadline, THEN send out make-up jobs to historically-fast machines and wait for a quorum, repeating the make-up procedure if required.

The result should be that since the slow machines would then be making a contribution rather than perpetually returning redundant responses, then the efficiency of the entire grid would be improved. It would be like adding an extra perhaps 10 to 40% to the network by simply managing the resources better.

[Jun 21, 2007 3:08:41 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Once upon a quorum

You would be right, if we used a 2 quorum system. But we don't. WCG requires three results for a quorum.

The initial replication is the same as the quorum size.

[Jun 21, 2007 3:12:01 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Once upon a quorum

One more rule that became possible with the latest BOINC releases (5.8.16) and Server (509) updates: If there are known to be many redundant work units lurking around or a bad batch, WCG has the ability to send a cancel/abort instruction. It is only executed upon passive contact i.e. when the client contacts the server. knreed posted the rules somewhere but it goes like this piu o meno,

1. If the unit was not started, cancel the unit.
2. If the unit is known to be bad and started, abort the unit.
3. If the unit is known to be good, but started, let it run as WCG has no ability to discover if WU was in the first or the last minute. Credit is granted upon completion.

AFAIK this rule has only been deployed once or twice.

The passive contact may become active in the future i.e. a push towards the client, but that does not help if off-line. Many crunch that way and e.g. my IP changes presently about every few hours (and i feel saver somehow).

added: oh and the HPF2 each are unique and discussion is there to review and apply that logic of distribution and validation on future projects, but paramount is the validity which is deemed to still need 3 work units.

To think we came from an initial 4 distribution with quorum 3 just a year ago, efficiency was improved by about 30 percent (somehow I think to remember a published number of about 26%).

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at Jun 21, 2007 6:32:46 PM]

[Jun 21, 2007 6:26:10 PM]

knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:

180 day badge for Human Proteome Folding

90 day badge for Human Proteome Folding - Phase 2

45 day badge for Help Cure Muscular Dystrophy - Phase 2

90 day badge for Computing for Clean Water

14 day badge for Uncovering Genome Mysteries

45 day badge for Outsmart Ebola Together

180 day badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

180 day badge for OpenPandemics - COVID-19


Re: Once upon a quorum

The rules that exist now are:

1) If the workunit has been assimilated and the related files deleted on the server, had an error or been canceled, then the server will tell the client to cancel the in-progress result the next time the client contacts the server.

The 5.10 client should be handle this next case (and I need to test this feature on the server before unleashing it on you since the 5.8 client had a bug which would crash the core client when the abort-if-unstarted was sent - as a result the 5.10 client listens for a abort-if-not-started message):

2) Once the workunit has been assimilated on the server (i.e. result(s) have been accepted and any future results returned will not contribute to the science) tell the client to abort any results for that workunit that haven't started next time the client contacts the server.

At some future point BOINC will add a preference to the client to treat 'abort if not started' messages as 'abort' messages from the servers.

Rule #1 makes sure that you are not crunching work that you can neither earn credit for and that does not contribute to the science.

Rule #2 helps reduce the number of times that you would be crunching work that does not contribute to the science.

Once the preference is added to a future BOINC client, then rule #2 will work with that preference to ensure that the work you are doing will contribute to the science.

In many cases the cause of these are due to someone shutting off their computer and taking a vacation.

I can tell you that there is significant effort going into scheduling and doing what is possible to maximize the use of the volunteers computers. A large part of the BOINC 5.10 client is improvements in the scheduling algorithm for the client. A simulator was created and numerous test cases were run through the simulator.

[Jun 21, 2007 6:46:08 PM]

jal2
Senior Cruncher
USA
Joined: Apr 28, 2007
Post Count: 422
Status: Offline
Project Badges:

10 year badge for Human Proteome Folding - Phase 2

2 year badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

45 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Computing for Clean Water

180 day badge for Drug Search for Leishmaniasis

10 year badge for Mapping Cancer Markers

90 day badge for Uncovering Genome Mysteries

14 day badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Once upon a quorum

Sekerob, if WCG implemented a server push to the client, then you will lose a number of security minded people who believe that the server should only reply to a client request and never initiate the conversation.

----------------------------------------

Team Christians UAG

[Jun 21, 2007 8:29:39 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Once upon a quorum

I'm fairly sure Sekerob just made that up.

This has been suggested in the past, and the BOINC developers have slammed the idea.

WCG are equally security minded.

Sek, got a source?

[Jun 21, 2007 8:33:38 PM]

Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline


Re: Once upon a quorum

No one more aware of that than WCG and i share your sentiments on that.

D. No thumbs work.... read it in the not to distant past and if slammed, shed no tear.

----------------------------------------

WCG

Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!

----------------------------------------
[Edit 1 times, last edit by Sekerob at Jun 21, 2007 8:41:59 PM]

[Jun 21, 2007 8:34:54 PM]

[ ]