| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 13
|
|
| Author |
|
|
Mysteron347
Senior Cruncher Australia Joined: Apr 28, 2007 Post Count: 179 Status: Offline Project Badges:
|
Once a quorum has been achieved for a WU, is there any point in further completing the calculations?
Given that I don't care about points - wouldn't it be better to have the processor crunching a fresh WU - and shouldn't there be an option to can such redundant processing automatically? Or have I got the wrong end of the stick (not unusual..?) |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Once the work unit has validated successfully, all additional results are superfluous - for all projects except HPF2 (which uses a different quorum method).
However, extra copies aren't usually sent out unless one of the original copies is an error, or is not returned on time. There is very little wasted work. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Occasionally the projects have an extra copy send out to speed up the completion of the quorum 3 (all projects but HPF2). That is e.g. to clear a batch that needs return to the scientists. It's though done rarely and usually announced in the Member News, Known Issues or Active Research project forum sections.
----------------------------------------Every opportunity is looked at to reduce the initial distributions or even extend the deadlines to minimize redundancy. A "No Reply" with a 7 day deadline sends out an extra copy with e.g. 2 days return. The "No reply" copy could still come back before the extra copy. With a 9 day deadline, the chances increase though only slightly, given the bulk of the work, > 90%, is returned within 4 days. In the cases where there is no issue known and you see that you are crunching the 4th copy, refer noted the exceptional needs in above post on HPF2, you could manually abort the unit. If not started yet, no loss at all and if started, no loss to those who're not interested in points. ATM i dont know what the 'waste' is due 'late' returns. I believe that if a late return comes back after the extra copy, no credit is given.... now that is sad. In general crunching a spare copy is unsatisfying, but dont spend to much time on it.... just let it run, set and forget (for most of the time).
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
|
Mysteron347
Senior Cruncher Australia Joined: Apr 28, 2007 Post Count: 179 Status: Offline Project Badges:
|
Now just one cotton-pickin' minute there!!
Since a WU is distributed to a number of machines at substantially the same time, then the effect of a slow machine (in terms of return time on a WU) is this: 1. The points-allotment is delayed until the slowest returner has responded (this is only relevant to points-chasers) 2. The result returned by a slow machine is more than likely redundant. It's this second point that is significant. Take a simple case of a Quorum-2, Distribution-3 WU which is sent to 1 slow machine taking 8 hours and 2 fast machines taking 4 hours. After four hours, it's likely that a quorum would be achieved - the result are delayed to to arrival of the slow machine's response four hours later AND that response is redundant in any case. Sure - the two fast machines would be off doing the next WU, but the slow machine has been operating for 8 hours to produce a redundant result. And so the slow machine lumbers on to the next WU - and again returns a redundant result. The nett effect is that despite running 24/7, the slow machine actually contributes NOTHING to the accumulated results, and delays the allotment of the precious points into the bargain. The name of the game would be to get some nett work from the slow machines - not to ban them. This could be achieved by better managing the distribution of WUs on a JIT-type basis. Send out the WUs in a pattern which would cause the responses to arrive at roughly the same time and set the deadline as say a day later. Only send out enough to fill the quorum. If the quorum hasn't been achieved by the deadline, THEN send out make-up jobs to historically-fast machines and wait for a quorum, repeating the make-up procedure if required. The result should be that since the slow machines would then be making a contribution rather than perpetually returning redundant responses, then the efficiency of the entire grid would be improved. It would be like adding an extra perhaps 10 to 40% to the network by simply managing the resources better. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You would be right, if we used a 2 quorum system. But we don't. WCG requires three results for a quorum.
The initial replication is the same as the quorum size. |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
One more rule that became possible with the latest BOINC releases (5.8.16) and Server (509) updates: If there are known to be many redundant work units lurking around or a bad batch, WCG has the ability to send a cancel/abort instruction. It is only executed upon passive contact i.e. when the client contacts the server. knreed posted the rules somewhere but it goes like this piu o meno,
----------------------------------------1. If the unit was not started, cancel the unit. 2. If the unit is known to be bad and started, abort the unit. 3. If the unit is known to be good, but started, let it run as WCG has no ability to discover if WU was in the first or the last minute. Credit is granted upon completion. AFAIK this rule has only been deployed once or twice. The passive contact may become active in the future i.e. a push towards the client, but that does not help if off-line. Many crunch that way and e.g. my IP changes presently about every few hours (and i feel saver somehow). added: oh and the HPF2 each are unique and discussion is there to review and apply that logic of distribution and validation on future projects, but paramount is the validity which is deemed to still need 3 work units. To think we came from an initial 4 distribution with quorum 3 just a year ago, efficiency was improved by about 30 percent (somehow I think to remember a published number of about 26%).
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jun 21, 2007 6:32:46 PM] |
||
|
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges:
|
The rules that exist now are:
1) If the workunit has been assimilated and the related files deleted on the server, had an error or been canceled, then the server will tell the client to cancel the in-progress result the next time the client contacts the server. The 5.10 client should be handle this next case (and I need to test this feature on the server before unleashing it on you since the 5.8 client had a bug which would crash the core client when the abort-if-unstarted was sent - as a result the 5.10 client listens for a abort-if-not-started message): 2) Once the workunit has been assimilated on the server (i.e. result(s) have been accepted and any future results returned will not contribute to the science) tell the client to abort any results for that workunit that haven't started next time the client contacts the server. At some future point BOINC will add a preference to the client to treat 'abort if not started' messages as 'abort' messages from the servers. Rule #1 makes sure that you are not crunching work that you can neither earn credit for and that does not contribute to the science. Rule #2 helps reduce the number of times that you would be crunching work that does not contribute to the science. Once the preference is added to a future BOINC client, then rule #2 will work with that preference to ensure that the work you are doing will contribute to the science. In many cases the cause of these are due to someone shutting off their computer and taking a vacation. I can tell you that there is significant effort going into scheduling and doing what is possible to maximize the use of the volunteers computers. A large part of the BOINC 5.10 client is improvements in the scheduling algorithm for the client. A simulator was created and numerous test cases were run through the simulator. |
||
|
|
jal2
Senior Cruncher USA Joined: Apr 28, 2007 Post Count: 422 Status: Offline Project Badges:
|
Sekerob, if WCG implemented a server push to the client, then you will lose a number of security minded people who believe that the server should only reply to a client request and never initiate the conversation.
---------------------------------------- |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I'm fairly sure Sekerob just made that up.
This has been suggested in the past, and the BOINC developers have slammed the idea. WCG are equally security minded. Sek, got a source? |
||
|
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
No one more aware of that than WCG and i share your sentiments on that.
----------------------------------------D. No thumbs work.... read it in the not to distant past and if slammed, shed no tear.
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jun 21, 2007 8:41:59 PM] |
||
|
|
|