| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 30
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The part of the scheduler responsible for all this is the work distribution policy. At present, BOINC seems to use a fairly simple algorithm: it will send whatever's available that doesn't exceed the client resources.
I assume the WCG scheduler is slightly different, since it needs to balance multiple projects. I recall the techs mentioning a load balancer. At a guess, all this does is respects the project preferences, allows for client limitations then allocates work based on how much is available from each project. One refinement that they could have added (but I suspect they didn't) is something to create an even mix on a per-host basis. For those that care, BOINC uses a master/agent architecture for scheduling. I would love to see a more distributed solution, but scheduling doesn't easily break down that way. A distributed architecture would require agents to talk to one another. Besides the security problems, I don't think the WCG privacy policy would allow clients to maintain information about each other. So if I want to take this any further, I need to set up a Monte Carlo scheduler simulation. Yay! Gaussian distributions! I'm really not very good at writing simulations. So, if you're still listening: help me get this straight... each host will have a number of properties: performance, reliability, crunching habits (that's where the most guesswork comes in), project preferences. Work units will have an estimated and actual time to completion. All I need to do is write schedulers that do all the things we've discussed, and see what happens when I throw 100,000 "hosts" at it. Part of the challenge will be determining the reliability and crunching habits based on the results returned, and on any additional data I add to the scheduling requests. The aim, of course, is to increase output without unduly delaying results along the way. I can see the scheduler eventually taking such factors as "it's Friday, and this computer is shut down over the weekend" into account. It's easy to say, but it's really hard to describe algorithmically. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
wow this is very complicated depending how u go about it, and i thoguht di-dactylos implied a 2- fingered typist.. obviously way better than that :)
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear me, no... two fingered is one of the primary roots, but not in that sense - in the British sense :-)
Also it plays on "didactic" and "Diogenes". Credit goes to Terry Pratchett, who's character it is. (cf. Small Gods) |
||
|
|
keithhenry
Ace Cruncher Senile old farts of the world ....uh.....uh..... nevermind Joined: Nov 18, 2004 Post Count: 18667 Status: Offline Project Badges:
|
Hmmm, it would seem then as suggested by Trog Dog that we at least try switching to sending a WU out only to three users. If/when one of those three errors out or does not return in time, the WU gets sent out to a fourth user. If the majority of the WUs successfully process in time by three users, we have our quorum, right? So then the user that would have been the fourth is crunching on another WU instead. Yes, this would mean that those WUs that do need to be sent out to a fourth user would end up taking longer (wall time) to get processed that otherwise but that's the case now if a WU gets sent out to a fifth user (hopefully not to the same magnitude of course). I would suspect that this would be technically simple to switch as all that would change is the number of users in the initial send. The behavior for WUs needing a fourth would be the same now for a fifth user. While we might see a small increase in the number of WUs "delayed" compared to the current process, it would seem that we would still get a net increase in throughput. Yes, we could improve from there but this may be a fairly simple change that we could gain from now......
---------------------------------------- |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
While we might see a small increase in the number of WUs "delayed" compared to the current process, it would seem that we would still get a net increase in throughput. Yes, we could improve from there but this may be a fairly simple change that we could gain from now...... Does it really matter to the bottom line if some WUs are "delayed"? If WUs sent tomorrow are created on the basis of what has been learned from WUs that were validated by the 3 quorum today, then there might be an argument for issuing 4 at once. That argument would be to guarantee generation of new WUs is never delayed. But that does not seem to be the case, or am I mistaken (just a newb here). -- |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Dagorath,
Does it really matter to the bottom line if some WUs are "delayed"? ???? There has been a lot of development work trying to make the client that members see run smoothly. But occasional comments from the staff make it obvious that the server side we never see is much more jury-rigged. There have been problems with the sheer amount of results waiting around to be returned to the project using the UD client. Meanwhile, the BOINC scheduler has sometimes hiccuped and told members that no work is available until knreed gets on the server and bangs on the queu. There may be pressure on the staff for a quick return of results to the project that we do not know of. I have always had the feeling that they are eager to clean out the old work units, even at the expense of sending out additional work units. But I do not know if there is any reason for this, so. . . [shrug]Lawrence |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Another metric to monitor.
*makes a note* |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
There have been problems with the sheer amount of results waiting around to be returned to the project using the UD client. Meanwhile, the BOINC scheduler has sometimes hiccuped and told members that no work is available until knreed gets on the server and bangs on the queu. So it doesn't work much better than the pop machine down the hall. Oh well, not many things do. We remember the tortoise vs. the hare. Sometimes a slower, steadier pace wins the race. -- |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Not sure but wouldn't the WU you dump be sent out to somebody else? If so then the net result is that 2 crunchers waste time on it rather than just 1 cruncher. -- not at all! i would not want to dump any w/u if it has a chance of being usefull, ie if 3 are returned and stated as valid - only then would i consider dumping in favour of new work OK, that was dumb question on my part. Of course the sheduler is smart enough not to send the WU again if it has been validated. Why not just enable each client software to do what retep57 does manually? UD client could check with the scheduler periodically to see if the WU it is working on (or is about to work on) has already been validated and dump the WU if it has already validated. Credit can be prorated when a WU is dumped midway. -- |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Dagorath,
You are suggesting that our loose cluster of computers be more tightly coupled. Most grids do this as a matter of course. But our public grid is very loosely coupled. Eventually, as the infrastructure improves and almost everybody has cheap, fast networking, we probably will. But as long as we have many members on dialup or with other connectivity problems, we probably will not bother. Just my opinion, Lawrence |
||
|
|
|