Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 76
|
![]() |
Author |
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was at the annual BOINC workshop last week and I had a chance to talk to David Anderson about the idea of sending workunits to computers of similar performance. We had some good ideas that we may be able to put into practice.
|
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
That I presume is part of plan to soon be able to size work in a fashion that all can run similar run times, not only on RICE, but also for the Autodock based projects. Was there not mention of dynamic sizing?
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
martin64
Senior Cruncher Germany Joined: May 11, 2009 Post Count: 445 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks knreed for this explanation.
----------------------------------------Although you say that the amount of work "lost" is small, wouldn't it be possible to do the verification on a (completed) parent workunit basis, rather than on single WUs with identical range? In your example, that would mean that you would send out the same parents, but different children (starting at 5500 in the first case, at 5000 in the second). Credits could then be granted on structures that are valid in both replica. @mreuter80, there is also some overhead in e.g. single quorum projects that are marked inconclusive, where other participants re-calculate the entire WU. It doesn't add to the results, but only to the level of reliability of the results. In HCMD2 you could say that the overhead neither contributes to the results nor to the reliability, but to the simplicity of the mechanism. ![]() Regards, Martin ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Don't think BOINC distribution system is able to save up many little tasks and than do a batch wide validation if that's what you meant. Sizing tasks down is not an option as it creates too much network load... the reason the jobs were sized up. The optimizing is in this device matching which will allow also to decide sending the heavier tasks of some sciences to the more powerful machines. There was talk some months ago to have a profile option such as heavy/light work plus some other bolts such as preferring longer or shorter. I forget the exact plan but it's kind of in the vein of getting closer to what can be done at Rosetta: Give me 2-4-6-12-24 hour jobs, not all sciences suitable to that approach.
----------------------------------------
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
martin64
Senior Cruncher Germany Joined: May 11, 2009 Post Count: 445 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Don't think BOINC distribution system is able to save up many little tasks and than do a batch wide validation if that's what you meant. More or less, yes. So it might still make sense to send out the 2 identical WUs to 2 computers of more or less identical speed, thus reducing the risk of having a 10-year old Pentium compete with an overclocked i7 extreme, with a lot of computer time wasted. You already indicated that this is under consideration. Regards, Martin ![]() |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Martin,
It would require substantial re-writing of BOINC to be able to do as you are saying (i.e. after the two results are returned, send out a third that only completes the work not done by the shorter). I agree that this would be the best - but it simply wasn't feasible. Sekerob, Both bits of logic will be useful. The framework to have different size workunits that are then targeted at computers that can process them in a reasonable amount of real-world time is in the latest version of the server code. We need to apply the latest updates to get it. However, for some projects like HCMD2 our estimates of difficulty are so inaccurate that we cannot use that mechanism in quite the same way. We can instead simply says workunits A,B,C will be processed by 'powerful' computers, workunits D,E will be processed by 'average' computers, and workunit F will be processed by 'less powerful' computers. We are likely to implement a mechanism to handle this second case sooner and then work with BOINC to implement several things that were discussed at the conference that will reduce the time it takes workunits (and batches) to complete. These changes benefit the members because credit will be awarded faster, it will benefit us because it will reduce rows on the result and workunits tables and reduce file system usage and it will benefit the researchers because it will get them their results a few days quicker. |
||
|
Mysteron347
Senior Cruncher Australia Joined: Apr 28, 2007 Post Count: 179 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Good description of the current strategy :)
And this explains why the child tasks seem to be consistent in size (processing time) - because the child-task-size is determined by the original processing of the parent. As I indicated, I've observed a 4:1 speed-ratio in my partners. I believe this to be a problem since the slower machine will always determine the amount of work discarded. What is required, in my view, is that each pair of machines applied to a workunit (of any generation) be as closely matched as possible. If a parent is processed by a FAST pair, then the child size may be such that a SLOW pair would need to generate a grandchild to complete the work; provided the SLOW pair is matched, the discarded results would be minimised. Equally,if the parent is processed by a SLOW pair, the children may be more numerous, but smaller and no grandchildren should be generated. (a FAST pair would run to completion and a SLOW pair should get an extension granted by the "I'm nearly finished" mechanism.) A reasonable indication of the relative speed of the machine in question would be (structures processed in last n tasks)/(CPU time taken in last n tasks) - and these figures appear to be easily available... Remember, reducing waste is equivalent to bringing possibly thousands of new processors on-line. You could even claim it's a "green" initiative... |
||
|
themoonscrescent
Veteran Cruncher UK Joined: Jul 1, 2006 Post Count: 1320 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Please excuse my lack of knowledge regarding HCMD2, but why is there a cut-off point for any system??
----------------------------------------I'm sure there's a good reason for it, but without knowing why, as long as the result is returned within the 10 days', it seems strange to have the work unit cut off after 6/12 hours of crunching? ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello themoonscrescent ,
You are a little unusual. When WCG was first started up, we quickly discovered that a large vocal segment of our members detest lengthy work units. We try to accommodate them. It does not make any real difference to the science. Lawrence |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1679 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello themoonscrescent,
----------------------------------------You have to think of people who are not keeping their system running 24 hours per day. Such people do not like having very long work units. Additionally, each turn off/on causes a restart at the last checkpoint which slows down again the computation of the work unit. For all these good reasons, WCG designs the "parent / children / grandchildren" approach even if the related validation process is not particularly trivial. Cheers, Yves |
||
|
|
![]() |