Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 14
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Crunchers unite! (seriously, I love you people) A great many years of work is coming to fruition this fall, and without you people that vollentered spare cycles we would not be here.
![]() As you can see we’re progressing at a pretty constant rate now. With only a small fraction of the folding left and Dec-2005 right around the corner we’ve (Lars Malmstroem, Rich Bonneau, Mike Riffle :: at the UW and the ISB) been focusing on getting the results formatted for biologists. A prototype of the database is up for yeast and some other bugs and should be up for Human soon Thanks all you volunteers for cranking through so many proteins so far…. So far the average volunteer has folded 5 million protein conformations, most of which are upstairs on the disk-pack (not to mention the bazillion confs the scoring fnx has kicked out)! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Dr Bonneau
Good to hear from you again, even better to hear that we are progressing so well; and thanks for the appreciation ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
A big Thank You from Team Vulture Central III too. We really appreciate the feedback and, as usual, I will post the image on our DCZone forum to ensure more WCG crunchers get the opportunity to see it.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
awesome!
it's also good to feel honestly informed about what's going on. thanks. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
hi dr bonneau
![]() something i've wondered for the longest time, and has been asked several times at grid.org's forums is why your charts seem completely different then the statistics provided by both projects? in a post this past sunday at grid.org questioning it, the following statistics were posted (i doubt they've changed considerably in 2 days) World Community Grid Run Time.................17,635:193:02:09:15 Points Generated.......3,908,666,511 Results Returned.......16,379,426 Batches Completed.....70 Grid.org Total CPU Time.........25,314:101:10:51:18 Points Generated.......5,088,869,442 Results Returned........9,340,008 Batches Completed.....22 in 2/3 of the time grid.org was running, wcg seems to have amazingly completed over 3 times as much data. some ideas i've heard for this include grid having much slower machines(possible, though i don't think it would account for such a giant difference) and grid having higher redundancy(also possible, though again even if grid's redundancy was twice that of wcg, judging by computing time, they shouldn't be that far behind on the charts). another possibility is wcg is getting easier batches then grid......interestingly, it's not uncommon to receive a workunit there that's 40+ hours long(ask the recent people that have arrived here at wcg from grid.org, they'll probably tell you they're amazed at the shortness of units here), and the minimum ram requirement is twice that which is required here. it's like running 2 completely different projects, though for the past 11 months or so that the project has been running, they were advertised as being the same? i guess in summary, i'm just curious to hear your speculation on the differences between the two. |
||
|
Viktors
Former World Community Grid Tech Joined: Sep 20, 2004 Post Count: 653 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I cannot speculate about how UD runs grid.org. However, I can clear up a few points about what we do here.
First, the average work unit run time is tunable by us. For example, the set of work units needed to process a given gene can be tuned in length and quantity. For example 20 10-hour-work-units could be used to process the same amount of total work as 10 20-hour-work-units. The overall result is the same, but members get more frequent feedback when shorter running work units are used. However, our servers have to handle more network traffic running shorter work units. For now, we are quite happy with the 10-hour average. At the start of the project our average was 20 hours and because of the wide variation in difficulty of particular work units (the part of the estimation we cannot predict), those work units were sometimes getting in the 200-300 hour run-time range. So, by cutting the average in half, the really long ones got cut in half on average (we still see some in the 100+ hour range) and the really short ones are as just a couple hours or so. So the trade-off is network bandwidth and average work unit run time. We will continue to keep them shorter unless we find that for some reason they need to be lengthened (or shortened). We do not run easier genes or anything like that. I have found relatively little variation in the overall time to process any given batch of work. A batch of work consists of approximately 1000 genes. Once in a while we see that a few genes create monster run times, but that is just part of the normal variation and unpredictability of how hard it is to fold a particular protein. As for redundancy, currently each work unit is sent to a minimum of 6 machines. However, machines sometimes lose work for whatever reasons and come back for new work, so not all of the assigned work gets finished and returned. We send out more of these straggler work units to additional machines. And as time grows longer we also relax the redundancy compare requirement somewhat for the last small percent of the work units. Without these measures, our disk storage would run out before the oldest batch gets finished. We would probably have to reprocess the all the batches under way from a clean slate, because it would be hard to patch everything up after running out of space. We process the returned results around the clock so that we can free up space used by finished work units. Even so, we sometimes get dangerously close to running out of disk space, so we have to keep a close eye on everything. In fact, we are planning a large disk space increase to better accommodate this and new projects. We will soon be announcing a several hour outage to let us restructure our storage layout. We have set the minimum physical memory requirement at 128MB. We wanted as many machines as possible that could reasonably contribute, so we didn't want to set this value too high. However, we do depend very much on virtual memory. The Rosetta program allocates approximately 200MB of memory, which obviously does not fit in 128MB. However, once the computation gets under way, the working set size is only about 25MB, so the 128MB machines can typically handle this, albeit with some extra paging activity. We do find that quite a few member machines have their maximum virtual memory size set too low or have other applications consuming most of the virtual memory allotment. This crashes Rosetta and that work unit ends up not being finished by that machine. Some of the future projects may have much higher hardware requirements and might not be suitable for all of the member machines. We hope to keep a mix of projects going, some of which require less resources so that as many member machines as possible can contribute. Also, in the future we hope to develop the ability to assign shorter work units to slower machines, so that their run times are not so long. FWIW, that is my long winded 2 cents on the subject. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
thanks for the explanation on how wcg runs, viktors.
![]() As for redundancy, currently each work unit is sent to a minimum of 6 machines. However, machines sometimes lose work for whatever reasons and come back for new work, so not all of the assigned work gets finished and returned. We send out more of these straggler work units to additional machines. am i interpreting right in thinking redundancy is set to 6 results here then, since i assume you continue sending the unit out until 6 returns have been received? if so, that makes the statistical differences even more strange...a recent posting by ud indicated they were only using 3 results for redundancy. interesting about disk space concerns, and glad to hear you have plans to improve upon it. still interested to hear from dr bonneau himself if he gets a chance as i assume he's been working with wcg as well as grid.org, and may have further insight on the differences. |
||
|
Viktors
Former World Community Grid Tech Joined: Sep 20, 2004 Post Count: 653 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We don't keep sending out repeated work until 6 are received. This would cause considerable inefficiency, so we usually send out at least 6 so that we are very likely to get 5 back. We don't have precise control over how many get sent out because of the nature of the scheduler, so we have found ways to work around that to achieve acceptable efficiency. Our compares start at 5 and for a small percentage of the work units the compare count is lowered as I mentioned above.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi,
thanks for the interest. As to the difference between UD and IBM grids i would not want to comment right now, as I work more closely with IBM, and know less about the UD side of the grid. one additional explanation is that UD runs multiple projects and is a small company compared to IBM, thus they have devoted less of grid.org to the HPF project and more to other projects (i don't know what those projects are). sorry I couldn't be more informative... |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I find the lack of interest/knowledge in and of one of ones partners rather disturbing.
|
||
|
|
![]() |