World Community Grid - View Thread

World Community Grid Forums

Category: Completed Research

Forum: Human Proteome Folding

Thread: latest status image

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 14

[ ]

Author

This topic has been viewed 5126 times and has 13 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


latest status image

Crunchers unite! (seriously, I love you people) A great many years of work is coming to fruition this fall, and without you people that vollentered spare cycles we would not be here.

As you can see we’re progressing at a pretty constant rate now. With only a small fraction of the folding left and Dec-2005 right around the corner we’ve (Lars Malmstroem, Rich Bonneau, Mike Riffle :: at the UW and the ISB) been focusing on getting the results formatted for biologists. A prototype of the database is up for yeast and some other bugs and should be up for Human soon Thanks all you volunteers for cranking through so many proteins so far…. So far the average volunteer has folded 5 million protein conformations, most of which are upstairs on the disk-pack (not to mention the bazillion confs the scoring fnx has kicked out)!

[Oct 8, 2005 10:00:00 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: latest status image

Hi Dr Bonneau

Good to hear from you again, even better to hear that we are progressing so well; and thanks for the appreciation blushing

[Oct 8, 2005 11:09:19 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: latest status image

A big Thank You from Team Vulture Central III too. We really appreciate the feedback and, as usual, I will post the image on our DCZone forum to ensure more WCG crunchers get the opportunity to see it.

[Oct 9, 2005 12:17:08 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: latest status image

awesome!

it's also good to feel honestly informed about what's going on. thanks.

[Oct 9, 2005 9:48:08 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: latest status image

hi dr bonneau smile

something i've wondered for the longest time, and has been asked several times at grid.org's forums is why your charts seem completely different then the statistics provided by both projects?

in a post this past sunday at grid.org questioning it, the following statistics were posted (i doubt they've changed considerably in 2 days)

World Community Grid
Run Time.................17,635:193:02:09:15
Points Generated.......3,908,666,511
Results Returned.......16,379,426
Batches Completed.....70

Grid.org
Total CPU Time.........25,314:101:10:51:18
Points Generated.......5,088,869,442
Results Returned........9,340,008
Batches Completed.....22

in 2/3 of the time grid.org was running, wcg seems to have amazingly completed over 3 times as much data.

some ideas i've heard for this include grid having much slower machines(possible, though i don't think it would account for such a giant difference) and grid having higher redundancy(also possible, though again even if grid's redundancy was twice that of wcg, judging by computing time, they shouldn't be that far behind on the charts). another possibility is wcg is getting easier batches then grid......interestingly, it's not uncommon to receive a workunit there that's 40+ hours long(ask the recent people that have arrived here at wcg from grid.org, they'll probably tell you they're amazed at the shortness of units here), and the minimum ram requirement is twice that which is required here. it's like running 2 completely different projects, though for the past 11 months or so that the project has been running, they were advertised as being the same?

i guess in summary, i'm just curious to hear your speculation on the differences between the two.

[Oct 11, 2005 8:11:40 AM]

Viktors
Former World Community Grid Tech
Joined: Sep 20, 2004
Post Count: 653
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

180 day badge for Help Cure Muscular Dystrophy

90 day badge for Discovering Dengue Drugs - Together

180 day badge for Nutritious Rice for the World

90 day badge for The Clean Energy Project

1 year badge for Help Fight Childhood Cancer

180 day badge for Influenza Antiviral Drug Search

1 year badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

1 year badge for GO Fight Against Malaria

180 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

180 day badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

180 day badge for Microbiome Immunity Project

180 day badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: latest status image

I cannot speculate about how UD runs grid.org. However, I can clear up a few points about what we do here.

First, the average work unit run time is tunable by us. For example, the set of work units needed to process a given gene can be tuned in length and quantity. For example 20 10-hour-work-units could be used to process the same amount of total work as 10 20-hour-work-units. The overall result is the same, but members get more frequent feedback when shorter running work units are used. However, our servers have to handle more network traffic running shorter work units. For now, we are quite happy with the 10-hour average. At the start of the project our average was 20 hours and because of the wide variation in difficulty of particular work units (the part of the estimation we cannot predict), those work units were sometimes getting in the 200-300 hour run-time range. So, by cutting the average in half, the really long ones got cut in half on average (we still see some in the 100+ hour range) and the really short ones are as just a couple hours or so. So the trade-off is network bandwidth and average work unit run time. We will continue to keep them shorter unless we find that for some reason they need to be lengthened (or shortened).

We do not run easier genes or anything like that. I have found relatively little variation in the overall time to process any given batch of work. A batch of work consists of approximately 1000 genes. Once in a while we see that a few genes create monster run times, but that is just part of the normal variation and unpredictability of how hard it is to fold a particular protein.

As for redundancy, currently each work unit is sent to a minimum of 6 machines. However, machines sometimes lose work for whatever reasons and come back for new work, so not all of the assigned work gets finished and returned. We send out more of these straggler work units to additional machines. And as time grows longer we also relax the redundancy compare requirement somewhat for the last small percent of the work units. Without these measures, our disk storage would run out before the oldest batch gets finished. We would probably have to reprocess the all the batches under way from a clean slate, because it would be hard to patch everything up after running out of space. We process the returned results around the clock so that we can free up space used by finished work units. Even so, we sometimes get dangerously close to running out of disk space, so we have to keep a close eye on everything. In fact, we are planning a large disk space increase to better accommodate this and new projects. We will soon be announcing a several hour outage to let us restructure our storage layout.

We have set the minimum physical memory requirement at 128MB. We wanted as many machines as possible that could reasonably contribute, so we didn't want to set this value too high. However, we do depend very much on virtual memory. The Rosetta program allocates approximately 200MB of memory, which obviously does not fit in 128MB. However, once the computation gets under way, the working set size is only about 25MB, so the 128MB machines can typically handle this, albeit with some extra paging activity. We do find that quite a few member machines have their maximum virtual memory size set too low or have other applications consuming most of the virtual memory allotment. This crashes Rosetta and that work unit ends up not being finished by that machine. Some of the future projects may have much higher hardware requirements and might not be suitable for all of the member machines. We hope to keep a mix of projects going, some of which require less resources so that as many member machines as possible can contribute. Also, in the future we hope to develop the ability to assign shorter work units to slower machines, so that their run times are not so long.

FWIW, that is my long winded 2 cents on the subject.

[Oct 12, 2005 3:40:15 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: latest status image

thanks for the explanation on how wcg runs, viktors. smile

very informative.

As for redundancy, currently each work unit is sent to a minimum of 6 machines. However, machines sometimes lose work for whatever reasons and come back for new work, so not all of the assigned work gets finished and returned. We send out more of these straggler work units to additional machines.

am i interpreting right in thinking redundancy is set to 6 results here then, since i assume you continue sending the unit out until 6 returns have been received? if so, that makes the statistical differences even more strange...a recent posting by ud indicated they were only using 3 results for redundancy.

interesting about disk space concerns, and glad to hear you have plans to improve upon it.

still interested to hear from dr bonneau himself if he gets a chance as i assume he's been working with wcg as well as grid.org, and may have further insight on the differences.

[Oct 12, 2005 8:46:57 AM]

Viktors
Former World Community Grid Tech
Joined: Sep 20, 2004
Post Count: 653
Status: Offline
Project Badges:


Re: latest status image

We don't keep sending out repeated work until 6 are received. This would cause considerable inefficiency, so we usually send out at least 6 so that we are very likely to get 5 back. We don't have precise control over how many get sent out because of the nature of the scheduler, so we have found ways to work around that to achieve acceptable efficiency. Our compares start at 5 and for a small percentage of the work units the compare count is lowered as I mentioned above.

[Oct 12, 2005 2:47:15 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: latest status image

Hi,
thanks for the interest. As to the difference between UD and IBM grids
i would not want to comment right now, as I work more
closely with IBM, and know less about the UD side of the grid.

one additional explanation is that UD runs multiple projects and is a small
company compared to IBM, thus they have devoted less of grid.org
to the HPF project and more to other projects (i don't know what those
projects are).

sorry I couldn't be more informative...

[Oct 12, 2005 5:46:32 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: latest status image

I find the lack of interest/knowledge in and of one of ones partners rather disturbing.

[Oct 12, 2005 6:55:39 PM]

[ ]