| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 27
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Human Proteome Folding Scientist: Please provide an update on how far along the Human Proteome Folding project is relative to results processing. In looking at the results that you have seen, have you spotted any exciting prospects?
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes anything useful coming out of all this processing - is it possible to also provide a full ranking system of the top 1000 users ?) just to see who is in front i guess the big unis etc would take that honour.
|
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I am copying this post by bbover3 from the Member News forum to consolidate all the Human Proteome Folding Project information in one forum. Here is Status Update of the Human Proteome Folding project dated 24 January 2005 at http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=1594
Here's an update on the Human Proteome Folding project from the Institute for Systems Biology (ISB), who owns the project. Overview: World Community Grid has sent large batches of results back to ISB and ISB is running them through the post-processing stage of the process, and the results look good. This means that World Community Grid is working just as it should. Our best estimates at this time are that Human Proteome Folding is from 15 to 20 percent complete. ISB will be delivering real information out to biologists within 3 to 6 months (the delay being the post-processing and human interpretation of a few of the results). Validation: All internal tests indicate that Rosetta is running correctly, that we're making good structures, and that we're not corrupting the input or output files anywhere. Several proteins have been run through the model-selection phase and show that the grid is operating correctly. General Progress / General Results: As of January 21, 2005, World Community Grid has processed approximately 9% of the protein sequences in the entire project. Grid.org is probably close to processing a comparable number. For each batch of 1,000 sequences, the work is subdivided into about 50,000 (+-20,000) individual work units. A heuristic, based on the length of a gene sequence, is used to decide how much to subdivide the work to make the work units average about 10 hours of run time. Even with this, the actual run-times vary over about a 10 to 1 range or more. Each work unit is sent to more than one device so that results can be compared to eliminate those that may have encountered some sort of problem. Because of the wide range of device characteristics and run-time patterns, the results arrive in a bell curve-like manner over time. Furthermore, because of the unpredictably long running work units, it takes extra time to receive the longer running work unit results for a batch of work. This leads to some imprecision about how far along we are. ISB is starting the post-processing phase and will soon be populating a data-base of structures that biologists will actually use. So another way to state our progress is that we've accumulated enough results to start the data-analysis/post-processing phase. Biological Results: After the structures are preprocessed, scientists will compare these with proteins of known function to narrow their studies to the most likely candidate functions. The shape information will be used to look for interactions that may occur with other proteins, disease pathogens, etc. Currently our post-processing is revealing the shapes of these proteins. Images of interesting examples will be posted at the ISB site after some time. It is still too early in the process to tell whether we have enough information for curing specific diseases. However, the scientists at ISB are very positive and upbeat about the potential of this project. Going forward: In late February we plan to create a Forum specifically dedicated to questions and discussions about the Human Proteome Folding project. This forum will be monitored by scientists from ISB. For more details about Human Proteome Folding, please refer to the following: http://www.worldcommunitygrid.org/projects_showcase/human_proteome.html http://www.systemsbiology.org/Default.aspx?pagename=humanproteome |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yes. I also would like to know how much of the subject project has been completed.1. How long would it take to complete the total project? Any estimation? 2. How many % of the project has been completed, so far? 3. What can we do to help to expedite the progress? |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
1. Six Months
2. Project is 20 percent completed 3. Put the World Community Agent on more Computers preferably in the 3 ghz plus range. Crunch ON!!!! |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
As of today (mid feb 2005) we are 22% of the way through.
We are now in the process of post-processing that data and getting the resulting protein annotations into our database that will be free to the public. From those 22k runs we will likely generate 11k fold predictions with confidence greater than 90%. I won't know for sure the total number of good predictions for human until the project is over, including resending failed runs and other touch up type efforts at the end. Also work units that will in effect be internal controls are in the process of being completed. This work will help us to refign the confidence function that we use to determine good from bad predictions and will make the technology ultimately more useful. Rich Bonneau ISB |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Rich,
Before joining WCG, I participated in "Folding at Home" for years for Stanford University. Never was there any feedback on anything I had done over the years for Stanford. I jumped to IBM's WCG when I saw it forming because I was just sure that the feedback to users would be much better. Have we found a cure for some disease? Have we mapped the whole "human whatever thingy" and now we are working on a cure for cancer? What have you... Sadly, now with WCG, I still have no feedback. Feedback is so important to maintaining momentum and both Stanford and IBM's WCG have missed that mark so far. Since WCG has complete control over our screen savers anyway, why not use that fact to give us progress reports that would encourage us? Queue up some messages to give me when my screensaver kicks in, or email me an update once every week or two to let me know what great things are happening for humanity. How 'bout it? Joe Underwood |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
beekeepr,
----------------------------------------Have we found a cure for some disease? Have we mapped the whole "human whatever thingy" and now we are working on a cure for cancer? Waxing sarky, are we?? Well, let's try asking Rich some specific questions about our immediate progress and about the program that we see running in the Application View of the Grid Agent.First off, let's start by stipulating that the precise number of proteins to be run is still not determined, but for the sake of discussion is taken to be 100K. Also, there will be some touchup units at the end, so the project will run full steam ahead until it is 99% complete, and then will slow sharply. By this time we can expect to mostly be running another project or projects, but some HPF units will be occasionally doled out. Perhaps, for PR purposes, we will call the project complete and give the touchup some other name. In essence this was done with the Human Genome Project. Everybody hates a PR black eye, but real scientific research seldom meets the standards of Madison Avenue, which is why all those people in white lab coats in the ads are actors. Specific questions about project progress. You have received results for about 22K proteins back from our 2 grids, GRID.ORG and the World Community Grid. 1) How many have you sent from ISB? 2) How many have you got batched up ready to send at ISB? 3) Are you still running a computer program over the Human Genome to prepare more proteins to send or is it all done? 4) Are you using some specific database to prepare only a specific subset of the human proteins for us to crunch on? 5) If so, is there any question we could ask about which subset that would not be impossibly long to answer? 6) Could you drop a note when a batch of results is received back at the ISB, so we can mark another step toward project completion? 7) How do you batch up proteins to send to us – is there some set number of proteins per batch? 8) What sorts of batches of results come back? I assume that all the results for a protein will be kept together. But is an entire batch of proteins that you sent returned as a group? Several batches at a time? Are individual proteins, regardless of how they were received, just queued up as they are completed and returned? How many proteins are returned at a time? 9) Please note that I am not asking for any specific breakdown by grid. That is a public relations decision that should be decided jointly by Paul and Bill. But with that granted, I would personally prefer information about progress of this sort correctly attributed to each grid. Added: [ Happy camper here] In accordance with Rick Alther's post below, note the terminology change. Read 'fold prediction' wherever I use the term 'result' below. 'Result' should only be used to describe a completed work unit.]Then there are specific questions about Rosetta. The Application View shows progress jumping up several tenths of a percent when a result is completed and written to the file on the hard drive. If it jumps up 0.2%, I assume that the work unit requires about 500 results. If it jumps up 0.7%, I assume that the work unit only requires about 140 results and so on. 1) Are there a standard number of results required per protein? 2) Is this variation in individual work units just an attempt to split the work into short chunks? 3) Or, does the number of results required per protein vary with the nature of the protein? 4) For that matter, what is a result? I have been assuming that it represents an attempted fold, with some parameters varied. 5) Is each result returned with an attached Rosetta value? There are more questions that could be asked about Rosetta, but I may already be wandering far off the path in the wilderness. I do not ask for a long essay attempting to answer everything at once. You need to sleep and get a lot of work done while keeping up paperwork and attending meetings. Still, I think the questions presented here are of general interest. Lawrence ( p.s: beekeeper, see the difference between specific questions and generalized discontent? ) [Edit 1 times, last edit by Former Member at Feb 25, 2005 9:02:31 PM] |
||
|
|
Alther
Former World Community Grid Tech United States of America Joined: Sep 30, 2004 Post Count: 414 Status: Offline Project Badges:
|
I can't answer the ISB questions but I can answer the following. Let's clear up a little terminology first. You are using the term "results" and having it mean 2 things: the result from a device and the "results" contained in a result.
----------------------------------------We use the term "result" to mean something returned by the client. Each "result" contains n number of fold predictions for a specific protein. On to the questions: 1) Are there a standard number of results required per protein? Yes and no. The number of results we need varies based on the protein sequence length. The number of fold predictions we need is constant though. We currently compute roughly 10,000 predictions for each protein. 2) Is this variation in individual work units just an attempt to split the work into short chunks? 3) Or, does the number of results required per protein vary with the nature of the protein? Yes and no. Proteins of longer sequence length are "harder" than shorter sequence lengths. But workunits for longer sequence lengths compute fewer predictions than those of shorter sequence lengths so we try to even it out that way. However, the real reason for the large variation is that Rosetta works by randomly trying to fold the protein based on rules. Based on where it starts and what it picks to fold next, Rosetta may fold the protein quickly or it may take a very long time. Because it's random (well...pseudo random), we don't know until a device completes a workunit how "hard" it is. 4) For that matter, what is a result? I have been assuming that it represents an attempted fold, with some parameters varied. 5) Is each result returned with an attached Rosetta value? A result from a device consists of n fold predictions. Each prediction contains the energy and positional information for each amino acid along with various scores associated with that prediction.
Rick Alther
Former World Community Grid Developer |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Rich, Since WCG has complete control over our screen savers anyway, why not use that fact to give us progress reports that would encourage us? Queue up some messages to give me when my screensaver kicks in, or email me an update once every week or two to let me know what great things are happening for humanity. How 'bout it? Joe Underwood Hey Joe, I share your need for feedback, but I must say there has been at least SOME feedback on the progress, but you still have to ask for it. The UD software is still very much in development, and aside from other stuff, a linux client is supposed to be in the works, so these people probably got their hands full.. Also in relatively short notice, I expect to hear about the freely accessible database with the results that WCG produced. As a last thing I`d say it`s nice to see your conscience patted on the back relatively often, but it`s not beneficial to anything. What matters is that you keep crunching, and that eventually people can be cured with what we do here. I inquired also for enough reason to keep crunching, and I received enough information to decide for myself that this project is indeed a good cause to invest free cycles in, and that I (Joe public) get something back out of this. WCG, Grid.org and UD are still in the process of establishing a concept for free contribution of resources to good causes, protecting the contributers rights and preventing commercial interest from running off with the good stuff. I hope they are sincere in their endeavers. I believe in the long term, this model can change the way the medical industry is built around their multi billion dollar research investments which they need to see repayed. I hope projects like this can be to the medical industry what the free software movement (linux et al) has been to the general software industry. So there is more to this than just results. I hope this can make you feel better participating ? :) Cheers |
||
|
|
|