Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 5
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 2256 times and has 4 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Essay about distributed computing, WCG and the future. Comments welcome.

First i would like to state, that im not a scientist (done 1½ year biotechnology education), i dont do computer programming, so some of my thoughts may (will) be unrealistick compared to whats possible.

So WCG has almost completed this project.

What i cant understand about this project, is that we can fold so many proteins, while folding at home uses thousands of times more computertime to do the same.

There is obviosoly some simplification in our model compared to the folding@home model. Another odd thing i noticed about this project, is that as i understand it the sequences bieng folded are up to 150 amino acids, while proteins is generally 100 - 1000 aminoacids. This must meen that most exsisting proteins in the human genome falls outside the model limitation of WCG.

This leeds me to the next question, where i would like to give my thoughts about what WCG could do next.

There should be a thourugh analysis about how valid the results generated by WCG are, and if computermodeling can help determine that, then we should do that. If the results are so good, that scientists can actuarly use them, then it is fine, but there should be some way to predict proteins that have a high probability of bieng misfolded in the simulation. If this is the case, then i think we should work on this.

As i understand it every task completede simulatets several foldings. For the long proteins, lets say 1000 aminoacids, why dont we do the model properly on them and decrese the number of runs in every run ? Would that be too time consuming or demand too high hardware standarts (eg. too much RAM) ?

Lets supose all these considerations (maby after some refolding) turns out to be really usefull to scientists in real life, then what ?

I see 3 options. Either wee use the results to make a better model to predict structures if possible, then we redo the project, if it will have any benifit to scientists.

We could also fold the rest of the sequenced genomes (over 60 bacterial genomes are sequenced), so we could build up a database, that all biological scientists could use, not primarely scientists studing humans. That would also (If the folds are close to real world folds) be a great tool for further bioinformatic analysis.

The third option is what it seems WCG would do. That is to run an entirely differend procejt. There is a lot of different options here, i think it could be intresting to study directionel mutations in bacterial genomes. Directionel mutations is accumulation of mutations in DNA in bacterial genomes, that originates from another bacterial spicies. Thiese secquences will mutate to give rise to proteins that is better suted to the biochemical enviroment in the new spicies. I think that could be intresting to study, because it would give insights into evolution, genetics, biochemistry and there are a lot of partial of full sequence information avalable. Further i think it could be intresting, because such a project isent bieng cunducted by any other distributed project.

Whatever they decide to run, i think it should be first and foremost beneficial, next inovative (please bring new projects to the public distributed computing domain, because there is so much potential in it) and finaly it should be nonprofit.

The final thing they should consider was that they could do more than 1 option. I think it would be great, if a scientist needed a folding done, and he could upload the sequence to WCG and the folding could be done in days instead of months.

Anyways, i fould like to hear comments on my thoughts, what do you all think WCG and distributed computing should be all about, and what are the perspectives ?
[May 13, 2005 3:22:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Essay about distributed computing, WCG and the future. Comments welcome.

The folding at Home project attempts to fold a protien as it would fold in nature. Microsecond by microsecond. I belive in the begining IBM concieved Blue Gene ( fastest supercomputer in the world) to do this very same thing.
Apparently in order to fold a protien exactly takes huge amounts of compute power. Thus folding at home attempts to use every avalible resource.
The Rosetta program uses a basic set of "rules" to predict in a statistical manner a LIKELY fold a protien sequence may take. Look up the CASP competitions to gleen more info.
[May 13, 2005 9:26:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Viktors
Former World Community Grid Tech
Joined: Sep 20, 2004
Post Count: 653
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Essay about distributed computing, WCG and the future. Comments welcome.

RE: [What i cant understand about this project, is that we can fold so many proteins, while folding at home uses thousands of times more computer time to do the same.]

The folding at home project does the folding by simulating the process down to the atomic level, and is in part actually studying the folding process itself. World Community Grid is using Rosetta, which takes a much higher level approach and works with whole amino acids and small sub-sequences of them, applying rules which have proven to rather accurately predict how the proteins fold. This speeds up the computation by an enormous amount, but some of the very fine details of the folded protein may not be accurate. For the proteins identified to be of particular interest to scientists, we are considering running a second phase of the project to further refine the detailed structure of those proteins using much longer Rosetta runs, configured to operate at a much more detailed level. Even with these higher resolutions runs, it should take much less time than using atomic simulation approaches.

Disclaimer: This is just my vague characterization and not necessarily the precise scientific answer.
[May 14, 2005 3:42:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Essay about distributed computing, WCG and the future. Comments welcome.

Great news, Viktors!

The page about the Human Proteome Folding Project at ISB here ( http://www.systemsbiology.net/Default.aspx?pagename=humanproteome ) has been updated to make it more informative. Rosetta uses tables of possible configurations of amino acid sequences 3-9 residues (amino acids) long taken from the Protein Data Bank. This way it does not have to calculate the atomic forces within the segment but instead just examines the forces between a segment and its neighborhood to see if that configuration is likely or not as it folds the protein. This is the low-resolution method of folding that Rosetta uses. Years of experimentation have shown that the high-resolution method (whatever that is, I cannot tell from the papers that I have read) produces more accurate results (hence more useful) than the low resolution method that I have described. The University of Washington has spent several years developing a confidence function (which I can only guess about, since I have never found a description) to predict the probability that the prediction of Rosetta is good. We make about 10,000 trial predictions for each protein and the ISB post-processes them to produce a prediction with a given confidence level (the only level I have heard mentioned is at least 90%). Some limitations are that we stick to short proteins (<= 150 residues) but we sometimes concentrate on 'domains' which apparently are part of longer proteins.

Apparently, according to the ISB page, we are also folding some proteins from human pathogens, so we are not confined to the strictly human proteome.

Viktors is being cautious when he speaks of microseconds in the simulation run by Folding@Home. I tried to write an earlier response to your post yesterday, but stopped just part way through. Let me quote a section from my unfinished draft.
If I understand correctly, Folding@Home is simulating molecules, which means potentially splitting up each second into quadrillions of quantum intervals. They must be trying to figure out how few intervals they can get away with for different configurations. The end goal? Well, the Hollywood version of the end goal is a large screen showing a changing vector-drawn outline writhing before the cool gaze of a scientist in a white lab coat. Suddenly he clicks on a mouse, runs the screen back slightly in time, enlarges the one crucial area, and starts it running forward much more slowly. Then he bounces to his feet [OK, Hollywood would have her bounce, a lot] and shouts: 'That's it! Eureka!!'

I had better add that as far as I know Predictor@Home is trying to develop alternative methods to fold proteins. We are not trying to improve Rosetta's methods but are just using it to try to predict protein folds (structures). The hope is that this will prove useful to medical and biological researchers.

Lots more I could say. That was a wide-ranging essay you wrote, Tex. But I will let my fingers cool off now.

mycrofth
[May 14, 2005 5:15:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Essay about distributed computing, WCG and the future. Comments welcome.

For the proteins identified to be of particular interest to scientists, we are considering running a second phase of the project to further refine the detailed structure of those proteins using much longer Rosetta runs, configured to operate at a much more detailed level. Even with these higher resolutions runs, it should take much less time than using atomic simulation approaches.


This would be a first good use of the low res database that ISB will post process the results into.
[May 14, 2005 8:23:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread