Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 20
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 11680 times and has 19 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: RNA

smile Back in the spotlight for the benefit of the New Members
[May 29, 2005 11:49:13 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
In Silico Techniques Tell How the Protein Turns

Here is an article from the July 2005 issue of ‘Genomics & Proteomics’ which Rosetta@home links to on its News Article page at http://boinc.bakerlab.org/rosetta/rah_articles.php

It contains a number of interesting images that I have omitted.

In Silico Techniques Tell How the Protein Turns

• Computational methods that can predict the structure of a protein from its amino acid sequence are improving. Developers of such programs say that the technology may one day completely supplant experimental structure determination.

• By Elizabeth Tolchin, Reed Life Science News Editor

The biological function of a protein is dependent on the protein folding into the correct structure. Predicting the structure of a protein from its amino acid sequence was once considered a distant goal. For this reason, researchers determine protein structures experimentally by X-ray crystallography and nuclear magnetic resonance (NMR), both laborious processes.

Computational methods have now reached the point where they can predict the structure of a protein with a relatively high degree of accuracy. An important question is how these structures can be used to determine the function of a protein.

"The [computational] methods are finally getting good enough and computers are getting fast enough that we are getting closer to being able to solve some of these
[image omitted]

Rosetta incorporates several algorithms for not only determining protein structure, but for also determining protein-protein interactions and designing completely new proteins. It is a combination of two software elements: Rosetta ab initio predicts the three-dimensional structure of a folded protein from its linear sequence of amino acids; and Rosetta Design used in creating better proteins by determining amino acid sequences that are good for a particular protein structure. It can also be used to enhance protein stability and create alternative sequences for naturally occurring proteins. (Source: David Baker, PhD, University of Washington


fundamental problems which have been around for 40 years, such as predicting a protein's structure accurately from its amino acid sequence," says David Baker, PhD, professor of biochemistry, University of Washington, Seattle. Baker's lab designed a well known computer program, called Rosetta, which is used for modeling macromolecular structures and interactions.

"Given the sequences of proteins, you should be able to predict their three-dimensional structures and predict how they interact," says Baker. "Once this is possible, it will have huge practical implications, because you won't have to solve structures experimentally anymore."

Rosetta incorporates several algorithms for not only determining protein structure, but for also determining protein-protein interactions and designing completely new proteins. "We developed algorithms for finding the lowest energy structure for a protein of a given sequence, the lowest energy structure for two proteins coming together, and the lowest energy amino acid sequence for a protein," says Baker. "In the area of structure prediction, we have made some real breakthroughs. We can now predict the structure of small proteins at really high accuracy in some cases. The goal for structure prediction is being able to compute all of structural biology and supplant experimental structure determination, so you shouldn't need X-ray crystallography, NMR, or mass spectroscopy."

Even if researchers have an experimentally determined high-resolution X-ray crystal structure, it is not always possible to infer the function of the protein. Baker says this is a problem in all aspects of structural genomics. "In some cases when you solve a new structure you can determine its function, but in other cases you can't. Obviously if you can't tell what the function of a protein is from a high-resolution crystal structure, you can't tell from a predicted structure."

But Baker says that there are other cases where the predicted structure may look a lot like a previously known protein or proteins whose structure have been solved previously. In those cases, you can make inferences about the function of the protein.

"Currently when a new protein sequence comes out, the first way to predict its function is by going to the sequence database and finding the protein sequences that are similar," says Sung-Hou Kim, PhD, professor of chemistry, University of California, Berkeley. "If the function of that protein is known you can attach that function to the new sequence. This is called transitive annotation. For most protein sequences, their functions are now inferred this way."
[images omitted]

(Top image) The protein structure space map (SSM). Altogether 1,898 nonredundant protein structures are represented by spheres, color-coded according to the Structural Classification of Proteins (SCOP) database). The spheres are distributed along three elongated regions. Proteins that are close together usually share similar structure and function with each other. The peripheral structures illustrate selected structure examples in the SSM.

(Bottom image) The top 10 most populated SCOP superfamilies. The names for superfamilies and their corresponding colors are listed on the upper right corner. All superfamilies have their members clustered together. P-loop-containing proteins are more spread out because they are defined by a shared sequence motif rather than global structure similarity.


Kim is working in collaboration with The Protein Structure Initiative at the NIH, which concentrates on solving structures of proteins for which there is no sequence match in the database. For this, they must first solve the three-dimensional structure, and if the structure is similar to a protein that is known, do a transitive annotation of the function based on the structural similarity.

Kim and his colleagues have created what they call a "protein structure space map" (see sidebar "Protein Structure Space") that plots proteins based on their structural similarity. "This is a little bit like trying to map the physical universe," says Kim. "Here, we take all of the known protein structures and we map them in three dimensions. This is done in such a way that if two structures are very similar, they are placed next to each other. If the structures are very different, they are placed very far apart.

"If we get a structure and find there is no other structure that belongs to the same family of proteins, we will use the map as a reference to find its closest neighboring family and do a transitive annotation based on its proximity to a neighboring family," says Kim.

Computations on a global scale

The strategy of the Human Proteome Folding Project at the Institute for Systems Biology (ISB), Seattle, is to bridge the gap between the computational and the experimental, and align structural genomics with what is known about biology.

"The degree with which we can defer [sic - infer] protein function from structure alone, where it is experimentally solved or structurally predicted, is
Structural Alignment
While Rosetta aims to solve the problem of determining new protein structures, other programs deal with the problem of aligning, or establishing a correspondence between the residues of two protein structures. Researchers look for structural similarity with the hope of discovering a common functionality, and they also look for newly determined structures. Fast methods that can correctly identify known structures that align with it are an indispensable tool.

In a recent study [J. Mol. Biol., vol. 346, pp. 1173-1188 (2005)], Rachel Kolodny, department of structural biology, Stanford University, Stanford, Calif., and others reported what they said was the largest and most comprehensive comparison of protein structural alignment programs. Six programs were included: SSAP, STRUCTAL, DALI, LSQMAN, CE, and SSM.

Their comparison found that STRUCTAL and SSM performed best, followed by LSQMAN and CE. They also propose a new "Best-of-All" that combines the best results of all methods.
"The programs all use a search procedure," says Kolodny. "In doing structural alignment you want to find sub-pieces or residues of each protein that look the same. The problem is that their relative positioning in space is not given to you, so you have to find both the sub pieces and the relative orientation at the same time."

"A protein is an object in three-dimensional space," says Kolodny. "It has X, Y, and Z coordinates for each one of its residues. In protein alignment, you have to take one protein and fix it in space and then take the other protein and rotate such a way that it can be placed on top of the other one. Some of the programs focus on finding the sub-parts that are similar and then, based on these sub-parts, finding the rotation that will place it in a good way. Some try to find the rotation and, based on the rotation, find the sub-parts."


actually quite small," says Richard Bonneau, PhD, a senior scientist at ISB. "In the end, having a completely structure-centric approach is not going to be very useful. It has to be aligned with totally orthogonal sets of data like proteomics data, microarray data, measured interaction, predicted interaction, and genomics. A great number of things will have to come into play before we can convince biologists that they should use all of this fuzzy information."

The core scientific application of the project is Rosetta. "There are a number of different programs to predict protein structure," says Bonneau, who was a graduate student in Baker's lab. "What's special about Rosetta is that it does not require you to know anything other than the protein sequence. Most programs predict protein structure based on a primary sequence or a word match to other proteins of known structure, but for roughly 40% to 50% of any newly sequenced genome, we have no annotations and no way of matching new proteins to other previously studied proteins. [With these programs,] we can do nothing but turn a blind eye to a fairly large portion of most genomes."

Bonneau says that Rosetta is one of the only programs available that can predict proteins de novo. He knows this, he says, because of a competition called CASP (Critical Assessment of Structure Prediction), organized by the Protein Structure Prediction Center, Lawrence Livermore National Laboratory, Livermore, Calif. Every two years, participants are given the sequence of a protein before the structure is publicly known, and they have to predict what that structure is and submit that to the organizers, who evaluate the results. "Because of CASP, we know that Rosetta is the best structure prediction program," says Bonneau. "We also have a good idea of how accurate it will be."

The plan for the project, he says, is to take the 40% of proteins that no one knows anything about and predict their structure using Rosetta. They will then take whatever confident predictions result from this very large computational effort and try to get biologists to look at those structures and see if they make biological sense.

Performing these computational calculations is very expensive. "It takes several days to do a single protein, and proteins sometimes have multiple domains," Bonneau says. "Each domain needs to be predicted. So you have multiple domains per protein and many proteins per genome, and many genomes. It would take us somewhere around 100 years to perform these calculations at the ISB. It will probably take us a matter of months on the World Community Grid."

The World Community Grid is a distributed computing platform created by IBM, Armonk, N.Y., to tap into the unused computational power of idle computers across the globe. There are now two grids; one is operated by IBM, the other by United Devices, Austin, Texas. There are about 3 million volunteers on the two grids. Each volunteer downloads a client program, which is the code for Rosetta. That code sits inside a program that talks to the IBM central server.

"The program grabs data that I put there, which is a protein that needs to be predicted, and feeds that back to Rosetta," says Bonneau. "Rosetta makes the prediction and the program sends it back to the central server. IBM collects several thousand of these results and sends them to me in big batches. The process itself isn't very exciting, but it effectively means that if you can cut your job up into several million pieces, you can effectively use a 3 million processor cluster. It distributes the cost and complexity of running this project. That is exciting."

So far, the project has predicted about 60,000 proteins, about halfway to the first-phase goal of predicting between 100,000 and 150,000 proteins and protein domains, derived from 16 complete genomes and the non-redundant set of large protein sequences at NCBI. The project excludes proteins that have any easy comparative modeling hit to a previously seen protein, because the project leaders don't want to waste computations on proteins that can be determined using fold recognition. They also restrict proteins to being less than 150 amino acids. "We also don't want to put in proteins that are so large they are likely to fail," says Bonneau.

"The 100,000 to 150,000 proteins and protein domains we are using were run through a domain parsing algorithm called Ginzu," he says. "Ginzu takes a protein sequence, runs a number of different primary-sequence-based methods and output domains, and tells you how to best handle those domains, using comparative modeling, fold recognition, or Rosetta. The domains can then be assembled at a later time." A second phase of the project will be for doing higher resolution structure predictions.

"The basic things we can do with these structures do are look at matches between these predicted folds and folds in the databank," says Bonneau, "and use that as a set of raw fold annotations that will then go into a more sophisticated data integration method."

Aside from finding a way to navigate and present all of this data at once, he says the larger problem will be knowing how much confidence to put into combining fold predictions with the other types of data. "How confident we can be about the conclusions that come from integrated data sets is a bigger long-term research question for a lot of different groups, including mine."
Protein Structure Space
Researchers like Jung-Hou Kim, of the University of California, Berkeley, and his colleagues are taking protein structure alignment a step further by seeing if they can create a system for mapping all the known structures on the basis of their similarity to one another and then using the resulting map to infer the function of unknown proteins based on where they fit.

A mathematical method used to generate the map. "You take two protein structures and align them together in all possible ways until you find the best position and then you calculate the score of how well they match to each other," says Kim. "This similarity score is then converted to a negative value representing their dissimilarity. The dissimilarity score is used to create the distance between the two proteins." There is then another mathematical method that performs a long calculation which combines all these distances and creates the final "protein structure space map" that Kim's group recently published [J. Hou et al., Proc. Natl. Acad. Sci. U.S.A., vol. 102, pp. 3651-3656 (2005)].

Any one point on the map represents a family of protein structures that are similar to each other. An adjacent point on the map would then be another family of structures, which are still similar but are not as closely related as those within the family. Further neighboring families represent relationships that are more remote. The map now includes about 30,000 known protein structures belonging to about 2,000 structure families.

"We then asked what the functions of all of these proteins are," says Kim. "When we looked within the same family, we found that the functions are either the same or related to each other. It also turns out that members of the neighboring families have functions that are a little bit more remote than within the family but are still somewhat related. So we find that when we map proteins based on their structural similarity to one another, it maps their functional similarity as well. Form and function are always related."

The map gives the demographic distribution of all proteins structures in one space with very clear demarcation based on the similarity of the structures. It also gives an indication of how proteins may have evolved; a subject on which Kim's group is currently writing a paper. "It looks like a protein structure that was invented early in evolution is in one part of the universe and then, out of this, more mature proteins, which have existed for a long time, evolve out," Kim says. "So not only does it give a demographic of the protein structure but it also looks like its given us and evolution or migration map, which is something we never expected."

[Sep 29, 2005 6:27:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Protein Folding Overview - For Those Who Are Curious - A Facinating Read

smile Back in the spotlight for the benefit of the new members here
[Oct 14, 2005 4:25:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Johnny Cool
Ace Cruncher
USA
Joined: Jul 28, 2005
Post Count: 8621
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Protein Folding Overview - For Those Who Are Curious - A Facinating Read

I have received many questions on this subject. I have copied and pasted these posts to Team Mates.

Not being a Scientist, I have many questions and have fielded many questions from Team Mates concerning this mattter.

These so-called "easy questions and answers" have left many puzzeled.

I have read all the FAQ's here.

What diseases are we fighting here? It sounds too generic.

With an understanding of how each protein affects human health, scientists can develop new cures for human diseases such as cancer, HIV/AIDS, SARS, and malaria.

What about Diabetes or Migraine headaches? These are common questions that I get from Team Mates.

And yes, what *about* Cancer???
----------------------------------------

Team Andrax Co-Captain
Free-DC Stats
Join Team Andrax at WCG
[Nov 2, 2005 4:15:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Protein Folding Overview - For Those Who Are Curious - A Facinating Read

Hi Johnny Cool,

The HPF project is basic research aimed at understanding how the human cell operates. It is just another brick in a massive structure that biologists are building. It may help contribute to understanding that will eventually lead to better medical diagnostic techniques and / or cures. Or it might not. Basic research is like that. It adds to the knowledge base that medical researchers searching for cures draw on, but does not point in any particular direction.

What I have said is true, but misleading. A lot of drugs are targeted to interfere with some cellular pathway (functional network). The HPF project should help scientists identify many proteins involved in these networks that are currently unknown. So after a few years of study, drug researchers should be able to derive a long list of new protein targets and try to develop drugs to bind with them and inactivate them. So the HPF project is basic research that is very closely linked with applied medical research. Which is why it qualified to run on the World Community Grid as a project to benefit humanity, rather than just an academic exercise. But it will probably be several years before there are any new drug research projects as a result, since we still need scientific researchers to study our protein results and annotate which cellular pathway they are involved in.

mycrofth
[Nov 2, 2005 4:18:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
TiBaal89
Cruncher
Joined: Nov 1, 2005
Post Count: 1
Status: Offline
Reply to this Post  Reply with Quote 
Re: Protein Folding Overview - For Those Who Are Curious - A Facinating Read

smile Back in the spotlight for the benefit of the new members here


Yes, thank you. Very informative and interesting - glad to be here. cool
[Nov 3, 2005 3:06:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
confused Re: Protein Folding Overview - For Those Who Are Curious - A Facinating Read

I belive you are wrong.

Protein folding is described in this link:

http://tinyurl.com/9ufpa

tongue

Thanks for the *real* links, an interesting read
[Nov 19, 2005 5:37:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Protein Folding Overview - For Those Who Are Curious - A Facinating Read

Who wrote the original program? Is he still around?
[Dec 3, 2005 8:59:35 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Protein Folding Overview - For Those Who Are Curious - A Facinating Read

Here is an article about protein folding published in Wired in July 2001. It starts with the 2000 CASP 4 conference.
http://www.wired.com/wired/archive/9.07/blue.html?pg=1

Added: [I ran across this article over at Rosetta@home.]
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 12, 2005 11:06:19 AM]
[Dec 12, 2005 9:17:42 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Protein Folding Overview - For Those Who Are Curious - A Facinating Read

smile Back in the spotlight for the benefit of the New Members
[Mar 18, 2006 4:39:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 20   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread