Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 4
[ Jump to Last Post ]
Post new Thread
Author
This topic has been viewed 2110 times and has 3 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Interesting Fact regarding Rosetta

Leeched from Slash.org today by Graham

Posted by the Author at Slashdot.org on 30/11/2004

Rosetta was developed on Linux

I'm one of the authors of the code they are running as the first application of the world grid. This is Rosetta, the protein structure prediction program. Rosetta was born on Linux. It can run on a mac too but not as well. There never was a version developed for Windows. But hand it to the the IBM folks to create a wrapper that lets it run as a grid "screen saver" scavenger application on windows. Pretty remarkable.

Of course the reason for this is obvious right? windows dominated the planet not only in installed systems but in installed systems with cycles to spare. i.e. desktops. So dont cry your eyes out over it not being linux compatible. The excess linux bandwidth after you subtract our the servers is not going to be a lot. Console yourself that the TCO of linux is really a lot less when you figure that linux computers are already too busy to be bothered with Grid computing. :-)

Rosetta itself was written in fortran and only recently converted to C++. the C++ conversion was done using the incredibly well designed Objexx Library by stuart metzner and colleagues. This is a library that lets you write fortran code in C++. Before this people who tried to re-write this behemoth to C++ just died in the process. The objexx library let the whole thing be converted to C++ in one fell swoop. Now the program will slowly evolve from fortran style to C++ object orientation as it continues to grow. But in the meantime the code is productive. Nice Eh? The cool thing is that with a bit of optimization the code did not lose any appreciable speed in the conversion. So if you have legacy fortran you use for speed, consider converting it using Objexx. I was one of the people who argued for going to fortran95 not c++ because I feeared a speed loss; Iv'e become a convert

In any event the program is not like folding at home. That program tries to study in detail the picosecond evolution of single protien as it folds. Rosetta simply predicts the folded structure. Its actually quite fast at doing that. But it turns out it makes lots of different predictions. So you have to do it tens of thousands of times and then see which geometries of folded structures are favored statistically. Then you do the next protein. Eventually you work your way through the whole human genome.

Also unlike folding at home the potential surface in rosetta is less physics based and more bayesian statistice. It has statistical potential for the probability of a peptide backbone structure occuring. And it has a probabilty for a sidechain amino acid sequence given a backbone structure. Multiply those together and bayes rule says the result is proportional to the probablity of a structrure given a sequence. You can read more about this here [washington.edu]. Click on publications.

This statistical potential turns out to be so accurate that it can not only be used to predict the structure of proteins but it can be used in reverse to design a novel structured protein. Recently it was used to design a protein with a tolopology that had never previously existied in nature. This is rather an amazing results. Others had previously redesigned the sequences of existing topologies or perturbed those topologies or created some special case topologies. But Brian Kuklman in David Baker's lab actually started from a napkin sketch and designed a protein from scratch.

After you predict the structure of a protein, one thing you can do is ask if that structure is like another Protein you have seen before. You can compare the structure of a model to a real protein using a program known as MAMMOTH. While there are a variety of programs for comparing two proteins this one is particularly good for the case of comparing an inaccurate model to an experimentally known structure. If they match then you can assume the protiens may share a related function or evolutionary origin (or not!).

Which brings us to what proteins are. Think of DNA as a disk drive that stores programs. Proteins are the CPU and all the perifrials. Proteins read the DNA, the DNA codes for proteins that carry out functions in your body. Unlike DNA proteins have complex topologies. And a given amino acid sequence, corresponding to a give DNA sequence "always" folds to "exactly" the same 3D structure. This 3-D structure modulates how the protein interacts with other molecules. You can think of it as a scaffold that holds certain chemically funtionally groups in precise geometries. IN essence complex structure is what differentiates biology from ordinary chemistry. THe cool thing is that all proteins with the same amino acid sequence fold to the same structure. Thus structure is theoretically predictable from just knowning the amino acid sequence. hence Rosetta.

Comment From Slashdot.org member

This is a nice description of what the program does. I'd use a different metaphor (DNA is the recipe, Protein is the cake), but it makes the concept understandable. This comment is also written by the the author of the software in question, which makes for a very knowledgeable provenance, and also very interesting!
[Dec 31, 2004 2:48:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Interesting Fact regarding Rosetta

I recently built 2 different 3ghz Linux machines (HDTV MythTV boxes) so they will never run Windows. These machines will be on 24/7, 365 days a year. I mean, come on. You can build an app in Linux and port it to almost any other platform. Imagine all additional power Solaris, Linux, FreeBSD, and OSX machines could lend. Think of what a 64bit version for SPARC systems could do!

Also, since these machines will not be running spyware or adware, you get more processing power than from an equivilent Windows machine. Most web servers are Linux, and lets admit it, you can build a great, fast server on a 333 (I have). All that additional power is WASTED!

Develop a custom version for hosting companies. Have it sense when to turn off so it wont hurt performance (transfering a bunch of files, Slashdotting) and owners wont complain. I know because I OWN a hosting company with 20 powerful servers. I would like to contribute but I can't.
[Jan 1, 2005 6:07:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Interesting Fact regarding Rosetta

Well, since the cat's out of the bag I'll throw in my 2 cents.

There are 2 major reasons why Rosetta takes up as much memory as it does:
1. Rosetta does a LOT more than what we are currently using it for.
2. Rosetta is basically a direct port from Fortran. This means ALL data structures are statically allocated.

These two combinations combine to ensure that Rosetta allocates all the memory it would ever need to do everything possible it could do. This is why it allocates ~200MB but only really uses ~25MB. Actually, when we first received the program, it was allocating over 500MB, so while 200MB may seem like a lot, it's a vast improvement over what we started with.

Rosetta is a very large and complex program. We can't just start cutting pieces out willy-nilly. Most pieces are tied together. Both us and the Baker lab folks are working on identifying structures and code that we can trim out while not affecting what we need to use it for. Due to the current code base, it's not possible to completely weed out "wasted" memory allocations since nothing is dynamically allocated.

Finally, other than the initial swap hit and the disk space, once Windows permanently swaps out the "unused" memory of the program, it's just a typical program that takes up 25MB. With disk sizes in the tens and hundreds of gigabytes these days, 175MB of hard disk space is miniscule. Not that we can't get it better and that we're working on it, but let's try to keep things in a little perspective.

Oh, and Happy New Year and thanks to everyone who has been contributing! Thus far, it's going very well!
----------------------------------------
Rick Alther
Former World Community Grid Developer
----------------------------------------
[Edit 1 times, last edit by Alther at Jan 1, 2005 5:53:14 PM]
[Jan 1, 2005 5:52:09 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Interesting Fact regarding Rosetta

Thanks for the responce, rare that you find a company/project tech willing to talk to you!

I guess I will just wait with my 2 windows machines until you get a Linux version. I await that day with much glee =)
[Jan 2, 2005 12:24:08 AM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread