World Community Grid - View Thread - How much data is expected to be computed?

Let's see. One (1) protein string (or part of protein - a domain), annotated by approximately 10,000 Rosetta fold predictions and possibly by a predicted structure together with an associated probability. Then there will be proteins and domains that were not run on Rosetta, for various reasons. Some will be too long (greater than 150 amino acids) and some will have almost the same sequence as known proteins. Also, there will be many annotations added over the years relating one protein to another. Just possibly annotated data for the proteins might be collected from other institutions, depending on ambition, etc. When you add it up, our predicted structures for proteins and domains will take up very little of the database assuming that everything of possible interest is saved. At one time, simple cost constraints made saving data too expensive to be practical, but today I expect everything to be in the database, just in case.

I am not predicting any particular size. I am predicting that the database could be condensed to only a tiny fraction of its size without losing a great deal of information.