| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 2
|
|
| Author |
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
In terms of actual data storage, how big is the entire projected expected to be? 100 Gigabytes, 100 terrabytes, bigger?
![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Let's see. One (1) protein string (or part of protein - a domain), annotated by approximately 10,000 Rosetta fold predictions and possibly by a predicted structure together with an associated probability. Then there will be proteins and domains that were not run on Rosetta, for various reasons. Some will be too long (greater than 150 amino acids) and some will have almost the same sequence as known proteins. Also, there will be many annotations added over the years relating one protein to another. Just possibly annotated data for the proteins might be collected from other institutions, depending on ambition, etc. When you add it up, our predicted structures for proteins and domains will take up very little of the database assuming that everything of possible interest is saved. At one time, simple cost constraints made saving data too expensive to be practical, but today I expect everything to be in the database, just in case.
I am not predicting any particular size. I am predicting that the database could be condensed to only a tiny fraction of its size without losing a great deal of information. |
||
|
|
|