World Community Grid - View Thread - Predict and hypothetical protein

Dear Colleagues:

Only about 5% of the proteins were actually studied experimentally. In a typical bacterial genome, about 30% of the proteins are either hypothetical, putative, probable, similar to, etc. The terminology used depends on the way the annotations were added to the genome. These 30% normally have no attributed function whatsoever.

But this means that about 65% of the proteins have had their functions inferred from previously annotated proteins. There lies the basic principle behind the comparison of biological sequences: the more similar two sequences are (either nucleotidic or protein), the more probable it is that they share the same function. Obviously, many biological factors can complicate this, which is one of the main points of the GC project, to check and develop criteria for the definition of protein families and reannotation.

As for the other 30%, the fact that they are hypothetical or putative does not mean that they are not real, although this can happen. Indeed, it is possible that a putative protein has a counterpart in another genome(s). This means that, although no information is available about that particular protein, it is very much likely that it s a real protein, since it is present in more than one genome. In this case, we usually say that this is an hipotheticaly conserved protein. On the other hand, sometimes you can find an "orphan" protein, that is, a protein without any counterparts. In this case, this protein is usually tagged as an unknown protein.

There are several documented cases of organism-specific proteins. In fact, one can not rule out the real existence of a certain orphan protein based only in their presence or absence among different genomes. We have to remember that the genetic code forces the DNA sequence of a protein coding region to be structured in a certain way, and these biases with relation to a randomic sequence may be measured and quantified. So it is possible to have almost absolute certainty that an orphan protein is indeed a real gene, although there are no counterparts in other genomes and no attributed function.

Cheers,

Antonio