| Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
| World Community Grid Forums
|
| No member browsing this thread |
|
Thread Status: Active Total posts in this thread: 5
|
|
| Author |
|
|
trongnguyen_82
Cruncher Joined: Aug 10, 2006 Post Count: 10 Status: Offline |
I just have two curious questions:
----------------------------------------1. So far, many comparisions of GC project included a predicted protein/hypothetical protein or both. Because they are only "predicted/hypothetical", they're not real. Is there a high chance that we're wasting our crunching time on some totally-different-from-real (useless) protein? 2. Just an estimation, how many proteins do we know so far (protein that is not in the predict/hypothetical class), compare with the total proteins of Mother Nature? Thank you. ![]() |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hopefully one of the project scientists can give a better answer.
However, from my understanding of this, Genome Comparison is using information from the genomes of different species, rather than just known proteins. The way proteins are expressed on the genome is not as simple as the DNA mechanism might make it appear. I understand this will let us learn more about many proteins that haven't been studied in detail yet, and make the annotation database complete in this respect. |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Dear Colleagues:
Only about 5% of the proteins were actually studied experimentally. In a typical bacterial genome, about 30% of the proteins are either hypothetical, putative, probable, similar to, etc. The terminology used depends on the way the annotations were added to the genome. These 30% normally have no attributed function whatsoever. But this means that about 65% of the proteins have had their functions inferred from previously annotated proteins. There lies the basic principle behind the comparison of biological sequences: the more similar two sequences are (either nucleotidic or protein), the more probable it is that they share the same function. Obviously, many biological factors can complicate this, which is one of the main points of the GC project, to check and develop criteria for the definition of protein families and reannotation. As for the other 30%, the fact that they are hypothetical or putative does not mean that they are not real, although this can happen. Indeed, it is possible that a putative protein has a counterpart in another genome(s). This means that, although no information is available about that particular protein, it is very much likely that it s a real protein, since it is present in more than one genome. In this case, we usually say that this is an hipotheticaly conserved protein. On the other hand, sometimes you can find an "orphan" protein, that is, a protein without any counterparts. In this case, this protein is usually tagged as an unknown protein. There are several documented cases of organism-specific proteins. In fact, one can not rule out the real existence of a certain orphan protein based only in their presence or absence among different genomes. We have to remember that the genetic code forces the DNA sequence of a protein coding region to be structured in a certain way, and these biases with relation to a randomic sequence may be measured and quantified. So it is possible to have almost absolute certainty that an orphan protein is indeed a real gene, although there are no counterparts in other genomes and no attributed function. Cheers, Antonio |
||
|
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
thanks, amazing stuff !
|
||
|
|
trongnguyen_82
Cruncher Joined: Aug 10, 2006 Post Count: 10 Status: Offline |
Thank you for the reply.
----------------------------------------![]() |
||
|
|
|