Semantic Similarity over the Gene Ontology: Family Correlation and Selecting Disjunctive Ancestors

return to the website
by Francisco M. Couto, Mario J. Silva, Pedro M. Coutinho
Abstract:
Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. In most biolog- ical databases, proteins are already annotated with ontology terms. Pre- vious studies identified a correlation between the sequence similarity and the semantic similarity of proteins. The semantic similarity of proteins was computed from their annotated GO terms. However, proteins sharing a bio- logical role do not necessarily have a similar sequence. This paper introduces our study of the correlation between GO and fam- ily similarity. Family similarity overcomes some of the limitations of se- quence similarity, thus we obtained a strong correlation between GO and family similarity. Additionally, this paper introduces GraSM, a novel method that uses all the information in the graph structure of the GO, instead of con- sidering it as a hierarchical tree. When calculating the semantic similarity of two concepts, GraSM selects the disjunctive common ancestors rather than only using the most informative common ancestor. GraSM produced a higher family similarity correlation than the original semantic similarity measures.
Reference:
Semantic Similarity over the Gene Ontology: Family Correlation and Selecting Disjunctive Ancestors (Francisco M. Couto, Mario J. Silva, Pedro M. Coutinho), In Conference in Information and Knowledge Management, ACM, 2005.
Bibtex Entry:
@inproceedings{Couto2005,
abstract = {Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. In most biolog- ical databases, proteins are already annotated with ontology terms. Pre- vious studies identified a correlation between the sequence similarity and the semantic similarity of proteins. The semantic similarity of proteins was computed from their annotated GO terms. However, proteins sharing a bio- logical role do not necessarily have a similar sequence. This paper introduces our study of the correlation between GO and fam- ily similarity. Family similarity overcomes some of the limitations of se- quence similarity, thus we obtained a strong correlation between GO and family similarity. Additionally, this paper introduces GraSM, a novel method that uses all the information in the graph structure of the GO, instead of con- sidering it as a hierarchical tree. When calculating the semantic similarity of two concepts, GraSM selects the disjunctive common ancestors rather than only using the most informative common ancestor. GraSM produced a higher family similarity correlation than the original semantic similarity measures.},
author = {Couto, Francisco M. and Silva, Mario J. and Coutinho, Pedro M.},
booktitle = {Conference in Information and Knowledge Management},
doi = {10.1145/1099554.1099658},
keywords = {GO sim,SML-LIB-BIBLIO,lang:ENG,semantic similarity},
mendeley-tags = {GO sim,SML-LIB-BIBLIO,lang:ENG,semantic similarity},
pages = {343--344},
publisher = {ACM},
title = {{Semantic Similarity over the Gene Ontology: Family Correlation and Selecting Disjunctive Ancestors}},
year = {2005}
}
Powered by bibtexbrowser