A graph-based semantic similarity measure for the gene ontology.

Alvarez, Marco; Yan, Changhui

doi:10.1142/S0219720011005641

by Marco Alvarez, Changhui Yan

Abstract:

Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.

View PDF

Reference:

A graph-based semantic similarity measure for the gene ontology. (Marco Alvarez, Changhui Yan), In Journal of bioinformatics and computational biology, volume 9, 2011.

Bibtex Entry:

@article{Alvarez2011a,
abstract = {Existing methods for calculating semantic similarities between pairs of Gene Ontology (GO) terms and gene products often rely on external databases like Gene Ontology Annotation (GOA) that annotate gene products using the GO terms. This dependency leads to some limitations in real applications. Here, we present a semantic similarity algorithm (SSA), that relies exclusively on the GO. When calculating the semantic similarity between a pair of input GO terms, SSA takes into account the shortest path between them, the depth of their nearest common ancestor, and a novel similarity score calculated between the definitions of the involved GO terms. In our work, we use SSA to calculate semantic similarities between pairs of proteins by combining pairwise semantic similarities between the GO terms that annotate the involved proteins. The reliability of SSA was evaluated by comparing the resulting semantic similarities between proteins with the functional similarities between proteins derived from expert annotations or sequence similarity. Comparisons with existing state-of-the-art methods showed that SSA is highly competitive with the other methods. SSA provides a reliable measure for semantics similarity independent of external databases of functional-annotation observations.},
author = {Alvarez, Marco and Yan, Changhui},
doi = {10.1142/S0219720011005641},
issn = {0219-7200},
journal = {Journal of bioinformatics and computational biology},
keywords = {SML-LIB-BIBLIO,Semantics,graph,lang:ENG,ontology},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = dec,
number = {6},
pages = {681--95},
pmid = {22084008},
title = {{A graph-based semantic similarity measure for the gene ontology.}},
url = {http://www.worldscientific.com/doi/abs/10.1142/S0219720011005641},
volume = {9},
year = {2011}
}