Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation

return to the website
by Phillip Lord
Abstract:
Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or‘ semantic similarity’ between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. Availability: Software available from http://www.russet.org.uk
Reference:
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation (Phillip Lord), In Bioinformatics, volume 19, 2003.
Bibtex Entry:
@article{Lord2003,
abstract = {Motivation: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or‘ semantic similarity’ between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repetoire of analyses. Results: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. Availability: Software available from http://www.russet.org.uk},
author = {Lord, Phillip},
doi = {10.1093/bioinformatics/btg153},
issn = {1460-2059},
journal = {Bioinformatics},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = jul,
number = {10},
pages = {1275--1283},
title = {{Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation}},
url = {http://www.bioinformatics.oupjournals.org/cgi/doi/10.1093/bioinformatics/btg153},
volume = {19},
year = {2003}
}
Powered by bibtexbrowser