Finding disease similarity based on implicit semantic similarity

return to the website
by Sachin Mathur, Deendayal Dinakarpandian
Abstract:
Genomics has contributed to a growing collection of gene–function and gene–disease annotations that can be exploited by informatics to study similarity between diseases. This can yield insight into disease etiology, reveal common pathophysiology and/or suggest treatment that can be appropriated from one disease to another. Estimating disease similarity solely on the basis of shared genes can be misleading as variable combinations of genes may be associated with similar diseases, especially for complex diseases. This deficiency can be potentially overcome by looking for common biological processes rather than only explicit gene matches between diseases. The use of semantic similarity between biological processes to estimate disease similarity could enhance the identification and characterization of disease similarity. We present functions to measure similarity between terms in an ontology, and between entities annotated with terms drawn from the ontology, based on both co-occurrence and information content. The similarity measure is shown to outperform other measures used to detect similarity. A manually curated dataset with known disease similarities was used as a benchmark to compare the estimation of disease similarity based on gene-based and Gene Ontology (GO) process-based comparisons. The detection of disease similarity based on semantic similarity between GO Processes (Recall=55\%, Precision=60\%) performed better than using exact matches between GO Processes (Recall=29\%, Precision=58\%) or gene overlap (Recall=88\% and Precision=16\%). The GO-Process based disease similarity scores on an external test set show statistically significant Pearson correlation (0.73) with numeric scores provided by medical residents. GO-Processes associated with similar diseases were found to be significantly regulated in gene expression microarray datasets of related diseases.
Reference:
Finding disease similarity based on implicit semantic similarity (Sachin Mathur, Deendayal Dinakarpandian), In Journal of Biomedical Informatics, volume 45, 2012.
Bibtex Entry:
@article{Mathur2012,
abstract = {Genomics has contributed to a growing collection of gene–function and gene–disease annotations that can be exploited by informatics to study similarity between diseases. This can yield insight into disease etiology, reveal common pathophysiology and/or suggest treatment that can be appropriated from one disease to another. Estimating disease similarity solely on the basis of shared genes can be misleading as variable combinations of genes may be associated with similar diseases, especially for complex diseases. This deficiency can be potentially overcome by looking for common biological processes rather than only explicit gene matches between diseases. The use of semantic similarity between biological processes to estimate disease similarity could enhance the identification and characterization of disease similarity. We present functions to measure similarity between terms in an ontology, and between entities annotated with terms drawn from the ontology, based on both co-occurrence and information content. The similarity measure is shown to outperform other measures used to detect similarity. A manually curated dataset with known disease similarities was used as a benchmark to compare the estimation of disease similarity based on gene-based and Gene Ontology (GO) process-based comparisons. The detection of disease similarity based on semantic similarity between GO Processes (Recall=55\%, Precision=60\%) performed better than using exact matches between GO Processes (Recall=29\%, Precision=58\%) or gene overlap (Recall=88\% and Precision=16\%). The GO-Process based disease similarity scores on an external test set show statistically significant Pearson correlation (0.73) with numeric scores provided by medical residents. GO-Processes associated with similar diseases were found to be significantly regulated in gene expression microarray datasets of related diseases.},
author = {Mathur, Sachin and Dinakarpandian, Deendayal},
journal = {Journal of Biomedical Informatics},
keywords = {Disease similarity,Gene Ontology,Ontology based disease similarity,Ontology perturbation,Ontology terms,SML-LIB-BIBLIO,Semantic similarity,Similarity measure},
mendeley-tags = {SML-LIB-BIBLIO},
number = {2},
pages = {363--371},
title = {{Finding disease similarity based on implicit semantic similarity}},
url = {http://www.sciencedirect.com/science/article/pii/S1532046411002073},
volume = {45},
year = {2012}
}
Powered by bibtexbrowser