Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective.

return to the website
by David Sánchez, Montserrat Batet
Abstract:
Semantic similarity estimation is an important component of analysing natural language resources like clinical records. Proper understanding of concept semantics allows for improved use and integration of heterogeneous clinical sources as well as higher information retrieval accuracy. Semantic similarity has been the focus of much research, which has led to the definition of heterogeneous measures using different theoretical principles and knowledge resources in a variety of contexts and application domains. In this paper, we study several of these measures, in addition to other similarity coefficients (not necessarily framed in a semantic context) that may be useful in determining the similarity of sets of terms. In order to make them easier to interpret and improve their applicability and accuracy, we propose a framework grounded in information theory that allows the measures studied to be uniformly redefined. Our framework is based on approximating concept semantics in terms of Information Content (IC). We also propose computing IC in a scalable and efficient manner from the taxonomical knowledge modelled in biomedical ontologies. As a result, new semantic similarity measures expressed in terms of concept Information Content are presented. These measures are evaluated and compared to related works using a benchmark of medical terms and a standard biomedical ontology. We found that an information theoretical redefinition of well-known semantic measures and similarity coefficients, and an intrinsic estimation of concept IC result in noticeable improvements in their accuracy.
Reference:
Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. (David Sánchez, Montserrat Batet), In Journal of biomedical informatics, volume 44, 2011.
Bibtex Entry:
@article{Sanchez2011b,
abstract = {Semantic similarity estimation is an important component of analysing natural language resources like clinical records. Proper understanding of concept semantics allows for improved use and integration of heterogeneous clinical sources as well as higher information retrieval accuracy. Semantic similarity has been the focus of much research, which has led to the definition of heterogeneous measures using different theoretical principles and knowledge resources in a variety of contexts and application domains. In this paper, we study several of these measures, in addition to other similarity coefficients (not necessarily framed in a semantic context) that may be useful in determining the similarity of sets of terms. In order to make them easier to interpret and improve their applicability and accuracy, we propose a framework grounded in information theory that allows the measures studied to be uniformly redefined. Our framework is based on approximating concept semantics in terms of Information Content (IC). We also propose computing IC in a scalable and efficient manner from the taxonomical knowledge modelled in biomedical ontologies. As a result, new semantic similarity measures expressed in terms of concept Information Content are presented. These measures are evaluated and compared to related works using a benchmark of medical terms and a standard biomedical ontology. We found that an information theoretical redefinition of well-known semantic measures and similarity coefficients, and an intrinsic estimation of concept IC result in noticeable improvements in their accuracy.},
author = {S\'{a}nchez, David and Batet, Montserrat},
issn = {1532-0480},
journal = {Journal of biomedical informatics},
keywords = {SML-LIB-BIBLIO,biomedical ontologies,information content,information theory,lang:ENG,semantic similarity},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG,semantic similarity},
month = apr,
number = {5},
pages = {749--759},
pmid = {21463704},
title = {{Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective.}},
volume = {44},
year = {2011}
}
Powered by bibtexbrowser