An ontology-based measure to compute semantic similarity in biomedicine.

return to the website
by Montserrat Batet, David Sánchez, Aida Valls
Abstract:
Proper understanding of textual data requires the exploitation and integration of unstructured and heterogeneous clinical sources, healthcare records or scientific literature, which are fundamental aspects in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that enables the processing, classification and structuring of textual resources. In the past, several approaches for assessing word similarity by exploiting different knowledge sources (ontologies, thesauri, domain corpora, etc.) have been proposed. Some of these measures have been adapted to the biomedical field by incorporating domain information extracted from clinical data or from medical ontologies (such as MeSH or SNOMED CT). In this paper, these approaches are introduced and analyzed in order to determine their advantages and limitations with respect to the considered knowledge bases. After that, a new measure based on the exploitation of the taxonomical structure of a biomedical ontology is proposed. Using SNOMED CT as the input ontology, the accuracy of our proposal is evaluated and compared against other approaches according to a standard benchmark of manually ranked medical terms. The correlation between the results of the evaluated measures and the human experts' ratings shows that our proposal outperforms most of the previous measures avoiding, at the same time, some of their limitations.
Reference:
An ontology-based measure to compute semantic similarity in biomedicine. (Montserrat Batet, David Sánchez, Aida Valls), In Journal of biomedical informatics, Elsevier Inc., volume 4, 2010.
Bibtex Entry:
@article{Batet2010f,
abstract = {Proper understanding of textual data requires the exploitation and integration of unstructured and heterogeneous clinical sources, healthcare records or scientific literature, which are fundamental aspects in clinical and translational research. The determination of semantic similarity between word pairs is an important component of text understanding that enables the processing, classification and structuring of textual resources. In the past, several approaches for assessing word similarity by exploiting different knowledge sources (ontologies, thesauri, domain corpora, etc.) have been proposed. Some of these measures have been adapted to the biomedical field by incorporating domain information extracted from clinical data or from medical ontologies (such as MeSH or SNOMED CT). In this paper, these approaches are introduced and analyzed in order to determine their advantages and limitations with respect to the considered knowledge bases. After that, a new measure based on the exploitation of the taxonomical structure of a biomedical ontology is proposed. Using SNOMED CT as the input ontology, the accuracy of our proposal is evaluated and compared against other approaches according to a standard benchmark of manually ranked medical terms. The correlation between the results of the evaluated measures and the human experts' ratings shows that our proposal outperforms most of the previous measures avoiding, at the same time, some of their limitations.},
annote = {
        From Duplicate 2 ( 
        
        
          An ontology-based measure to compute semantic similarity in biomedicine.
        
        
         - Batet, Montserrat; S\'{a}nchez, David; Valls, Aida )

        
        

        From Duplicate 2 ( 
        
        
          An ontology-based measure to compute semantic similarity in biomedicine.
        
        
         - Batet, Montserrat; S\'{a}nchez, David; Valls, Aida )

        
        

        

        

        

        

      },
author = {Batet, Montserrat and S\'{a}nchez, David and Valls, Aida},
doi = {10.1016/j.jbi.2010.09.002},
issn = {1532-0480},
journal = {Journal of biomedical informatics},
keywords = {Processed,SML-LIB-BIBLIO,SSM\_comparison,Semantic Similarity,biomedicine,information content,lang:ENG,ontologies,semantic similarity,web},
mendeley-tags = {Processed,SML-LIB-BIBLIO,SSM\_comparison,Semantic Similarity,lang:ENG},
month = sep,
number = {1},
pages = {39--52},
pmid = {20837160},
publisher = {Elsevier Inc.},
title = {{An ontology-based measure to compute semantic similarity in biomedicine.}},
url = {http://www.ncbi.nlm.nih.gov/pubmed/20837160},
volume = {4},
year = {2010}
}
Powered by bibtexbrowser