A novel view on information content of concepts in a large ontology and a view on the structure and the quality of the ontology.

return to the website
by Carl Van Buggenhout, Werner Ceusters
Abstract:
Semantic distance and semantic similarity are two important information retrieval measures used in word sense disambiguation as well as for the assessment of how relevant concepts are with respect to the documents in which they are found. A variety of calculation methods have been proposed in the literature, whereby methods taking into account the information content of an individual concept outperform those that do not. In this paper, we present a novel recursive approach to calculate a concept's information content based on the information content of the concepts to which it relates. The method is applicable to extremely large ontologies containing several million concepts and relationships amongst them. It is shown that a concept's information content as calculated by this method provides additional information with respect to an ontology that cannot be approximated by hierarchical edge-counting or human insight. In addition, it is suggested that the method can be used for quality control within large ontologies and that it can give you an impression on the structure and the quality of the ontology.
Reference:
A novel view on information content of concepts in a large ontology and a view on the structure and the quality of the ontology. (Carl Van Buggenhout, Werner Ceusters), In International journal of medical informatics, volume 74, 2005.
Bibtex Entry:
@article{VanBuggenhout2005,
abstract = {Semantic distance and semantic similarity are two important information retrieval measures used in word sense disambiguation as well as for the assessment of how relevant concepts are with respect to the documents in which they are found. A variety of calculation methods have been proposed in the literature, whereby methods taking into account the information content of an individual concept outperform those that do not. In this paper, we present a novel recursive approach to calculate a concept's information content based on the information content of the concepts to which it relates. The method is applicable to extremely large ontologies containing several million concepts and relationships amongst them. It is shown that a concept's information content as calculated by this method provides additional information with respect to an ontology that cannot be approximated by hierarchical edge-counting or human insight. In addition, it is suggested that the method can be used for quality control within large ontologies and that it can give you an impression on the structure and the quality of the ontology.},
author = {{Van Buggenhout}, Carl and Ceusters, Werner},
doi = {10.1016/j.ijmedinf.2004.03.009},
issn = {1386-5056},
journal = {International journal of medical informatics},
keywords = {Algorithms,Evaluation Studies as Topic,SML-LIB-BIBLIO,Semantic Similarity,Vocabulary,information content,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,Semantic Similarity,information content,lang:ENG},
month = mar,
number = {2-4},
pages = {125--32},
pmid = {15694617},
title = {{A novel view on information content of concepts in a large ontology and a view on the structure and the quality of the ontology.}},
url = {http://www.ncbi.nlm.nih.gov/pubmed/15694617},
volume = {74},
year = {2005}
}
Powered by bibtexbrowser