Design , Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content

return to the website
by Giuseppe Pirró, Nuno Seco
Abstract:
In many research fields such as Psychology, Linguistics, Cognitive Science, Biomedicine, and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper we present a new semantic similarity metric that exploits some notions of the early work done using a feature based theory of similarity, and translates it into the information theoretic domain which leverages the notion of Information Content (IC). In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, we conducted an on line experiment asking the community of researchers to rank a list of 65 word pairs. The experiment’s web setup allowed to collect 101 similarity ratings, and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that our metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC based metrics. We implemented our metric and several others in the Java WordNet Similarity Library
Reference:
Design , Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content (Giuseppe Pirró, Nuno Seco), In Lecture Notes in Computer Science Volume 5332 (Robert Meersman, Zahir Tari, eds.), Springer Berlin Heidelberg, volume 5332, 2008.
Bibtex Entry:
@article{Pirro2008,
abstract = {In many research fields such as Psychology, Linguistics, Cognitive Science, Biomedicine, and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper we present a new semantic similarity metric that exploits some notions of the early work done using a feature based theory of similarity, and translates it into the information theoretic domain which leverages the notion of Information Content (IC). In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, we conducted an on line experiment asking the community of researchers to rank a list of 65 word pairs. The experiment’s web setup allowed to collect 101 similarity ratings, and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that our metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC based metrics. We implemented our metric and several others in the Java WordNet Similarity Library},
address = {Berlin, Heidelberg},
annote = {
        From Duplicate 2 ( 
        
        
          Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content
        
        
         - Pirr\'{o}, Giuseppe; Seco, Nuno )

        
        
En relation avec intrinsc IC -  Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content cit\'{e} par Sanchez Batet 2011

        

      },
author = {Pirr\'{o}, Giuseppe and Seco, Nuno},
doi = {10.1007/978-3-540-88873-4},
editor = {Meersman, Robert and Tari, Zahir},
isbn = {978-3-540-88872-7},
issn = {0302-9743},
journal = {Lecture Notes in Computer Science Volume 5332},
keywords = {SML-LIB-BIBLIO,based similarity,feature,intrinsic information content,java wordnet similarity library,lang:ENG,semantic similarity},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
pages = {1271--1288},
publisher = {Springer Berlin Heidelberg},
series = {Lecture Notes in Computer Science},
title = {{Design , Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content}},
url = {http://www.springerlink.com/content/j283405616tv8t43/},
volume = {5332},
year = {2008}
}
Powered by bibtexbrowser