Exploring Knowledge Bases for Similarity

Agirre, Eneko; Cuadros, Montse; Rigau, German; Soroa, Aitor

by Eneko Agirre, Montse Cuadros, German Rigau, Aitor Soroa

Abstract:

Graph-based similarity over WordNet has been previously shown to perform very well on word similarity. This paper presents a study of the performance of such a graph-based algorithm when using different relations and versions of Wordnet. The graph algorithm is based on Personalized PageRank, a random-walk based algorithm which computes the probability of a random-walk initiated in the target word to reach any synset following the relations in WordNet (Haveliwala, 2002). Similarity is computed as the cosine of the probability distributions for each word over WordNet. The best combination of relations includes all relations in WordNet 3.0, included disambiguated glosses, and automatically disambiguated topic signatures called KnowNets. All relations are part of the official release of WordNet, except KnowNets, which have been derived automatically. The results over the WordSim 353 dataset show that using the adequate relations the performance improves over previously published WordNet-based results on the WordSim353 dataset (Finkelstein et al., 2002). The similarity software and some graphs used in this paper are publicly available at http://ixa2.si.ehu.es/ukb.

View PDF

Reference:

Exploring Knowledge Bases for Similarity (Eneko Agirre, Montse Cuadros, German Rigau, Aitor Soroa), In In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2010), 2010.

Bibtex Entry:

@inproceedings{Agirre2010,
abstract = {Graph-based similarity over WordNet has been previously shown to perform very well on word similarity. This paper presents a study of the performance of such a graph-based algorithm when using different relations and versions of Wordnet. The graph algorithm is based on Personalized PageRank, a random-walk based algorithm which computes the probability of a random-walk initiated in the target word to reach any synset following the relations in WordNet (Haveliwala, 2002). Similarity is computed as the cosine of the probability distributions for each word over WordNet. The best combination of relations includes all relations in WordNet 3.0, included disambiguated glosses, and automatically disambiguated topic signatures called KnowNets. All relations are part of the official release of WordNet, except KnowNets, which have been derived automatically. The results over the WordSim 353 dataset show that using the adequate relations the performance improves over previously published WordNet-based results on the WordSim353 dataset (Finkelstein et al., 2002). The similarity software and some graphs used in this paper are publicly available at http://ixa2.si.ehu.es/ukb.},
annote = {
        From Duplicate 1 ( 
        
        
          Exploring Knowledge Bases for Similarity
        
        
         - Agirre, Eneko; Cuadros, Montse; Rigau, German )

        
        

        

        

      },
author = {Agirre, Eneko and Cuadros, Montse and Rigau, German and Soroa, Aitor},
booktitle = {In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2010)},
keywords = {SML-LIB-BIBLIO,Semantic Similarity,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,Semantic Similarity,lang:ENG},
pages = {373--377},
title = {{Exploring Knowledge Bases for Similarity}},
url = {http://adimen.si.ehu.es/~rigau/publications/lrec10-acrs.pdf},
year = {2010}
}