Lexical Semantic Relatedness with Random Graph Walks

return to the website
by Thad Hughes, Daniel Ramage
Abstract:
Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information from every explicit or implicit path connecting the two words in the entire graph. Our model uses a random walk over nodes and edges derived from WordNet links and corpus statistics. We treat the graph as a Markov chain and compute a word-specific stationary distribution via a generalized PageRank algorithm. Semantic relatedness of a word pair is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions. In our experiments, the resulting relatedness measure is the WordNet-based measure most highly correlated with human similarity judgments by rank ordering at = .90.
Reference:
Lexical Semantic Relatedness with Random Graph Walks (Thad Hughes, Daniel Ramage), In Computational Linguistics, Association for Computational Linguistics, volume 7, 2007.
Bibtex Entry:
@article{Hughes2007,
abstract = {Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information from every explicit or implicit path connecting the two words in the entire graph. Our model uses a random walk over nodes and edges derived from WordNet links and corpus statistics. We treat the graph as a Markov chain and compute a word-specific stationary distribution via a generalized PageRank algorithm. Semantic relatedness of a word pair is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions. In our experiments, the resulting relatedness measure is the WordNet-based measure most highly correlated with human similarity judgments by rank ordering at = .90.},
author = {Hughes, Thad and Ramage, Daniel},
journal = {Computational Linguistics},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
number = {June},
pages = {581--589},
publisher = {Association for Computational Linguistics},
title = {{Lexical Semantic Relatedness with Random Graph Walks}},
url = {http://acl.ldc.upenn.edu/D/D07/D07-1061.pdf},
volume = {7},
year = {2007}
}
Powered by bibtexbrowser