Using Roget’s Thesaurus to Measure Semantic Similarity

return to the website
by Mario Jarmasz, Stan Szpakowicz
Abstract:
Measuring semantic similarity with the ELKB allows us to present a first application of the system as well as to perform a qualitative evaluation. In this chapter, we define the notions of synonymy and semantic similarity and explain a metric for calculating similarity based on Rogets taxonomy. We evaluate it using a few typical tests. The experiments in this chapter compare the synonymy judgments of the system to gold standards established by Rubenstein and Goodenough (1965), Miller and Charles (1991) as well as Finkelstein et al. (2002; Gabrilovich 2002) for assessing the similarity of pairs of words. We further evaluate the metric by using the system to answer Test of English as a Foreign Language TOEFL (Landauer and Dumais, 1997) and English as a Second Language tests ESL (Turney, 2001), as well as the Readers Digest Word Power Game RDWP (Lewis, 2000-2001) questions where a correct synonym must be chosen amongst four target words. We compare the results to six other WordNet-based metrics and two statistical methods.
Reference:
Using Roget’s Thesaurus to Measure Semantic Similarity (Mario Jarmasz, Stan Szpakowicz), In Paragraph, 2002.
Bibtex Entry:
@article{Jarmasz2002,
abstract = {Measuring semantic similarity with the ELKB allows us to present a first application of the system as well as to perform a qualitative evaluation. In this chapter, we define the notions of synonymy and semantic similarity and explain a metric for calculating similarity based on Rogets taxonomy. We evaluate it using a few typical tests. The experiments in this chapter compare the synonymy judgments of the system to gold standards established by Rubenstein and Goodenough (1965), Miller and Charles (1991) as well as Finkelstein et al. (2002; Gabrilovich 2002) for assessing the similarity of pairs of words. We further evaluate the metric by using the system to answer Test of English as a Foreign Language TOEFL (Landauer and Dumais, 1997) and English as a Second Language tests ESL (Turney, 2001), as well as the Readers Digest Word Power Game RDWP (Lewis, 2000-2001) questions where a correct synonym must be chosen amongst four target words. We compare the results to six other WordNet-based metrics and two statistical methods.},
author = {Jarmasz, Mario and Szpakowicz, Stan},
journal = {Paragraph},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
number = {1965},
pages = {37--53},
title = {{Using Roget’s Thesaurus to Measure Semantic Similarity}},
year = {2002}
}
Powered by bibtexbrowser