Large-scale learning of word relatedness with constraints

return to the website
by Guy Halawi, Gideon Dror, Evgeniy Gabrilovich, Yehuda Koren
Abstract:
Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process. We learn for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears. Our method, called CLEAR, is shown to significantly outperform previously published approaches. The proposed method is based on first principles, and is generic enough to exploit diverse types of text corpora, while having the flexibility to impose constraints on the derived word similarities. We also make publicly available a new labeled dataset for evaluating word relatedness algorithms, which we believe to be the largest such dataset to date.
Reference:
Large-scale learning of word relatedness with constraints (Guy Halawi, Gideon Dror, Evgeniy Gabrilovich, Yehuda Koren), In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD'12, ACM Press, 2012.
Bibtex Entry:
@inproceedings{Halawi2012,
abstract = {Prior work on computing semantic relatedness of words focused on representing their meaning in isolation, effectively disregarding inter-word affinities. We propose a large-scale data mining approach to learning word-word relatedness, where known pairs of related words impose constraints on the learning process. We learn for each word a low-dimensional representation, which strives to maximize the likelihood of a word given the contexts in which it appears. Our method, called CLEAR, is shown to significantly outperform previously published approaches. The proposed method is based on first principles, and is generic enough to exploit diverse types of text corpora, while having the flexibility to impose constraints on the derived word similarities. We also make publicly available a new labeled dataset for evaluating word relatedness algorithms, which we believe to be the largest such dataset to date.},
address = {New York, New York, USA},
author = {Halawi, Guy and Dror, Gideon and Gabrilovich, Evgeniy and Koren, Yehuda},
booktitle = {Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD'12},
doi = {10.1145/2339530.2339751},
isbn = {9781450314626},
keywords = {SML-LIB-BIBLIO,lang:ENG,semantic similarity,word relatedness},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = aug,
pages = {1406},
publisher = {ACM Press},
title = {{Large-scale learning of word relatedness with constraints}},
url = {http://dl.acm.org/citation.cfm?id=2339530.2339751},
year = {2012}
}
Powered by bibtexbrowser