A Non-dual approach to measuring semantic distance by integrating ontological and distributional information whithin a network-flow framework

return to the website
by Vivian Tsang
Abstract:
Text comparison is a key step in many natural language processing (NLP) applications in which texts can be classified based on their semantic distance (how similar or different the texts are). For example, comparing the local context of an ambiguous word with that of a known word can help identify the sense of the ambiguous word. Typically, a distributional measure is used to capture the implicit semantic distance between two pieces of text. In this thesis, we introduce an alternative method of measuring the semantic distance between texts as a non-dual com- bination of distributional information and ontological knowledge. We define non-dualism as combining two distinct components such that they are seamless in the combination. We achieve this non-dual combination by proposing a novel distance measure within a network-flow for- malism. First, we represent each text as a collection of frequency-weighted concepts within an ontology. Then, we make use of a network-flow method which provides an efficient way of measuring the semantic distance between two texts by taking advantage of the ontological structure. We evaluate our method in a variety of NLP tasks. In our task-based evaluation, we find that our method performs well on two of three tasks. We introduce a novel approach to analysing the sensitivity of our network-flow method to any dataset (represented as a collection of frequency-weighted concepts). Given that the ontolog- ical and the distributional components are intricately knitted together in our method, we find that a non-dual approach, rather than a purely distributional or graphical analysis, is more ap- propriate and more effective in explaining the performance inconsistency. Finally, we address a complexity issue that arises from the overhead required to incorporate more sophisticated concept-to-concept distances into the network-flow framework. We propose a graph transformation method which generates a pared-down network that requires less time to process. The new method achieves a significant speed improvement, and does not seriously hamper performance as a result of the transformation, as indicated in our analysis.
Reference:
A Non-dual approach to measuring semantic distance by integrating ontological and distributional information whithin a network-flow framework (Vivian Tsang), PhD thesis, , 2008.
Bibtex Entry:
@phdthesis{TsangThesis2008,
abstract = {Text comparison is a key step in many natural language processing (NLP) applications in which texts can be classified based on their semantic distance (how similar or different the texts are). For example, comparing the local context of an ambiguous word with that of a known word can help identify the sense of the ambiguous word. Typically, a distributional measure is used to capture the implicit semantic distance between two pieces of text. In this thesis, we introduce an alternative method of measuring the semantic distance between texts as a non-dual com- bination of distributional information and ontological knowledge. We define non-dualism as combining two distinct components such that they are seamless in the combination. We achieve this non-dual combination by proposing a novel distance measure within a network-flow for- malism. First, we represent each text as a collection of frequency-weighted concepts within an ontology. Then, we make use of a network-flow method which provides an efficient way of measuring the semantic distance between two texts by taking advantage of the ontological structure. We evaluate our method in a variety of NLP tasks. In our task-based evaluation, we find that our method performs well on two of three tasks. We introduce a novel approach to analysing the sensitivity of our network-flow method to any dataset (represented as a collection of frequency-weighted concepts). Given that the ontolog- ical and the distributional components are intricately knitted together in our method, we find that a non-dual approach, rather than a purely distributional or graphical analysis, is more ap- propriate and more effective in explaining the performance inconsistency. Finally, we address a complexity issue that arises from the overhead required to incorporate more sophisticated concept-to-concept distances into the network-flow framework. We propose a graph transformation method which generates a pared-down network that requires less time to process. The new method achieves a significant speed improvement, and does not seriously hamper performance as a result of the transformation, as indicated in our analysis.},
author = {Tsang, Vivian},
booktitle = {Framework},
keywords = {SML-LIB-BIBLIO,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
title = {{A Non-dual approach to measuring semantic distance by integrating ontological and distributional information whithin a network-flow framework}},
year = {2008}
}
Powered by bibtexbrowser