Ranking documents with a thesaurus.

return to the website
by Roy Rada, Ellen Bicknell
Abstract:
This article reports on exploratory experiments in evaluating and improving a thesaurus through studying its effect on retrieval. A formula called DISTANCE was developed to measure the conceptual distance between queries and documents encoded as sets of thesaurus terms. DISTANCE references MeSH (Medical Subject Headings) and assesses the degree of match between a MeSH-encoded query and document. The performance of DISTANCE on MeSH is compared to the performance of people in the assessment of conceptual distance between queries and documents, and is found to simulate with surprising accuracy the human performance. The power of the computer simulation stems both from the tendency of people to rely heavily on broader-than (BT) relations in making decisions about conceptual distance and from the thousands of accurate BT relations in MeSH. One source for discrepancy between the algorithms' measurement of closeness between query and document and people's measurement of closeness between query and document is occasional inconsistency in the BT relations. Our experiments with adding non-BT relations to MeSH showed how these non-BT non-BT relations to MeSH showed how these non-BT relations could improve document ranking, if DISTANCE were also appropriately revised to treat these relations differently from BT relations.
Reference:
Ranking documents with a thesaurus. (Roy Rada, Ellen Bicknell), In Journal of the American Society for Information Science. American Society for Information Science, volume 40, 1989.
Bibtex Entry:
@article{Rada1989b,
abstract = {This article reports on exploratory experiments in evaluating and improving a thesaurus through studying its effect on retrieval. A formula called DISTANCE was developed to measure the conceptual distance between queries and documents encoded as sets of thesaurus terms. DISTANCE references MeSH (Medical Subject Headings) and assesses the degree of match between a MeSH-encoded query and document. The performance of DISTANCE on MeSH is compared to the performance of people in the assessment of conceptual distance between queries and documents, and is found to simulate with surprising accuracy the human performance. The power of the computer simulation stems both from the tendency of people to rely heavily on broader-than (BT) relations in making decisions about conceptual distance and from the thousands of accurate BT relations in MeSH. One source for discrepancy between the algorithms' measurement of closeness between query and document and people's measurement of closeness between query and document is occasional inconsistency in the BT relations. Our experiments with adding non-BT relations to MeSH showed how these non-BT non-BT relations to MeSH showed how these non-BT relations could improve document ranking, if DISTANCE were also appropriately revised to treat these relations differently from BT relations.},
author = {Rada, Roy and Bicknell, Ellen},
doi = {10.1002/(SICI)1097-4571(198909)40:5<304::AID-ASI2>3.0.CO;2-6},
issn = {0002-8231},
journal = {Journal of the American Society for Information Science. American Society for Information Science},
keywords = {Abstracting and Indexing as Topic,Evaluation Studies as Topic,MEDLARS,Models,National Library of Medicine (U.S.),SML-LIB-BIBLIO,Subject Headings,Theoretical,United States,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = sep,
number = {5},
pages = {304--10},
pmid = {10303917},
title = {{Ranking documents with a thesaurus.}},
url = {http://www.ncbi.nlm.nih.gov/pubmed/10303917},
volume = {40},
year = {1989}
}
Powered by bibtexbrowser