Free-text medical document retrieval via phrase-based vector space model.

return to the website
by Wenlei Mao, Wesley W Chu
Abstract:
Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval accuracy. Yet the incompleteness of the knowledge source precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of multiple concepts and word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from phrase similarities. Using OHSUMED as a test collection and UMLS as the knowledge source, our experiment results reveal that phrase-based VSM yields a 16\% increase of retrieval accuracy compared to the stem-based model.
Reference:
Free-text medical document retrieval via phrase-based vector space model. (Wenlei Mao, Wesley W Chu), In AMIA Symposium. American Medical Informatics Association, 2002.
Bibtex Entry:
@inproceedings{Mao2002,
abstract = {Many information retrieval systems are based on vector space model (VSM) that represents a document as a vector of index terms. Concepts have been proposed to replace word stems as the index terms to improve retrieval accuracy. However, past research revealed that such systems did not outperform the traditional stem-based systems. Incorporating conceptual similarity derived from knowledge sources should have the potential to improve retrieval accuracy. Yet the incompleteness of the knowledge source precludes significant improvement. To remedy this problem, we propose to represent documents using phrases. A phrase consists of multiple concepts and word stems. The similarity between two phrases is jointly determined by their conceptual similarity and their common word stems. The document similarity can in turn be derived from phrase similarities. Using OHSUMED as a test collection and UMLS as the knowledge source, our experiment results reveal that phrase-based VSM yields a 16\% increase of retrieval accuracy compared to the stem-based model.},
author = {Mao, Wenlei and Chu, Wesley W},
booktitle = {AMIA Symposium. American Medical Informatics Association},
issn = {1531-605X},
keywords = {Abstracting and Indexing as Topic,Algorithms,Brain Edema,Humans,Information Storage and Retrieval,Information Storage and Retrieval: methods,SML-LIB-BIBLIO,Subject Headings,Unified Medical Language System,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = jan,
pages = {489--93},
pmid = {12463872},
title = {{Free-text medical document retrieval via phrase-based vector space model.}},
url = {http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2244442\&tool=pmcentrez\&rendertype=abstract},
year = {2002}
}
Powered by bibtexbrowser