Indexing high-dimensional data for main-memory similarity search

Yu, Xiaohui; Dong, Junfeng

doi:10.1016/j.is.2010.05.001

by Xiaohui Yu, Junfeng Dong

Abstract:

As RAM gets cheaper and larger, in-memory processing of data becomes increasingly affordable. In this paper, we propose a novel index structure, the CSR+-tree, to support efficient high-dimensional similarity search in main memory. We introduce quantized bounding spheres (QBSs) that approximate bounding spheres (BSs) or data points. We analyze the respective pros and cons of both QBSs and the previously proposed quantized bounding rectangles (QBRs), and take the best of both worlds by carefully incorporating both of them into the CSR+-tree. We further propose a novel distance computation scheme that eliminates the need for decompressing QBSs or QBRs, which results in significant cost savings. We present an extensive experimental evaluation and analysis of the CSR+-tree, and compare its performance against that of other representative indexes in the literature. Our results show that the CSR+-tree consistently outperforms other index structures.

View PDF

Reference:

Indexing high-dimensional data for main-memory similarity search (Xiaohui Yu, Junfeng Dong), In Information Systems, Elsevier, volume 35, 2010.

Bibtex Entry:

@article{Yu2010c,
abstract = {As RAM gets cheaper and larger, in-memory processing of data becomes increasingly affordable. In this paper, we propose a novel index structure, the CSR+-tree, to support efficient high-dimensional similarity search in main memory. We introduce quantized bounding spheres (QBSs) that approximate bounding spheres (BSs) or data points. We analyze the respective pros and cons of both QBSs and the previously proposed quantized bounding rectangles (QBRs), and take the best of both worlds by carefully incorporating both of them into the CSR+-tree. We further propose a novel distance computation scheme that eliminates the need for decompressing QBSs or QBRs, which results in significant cost savings. We present an extensive experimental evaluation and analysis of the CSR+-tree, and compare its performance against that of other representative indexes in the literature. Our results show that the CSR+-tree consistently outperforms other index structures.},
author = {Yu, Xiaohui and Dong, Junfeng},
doi = {10.1016/j.is.2010.05.001},
issn = {03064379},
journal = {Information Systems},
keywords = {Cache-conscious,Indexing,SML-LIB-BIBLIO,Similarity search,high-dimensional data,lang:ENG},
mendeley-tags = {SML-LIB-BIBLIO,lang:ENG},
month = nov,
number = {7},
pages = {825--843},
publisher = {Elsevier},
title = {{Indexing high-dimensional data for main-memory similarity search}},
url = {http://linkinghub.elsevier.com/retrieve/pii/S0306437910000402},
volume = {35},
year = {2010}
}