Authors
Guoliang Li, Jian He, Dong Deng, Jian Li
Publication date
2015/5/27
Book
Proceedings of the 2015 ACM SIGMOD international conference on management of data
Pages
1137-1151
Description
In this paper we study similarity join and search on multi- attribute data. Traditional methods on single-attribute data have pruning power only on single attributes and cannot efficiently support multi-attribute data. To address this problem, we propose a prefix tree index which has holis- tic pruning ability on multiple attributes. We propose a cost model to quantify the prefix tree which can guide the prefix tree construction. Based on the prefix tree, we devise a filter-verification framework to support similarity search and join on multi-attribute data. The filter step prunes a large number of dissimilar results and identifies some candi- dates using the prefix tree and the verification step verifies the candidates to generate the final answer. For similar- ity join, we prove that constructing an optimal prefix tree is NP-complete and develop a greedy algorithm to achieve high performance. For similarity search, since one prefix tree …
Total citations
20162017201820192020202120222023202441013776112
Scholar articles
G Li, J He, D Deng, J Li - Proceedings of the 2015 ACM SIGMOD international …, 2015