View article

[PDF] from microsoft.com

An efficient filter for approximate membership checking

Authors

Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Dong Xin

Publication date

2008/6/9

Book

Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages

805-818

Description

We consider the problem of identifying sub-strings of input text strings that approximately match with some member of a potentially large dictionary. This problem arises in several important applications such as extracting named entities from text documents and identifying biological concepts from biomedical literature. In this paper, we develop a filter-verification framework, and propose a novel in-memory filter structure. That is, we first quickly filter out sub-strings that cannot match with any dictionary member, and then verify the remaining sub-strings against the dictionary. Our method does not produce false negatives. We demonstrate the efficiency and effectiveness of our filter over real datasets, and show that it significantly outperforms the previous best-known methods in terms of both filtering power and computation time.

Total citations

Cited by 128

200820092010201120122013201420152016201720182019202020212022202320241 13 10 16 10 12 18 15 7 10 5 5 1 2 2 1

Scholar articles

An efficient filter for approximate membership checking

K Chakrabarti, S Chaudhuri, V Ganti, D Xin - Proceedings of the 2008 ACM SIGMOD international …, 2008