Authors
Alistair Moffat, Justin Zobel
Publication date
1996/10/1
Journal
ACM Transactions on Information Systems (TOIS)
Volume
14
Issue
4
Pages
349-379
Publisher
ACM
Description
Query-processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Retrieval time for inverted lists can be greatly reduced by the use of compression, but this adds to the CPU time required. Here we show that the CPU component of query response time for conjunctive Boolean queries and for informal ranked queries can be similarly reduced, at little cost in terms of storage, by the inclusion of an internal index in each compressed inverted list. This method has been applied in a retrieval system for a collection of nearly two million short documents. Our experimental results show that the self-indexing strategy adds less than 20% to the size of the compressed inverted file, which itself occupies less than 10% of the indexed text, yet can reduce processing time for Boolean queries of 5-10 terms to under one fifth of the previous cost. Similarly, ranked queries …
Total citations
1995199619971998199920002001200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320241055149141718203022403631343545372124171218121075255
Scholar articles
A Moffat, J Zobel - ACM Transactions on Information Systems (TOIS), 1996