Authors
Zaiqing Nie, Subbarao Kambhampati, Ullas Nambiar
Publication date
2005/4/4
Journal
IEEE Transactions on Knowledge and Data Engineering
Volume
17
Issue
5
Pages
638-651
Publisher
IEEE
Description
Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition, there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. In this paper, we present a set of connected techniques that estimate the coverage and overlap statistics, while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries and threshold-based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and, present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the …
Total citations
2005200620072008200920102011201220132014201520162017201820192020202120222642142122111
Scholar articles
Z Nie, S Kambhampati, U Nambiar - IEEE Transactions on Knowledge and Data …, 2005