Authors
Eike Schallehn, Kai-Uwe Sattler, Gunter Saake
Publication date
2002/2/26
Journal
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING
Pages
277-277
Publisher
IEEE Computer Society Press; 1998
Description
Data integration as required in a variety of applications like data warehousing, information system integration etc. makes great demands regarding features to deal with overlapping and inconsistent data. Object-relational and other data management systems available today provide only limited concepts to deal with these requirements. The general concept of grouping and aggregation appears to be a fitting paradigm for various of the current issues in data integration, but in its common form of equality-based grouping a number of problems remain unsolved. Various extensions to this concept have been introduced over the last years regarding user-defined functions for aggregation and grouping. Especially, existing extensions to the grouping operation like simple derivations of group-by values do not meet the requirements of data integration applications. We propose generic interfaces for user-defined grouping and aggregation as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications and illustrate the approach by introducing new concepts for similarity-based duplicate detection and elimination. For both approaches implementation and optimization issues are considered.
Total citations
2003200420052006200720082009201020112012201320142015201620171244413231363662
Scholar articles
E Schallehn, KU Sattler, G Saake - … OF THE INTERNATIONAL CONFERENCE ON DATA …, 2002