View article

[PDF] from unitn.it

On schema matching with opaque column names and data values

Authors

Jaewoo Kang, Jeffrey F Naughton

Publication date

2003/6/9

Book

Proceedings of the 2003 ACM SIGMOD international conference on Management of data

Pages

205-216

Description

Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar" column names in the schemas to be matched, or by recognizing common domains in the data stored in the schemas. While each of these approaches is valuable in many cases, they are not infallible, and there exist instances of the schema matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are "opaque" or very difficult to interpret. In this paper we propose a two-step technique that works even in the presence of opaque column names and data values. In the first step, we measure the pair-wise attribute correlations in the tables to be matched and construct a dependency graph using mutual information as a measure of the dependency between attributes. In the second stage, we find matching node pairs in …

Total citations

Cited by 342

20032004200520062007200820092010201120122013201420152016201720182019202020212022202320245 17 32 38 19 24 15 11 23 18 19 26 9 8 16 12 11 12 5 5 6 3

Scholar articles

On schema matching with opaque column names and data values

J Kang, JF Naughton - Proceedings of the 2003 ACM SIGMOD international …, 2003