Authors
Li Ding, Tim Finin, Yun Peng, Paulo Pinheiro Da Silva, Deborah L McGuinness
Publication date
2005/4/30
Journal
TR-CS-05-06
Description
The Semantic Web can be viewed as one large 'universal' RDF graph distributed across many Web pages. This is impractical for many reasons, so we usually work with a decomposition into RDF documents, each of which corresponds to an individual Web page. While this is natural and appropriate for most tasks, it is still too coarse for some. For example, many RDF documents may redundantly contain the same data and some documents comprise large amounts of weakly-related or unrelated data. Decomposing a document into its RDF triples is usually too fine a decomposition, information may be lost if the graph contains blank nodes. We define an intermediate decomposition of an RDF graph G into a set of RDF 'molecules', each of which is a connected sub-graph of the original. The decomposition is 'lossless' in that the molecules can be recombined to yield G even if their blank node IDs are 'standardized apart'. RDF molecules provide a useful granularity for tracking the provenance of or evidence for information found in an RDF graph. Doing so at the document level (e.g., finding other documents with identical graphs) may find too few matches. Working at the triple level will just fail for any triples containing blank nodes. RDF molecules are the finest granularity at which we can do this without loss of information. We define the RDF molecule concept in more detail, describe an algorithm to decompose an RDF graph into its molecules, and show how these can be used to find evidence to support the original graph. The decomposition algorithm and the provenance application have both been prototyped in a simple Web-based demonstration.
Total citations
2005200620072008200920102011201220132014201520162017201820192020202120222023359516231315127678937222
Scholar articles
L Ding, T Finin, Y Peng, PP Da Silva, DL McGuinness - TR-CS-05-06, 2005