Schema-Directed XML Publishing and Integration

A common paradigm for data exchange on the Web is to first convert data to XML, and then send the XML data over the network to another party. Transformation from databases to XML is often referred to as XML publishing. In practice, this is always done with a predefined XML schema: a community or industry agrees on a certain schema, and subsequently all members of the community create XML documents for their data such that the documents conform to the predefined schema. An XML schema typically consists of a (recursive) DTD and a set of integrity constraints, and schema-conformance requires that the published XML documents both conform to the DTD and satisfy the constraints.

Schema-directed XML publishing. We consider a systematic DTD-directed approach to publishing data while guaranteeing schema-conformance. It is based on a novel notion of attribute transformation grammars (ATG). An ATG extends a DTD by associating semantic rules via SQL queries. It differs from conventional attribute grammars as it does not ``parse'' data; instead, it extracts relevant data selectively from databases to construct an XML document that conforms to the DTD.

Incremental XML publishing. Since the underlying source data constantly evolves and changes, critical to XML publishing is the ability to reflect source updates in the published XML document accurately and efficiently. It is typically costly to recompute the entire document from scratch when source data is updated. Thus one needs to incrementally evaluate schema-directed XML publishing: propagate the updates from the data sources to the target XML document (XML views) without violating the predefined schema and with minimal recomputation.

Schema-directed XML integration. While XML publishing is nontrivial, schema-directed XML integration is much harder. It typically needs to extract data from a collection of distributed, heterogeneous data sources, and construct an XML document that conforms to a predefined schema.

Schema-directed XML-to-XML transformations. Given an XML source and a (recursive) target XML schema, one often wants to transform the data from the source to a target XML document that conforms to the given target schema. XML query languages and systems do not provide any guidance on how to define an transformation that is guaranteed to type-check.

Lossless XML data merging. One often needs to collect data from multiple distributed XML sources, and integrate the data in an XML archive without loss of information, such that queries over the source data can be effectively translated to equivalent queries over the XML archive. This calls for a notion of schema embedding that enables a source schema to be effectively matched to (or, embedded in) a target schema with larger ``information capacity''. This notion should allow for powerful schema-restructuring transformations that capture data-structuring variants (e.g. different hierarchical element groupings) commonly encountered in practice, while ensuring lossless instance mappings and effective translation of source queries over the target.

People:

Philip Bohannon, Michael Benedikt, Chee Yong Chan, Byron Choi, Wenfei Fan, Xibei Jia, Rajeev Rastogi

Publications:

  • Information Preserving XML Schema Embedding.
    The 31st International Conference on Very Large Data Bases (VLDB'05)
    Philip Bohannon, Wenfei Fan, Michael Flaster, P.P.S.Narayan.

  • Composable XML Integration Grammars [.pdf]
    ACM Thirteenth Conference on Information and Knowledge Management (CIKM'04)
    Wenfei Fan, Ming Xiong, Minos N.Garofalakis and Xibei Jia

  • Incremental Evaluation of Schema-Directed XML Publishing [.pdf]
    ACM SIGMOD Conference on Management of Data (SIGMOD'04)
    Philip Bohannon, Byron Choi, Wenfei Fan

  • Capturing both Types and Constraints in Data Integration [.pdf]
    ACM SIGMOD Conference on Management of Data (SIGMOD'03)
    Michael Benedikt, Chee Yong Chan, Wenfei Fan, Juliana Freire, and Rajeev Rastogi

  • DTD-Directed Publishing with Attribute Translation Grammars [.pdf]
    The 28th International Conference on Very Large Data Bases (VLDB'02)
    Michael Benedikt, Chee Yong Chan, Wenfei Fan, Rajeev Rastogi and Shihui Zheng, Aoying Zhou

  • A Uniform System for Publishing and Maintaining XML Data [.pdf]
    The 30th International Conference on Very Large Data Bases (VLDB'04), demo
    Byron Choi, Wenfei Fan, Xibei Jia and Arek Kasprzyk

  • TREX: DTD-Conforming XML to XML Transformations [.pdf]
    ACM SIGMOD Conference on Management of Data (SIGMOD'03), demo
    Qing Wang, ..., Wenfei Fan

Posters:

  • A Uniform System for Publishing and Maintaining XML Data [ JPG (203K) ] [PDF (A0, 1.3M) ] [PS (A0, 1.4M) ] [GZ (A0, 566K) ]
    for demos on the International Conference on Very Large Data Bases (VLDB), 2004
Topic revision: r1 - 17 Jan 2006 - 11:44:39 - XibeiJia
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies