Storing XML inDatabases

queries, scalability, and backup andrestoration utilities.This article explains how native-XML databases efficiently and reliablystore XML content better than relation-al or partial XML databases. It includesa “walk-through” of the storage andretrieval process as it supports an e-bus-iness transaction.XML Database ManagementApproachesWhile there are good reasons to storeXML persistently in a database, thispresents a challenge. There are funda-mental mismatches between the XML-structured data and the data model vir-tually all mainstream RDBMS productssupport. Entire books have been writtenon this subject, but Figure 1 summarizesthe essential differences. You can think of solutions to thechallenge as points along a spectrumfrom the “pure relational” approach to“post relational” approach (though mostreal-world systems are a combination).Pure RelationalThe relational model of databasesoffers one answer: Normalize the XMLdata into rows and columns, with eachcell containing an “atomic” text value.Any hierarchical or network data modelcan be translated into normalized rela-tions, so in principle any document can bedecomposed for relational storage. Thetechniques for normalizing tree structuresare discussed in advanced RDBMS text-books. Similarly, Wrox Press’s Profes-sional XML Databases provides a check-list of 18 rules (Chapter 3) that givedetailed suggestions about how to mode astructure specified by an XML DTD in arelational database.The problem is that there’s a signifi-cant gap between relational model prin-ciples and the actual practice of RDBMSvendors and users. Normalizing XMLstructures into RDBMS relations can befiendishly complex for designers, time-consuming for programmers, and opera-tionally inefficient for Database Admin-istrators (DBAs) or users. The more“document-like” the data model — thatis, when there are recursive elements,mixed content, and a less rigid structure— the more difficult it is to devise prac-tical RDBMS models for XML data.Furthermore, there are well-known chal-lenges in using Structured QueryLanguage (SQL) to effectively querynormalized recursive data models suchas a bill of material. Relational puristsuse the challenges of building bills ofmaterial applications to illustrate SQL’slack of adherence to the theories underly-ing the relational model (e.g., FabianPascal’s Practical Issues in DatabaseManagement: A Reference for the Think-ing Practitioner, Chapter 7). Ordinaryusers can be forgiven for being dauntedby the challenges of effectively normaliz-ing and querying hierarchical XML doc-uments using today’s SQL databases!Post-RelationalRelational database vendors haveresponded to the challenges of serializ-ing the data from object-oriented pro-grams and object-oriented databases byadding features useful for simplifyingXML data management. Most funda-mentally, RDBMS systems have addedLarge Object (LOB) data types thatallow arbitrary types and amounts ofdata to be stored and retrieved in a sin-gle “cell” of a table. Similarly, RDBMSvendors (and the SQL standard) haveadded other “post-relational” featuressuch as:• Support for “cells” containing repeat-ing groups of data• Full-text search capabilities.These features make it easier for non-specialists to build effective databaseapplications that don’t fit the constraintsof the pure relational model. It’s not abig stretch for the RDBMS turned“object-relational” vendors to add con-venient XML extensions to their prod-ucts that exploit new post-relational andtext-retrieval features. Additional utili-ties ease the burden of modeling XMLhierarchies to work with the underlyingrelational and post-relational storagemodels. World Wide Web Consortium(W3C) recommendations for XML,such as the XPath query syntax and theDocument Object Model (DOM) Appli-cation Program Interface (API), are pri-marily supported in the utilities. OnceSQL or proprietary text-search exten-sions have located records, the XMLutilities provide tools for representingthe results as XML and manipulating theXML with DOM, Xpath, etc. CurrentXML-enabled database systems don’tsupport a complete, seamless round tripof arbitrary XML content into and out ofa database. None of them support thecomplete XPath specification as a querylanguage into the database itself.We see several features added toobject-relational databases to make itrelatively easy to store and retrieveXML data. Different vendors take dif-ferent approaches, but all add XMLsupport on top of existing featuresrather than as a fundamentally new stor-age model inside the database engine.Even Oracle 9i’s “native XML SQL datatype” is essentially just a CharacterLarge Object (CLOB) that supportssome proprietary extensions to SQL forXML processing. Native XMLTo work with XML data in a main-stream DBMS, either the end user or theDBMS vendor must use relatively sophis-ticated techniques to overcome the mis-match between XML data and existingtechnologies. An alternative approach isto build a DBMS from the bottom up toeasily store, retrieve, and query XML-structured data. Such “native XML” data-base systems expose the data and the pro-cessing model via XML standards. AnXML document is the fundamental unitof storage. XML DTDs or schemas,rather than RDBMS schemas, define theproperties of document collections.XPath or another XML-specific querylanguage locates documents meetingsome search criteria. Some productsallow XML data to be processed in theactual database engine (as opposed tosome external utility) with Simple APIfor XML (SAX), DOM, and XML Style-sheet Language Transformations(XSLTs), XLink, etc.A native XML database doesn’t askthe user to worry about how to map54eAI Journal • October 2001 .XML structures onto some non-XMLunderlying data model or processinglanguage. No native XML databaseproduct supports every detail of everyXML specification, but the XML stan-dards define a large percentage of theinterfaces to native XML DBMS prod-ucts. There’s little need for a 1,000-page book on native XML databaseprogramming, because those familiarwith XML specifications and toolsalready know about 95 percent of whatthey need to know to use native XMLdatabase products!This conceptual advantage of using anative XML DBMS to handle XMLdata can translate into concrete opera-tional advantages, too. The coderequired to partition XML documentsinto multiple tables and CLOBs and totranslate queries and merge results frommultiple underlying database structuresprovides additional processing overheadand points of failure compared with themore straightforward native XMLapproach. While performance and relia-bility depend on a complex mixture offactors, native XML databases provide amore scalable, reliable platform inwhich to store XML than object-rela-tional databases. Since native XMLdatabases tend to focus on doing onething well rather than being a “universaldatabase” solution, the workload forDBAs and Web administrators is con-siderably reduced. In short, the “totalcost of ownership,” (considering thehardware, programming, administra-tion, and training) is likely to be signif-icantly lower for XML applicationsbuilt on a native XML database.Choosing an RDBMS orXMLDBOf course, native XML databasesaren’t a universal solution for all datamanagement needs — or even XMLdata management needs. When is anRDBMS the more appropriate back-endor when is a native XML DBMS thebetter choice? RDBMS systems areprobably best for maintaining theintegrity of data, and XML DBMS sys-tems are best for maintaining XML doc-uments. The distinction between “docu-ments” and “data” is fuzzy and the riseof XML has further blurred the bound-ary. Let’s try to clarify it a bit. “Data” describes propositions aboutthe world. RDBMS systems (to theextent they’re truly based on E. F. Codd


Storing XML inDatabases
.XML Database ManagementApproaches,There aremismatches betweenthe XML-structureddata and the datamodel RDBMSproducts support.Native XMLdatabases aren’t auniversal solutionfor all data manage-ment needs.

Website design company in kolkata