XSLTVM - an XSLT Virtual Machine   Table of contents   Indexes   Graphics on the Web

 DBMS, DataBase Management Systems  
Data Interchange
 Databases  
 XML 
 

Informix & XML

 in, out, and shakin’ all about
Brown, Paul
 
 Paul  Brown
 Chief Plumber
  California 
INFORMIX Software Inc.
 Menlo Park 
 USA 
INFORMIX Software Inc.,  4100 Bohannon Drive
Menlo Park  California  94025 USA
Phone: 650 926 6300 Fax: 510 628 3951 email: paul.brown@informix.com web site: www.informix.com
 Biography
 Paul Brown - Paul Brown, who works in INFORMIX’s Chief Technology Office, is INFORMIX’s ‘Chief Plumber’. He was part of the team at UC Berkeley that created Postgres, an early Object-Relational DBMS prototype, and he worked for Illustra, a California start-up that was among the first to commercialize ORDBMS technology. Mr. Brown is the co-author ofObject-Relational DBMS: Tracking the Next Great Wave ,Developing Object-Relational Database Applications , and numerous research papers.
 Abstract
 An Object-Relational DataBase is often referred to as an extensible server software system. XML is the eXtensible Markup Language. In this talk, you will learn what happens when the most advanced data management technology, and the most advanced internet technology, collide. In a nutshell you will learn how easy it is to get XML data into an object-relational DBMS, how easy it is to get XML out, and how you can take advantage of an ORDBMS to ‘shake XML all about’.
Database Systems
 Internet  
 

Introduction

 The web changes everything. By expanding access to information, and our ability to share information, the web enhances individual productivity, and improves overall organizational efficiency. The scale of these changes has been trivialized and obscured by the recent dot.com mania, but it is hard to underestimate the impact the Internet will have on our lives, and on the lives of our children.
 Once, IT professionals built information systems that were like islands. Travel between these data islands was very difficult so exchanges between them were rare. As a result other challenges – language differences, standards etc – attracted little attention. But the web makes travel between our data islands far easier. And this makes XML important, because it represents a powerful way to overcome semantic barriers to information exchange.
 DBMS, DataBase Management Systems 
 
So technology vendors like INFORMIX are changing. We are making XML an integral part of our products and solutions. Traditionally, INFORMIX has been a leading provider ofDBMS software. With more people sharing more information, there is greater demand than ever for our scalable, transactional, data management products. Also, one of the basic requirements for web software is flexibility: web sites evolve rapidly, changing their look-and-feel, content, and the kinds of the services they provide. This makes declarative, query-centric interfaces, where a web application can ask and answer ad hoc questions, very useful.
 But how do we view the combination of XML and DBMSs? And how do we think it ought to be done?
 Java  
Object-Relational Databases
 XML 
 

Extensible or object-relational DBMSs

 Fortunately, recent changes in DBMS technology make this integration easier than it would have been before. Today, the best DBMS engines are extensible. This means that they allow developers to embed modules of procedural code within them, and to use these modules within an abstracted, logical data model. In other words, instead of storing INTEGER, VARCHAR, DECIMAL, FLOAT and BLOB data types, and relying on middle-ware or client-side logic to turn this data into information, the columns in an object-relational database’s table can contain instances of atomic objects as exotic and varying as Java beans, records of temperature {120 F, 41 C, 304 K}, physical quantities {85 Kg, 180 Lb}, geographic points and polygons, finger prints and so on.
 Moreover, ORDBMSs allow developers to reason about these objects in the query language. For example, consider a Business-to-business (B2B) e-commerce exchange where retail buyers locate perishable foodstuffs, chek on the availability of space in a refrigerated moving van, and send messages that are bids to buy the inventory and reserve van space. Within an ORDBMS, such a schema and queries might look like this:
 
Object-relational schema and query example
 Illustrates object-relational schema storing extensible objects within tables, and example of declarative query expression using these objects.
CREATE TABLE Perishable_Food (      CREATE TABLE Freight_Space (
What       Food_Type  NOT NULL,     From_To   Geo_Path   NOT NULL,
Where      Geo_Point  NOT NULL,     When      Period     NOT NULL,
Available  Period     NOT NULL      Capacity  Mass       NOT NULL,
);                                    Space     Volume     NOT NULL,
Goods   SET(Packages NOT NULL)
);
SELECT F.From_To, F.When
FROM Freight_Spaces F, Perishable_Food P
WHERE Geo_Within    ( Circle(P.Where, ’10 Miles’), From(F.From_To) )
AND Time_Within   ( F.Available, Start (P.When) )
AND Has_Space_For ( P.What, F.Capacity, F.Space, F.Goods );
 The point of this figure is to illustrate how sophisticated modern data management systems can be. And it also hints at the necessity of XML in this application. Where do the values in these tables come from? Given the variety of data islands involved (each wholesale supplier and trucking company probably has their own, existing management information systems, each with its own formats and structures for storing data) how can all of this be unified? The answer is XML.
 The good news is that DBMS extensibility also means much of the plumbing necessary to make XML a reality can now be embedded directly into the DBMS. (This does not mean, of course, either that XML is the only way to talk to an ORDBMS, nor that an ORDBMS is the only use for XML!) Over the next couple of pages, we will see how this can be done.
 Parsers 
 XML 
 

Getting XML in

 The problem with building this kind of system is the number and variety of islands of data involved. XML excels at overcoming this problem. Independently of the DBMS, our B2B site developers can create a set of DTD specifications to describe how information can be communicated. In our example application, such a message sample may look like this:
 
Examples of XML exchange of business information
 Illustrates how XML might be used as a standard means of representing complex business data. The XML data in these examples would comply with an appropriate, standardized Document Type Definition or Style Sheet.
<goods>                            <freight_spaces>
<food_item>                        <transport>
<food_type>                        <trip>
<name>Apples</name>               (2.371,48.937,4.01,49.24)
<mass unit="Kg">12</mass>         </trip>
<space unit="M">1x1x1</space>     <when>
<store unit="C">12</store>         <from>05/02/2000</from>
</food_type>                        <to>06/02/2000</to>
<loc>(2.371,48.937)</loc>          </when>
<available>                        <cap units="T">1.5</cap>
<from>05/02/2000</from>           <vol units="M">3x3x2</vol>
<to>10/02/2000</to>               <goods>
</available>                        <item>Wine
</food_item>                          <mass units="Kg">175</mass>
<food_item>                            <space unit="M">.5x.5x.5</space>
etc                              <store unit="C">20</store>
</food_item>                         </item>
</goods>                              <item>
etc
</item>
</transport>
</freight_spaces>
 The trick, of course, is bridging the gap between the kind of data you see in , which might come from a variety of sources, and the kind of structure you see in , where end users answer their questions.
 One of XML’s strengths lies in the way it employs standard ASCII text. Although accessing data within an XML document requires that you first process it, because of XML’s simple structure, writing parsers for it is a relatively simple programming assignment. Consequently, a variety of commercial quality parsers are available, for free, from various sources on the web. Many of these parsers are written in Java.
 Extensible DBMSs can take Java code, and run it natively within the DBMS. Consequently, we are able to embed several, free Java XML parsers directly into the framework of our server. In below, we illustrate the general architecture.
 For large documents and systems with high volumes of information exchange, such an approach has a performance advantage because it avoids the overhead of moving queries and data between an external program and the DBMS. It is also attractive from an ongoing administration and maintenance perspective because the embedded code is not linked into the ORDBMS as it might be with more conventional programs. Extensible DBMSs employ dynamic linking and invocation techniques that make replacing such a module as easy as dropping an empty database table.
 
Architecture for embedding XML parser into ORDBMS
 Illustrates embedding an XML parser into the ORDBMS. The parser picks apart the XML document, modifying the state of the database schema depending on what it finds. Note that the parser may elect to store the entire document un-parsed. Parser may use XSL specification to allocate data in XML document to locations in schema.
 This process is made easier when the overall bundle also includes:
 
  • Tools to map DTD and style sheets to corresponding relational schemas.
  •  
  • Advanced data model features in the ORDBMS, like compound (multi-part) types, collections (sets) and facilities to store semi-structured data like text indexing.
  • SGML Markup
     

    Getting XML out

     XML is a derivative of SGML. So is HTML. For some time, DBMS vendors have been providing tools that can take the results of a SQL query, and return it marked up with HTML tags. It is a fairly straightforward engineering assignment to re-work these tools to handle XML too.
     Web development tools like this rely on the way query results consist of a set of named columns. Data in an ORDBMS’s columns can be of a compound form (a single column containing multiple elements) or a COLLECTION (a single row/column data object consisting of a set of values). Fortunately, both of these novelties can be easily married to the XML data model.
     In terms of our islands of data, getting XML out of the database makes it possible to encapsulate the functionality of a central server like the one we use in our examples. Other systems wishing to exchange information with it can send in their contributions in XML form, and receive responses in XML. An overall architecture that adopted this kind of approach might look like what we see in following.
     In this figure we see how multiple, heterogeneous, islands of data can all share their information in order to achieve more individual business efficiency. In this figure, sub-sets of the information in systems developed by trucking companies and food wholesalers are exchanged (using XML) with a central B2B service. Using this service, other businesses can bid for allotments of perishable goods, making their valuation decisions not merely on the quality of the good on offer, but also on its geographic location, and based on whether or not it can be delivered.
     In this example, we see XML being used both to get the data into the central store, and to get information out of the central store and back into each external information system.
     
    B2B infrastructure architecture
     The "bigger picture"; illustrating how the XML/ORDBMS strategies described in this paper fits into an overall B2B infrastructure.
    Document Object Models
     XQL  
     XSL  
     

    Shakin’ it all about

     Most data management companies and many web applications will adopt this kind of model. But it is unsuitable for every kind of XML. Another potential use for XML is in document exchange. In this problem domain, XML data usually exhibits much less structure than in the kind of scenario we envision earlier. Never the less, it is still highly desirable to store the XML data in a transactional system, and then to allow external users to interact with it: to query it, read it, and so on. In other words, in addition to getting it in, and getting it out, any complete XML story needs to deal also with shakin’ it all about.
     Ultimately, an XML document can be completely unstructured but ‘marked up’. In this kind of document key words or phrases are tagged with a label than conveys semantic information. Sometimes these tags are indications for a user-interface program, but sometimes users want to ask questions about the contents of such documents. For example, they may want to say “Show me documents in the repository where the word ‘Paris’ is tagged up as a ‘destination’?”
     The appropriate way to store this kind of document data is to do so using data management techniques like document indexing, query-by-document content, and so on. Object-relational DBMSs can be extended with this kind of functionality too. Whether or not you ultimately use a DBMS to store the data, an ORDBMS can play an invaluable role as index and scalable subject catalog.
     Alternatively, in the absence of a style sheet or DTD, it might be desirable toshred an XML document. Shredding involves parsing the XML but instead of assigning values in its elements to corresponding rows in a table. An obvious challenge with such a strategy is how you maintain the XML document’s original structure within the Object-Relational model.
     

    Summary and conclusions

     In this paper we have explored how XML and extensible or Object-Relational DBMS technology complement one another. In the short term, the importance and usefulness of XML in building web applications is as a data inter-change format, enabling information exchange between islands of data. But to use XML efficiently requires changes to how database management systems are built, and developers wishing to build effective web applications would to well to use object-relational DBMSs somewhat differently from how they used relational DBMSs in the past.
     The key points of this paper are:
     
    1. The core extensibility of an object-relational DBMS allows vendors and our customers to embed logic into the DBMS to parse XML data, and then modify the database based on the contents of the document. Similarly, the best DBMS technology allows developers to embed logic to convert SQL query results into XML.
    2. Because the object-relational data model includes facilities like compound (multi-element) data structures and COLLECTIONS (sets), the task of mapping between XML and ORDBMS SQL is not as complex as the task of mapping between XML and earlier versions of SQL. Further, using XML is necessary to truly support such systems, because of the complexity of the data involved.
     In summary, the three things you need to support XML are the capacity to get XML data into your database, get XML out when an external system requires it, and shake XML all about when you need to store it.

    XSLTVM - an XSLT Virtual Machine   Table of contents   Indexes   Graphics on the Web