The Interchange of Mathematics in XML: MathML, OpenMath and their Application   Table of contents   Indexes   SGML in a Multilingual Environment

 

The Marriage of XML and Databases

, How to Sustain a Lasting Relationship
 John   Chelsom
  Managing Director
  CSW Informatics Ltd  Oxford Centre For Innovation
Mill Street
Oxford   United Kingdom  OX2 0JX
Phone: +44 1865 794 789
Fax: +44 1865 205 008
Email: john@csw.co.uk Web: www.csw.co.uk
 
Biographical notice:
 
John Chelsom is Managing Director of CSW Informatics, a company dedicated to providing object-level information management solutions using XML, SGML and database technology. Originally trained as an electrical engineer, John worked first as an X-Ray engineer and later gained a PhD for work on the application of knowledge based systems in medicine. From there it was a short step to the world of structured information management where he has been responsible for the design and development of XML and SGML information management systems using both object and relational database technology. John is also the presenter of the Technology Appraisals seminar series on XML Document Databases.
 
ABSTRACT:
 
Application developers and solution providers are well aware of the close relationship between XML and database technology, but the massive explosion of new standards, techniques, tools and products that combine XML and database technology is forcing developers to focus on the most fundamental issues and objectives before they can choose the right combination of tools and techniques to implement successful information systems. This presentation highlights five of the key issues that must be addressed before choosing the components of an XML database solution.
 

Introduction

 
Database technology provides the means of storing and accessing structured data in an efficient and secure manner. XML now provides a standard representation for the exchange and storage of structured information. It seems only natural that the two technologies should be combined to deliver more effective information management solutions. The problem comes in deciding the best way to do this. How should the solutions architect or developer choose between XML, SGML, RDBMS, OODBMS, SQL, ODMG, OQL, XQL, ODBC, JDBC, CORBA, WIDL, COM, Java, DOM, SAX, CGI, ASP, DCD, RDF, PICS, DSSSL, XSL, CSS2, ...? And that's before getting down to choose between the hundreds of products that are now jostling for position in the burgeoning XML marketplace.
 
One of the first steps towards a successful fusion of XML and database technologies should be to consider some of the basic objectives and issues in systems development. At the implementation level the choice of technology is influenced by factors such as:
 
  •  ease of implementation
  •  speed of implementation
  •  speed of operation
  •  ease of operation
  •  portability
  •  extensibility
  •  maintainability
 
Although important from a systems engineering perspective, none of these factors are specific to the implementation of XML database systems. This paper explores five issues that are specific to XML database systems and must be considered when designing or selecting products for such systems:
 
  •  Is the application driven by structured information or structured documents?
  •  Which type of XML database should be used?
  •  Is the system for information production, information delivery, or both?
  •  What type of distributed computing environment is required?
  •  At what level does the solution use proprietary software?
 

Is the application driven by structured information or structured documents?

 
From one perspective, XML can be regarded simply as a standard and logically structured format for electronic documents. As such there is a role for database technology in the management and delivery of XML documents, just as with any other form of electronic documents. Databases can provide secure, controlled and indexed access to XML documents as well as helping to manage specific features that XML brings to electronic documents, such as hyper-linking and document configuration.
 
However, there is another dimension to XML as a data representation which has not always been exploited fully by SGML users. The ascent of XML, and the host of vertical applications it has spawned, has re-focussed attention on SGML/XML as the mechanism for creating, managing and exchanging structured information, rather than merely a representation for structured documents.
 
From this perspective, the boundaries between XML and database representations of information become blurred and there is a requirement for software applications which access and manipulate XML to span those boundaries seamlessly. Ideally, the users of such applications should be able to gain the benefits derived from both representations, without being aware of the underlying data format.
 

Which type of XML database should be used?

 
Several years ago this question would have been taken to mean: should relational or object-oriented database technology be used to store XML documents? To some extent the fierce intellectual debate on this issue has now subsided and been replaced by more pragmatic issues (like does the technology perform the functions required of it, at sufficient speed, for the required volumes of data and number of users?). Now the question can be interpreted as a choice between three basic functional types of database:
  Definitions
XML generating databases
generate XML documents as an interface to the outside world, but which internally are not aware of XML structures.
XML document databases
hold XML documents (or fragments) as blobs of text within a more general database schema.
XML component databases
hold each element and entity of an XML document as an individual database object.
 
To some extent the type of XML database determines the underlying technology, since the first two types are most usually implemented using relational database technology and the third using object-oriented technology.
 

Is the system for information production, information delivery, or both?

 
This is such a basic question that it is sometimes easy to overlook. Systems that manage information production are generally much more difficult to implement since they must account for both read and write access to the database. The designer of an effective information production system must address issues such as data locking, reuse and versioning which add to the complexity of the design.
 
When databases are used solely for XML delivery, the most important factor is simply the efficiency with which information can be located, extracted and presented to access clients. Here, the most challenging aspects of design often revolve around the dynamic assembly of information in reponse to a specific client request. Since speed of response is usually a critical success factor, the system designer must find the most effective way to distribute processing between the client, server and middle tiers of the architecture.
 
In cases where a system covers both production and delivery, it is important to remember that the database system managing the production side may not be the same as the system used for delivery. Witness, for example, the rapid proliferation of systems where an XML database server is deployed to extract information from one or more legacy database systems and deliver that information to external users over the web.
 

What type of distributed computing environment?

 
Or to put it another way, does the marriage of XML and databases always end in tiers? First there were mainframe architectures where the server did all the processing and the client was a dumb terminal, then client/server with most of the processing devolved to the client side, then thin (web-based) client/server where the bulk of the processing went back to the server side and now multi-tier client/server where processing is split between at least three levels of the architecture (though not always on three different processors). There are many good reasons to implement multi-tiered architectures, such as:
 
  •  Running a web server in the middle tier to insulate a thin client layer from database servers, enabling the database to serve clients over low bandwidth networks.
  •  Using an XML server in the middle tier to integrate multiple legacy systems.
  •  Using multiple database client processes in the middle tier to balance the load on the database server.
 

At what level does the solution use proprietary software?

 
In practice, it is not generally possible to implement systems using entirely non-proprietary data structures or process encodings. Usually developers are keenly aware of the trade-off between using open standards and creating systems which can be easily implemented and which perform adequately. When it comes to implementing using non-proprietary standards, longer term objectives must be considered.
 
In the case of XML, one important objective is to invest in reusable information, by separating that information resource from the applications that manipulate it, rather than embedding information inside proprietary application code.
 
Another consideration, is that by using XML encoding for proprietary or application-specific data structures (e.g. message structures, interface definitions, forms designs) those structures become easier to process, because off-the-shelf parsers and processors are already available using standard interfaces such as DOM (or SAX).
 
With this objective in mind it is relatively easy to draw the line at which any proprietary applications or encodings should be excluded.

The Interchange of Mathematics in XML: MathML, OpenMath and their Application   Table of contents   Indexes   SGML in a Multilingual Environment