| The Interchange of Mathematics in XML: MathML, OpenMath and their Application | Table of contents | Indexes | SGML in a Multilingual Environment | |||
The Marriage of XML and Databases |
| John Chelsom |
| Managing Director |
| CSW Informatics Ltd
Oxford Centre For Innovation Mill Street Oxford United Kingdom OX2 0JX Phone: +44 1865 794 789 Fax: +44 1865 205 008 Email: john@csw.co.uk Web: www.csw.co.uk |
Biographical notice: |
ABSTRACT: |
Introduction |
Is the application driven by structured information or structured documents? |
From one perspective, XML can be regarded simply as a standard and logically structured format for electronic documents. As such there is a role for database technology in the management and delivery of XML documents, just as with any other form of electronic documents. Databases can provide secure, controlled and indexed access to XML documents as well as helping to manage specific features that XML brings to electronic documents, such as hyper-linking and document configuration. |
However, there is another dimension to XML as a data representation which has not always been exploited fully by SGML users. The ascent of XML, and the host of vertical applications it has spawned, has re-focussed attention on SGML/XML as the mechanism for creating, managing and exchanging structured information, rather than merely a representation for structured documents. |
From this perspective, the boundaries between XML and database representations of information become blurred and there is a requirement for software applications which access and manipulate XML to span those boundaries seamlessly. Ideally, the users of such applications should be able to gain the benefits derived from both representations, without being aware of the underlying data format. |
Which type of XML database should be used? |
Several years ago this question would have been taken to mean: should relational or object-oriented database technology be used to store XML documents? To some extent the fierce intellectual debate on this issue has now subsided and been replaced by more pragmatic issues (like does the technology perform the functions required of it, at sufficient speed, for the required volumes of data and number of users?). Now the question can be interpreted as a choice between three basic functional types of database: |
Definitions
|
To some extent the type of XML database determines the underlying technology, since the first two types are most usually implemented using relational database technology and the third using object-oriented technology. |
Is the system for information production, information delivery, or both? |
This is such a basic question that it is sometimes easy to overlook. Systems that manage information production are generally much more difficult to implement since they must account for both read and write access to the database. The designer of an effective information production system must address issues such as data locking, reuse and versioning which add to the complexity of the design. |
When databases are used solely for XML delivery, the most important factor is simply the efficiency with which information can be located, extracted and presented to access clients. Here, the most challenging aspects of design often revolve around the dynamic assembly of information in reponse to a specific client request. Since speed of response is usually a critical success factor, the system designer must find the most effective way to distribute processing between the client, server and middle tiers of the architecture. |
In cases where a system covers both production and delivery, it is important to remember that the database system managing the production side may not be the same as the system used for delivery. Witness, for example, the rapid proliferation of systems where an XML database server is deployed to extract information from one or more legacy database systems and deliver that information to external users over the web. |
What type of distributed computing environment? |
Or to put it another way, does the marriage of XML and databases always end in tiers? First there were mainframe architectures where the server did all the processing and the client was a dumb terminal, then client/server with most of the processing devolved to the client side, then thin (web-based) client/server where the bulk of the processing went back to the server side and now multi-tier client/server where processing is split between at least three levels of the architecture (though not always on three different processors). There are many good reasons to implement multi-tiered architectures, such as: |
|
At what level does the solution use proprietary software? |
In practice, it is not generally possible to implement systems using entirely non-proprietary data structures or process encodings. Usually developers are keenly aware of the trade-off between using open standards and creating systems which can be easily implemented and which perform adequately. When it comes to implementing using non-proprietary standards, longer term objectives must be considered. |
In the case of XML, one important objective is to invest in reusable information, by separating that information resource from the applications that manipulate it, rather than embedding information inside proprietary application code. |
Another consideration, is that by using XML encoding for proprietary or application-specific data structures (e.g. message structures, interface definitions, forms designs) those structures become easier to process, because off-the-shelf parsers and processors are already available using standard interfaces such as DOM (or SAX). |
With this objective in mind it is relatively easy to draw the line at which any proprietary applications or encodings should be excluded. |
| The Interchange of Mathematics in XML: MathML, OpenMath and their Application | Table of contents | Indexes | SGML in a Multilingual Environment | |||