The promised land of XML   Table of contents   Indexes   Executive briefings

 

The state of XML

Dumbill, Edd
 
 Edd  Dumbill
 Managing Editor, XML.com
  California 
O'Reilly Network
Sebastopol
 USA 
O'Reilly Network,  101 Morris Street
Sebastopol  California  95472 USA
Phone: +44 7020 936870 Fax: +44 8701 640230 email: edd@xml.com web site: xml.com
 Biography
 Edd Dumbill - Edd Dumbill is Managing Editor of XML.com, a web site for XML developers published by O'Reilly Network. As well as writing about XML, Edd develops XML applications and web sites. He founded XMLhack.com, the XML developer news site, and was co-founder of Pharmalicensing.com, an intellectual property exchange for the pharmaceutical industry.
 Abstract
 This paper examines the current state of XML, from standards initiatives to commercial tool support. From its catalytic effect in electronic business to its long term influence on the future of the Web, XML is having a radical impact on the world of computing. Yet we are still at the beginning of a long and exciting journey.
 

Introduction

 Writing about "the state of XML" is an ambitious undertaking, and one in which I am almost doomed to fail. Happily, the reason for this is the widespread success and adoption of XML, such that you can find it almost anywhere you care to look right now. As well as the obvious data exchange and publishing applications, new uses of XML, from Internet messaging clients to GUI design programs, are popping up everywhere.
 I'm delighted to be able to say I'm also utterly overwhelmed by some of the new inventions and techniques in the XML world. At the XTech 2000 conference in San Jose earlier this year, the quality and quantity of innovation was outstanding. Each day of presentations was at the same time completely fascinating and completely exhausting!
 As editor of XML.com, I'm all too aware of increasingly widespread nature of XML applications. Receiving press releases on subjects ranging over anything from healthcare to legal matters, I feel hard-pushed to be as versatile as XML is! Tim Bray is fond of the aphorism that "XML is the new ASCII". Soon XML will be everywhere. Bray foresees the doom of conferences like XML Europe, where people gather to talk about new ways of handling XML. Well, yes, and no. Later in this paper I'll observe that some areas of XML, once solved, do indeed become dull. It is my contention, however, that with XML lies the long-term future of the Web itself: that's a topic that won't run out of steam!
 In this paper I will attempt to provide a "long view" on XML, taking in where XML is now, and where it's going.
 

XML standards

 It is most appropriate to begin any review of XML with a look at standards. XML and web standards are inextricably linked. Interoperability of data - XML's core strength - requires that we all agree and implement certain things. Vendors have a responsibility to their users to implement standards, and users have a responsibility to demand such implementations.
 W3C 
 
Yet the standards bodies themselves also have a large responsibility to address their activities to the right area: to solve the correct problems, and to solve them in a way that vendors and programmers can readily adopt. TheW3C introduced last year the phase of "Candidate Recommendation" into its standards development process. This means that there is a mandatory period of implementation time, where feedback is solicited from developers implementing a standard. A move which was long overdue, but should ensure that fewer retrofits are needed, and it is some insurance against a standard withering and dying for want of tools.
 

XPath and XSLT

 XSLT has undeniably been the W3C's success story of the last year. Reinforcing the need for the Candidate Recommendation phase, XSLT's success rides partly on the back of several quality XSLT processor implementations from James Clark, Michael Kay and Lotus. These garnered both grassroots support for XSLT, and feedback as the standard progressed. It is interesting to note that XSLT processors were developed more or less in parallel with the XSLT Working Draft - whether it would have progressed in the same fashion had implementation been done when the specification was less malleable, as with Candidate Recommendation phase, is uncertain.
 One must also applaud Microsoft for their support of XSLT. Although it is widely regarded as unfortunate that IE5.0's XSLT implementation is somewhat non-standard, by providing a tool for getting instant utility out of XSLT, Microsoft made a definite contribution to the adoption of the standard.
 As a technology XSLT has shown itself useful for far more than just straight transformations. Two inventions in particular have caught my attention. One of these is Rick Jelliffe's Schematron
 See http://www.ascc.net/xml/resource/schematron/schematron.html ; also Simon St.Laurent's interview with Rick Jelliffe at http://xmlhack.com/read.php?item=121 .
. The Schematron is an XSLT-based tool for validating and constraining XML, and is an alternative to using DTDs or XML Schemas. In Jelliffe's words, "Schematron rejects the idea that the result of validation is a binary valid/invalid... Schematron puts natural language descriptions on an equal footing to machine-usable expressions. Diagnosis is just as important as prescription". Utilizing the power of XPath, Schematron allows users to write constraints like "if there are 3 'foo' elements, there must be at least 2 'bar' elements. It allows usage recommendations to be built into the schema, rather than simple valid/invalid constraints.
 The second innovative use of XSLT that caught my eye was a project at Sun, presented at XTech 2000 earlier this year by Jacek Ambroziak. It uses XSLT to drive the indexing process for documentation. The stylesheets perform such functions as selecting elements to index or ignore, assigning tokenizers to process text content of elements and computing metadata to be stored in the index.
 

XML Schemas

 The road forward has been less than easy for the XML Schemas specification. Its completion has been eagerly awaited by many XML developers. Accompanying the clamour was the concern that those who had proceeded with their own schema technologies (Microsoft, Commerce One, etc.) needed to adopt the new W3C XML Schemas. Unfortunately, the need to please everyone, and the sheer complexity of the task, has delayed the delivery of the specification. The spec itself has not been without its detractors, having been described by some as "monstrously complex", and criticized by others as omitting required features. Whether the schema Working Group has effectively reached the "80/20" point with the specification, as they claim, will doubtless be discovered during the Candidate Recommendation phase.
 More than any other XML technology so far, XML Schemas will depend on the volume of available tool support for their adoption. Authoring the various constraints in a schema is not a trivial task. Other schema initiatives, such as the above-mentioned Schematron, and Murata Makoto's "RELAX" schema language, may well gain ground with those who feel no need to utilize (or memorize) the full depth of XML Schemas. Nevertheless, the XML Schema specification is an important one for establishing the "contracts" used in machine-to-machine XML communication: it delivers the necessary tools to those using strongly typed languages and wishing to use XML for data exchange.
 

SVG and XHTML

 As the Web has grown older, the rate of progress in user-interface and presentation technology has gradually slowed further. One can attribute this to various reasons: the increasing spread of the Web to non-technical folk who are reluctant to upgrade browsers, and the cooling of the "browser wars" as Internet Explorer gained dominance. Things reached a point where people wanted to concentrate on selling things over the Web rather than furthering the browser technology itself. Happily, things are set to hot up again on that front, with the advent of the Mozilla browser.
 Two new technologies this year are set to change the face of web browsing. The first of these is SVG, the Structured Vector Graphics standard. Widely acclaimed as one of the W3C's success stories, SVG already has multiple implementations in the form of browser plug-ins. Its integration with the DOM and Javascript means that it is in a position to radically improve the toolset available for web designers to convey information. One note of warning: what will inevitably happen is that people will script user interface elements (e.g. buttons, maybe even windows) with SVG to make up for the continuingly woeful lack of UI elements in HTML. Such a move would be unfortunate, and to the long-term disadvantage of both the user and developer. There is a W3C initiative underway, called XForms, to enhance the UI facilities of browsers, but this looks like a long way off. Perhaps the best bet for now is XUL, the XML-based user interface language built into the Mozilla browser.
 XHTML, the reformulation of HTML 4 into XML, paves the way for more XML on the web. While not an especially large step forward from the user's point of view, bringing XHTML support into Web browsers sets the scene for the transfer of web markup from HTML to XML. The formulation in XML also enables XML document authors to import XHTML semantics by using namespaces, rather than continually reinventing the <p> tag, for instance.
 

XLink and RDF

 XLink is the final missing piece for using XML on the Web, supplying as it does the componentry to turn XML documents into hypertext. Within its relatively simple specification lies a whole host of questions. From the issues of implementation through to the intellectual property and legal ramifications. XLink could make Napster look like child's play.
 Within RDF, frequently maligned and misunderstood, lurks some of the most exciting possibilities for the Web. Ubiquitous machine-readable metadata all over the web presents great possibilities. There is not room here to present a detailed vision for RDF. I will say, however, that for RDF as well as XLink tool support is probably the critical factor in its long-term success. Most likely that tool support needs to come from the web browser, or client software that is in as regular use (e.g. "Outlook" or its equivalents). RDF is one of those technologies that sets the mind racing with ideas of exciting implementations. However, for it to be reality, somebody needs to write the "killer" application for it.
 

OASIS and vertical consortia

 OASIS's flagship effort is undoubtedly ebXML. With ebXML, OASIS is working with the United Nations to produce standard vocabularies and mechanisms for the establishment and conduct of electronic commerce with XML. Involvement here is from the big vendors such as IBM and Sun. Microsoft has pledged support for ebXML, but meanwhile continues to implement its own BizTalk technologies. One suspects that the constituency for BizTalk is likely to be larger than that for ebXML and, at the very least, tomorrow's eBusiness servers must be compatible with both initiatives.
 This year will be a testing time for OASIS in its relationship with the grassroots XML community. Having introduced personal memberships, it provides a forum for support for more grassroots initiatives, as well as those that are outside of the scope of the W3C. OASIS's remit is as a "standards organization", that is, they will aid standardization of existing technology, rather than as an institution like the W3C, which also invents the technology they standardize. It is by no means certain at the time of writing that OASIS has won the confidence of the grassroots community: suggestions that OASIS should be the guardian for SAX met a decidedly mixed response earlier in the year. As far as OASIS and the XML community are concerned, one might ask whether there is any appeal for the small developer in unpaid positions on committees. Despite this, Jon Bosak, the "Father of XML", and several other respected XML developers, believe strongly in OASIS participation, a significant enough reason for most to consider OASIS. The contribution of OASIS in hosting the XML-DEV mailing list shows their commitment to the community.
 The number of vertical consortia focusing on implementing XML in specific markets has exploded over the last year. Yet it is by no means clear that members of the same vertical industry are even aware of these efforts, despite several notable successes. Just because standards work is going on in a particular area, it does not mean that it will be implemented. Many companies will just plough straight ahead with their own requirements. For a lot of companies, where 100% integration isn't business critical, this is probably the most pragmatic route to take. Committees and standardization can take many months, frequently years, and business cannot wait that long. The beauty of XML is that it allows companies to do that, and for translation to interchange standards at a later date. The core constraint is that a company must ensure that it still retains all the information items which might be needed at a later date. That problem is nothing new to XML, however.
 XML has a tendency to place a magnifying glass on whatever you're doing. Bringing your data out into a readily readable format that encourages structure will highlight the weaknesses of your information infrastructure. Perhaps the biggest challenge of XML usage inside an organization is the focus it brings to the way you gather, structure and store your information. The problems there need solving before you even touch XML.
 

General comments on standards

 XML is still too young for us to draw general conclusions about which standards are effective and which aren't. It has been said often that XML itself came in "Fast and low and under the radar", rather than being conceived by committee. Groups working on XML technologies through the W3C work via committees, and probably face more procedural challenges than the original XML 1.0 Working Group. Happily, we've not seen any XML technology go down in flames yet (or, more likely, die quietly) but it is not inconceivable that some future recommendations may go by the wayside.
 Currently, the W3C's problem seems to be the reverse. There is demand for standardization on subjects such as XML Protocols and XML Packaging that is not being met. It was remarked earlier in the year that although many businesses are embracing and building upon XML, the core number of developers at the heart of XML doesn't seem to have increased. There are some companies that need to pay back to XML by contributing resource towards W3C activities.
 One factor having a major bearing on the success of standards is tool support. The most finely honed and thoroughly agreed business interchange vocabulary doesn't mean a thing if there's no software to use it with.
 

XML tools

 The last year has seen a lot of growth in commercial XML support. It is an interesting time, as we're only just starting to figure out what we want to do with XML. One relatively stable product category is that of SGML repositories and content management systems that have adapted to XML. Perhaps the most interesting change in the industry so far is that effected by eXcelon Corp, formerly Object Design, who have migrated their whole business from object databases to XML storage.
 However, as most of us still haven't figured out what kind of XML-based products we require, many so-called "XML products" are simply relaunches of existing products with simple XML compatibility. While this is good news, it's not a sign of much progress: import/export facilities are relatively simple features. XML has an accompanying market hype which causes companies to get their XML product and press releases out fast, despite the fact that often, to quote Elliotte Rusty Harold, "there's no there there"! I'm heartily sick of 95% of the press releases I receive, containing unremarkable news about a product that they don't even define to the point where I can tell what it does.
 XML is delivering today for the marketers. We are at the beginning of a journey in terms of seeing commercial XML products deliver value for systems developers. There are many wrinkles to iron out, and many production situations to test. Interoperability between tools, despite being a key promise of XML, is not yet a reality. At the moment, even though you use XML, you still have to go for a "platform" decision.
 

XML databases and application servers

 Support for XML in relational databases is advancing in leaps and bounds, with Oracle in particular making an excellent contribution in this area.
 There is also an emerging class of "XML servers". In the general case, I am unsure as to whether this kind of product is a good idea or not. Clearly, if you are storing documents then an XML-aware repository, it makes sense. I detect, however, a missionary enthusiasm in some to use XML everywhere. What's wrong with that? Well, I applaud XML everywhere outside your application, but not necessarily everywhere inside your application. It may make a lot more sense to keep your lean, business-specific, data structures and databases, and use XML purely for externalization and interchange, than to completely replace your structures with DOMs and your databases with XML servers. Several developers have complained to me of this kind of XML-misuse, and have had to forcibly remove XML from the internals of some projects.
 A sure win for XML though is the growing breed of Enterprise Integration applications, whose main purpose in life is a glorified import/export filter, bring disparate data together. Both in the Intranet and e-business exchange situation, this is an application of XML that delivers value today.
 The farthest I will commit myself on commercial XML support, and XML servers in particular, would be to say that the game is hardly played out yet. There is plenty of time to see what will sink and what will swim. My advice is to carefully consider changes to infrastructure and big platform decisions. XML-awareness alone, as we know, is no silver bullet.
 

XML browsers

 By the end of this year, we can look forward to widely-available cross-platform XML browsing support. This is an important step forward to a web full of XML.
 The open-source Mozilla browser, although distinctly laggardly in its development, is finally coming to fruition. It is notable for its excellent support of W3C standards: so much so in fact, that you can use W3C Recommendations as developer documentation.
 Microsoft's Internet Explorer, although making huge leaps forward in XML support last year, is currently lagging a little in its support for W3C standards. The exception here is IE5 for the Macintosh, whose attention to detail in implementing CSS is commendable. It is clear, though, that Microsoft's core commitment is to its customers. Clearly that isn't the same as commitment to an open platform for web authoring. For this reason, it is vital that diversity in web browsers continues.
 Opera 4.0 now contains XML and CSS support, a welcome move that means all mainstream web browsers now support direct display of XML.
 For all the browsers though, display isn't enough, implementation of the XML DOM is what counts if we are to get true browser-based applications. On this score Mozilla alone currently delivers.
 There still remain many open questions about interoperability: whether it's possible to create cross-platform web applications using W3C technology that will run identically on every web browser. From XML's perspective, this support is a necessary precondition to filling the Web with XML - however exciting the prospect of a "Semantic Web" is, it won't gain broad support until tools are available for the widespread user creation and manipulation of XML.
 

XML community

 One of XML's crowning glories is its community. From the beginning, XML has been supported by a lot of generous spirited, intelligent developers. In the same way that today's Internet companies stand on the shoulders of free software giants, who wrote programs such as "BIND" and "sendmail", companies making money out of XML benefit from the great work achieved by such people as James Clark and Tim Bray.
 Today the XML community is growing and diversifying. We're seeing more work done in application and language specific areas. In particular, the Python and Perl XML communities are strong. New ideas about XML processing are coming out of these groups and into the general arena of interest.
 

Conclusions

 One of the most remarkable things about XML has been its social effects. It has sparked new co-operation within industries, and changes in the way information is distributed on the Web. Those effects are worthy of a paper in themselves.
 Although many feel they've been working on XML for a long time, practically speaking we're still at the beginning of the story. Tim Berners-Lee, in his book "Weaving the Web", suggested that Amazon, AOL and others are the merely the background for the Web: the Web will move on, fuelled by these "incidental economies". In a similar way it is my view that the B2B world and its associated alphabet soup forms the next generation background for the Web and XML, but the technology will move on. Once you can exchange business info with XML (and it isn't that hard), then XML ceases to become interesting in that sphere: much in line with Tim Bray's XML/ASCII comparison. The ultimate future is rooted more deeply in technologies such as XLink and RDF.
 We are only just beginning to imagine the possibilities of a Web full of XML. Today's business models won't work for that future. There's got to be a lot of invention and hard work yet.

The promised land of XML   Table of contents   Indexes   Executive briefings