| Implementing the Proper Standard | Table of contents | Indexes | Most Frequently Asked Business Questions About XML | |||
XML: What HTML Wanted to Be! |
Ann Arbor ![]() Arbortext, Inc. ![]() Haakonstad, Norma ![]() Michigan ![]() | Norma
Haakonstad
National Accounts Manager, Arbortext, Inc.
Biographical notice As National Accounts Manager for ArborText, Ms. Norma Haaakonstad works closely with many of the nation's largest publishing, automotive, heavy equipment, telecommunications, and pharmaceutical companies implementing enterprise SGML/XML applications. Before she became National Accounts Manager, Ms. Haakonstad served as ArborText's Midwest Regional Sales Manager for five years. Prior to joining ArborText, Ms. Haakonstad was one of the owners of Integrated Engineering Software Inc., where she ran marketing and sales operations. Before that, she was Sales Manager for Electrocon International, Inc. a company that develops software for electric utility applications. Ms. Haakonstad holds a degree in Business Management. |
Introduction |
| There's no doubt that there's huge hype behind the XML frenzy, but there's also a lot of substance. As the number of vendors pledging support for XML has climbed from a few to a few dozen to a hundred or more, it's clear that XML is rocketing into the mainstream. |
| The reason that XML is important -- in fact, the reason that it's crucial for you to gain an early grasp of XML's implications for your organization -- is because it's now crystal-clear that XML will be the next-generation language of the Web. With HTML, we saw explosive growth of the Web even though its primary business use was merely advertising and public relations. But with XML, businesses can finally realize the full potential of the Web -- by putting the Web to work with high value added information on enterprise-critical applications. These applications will bring meaningful competitive differentiation and high ROI to those who can move quickly to exploit them. |
| The purpose of this paper is to let you know the current state of XML, its exciting future, what is realistic to expect from it, and why it's important to you. |
| Most of you know that ArborText has been involved with standards since day one. As a leading vendor of SGML-based software for authoring, managing, and delivering structured documents, we have a lot of experience in the strengths and weaknesses of SGML. We are contributing our experience to the entire set of XML-related standards, all of which are under the auspices of the World Wide Web Consortium, or W3C. Eleven companies including ArborText were part of the original XML Working Group that formed in late 1996. As a result of the efforts of the working group, the standard was formally adopted in February, 1998. |
| The Extensible Style Language, or XSL, is the XML way to attach style to XML content. ArborText, Inso, and Microsoft jointly proposed an XSL specification to the W3C to kick off the XSL effort, and Paul Grosso of ArborText is our representative on the XSL Working Group. But XSL is about more than just describing how an element is formatted on the screen or in print -- it's about attaching any kind of behavior to an element, not just formatting. |
| XLL or Extensible Link Language, will bring additional capabilities to today's HTML linking. The XLL specification goes beyond traditional linking to allow you to attach links to documents even when you don't control (and therefore can't change) those documents. |
| The DOM Working Group is developing the Document Object Model, a standardized API for accessing and manipulating HTML and XML elements. By standardizing the API, you'll be able to write software that's reusable across a variety of different tools. |
| The last on the list is XML-Data, currently at the proposal stage. XML-Data was jointly developed by ArborText, DataChannel, Inso, and Microsoft and has been acknowledged by the W3C. The purpose of XML-Data is to create a "schema" to specify not only the validity and relationships of XML elements, but also the content of those elements. What does this mean for us? It allows us to go beyond what the DTD will provide. It allows us to validate the data without supplementary and proprietary software routines. |
XML ![]() | What is XML? |
| XML stands for "Extensible Markup Language" -- it's extensible because it is not a fixed set of elements like HTML. XML was originally developed by SGML people to enable delivery of SGML documents over the Web. |
| But during the process of defining XML, the vision of that group expanded to include other ways to apply XML, including basing general data formats on XML, and using XML as the data encoding scheme for metadata and transaction data. |
| Momentum behind XML has grown to a frenzy, so much so that it's now certain that XML is going to be broadly supported. Companies already supporting XML in their products include ArborText, Chrystal, DataChannel, Grif, Inso, Microsoft, and WebMethods. Far more companies have pledged support in the future, including Adobe, IBM, and Netscape. And dozens of companies are cooperating to develop XML- based standards for metadata and transactions, not only IT companies but also banks and credit card companies. |
| We have been thinking of XML as SGML minus minus instead of HTML plus plus. Why? Because XML offers 95% of the capabilities of SGML while it's vastly more powerful than HTML. |
| But now we're starting to see that XML is on the way to becoming SGML plus plus -- it's got the power of SGML and the simplifications needed to address a mainstream market. Also, there are emerging standards such as XSL stylesheets, XLL, and XML-Data promise to deliver even more power and functionality than we would ever have seen from SGML. And we can begin to work with data beyond traditional document applications. |
Uses of XML |
| When we look at XML to decide where it will be used and who will be using it, we'll be looking beyond comparing it to other document content formats. We'll be comparing XML to other data formats, other metadata formats, and other transaction formats as well. |
| Consider the available formats for document content. Existing formats include SGML, the international standard, HTML, the way almost all Web documents are formatted, and a variety of different proprietary word processing and desktop publishing formats. |
| Expanding our view to data formats were see that there are almost as many data formats as there are applications for those formats. There's the result of a database query, the contents of a configuration or initialization file, a graphics image, a few seconds of sound, a video, and on and on. |
| Another area to consider is metadata. We think of metadata as information about documents as opposed to the contents of the documents themselves. For example, metadata might include author, date of original creation, date of last revision, permissions to read and change, and so on. With XML, there will be a standardized method of adding metadata to documents. |
| Finally, we have broadened the discussion further to include transactions such as electronic transfers of funds, purchase orders, inventory checks, and other forms of electronic transfers. Today, in the world of EDI, there are a huge number of overlapping and incompatible formats. |
SGML ![]() | XML vs SGML |
| While XML is substantially based on SGML, it improves on SGML in several crucial ways: |
DTD, Document Type Definition ![]() | DTDs in XML |
| DTD, which stands for "Document Type Definition," establishes all the rules and relationships for a particular document. For those of you who don't know SGML but are familiar with HTML, you may find it helpful to learn that HTML is defined by a DTD. The DTD for HTML defines which elements you can use on a Web page and how you can use them. |
| DTDs are extremely valuable because the consistency they enforce on the creation side supports automatic processing on the assembly and delivery side. Ever wonder why you couldn't just add your own tags to HTML? Because if you did, the HTML applications downstream from you wouldn't know how to handle them. |
| XML does not require a DTD for processing. That means, for example, that a Web browser can process and display XML data without requiring its DTD as well. The primary benefit of eliminating the DTD is to simplify the design of processing applications because they don't have to be capable of interpreting a DTD. |
| Eliminating the DTD is possible thanks to minor data format changes that provide embedded cues within XML data that SGML only provides through a DTD. However, eliminating the requirement to send a DTD along with its associated data does not mean that "anything goes" when creating the data. To obtain all of the benefits you traditionally associate with SGML -- reuse, interchange, and automation -- you'll still want to use a DTD when authoring XML in order to ensure the absolute data consistency you need to achieve those benefits. |
| There are a couple of good reasons that you'll want the flexibility to create new elements on the fly: rapid prototyping and "personal" applications of XML. That's why Arbortext intends to support this use of XML in upcoming releases of its software. |
| The effort to make XML simpler than SGML focused on the capabilities of the DTD. The result of that effort was to omit capabilities from SGML DTDs. The list of capabilities dropped from SGML is quite long, but we've developed workarounds for most of these. However, there are a couple of capabilities that were omitted from XML that you might notice, especially if you are working in one of the industries that use standard interchange DTD such as those developed for the aerospace and automotive industries. There are two capabilities I'd like to touch on: |
| We see two potential solutions to the missing features. First, it's possible that later revisions of XML will address these issues. You all know that lots of changes occur as both software and standards are revised through 1.1, 1.2, 2.0, and so on. We know that XML 1.1 is coming -- we just don't know when it's coming or exactly what it will support. |
| Another potential solution, and the one that we think is far more likely, is XML-Data. XML-Data was designed by ArborText, DataChannel, Inso, and Microsoft to replace and improve upon DTDs. These four companies jointly submitted their design to the W3C as a proposal for a formal specification. The W3C has not yet formally launched any activity related to XML-Data. Even so, there's a lot of interest in it. |
| XML-Data prescribes the format of "schemas" for XML documents and data. Schemas are commonly used in database applications to specify the valid content of various fields and to indicate the relationships among fields and records. An XML-Data schema describes the rules for creating valid XML data for a specific application. XML-Data schemas includes three key improvements over DTDs: |
XSL is more than style |
| XSL is about more than style, it's about defining any behavior of an element. In other words, XSL lets you define what you want to do with that element. What do you want to do with the element? Do a database query, bring up a dialog in Internet Explorer? |
| XSL allows us to reorder text, suppress the display of text, and automatically generate calculated text. |
| While XSL is fully compatible with CSS, the Cascading Style Sheet format for HTML documents supported by the Microsoft and Netscape browsers, it has many improvements over CSS. XSL has the capability to examine all ancestors, descendants, and siblings in order to establish context. CSS is limited to setting style based only on an element's immediate parent. |
| XSL's primary additional capabilities include: |
| We expect to see Version 1.0 of the specification to be released this year, although there is no formal timetable that's been published by the W3C. |
| XSL also forms the basis of a transformation language, so we expect to see applications emerge that convert information from one DTD to another. |
XLL - Linking |
| The XML linking specification, XLL is being designed to improve on HTML's existing URL linking while remaining compatible. XLL provides additional functionality that will make the Web easier to use and more functional. |
| The primary capabilities that XLL will provide are bidirectional, conditional, and indirect linking. |
| External links allow you to create links to and from a read-only document. Today, of course, you must be able to change a document in order to add links to it. |
XML,SGML, and HTML |
| The diagram in Figure 1 helps explain where HTML, XML, and SGML fit into the traditional world of documents. There are two continuums, one representing the complexity of the structure and the second representing the complexity of the data. The information production products you develop will fall somewhere within the four quadrants illustrated. |
|
| A novel is an example of a publication that has a very simple structure. The body of a novel generally only contains chapters and paragraphs. A phone book also has simple data, but it is highly structured. You have one SECT that contains last name followed by first name or initial followed by address followed by the phone number of residential customers and another SECT that contains information on commercial customers, etc. Structure helps you find the information you are seeking more quickly. A newspaper contains complex data, but like a novel, has minimal formal structure. In the upper right corner -- representing a document that is highly structured and contains very complex data -- is airline documentation (ATA2100), automotive documentation (J2008) and computer hardware and software documentation (DocBook). These documents contain warnings, cautions, assembly procedures, disassembly procedures, bill of material listings, part numbers, and so on. Consistent structure and reliable content are key factors for ensuring usability. |
| Some people like to argue that "SGML is for everything" or "XML is for everything" or "HTML is for everything". We don't believe that's true. The ellipses represent the areas where we see these markup schemes fitting the best. |
Using XML beyond documents |
| XML can be used in some applications that don't involve documents. |
| First, consider the use of XML to exchange data between applications. Let's look at what Microsoft is planning. They have announced that Office 98 will use XML to store Office-specific data within HTML documents so that those HTML documents will "round trip" from Office to HTML back to Office. When today's version of Office saves a file as HTML, information is lost that cannot be regained when the file is loaded back into the Office application. But with XML elements preserving the Office-specific data, nothing will be lost on conversion to HTML. |
| Then there's the use of XML for metadata, which can provide a standard way of categorizing, locating, and indexing files regardless of their content. For example, you could attach XML metadata to a Word document without having to convert the entire document to XML. |
| Several efforts are either finished or well under way to establish XML-based metadata formats. For example, RDF is a W3C proposal to establish a standardized method of applying metadata to many types of content, which can be stored as any file type. RDF is expected to support applications such as indexing Internet or intranet sites, build site maps for navigation, or contain fields for content ratings and push channel definitions. |
| Channel Definition Format was designed specifically for push applications and has already been deployed. ICE is a standard set of XML elements that contain metadata information to support secure and reliable exchanges of content and transactions among independent websites. The objective behind ICE is to enable several companies to get together on the Web and create superstores of products or content. |
| XML provides the crucial enabling technology to support the long-predicted explosion of Web-based electronic commerce. Examples include Open Financial Exchange (OFX) from Microsoft, which was designed for consumer financial transactions on the Web, and Open Trading Protocol (OTP), for purchases and sales over the Web. The OTP Consortium is quite large and includes leading companies such as AT&T, Hewlett-Packard, MasterCard, Hitachi, Royal Bank of Canada, CyberCash, Fujitsu, IBM, Netscape, Nokia, Oracle, Sun Microsystems and Wells Fargo. |
| OSD ("Open Software Definition") describes the delivery of software applications over the Internet. It allows Web developers to create application "channels" by defining versions, underlying structure, dependencies, relationships to other components, etc. |
| CommerceNet is a non-profit consortium of companies seeking to use the Web for a broad array of business-to-business e-commerce applications. CNgroup is the R&D affiliate of CommerceNet that will also provide expertise in the use of XML for related technologies such as electronic catalogs. |
Is XML Easier? |
| One of the questions we're hearing a lot these days is "Isn't XML easier than SGML? Because if it is, why wouldn't I use XML and forget about SGML?" Lets review both of those questions. |
| XML is certainly easier than SGML to deliver over the Web. Since Microsoft already supports XML and Netscape has promised support later this year, you already have tools that can almost effortlessly deliver rich structure and content to the desktop. Delivering SGML data over the Web today relies on tools that are expensive and outside the mainstream. |
| We expect that XSL, the stylesheet standard for XML, will make it possible to exchange stylesheets between applications. Eventually, you'll be able to build one stylesheet and use it across multiple tools for multiple deliveries. |
| Maybe XML is easier than SGML for building tools. Certainly, some tools that support XML will be easier to build. If you're a software developer and you want to use XML as a data interchange format, you'll be able to find a freely available parser that will examine an XML data stream. Now, since you can get a freely available SGML parser just as easily as an XML parser, you may wonder why it really matters. And the answer is that for any application where a freeware parser is sufficient, the only real difference is code size and speed. An SGML parser is a lot bigger and a little slower. But many application developers, especially those who are working on non-document applications, prefer to write their own parser. And that's way too big a job with SGML. |
| XML might be easier than SGML for "personal" use, where the rigors of analysis aren't required. For example, small websites might easily be developed and maintained in XML without ever creating a DTD. |
| But let's look at one way that XML is not easier than SGML: most of you are aiming to build a database of modular document components that you can easily reuse, interchange, and automate. For those kind of applications, you'll still need to perform all of the up-front requirements analysis as well as the rigid enforcement of rules to ensure an absolutely consistent data format. In other words, if you decide that you want not just well-formed but also "well-structured" (or "valid") XML, you must still make the investment to achieve it. You must still make sure that you have valid information. |
XML documents are data |
| One of the most fascinating opportunities for using XML on the Web are hybrid applications that cross the boundaries of documents and data. These applications will bring static Web documents to life in a way that existing technologies do not. |
| For example, consider being able to develop an interactive parts catalog that lets users select parts from a picture, sort the parts based on type or location, check the inventory levels of selected parts, and enter purchase orders for parts, all from the same interface. |
| Another example is an electronic service manual that knows what step you're on, can automatically branch to different steps based on your answer to a question, allows you to enter questions or corrections about each step, and records the time you spent on each step. |
| An interesting side effect of these hybrid applications is that those responsible for developing interactive content will find themselves in the application development business and not just in the content development business. |
XML today |
| XML is clearly on the way to becoming the mainstream technology of the Web for a broad array of applications. |
| Today, XML is approved and we're still waiting for the others to reach that same level. Nonetheless, you can move forward with XML right now and transition later to the additional capabilities that the remaining standards will provide. In the meantime, tools are already available to help you take advantage of the enormous potential of XML today. |
| Implementing the Proper Standard | Table of contents | Indexes | Most Frequently Asked Business Questions About XML | |||