XML: What HTML Wanted to Be!   Table of contents   Indexes   XML for legislation drafting, management and Web delivery&mdash,How structured document representation facilitates automatic processing

 

Most Frequently Asked Business Questions About XML

 AIS 
 El Andaloussi, Jeanne 
 France 
 Paris 
 
Jeanne  El Andaloussi
Director of Operations,  AIS 
 17 rue Remy Dumoncel
75014  Paris  (France)
Email: jela@ais.berger-levrault.fr

Biographical notice

Jeanne El Andaloussi is director of operations, training, and distribution at AIS S.A., an SGML/XML systems integrator. She has long experience in SGML training and corporate documentation standards, tools, and methodologies, and in recent years has conducted and overseen several European projects for general and specialized publication systems.

Ms El Andaloussi is a co-author of "Developing SGML DTDs: From Text to Model to Markup", published by Prentice Hall PTR. She has presented and published her DTD development methodology and the results of its use in a number of SGML and publishing industry forums.

 

Introduction

 With the sudden advent of XML in 1997 (because many did not take it seriously in 1996) and the relative lack of information about it except technical, a number of current or prospective users have found their plans thrown into doubt. They now have serious questions to ask to the "big names" in the SGML community, software providers and the integrators who promised stability, longevity, flexibility and above all peace of mind. This talk will provide pragmatic answers to questions that the presenter has collected from several user/integrators/vendor encounters.
Browsers
Content management tools
Editors
 Parsers 
Transformation tools
 

The Questions

 We have collected these questions from discussions with customers and colleagues who are current or prospective users of SGML and who are curious, or even worried, about the arrival of XML on the market.
 1. What is XML?
 It is a recommendation from the World Wide Web Consortium, which was issued on the 10th of February 1998. It stands for the Extensible Markup Language. It is a language designed to deliver structured information over the Web.
 2. Is it a new version of HTML?
 It has been designed to replace HTML for deploying large and complex applications, not only on the Web, but also on intranets. HTML will keep on evolving but will always be presentational. XML will allow the distribution of semantic data over the Web and intranets.
 3. Is it a real standard?
 No. It is not an ISO standard, but since the World Wide Web Consortium and the leading software companies endorse it, it is a de facto standard. The XML Working Group intends for Version 1.0 of the XML recommendation to be stable for a long time, and intends for future version to be upward compatible to the extent possible.
 Note that XSL, XLink, and XPointer have not reached the same level of standardization, so in business terms one should be cautious about contemplating using them in the short term.
 4. In what way is XML different from SGML?
 XML is essentially a subset of SGML. XML and SGML are strongly similar in practice, since the features which were dropped from SGML, are the ones that were almost never used. Thus, it can be said that XML is simpler than SGML. For example:
 
  • When XML documents are delivered over the Web, they don't need to be accompanied by or validated against a DTD; they need only to meet standards for "well formedness."
  •  
  • XML DTDs have slightly less validation power than full SGML DTDs. For example, they can't have content model shortcuts where elements are required but in any order, and they can't specify "exceptions."
  •  
  • In SGML, special characters are typically generated by means of "system-specific" character entities. XML doesn't allow this kind of entity; instead, it makes use of the large Unicode character set.
  •  
  • In XML documents, empty elements and "processing instructions" have a slightly different syntax.
  •  5. Why was it created in the first place?
     Because HTML was too limited to publish sophisticated semantic data on the web and because SGML was too complicated to achieve this task.
     HTML is too limited for a number of reasons. First, the document structure it expresses is too limited. For example, it offers six levels of SECTion structure, but offers nothing above the "page" level (such as chapters). In addition, its content structure is largely presentation oriented, rather than semantically oriented. For example, it offers tagging for bold and italic information and allows the setting of font characteristics, but does not allow the tagging of "procedures" or "articles of law." This limits interchange, reuse, automation, and clever searching capabilities. Finally, compared to SGML, the HTML standard must be changed by a standards body before you can properly deploy any new tagging features.
     SGML is limited for other reasons. First, there are no mainstream and free SGML Web browsers available. Also, no standardized support for document presentation is available, despite the Output Specification (FOSI) and Document Style Semantics and Specification Language (DSSSL) efforts, which have received little vendor support. Finally, the full SGML language with all its intricacies has appeared, from the Web community's perspective, to be too complex and too expensive to implement.
     So XML was created to give data the semantic tagging power of SGML and the delivery facility of the Web, in other words the best of both worlds in terms of DELIVERY.
     What is it useful for?
     To date XML was not targeted for documentation creation but for document and data electronic delivery. As a result it is useful for:
     
  • Delivering on the Web large document sets that are rich in meaning
  •  
  • Allowing fast and easy interchange of structured non-textual data (such as EDI messages or E-commerce transactions) in a vendor-neutral way
  •  
  • Processing structured data without having to validate it
  •  
  • Publishing static legacy data with a minimum of effort and cost; there is no need to force it into an impossible DTD because XML well-formedness is enough
  •  7. Who is it for?
     According to the PAPERs and vendors at this conference, XML is for everybody. If we try to be more specific, XML will work best in its current state for:
     
  • Organizations which want to publish structured data cost-effectively on the Web
  •  
  • Organizations which want to interchange non-hierarchical data efficiently and cost-effectively
  •  As additional pieces of the XML puzzle are completed, the potential population who might benefit from XML will increase.
     8. Do I need to switch to XML? What does it imply?
     If your data is already in SGML and you want to publish it on the Web, you have two choices. You either convert all your data to HTML, in which case you will define the layout of that data but lose the semantic markup, or you can convert to XML. The work involved in either choice is more or less equivalent, but converting to XML lets you keep all the intelligent markup you currently have. The problem is that today, no browser supports XML out of the box. However, the leading browser companies are committed to it and their next versions will likely contain some generalized XML support. Your choice will actually depend on your need to access semantic markup for interaction with other applications on the desktop, for instance.
     If you do not want to publish on the Web, there is no rationale today for switching all your data to XML - yet.
     If your document data is not in SGML and you need to publish it on the Web a single time, without further revision cycles, DTDless XML will soon become your best choice. This is because it will be able to be delivered as easily as HTML, while allowing you to retain, cheaply and quickly, whatever intelligence your original data carried.
     What does "switching to XML" imply? The physical process of switching your document instances from SGML to XML is almost trivial. However, transforming an SGML DTD to an XML-compliant DTD can involve major changes to your markup model. See question #14 for more information on this topic.
     9. Are there reliable XML tools available on the market?
     Let's go through the types of tools one by one:
     
  • : There are no commercially available browsers today that, when given a URL corresponding to an XML file, display it with the appropriate formatting. Netscape's Mozilla browser appears to support this feature, but it is in extremely early release.
  •  
  • : XML, like SGML, relies on four major types of software components. An XML processor is the parser that validates an XML document initially and then hands it to the next application. There are several commercial-quality XML processors available; some are for validation and others only do basic parsing.
  •  
  • : Native XML document creation tools are available. The most robust ones come from the SGML world, and their support of XML tends to be thorough.
  •  
  • : All the tools in this category claim XML support, in general making as much use of the XML as they were making of SGML.
  •  
  • : The SGML transformation tools all support XML, and some even offer out-of-the-box support for transforming SGML instances to XML.
  •  10. I have SGML data, can I convert it to XML? Is it expensive?
     Yes, you can convert SGML data to XML. It can generally be accomplished at very little cost. The exception is if you used certain unusual features in your DTD that would require some rethinking of your entire model (for example, SUBDOC).
     11. In my company we have MS Word files. Should I convert them to SGML or to XML?
     It depends what you want to do with them. If you want to keep editing in Word, and if you plan to perform routine transformation aiming at electronic delivery, then DTDless XML is probably your best bet. If you want to increase the intelligence in your documents permanently while gaining validation and content management power, then you should be aiming at SGML for content creation and editing.
     12. With XML, do you need a DTD?
     No; the minimum requirement is for your documents to be well formed. However, all applications that perform interesting processing (including formatting) on structured documents need consistent and constrained tag sets. Therefore, even if you choose to deliver "topless" XML documents on the Web, you will almost certainly want to create your XML documents under the control of a DTD whose minimum role is to define the legal tags.
     13. What is an XML DTD?
     It is the same as an SGML DTD, except that its syntax is a bit simpler and, as a result, it is somewhat less able to constrain your documents' content. For example, if you want to ensure that a particular attribute value is a "number" (that is, contains only digit characters), you can achieve this in SGML but not in XML.
     In the future, it is expected that schemas will offer even more powerful constraints on documents than full-SGML DTDs do. For example, if you want an attribute value to contain a series of numbers and dashes corresponding to a "date" (such as 1998-05-21), you should be able to write a schema that requires the string to be "4 digits, a dash, two digits no more than 12, a dash, and two digits no more than 31."
     14. Do I need to change my existing SGML DTDs? If yes, how?
     No, you shouldn't need to redesign your SGML DTDs. The reason is that, since you can easily turn your SGML document instances into DTDless XML for delivery, when they're in the SGML stage you can keep using your current tools with your current DTD.
     You may find it useful to deliver your XML data along with an XML DTD, but that DTD can be a variant of your original SGML DTD, perhaps with features solely related to publishing.
     15. Up to a year ago, you swore SGML meant reliability and longevity of my company data. Now what does it mean with this new XML standard?
     I can still swear it and look you straight in the eye. The proof is that your SGML data can easily take advantage of the new XML data format that has come along in the last year. In other words, XML has allowed SGML to prove that it delivers on its promises.
     16. Are there things that I can do in XML that I can't do in SGML? And inversely?
     Today, the answer is no. In the future, there will be a standardized way to do more powerful links and to define more powerful data types (like the date example in question #13). These features were not available in SGML, but we will have to wait for the standards to be finished and for the vendor community to deliver the tools that make the standard real.
     There are things you can today in SGML that you can't in XML. However, other than the DTD validation power already discussed, most of the features exclusive to SGML are so obscure that they were rarely used and rarely implemented. In other words, you're unlikely to miss them.
     17. Is it less expensive than SGML? In production and in distribution?
     Most of the tasks involved in building an SGML-based production line are concerned with planning, analysis, and design. These tasks might include designing markup models, stylesheets, database structures, transformation mappings, creation interfaces, and so on. This is labor-intensive and will probably cost as much in an XML scenario as in SGML.
     What might end up being less expensive are the distribution and delivery tools (indexing tools, search engines, and so on), which, because they must conform to a simpler standard, will likely be less expensive to develop and thereby less expensive to buy, customize, and deploy.
     18. Is XML a Microsoft invention?
     No. It was created by a working group under the auspices of the World Wide Web Consortium. At the time, the group had about a dozen representatives from both the SGML and Web communities, and included both a Microsoft and a Netscape representative.
     19. How do XSL and XLink relate to XML?
     They relate on two levels. First, they will be additional standards in the XML family. XSL stands for Extensible Style Language, and it will provide a standardized way to define stylesheets linked to XML documents. XLL (now called XLink) stands for XML Linking Language, and it will provide a standardized way to expand the power of HTML links in XML documents.
     Second, XSL and XLink are both applications of XML. That is, an XSL stylesheet is an XML document itself, and XLink linking elements are true XML elements that reside in an XML document.
     20. Will my native SGML database take XML?
     If you talk to your vendor, you should expect the answer to be Yes, since what is involved in supporting XML is mainly a change in the parser component.
     21. Does XML mean SGML death?
     I don't think so. Obviously, SGML has needed for some time to become more user-friendly, which meant some additions along with drastic simplification. These improvements to SGML are finally being undertaken, in the guise of the WebSGML work in the ISO WG4 committee. It appears that this work was motivated by the breathtaking speed of Internet growth and the realization that XML provided the only scalable solution.
     Because WebSGML is a real standards effort that will result in an ISO standard, I strongly believe that it can be trusted in terms of quality and stability. I also strongly believe that the underlying concepts of SGML will keep being relevant to upstream document creation and management.
     The development of "new SGML" and future versions of XML will take some time, but I'm confident that the results will converge and be fully compatible with each other, bringing you the safety of ISO standardization.

    XML: What HTML Wanted to Be!   Table of contents   Indexes   XML for legislation drafting, management and Web delivery&mdash,How structured document representation facilitates automatic processing