| XML: What HTML Wanted to Be! | Table of contents | Indexes | XML for legislation drafting, management and Web delivery&mdash,How structured document representation facilitates automatic processing | |||
Most Frequently Asked Business Questions About XML |
AIS ![]() El Andaloussi, Jeanne ![]() France ![]() Paris ![]() | Jeanne
El Andaloussi
Director of Operations, AIS
Biographical notice Jeanne El Andaloussi is director of operations, training, and distribution at AIS S.A., an SGML/XML systems integrator. She has long experience in SGML training and corporate documentation standards, tools, and methodologies, and in recent years has conducted and overseen several European projects for general and specialized publication systems. Ms El Andaloussi is a co-author of "Developing SGML DTDs: From Text to Model to Markup", published by Prentice Hall PTR. She has presented and published her DTD development methodology and the results of its use in a number of SGML and publishing industry forums. |
Introduction |
| With the sudden advent of XML in 1997 (because many did not take it seriously in 1996) and the relative lack of information about it except technical, a number of current or prospective users have found their plans thrown into doubt. They now have serious questions to ask to the "big names" in the SGML community, software providers and the integrators who promised stability, longevity, flexibility and above all peace of mind. This talk will provide pragmatic answers to questions that the presenter has collected from several user/integrators/vendor encounters. |
| Browsers Content management tools Editors Parsers ![]() Transformation tools | The Questions |
| We have collected these questions from discussions with customers and colleagues who are current or prospective users of SGML and who are curious, or even worried, about the arrival of XML on the market. |
| 1. What is XML? |
| It is a recommendation from the World Wide Web Consortium, which was issued on the 10th of February 1998. It stands for the Extensible Markup Language. It is a language designed to deliver structured information over the Web. |
| 2. Is it a new version of HTML? |
| It has been designed to replace HTML for deploying large and complex applications, not only on the Web, but also on intranets. HTML will keep on evolving but will always be presentational. XML will allow the distribution of semantic data over the Web and intranets. |
| 3. Is it a real standard? |
| No. It is not an ISO standard, but since the World Wide Web Consortium and the leading software companies endorse it, it is a de facto standard. The XML Working Group intends for Version 1.0 of the XML recommendation to be stable for a long time, and intends for future version to be upward compatible to the extent possible. |
| Note that XSL, XLink, and XPointer have not reached the same level of standardization, so in business terms one should be cautious about contemplating using them in the short term. |
| 4. In what way is XML different from SGML? |
| XML is essentially a subset of SGML. XML and SGML are strongly similar in practice, since the features which were dropped from SGML, are the ones that were almost never used. Thus, it can be said that XML is simpler than SGML. For example: |
| 5. Why was it created in the first place? |
| Because HTML was too limited to publish sophisticated semantic data on the web and because SGML was too complicated to achieve this task. |
| HTML is too limited for a number of reasons. First, the document structure it expresses is too limited. For example, it offers six levels of SECTion structure, but offers nothing above the "page" level (such as chapters). In addition, its content structure is largely presentation oriented, rather than semantically oriented. For example, it offers tagging for bold and italic information and allows the setting of font characteristics, but does not allow the tagging of "procedures" or "articles of law." This limits interchange, reuse, automation, and clever searching capabilities. Finally, compared to SGML, the HTML standard must be changed by a standards body before you can properly deploy any new tagging features. |
| SGML is limited for other reasons. First, there are no mainstream and free SGML Web browsers available. Also, no standardized support for document presentation is available, despite the Output Specification (FOSI) and Document Style Semantics and Specification Language (DSSSL) efforts, which have received little vendor support. Finally, the full SGML language with all its intricacies has appeared, from the Web community's perspective, to be too complex and too expensive to implement. |
| So XML was created to give data the semantic tagging power of SGML and the delivery facility of the Web, in other words the best of both worlds in terms of DELIVERY. |
| What is it useful for? |
| To date XML was not targeted for documentation creation but for document and data electronic delivery. As a result it is useful for: |
| 7. Who is it for? |
| According to the PAPERs and vendors at this conference, XML is for everybody. If we try to be more specific, XML will work best in its current state for: |
| As additional pieces of the XML puzzle are completed, the potential population who might benefit from XML will increase. |
| 8. Do I need to switch to XML? What does it imply? |
| If your data is already in SGML and you want to publish it on the Web, you have two choices. You either convert all your data to HTML, in which case you will define the layout of that data but lose the semantic markup, or you can convert to XML. The work involved in either choice is more or less equivalent, but converting to XML lets you keep all the intelligent markup you currently have. The problem is that today, no browser supports XML out of the box. However, the leading browser companies are committed to it and their next versions will likely contain some generalized XML support. Your choice will actually depend on your need to access semantic markup for interaction with other applications on the desktop, for instance. |
| If you do not want to publish on the Web, there is no rationale today for switching all your data to XML - yet. |
| If your document data is not in SGML and you need to publish it on the Web a single time, without further revision cycles, DTDless XML will soon become your best choice. This is because it will be able to be delivered as easily as HTML, while allowing you to retain, cheaply and quickly, whatever intelligence your original data carried. |
| What does "switching to XML" imply? The physical process of switching your document instances from SGML to XML is almost trivial. However, transforming an SGML DTD to an XML-compliant DTD can involve major changes to your markup model. See question #14 for more information on this topic. |
| 9. Are there reliable XML tools available on the market? |
| Let's go through the types of tools one by one: |
| 10. I have SGML data, can I convert it to XML? Is it expensive? |
| Yes, you can convert SGML data to XML. It can generally be accomplished at very little cost. The exception is if you used certain unusual features in your DTD that would require some rethinking of your entire model (for example, SUBDOC). |
| 11. In my company we have MS Word files. Should I convert them to SGML or to XML? |
| It depends what you want to do with them. If you want to keep editing in Word, and if you plan to perform routine transformation aiming at electronic delivery, then DTDless XML is probably your best bet. If you want to increase the intelligence in your documents permanently while gaining validation and content management power, then you should be aiming at SGML for content creation and editing. |
| 12. With XML, do you need a DTD? |
| No; the minimum requirement is for your documents to be well formed. However, all applications that perform interesting processing (including formatting) on structured documents need consistent and constrained tag sets. Therefore, even if you choose to deliver "topless" XML documents on the Web, you will almost certainly want to create your XML documents under the control of a DTD whose minimum role is to define the legal tags. |
| 13. What is an XML DTD? |
| It is the same as an SGML DTD, except that its syntax is a bit simpler and, as a result, it is somewhat less able to constrain your documents' content. For example, if you want to ensure that a particular attribute value is a "number" (that is, contains only digit characters), you can achieve this in SGML but not in XML. |
| In the future, it is expected that schemas will offer even more powerful constraints on documents than full-SGML DTDs do. For example, if you want an attribute value to contain a series of numbers and dashes corresponding to a "date" (such as 1998-05-21), you should be able to write a schema that requires the string to be "4 digits, a dash, two digits no more than 12, a dash, and two digits no more than 31." |
| 14. Do I need to change my existing SGML DTDs? If yes, how? |
| No, you shouldn't need to redesign your SGML DTDs. The reason is that, since you can easily turn your SGML document instances into DTDless XML for delivery, when they're in the SGML stage you can keep using your current tools with your current DTD. |
| You may find it useful to deliver your XML data along with an XML DTD, but that DTD can be a variant of your original SGML DTD, perhaps with features solely related to publishing. |
| 15. Up to a year ago, you swore SGML meant reliability and longevity of my company data. Now what does it mean with this new XML standard? |
| I can still swear it and look you straight in the eye. The proof is that your SGML data can easily take advantage of the new XML data format that has come along in the last year. In other words, XML has allowed SGML to prove that it delivers on its promises. |
| 16. Are there things that I can do in XML that I can't do in SGML? And inversely? |
| Today, the answer is no. In the future, there will be a standardized way to do more powerful links and to define more powerful data types (like the date example in question #13). These features were not available in SGML, but we will have to wait for the standards to be finished and for the vendor community to deliver the tools that make the standard real. |
| There are things you can today in SGML that you can't in XML. However, other than the DTD validation power already discussed, most of the features exclusive to SGML are so obscure that they were rarely used and rarely implemented. In other words, you're unlikely to miss them. |
| 17. Is it less expensive than SGML? In production and in distribution? |
| Most of the tasks involved in building an SGML-based production line are concerned with planning, analysis, and design. These tasks might include designing markup models, stylesheets, database structures, transformation mappings, creation interfaces, and so on. This is labor-intensive and will probably cost as much in an XML scenario as in SGML. |
| What might end up being less expensive are the distribution and delivery tools (indexing tools, search engines, and so on), which, because they must conform to a simpler standard, will likely be less expensive to develop and thereby less expensive to buy, customize, and deploy. |
| 18. Is XML a Microsoft invention? |
| No. It was created by a working group under the auspices of the World Wide Web Consortium. At the time, the group had about a dozen representatives from both the SGML and Web communities, and included both a Microsoft and a Netscape representative. |
| 19. How do XSL and XLink relate to XML? |
| They relate on two levels. First, they will be additional standards in the XML family. XSL stands for Extensible Style Language, and it will provide a standardized way to define stylesheets linked to XML documents. XLL (now called XLink) stands for XML Linking Language, and it will provide a standardized way to expand the power of HTML links in XML documents. |
| Second, XSL and XLink are both applications of XML. That is, an XSL stylesheet is an XML document itself, and XLink linking elements are true XML elements that reside in an XML document. |
| 20. Will my native SGML database take XML? |
| If you talk to your vendor, you should expect the answer to be Yes, since what is involved in supporting XML is mainly a change in the parser component. |
| 21. Does XML mean SGML death? |
| I don't think so. Obviously, SGML has needed for some time to become more user-friendly, which meant some additions along with drastic simplification. These improvements to SGML are finally being undertaken, in the guise of the WebSGML work in the ISO WG4 committee. It appears that this work was motivated by the breathtaking speed of Internet growth and the realization that XML provided the only scalable solution. |
| Because WebSGML is a real standards effort that will result in an ISO standard, I strongly believe that it can be trusted in terms of quality and stability. I also strongly believe that the underlying concepts of SGML will keep being relevant to upstream document creation and management. |
| The development of "new SGML" and future versions of XML will take some time, but I'm confident that the results will converge and be fully compatible with each other, bringing you the safety of ISO standardization. |
| XML: What HTML Wanted to Be! | Table of contents | Indexes | XML for legislation drafting, management and Web delivery&mdash,How structured document representation facilitates automatic processing | |||