| Overview of XSL | Table of contents | Indexes | XML In Defense Procurement | |||
| Halpern-Hamu Incremental Development | Using an XML Audit to Move SGML Data towards XML |
Canada ![]() Halpern-Hamu, Ph.D., Charlie Incremental Development, Inc. Ontario ![]() Toronto ![]() | Charlie
Halpern-Hamu, Ph.D.
Structured Information Consultant, Incremental Development, Inc.
Biographical notice Charlie Halpern-Hamu completed his doctorate in Computer Science at the University of Toronto. He has published papers in the areas of denotational semantics, programming-language design tools and graphical control of robots by the disabled. He has been a been a structured-information consultant for seven years. |
Abstract |
| This paper describes, at a technical level, how to assess the XML-readiness of your SGML data as a first step towards moving it towards XML. |
| This paper suggests an 'XML audit': a technical review of current markup practice with eye towards simplification. The goal of an XML audit is to understand which portions of your current SGML application are not XML. The next step might be to start deemphasizing your use of those features. |
| Moving all the way to XML allows you to use XML tools that do not support full SGML. Even getting part way there means you can use a wider variety of SGML tools. In either case you will be simplifying work for both editorial and programming staff. Simpler is better. |
This paper is derived from James Clark's 'Comparison of SGML and XML', a World Wide Web Consortium Note (
|
Introduction |
| This talk describes, at a technical level, how to assess the XML-readiness of your SGML data as a first step towards moving it towards XML. |
| XML Audit | XML Audit |
| This talk introduces the concept of an 'XML audit': a review of current markup practice with eye towards simplification. An XML audit lets you know where you stand. Your next step might be to de-emphasize those SGML features that are not XML. |
Motivation |
| Moving all the way to XML allows you to use XML tools that do not support full SGML. Even getting part way there means you can use a wider variety of SGML tools. In either case you will be simplifying work for both editorial and programming staff. This simplification may result in reduced training requirements, less confusion and fewer errors. |
| But even if you choose to make no immediate change to your markup practices, an XML audit will give you valuable information that will help inform future decisions. You may discover that, give or take an angle-bracket or two, you are already doing XML. |
Notes on Style |
All discussion assumes the reference concrete syntax. So I will say 'left angle-bracket' or '
|
| Where SGML and XML vary slightly in their nomenclature, I tend towards the SGML, since that's our starting point. Or I fall back towards spelling things out using the reference concrete syntax as described above. |
| I use the term 'URL' ('uniform resource locator') where the XML standard uses the term 'URI' ('uniform resource identifier'). The expectation is that the URL standard will be updated to define 'URI'. Until then, speaking of URIs is getting a bit ahead of ourselves. |
Acknowledgments |
This paper is derived from James Clark's 'Comparison of SGML and XML', a World Wide Web Consortium Note (
|
| Clark's Note discusses XML options not available in SGML. This paper ignores these, only discussing those SGML options that are not available in XML. |
| In this paper, and to an even greater degree in the corresponding presentation, I've tried to give more prominence to the more commonly-used SGML features that are missing in XML. |
| I'd like to thank Larry Sulky for his copy edit. The only suggestion I didn't take was to change 'a journey of a thousand miles' to 'a journey of sixteen-hundred kilometres'. |
| How to Conduct an XML Audit | How to Conduct an XML Audit |
| The key idea in conducting an XML audit is resisting the temptation to do more than simply review where you stand. |
Who Should Attend |
| You need a selection of technical people: someone who knows the DTD, someone who knows editorial tagging practices, someone who knows about the programs that operate on the data as it flows in, through, and out of the organization. |
| An XML audit is for figuring out where you are, not where you are going. Consequently, you don't need managerial or technical decision-makers at the meeting. They will want to understand and act on the final assessment. |
How to Prepare |
| Make printouts of this paper, your SGML declaration(s), DTD(s), some sample data, and programs that act on this data. Distribute these items in advance to your attendees. Each attendee should review these items, especially those about which she is the designated expert. So the data architect should focus on the DTDs, the programmer the programs, etc. Ask attendees to note those aspects of your current SGML that are not XML, perhaps in the margins of this paper. |
How to Proceed |
| Designate one person as the note-taker. As with the individual preparation step, it may be convenient to use a copy of this paper as a note-taking template. Move systematically through the headings in this paper and determine if they apply to your application. |
Postpone discussions about how to recast SGML usage as simpler XML usage. Focus on simply listing those aspects of your SGML usage that go beyond XML. When you do find non-XML usages, include details of where. Do you have one use of the '
|
Results |
| The result of an XML audit should be an assessment report. Transcribe your notes into a complete list of the non-XML things you do. The next step will be to decide if it makes sense to change all or some of your markup practices. |
| Stupid SGML Tricks | Stupid SGML Tricks |
| Those aspects of SGML that are not available in XML are listed in the sections that follow. The following organization has been used: |
| The Big Three |
| Out of Band |
| Miscellaneous |
| The Big Three: Elements | The Big Three: Elements |
|
|
|
|
|
|
|
|
| The Big Three: Attributes | The Big Three: Attributes |
|
|
|
|
|
| The Big Three: Entities | The Big Three: Entities |
| XML places various restrictions on entity declarations and entity references. |
|
|
|
|
|
|
|
|
|
|
|
| Out of Band: Comments | Out of Band: Comments |
| XML restricts the variation in syntax and location of comments that SGML allows. |
| A typical SGML comment looks like this: |
<!--Okay XML.--> |
The '
|
|
|
|
|
| Out of Band: Marked Sections | Out of Band: Marked Sections |
XML severely restricts the usage of SGML's marked sections. The only type of marked section allowed is a
|
|
|
|
|
|
|
|
| Out of Band: Processing Instructions | Out of Band: Processing Instructions |
| XML uses a special syntax for processing instructions. You can imitate this XML syntax by using a similar convention for your SGML processing instructions. Processing instructions are closed in SGML with a right angle-bracket. In XML, they are closed by a question-mark right angle-bracket sequence: |
<?isnt-xml This is a processing instruction.> <?okay-xml This is a processing instruction.?> |
In XML, the
|
| It is good practice to categorize your SGML processing instructions by always starting them with a name that says to which processor they are directed. In XML, this practice is a requirement. This name is called the PI 'target': |
<??> <!--This isn't XML because it has no target.--> <?okay-xml?> <?okay-xml2 The target is 'okay-xml2'.?> <?okay-xml3The target is 'okay-xml3The'.?> |
The target '
|
<?xml This isn't XML.> <?XML This isn't XML.> <?XmL This isn't XML.> <?xmlx This is technically okay but tempting fate.> <?sgml This is okay XML.> |
| Miscellaneous: Characters | Miscellaneous: Characters |
|
|
|
|
|
| Miscellaneous: Minimization | Miscellaneous: Minimization |
| XML does not include a wide variety of markup minimization features available in SGML. This section lists the more common types of minimization. Less commonly used minimization techniques are listed under 'Obscure Features'. |
|
|
|
|
|
|
| Miscellaneous: Other Restrictions | Miscellaneous: Other Restrictions |
|
|
|
|
| Miscellaneous: Obscure SGML Features | Miscellaneous: Obscure SGML Features |
There are a number of features of SGML of which you may be only dimly aware. You likely won't notice their absence from XML. The
|
|
|
|
|
|
Conclusion |
| A journey of a thousand miles starts with the first step. But before you take that step, you ought to determine where you stand. This will help you start out in the right direction. Or realize you're happy right where you are. |
| James Clark | References |
| SGML is defined by 'ISO 8879:1986(E). Information processing - Text and Office Systems - Standard Generalized Markup Language (SGML). First edition - 1986-10-15', available from the International Organization for Standardization in Geneva. |
XML is defined by 'Extensible Markup Language (XML) 1.0', a World Wide Web Consortium (W3C) Recommendation dated 1998 February 10 (
|
This paper is derived from James Clark's 'Comparison of SGML and XML', a World Wide Web (W3C) Consortium Note dated 1997 December 15 (
|
| Overview of XSL | Table of contents | Indexes | XML In Defense Procurement | |||