The student and the mechanic - how XML enables architectures to solve real-life document delivery requirements   Table of contents   Indexes   Developing XML Requirements that are Extensible

 

Validation Is Good

 Eve   Maler
  Solutions Marketing Manager
  Arbortext, Inc.  3 University Office Park
95 Sawyer Road
Waltham   Massachusetts  USA  01803
Phone: +1 781 529 1912
Fax: +1 781 529 1099
Email: elm@arbortext.com Web: http://www.arbortext.com
 
Biographical notice:
 
As a Solutions Marketing Manager at Arbortext, Eve Maler is responsible for product solutions design and for coordinating standards compliance. Previously she built and led Arbortext's DTD Team, which specializes in SGML/XML data modeling and DTD development consulting. Maler was a founding member of the World Wide Web Consortium's XML Working Group. She has served as co-editor of the XLink and XPointer specifications, and currently represents Arbortext on the W3C XML Schema and XML Linking Working Groups.
 
Maler is co-author (with Jeanne El Andaloussi) ofDeveloping SGML DTDs: From Text to Model to Markup , the only book published to date on a complete methodology for designing DTDs. She was a long-time technical contributor to the Davenport Group, and served for several years as a maintainer of the popular DocBook DTD for software documentation.
 
Before joining Arbortext, Maler spent ten years at Digital Equipment Corporation as a technical editor and documentation specialist, where she developed DTDs, converters, and ADEPT environments for Digital's UNIX documentation group.
 
Maler holds a BA in Linguistics from Brandeis University in Waltham, Massachusetts.
 
ABSTRACT:
 
Validation is necessary for sophisticated information processing, and need not be a burden when the constraints and creation environment are designed correctly. DTDs may, but need not, be an essential part of validation and constraint. I illustrate my position by presenting two challenges (which were posed to me in real life!) and my response to them.
 

Validation Is Good

 
Validation is necessary for sophisticated information processing, and need not be a burden when the constraints and creation environment are designed correctly. DTDs may, but need not, be an essential part of validation and constraint. I illustrate my position by presenting two challenges (which were posed to me in real life!) and my response to them.
 

Challenge #1

 
"DTDs constrain the authoring process from being flexible and creative to being 'rigidly structured.' The great thing about XML is that it eliminates the requirement for a DTD." My response is as follows
 
A DTD can be as rigid or as loose as your purposes dictate. For example, there are DTDs that basically encode all the same information that RTF does, and if one of these were to be used for authoring, it would impose no particular constraints.
 
Nonetheless, the most important role of a DTD is to constrain data, so that processing applications can operate on the data reliably. In this sense, it is little different from a relational database schema that enables sophisticated report writing and data analysis.
 
For example, if you expect your sections to have titles but can't rely on the title to be present, your tables of contents will need manual checking and fixing, and keyword searching based on title text will be compromised; a DTD can automate the process of checking the sections and therefore automates high-quality TOC building and navigation. The trick is in careful requirements gathering and schema design.
 
Many of the perceived downsides to using DTD-driven authoring environments have been the result of application-side lack of sophistication, rather than XML-side lack of flexibility. It is certainly possible to create an interface that enforces the DTD's rules while feeling very word-processor-like from the user's perspective.
 
Finally, just as software development has become more methodical and modular (but no less creative) in order to support creation of more/better/cheaper software products, text information development is becoming more methodical and modular in order to support more/better/cheaper information products. This sometimes does require the imposition of constraints, such as dictating the manner and kind of cross-references and the order of information. However, the payoff can be huge, and the creativity simply moves "up" in abstraction.:
 

Challenge #2

 
"Being merely well-formed (rather than complying with a DTD) is sufficient because every XML document is self-describing. And anyway, a DTD can be inferred from the data if a DTD is needed." My response is as follows:
 
Well-formed documents do have the advantage of self-description. Note, however, that a well-formed XML document describes onlyitself , and not necessarily a useful class of documents that it falls into. For example, if you are in possession of a single document A that does not have an abstract, and then another similar document B comes along that does have an abstract, any DTD you derived for document A cannot possibly account for document B.
 
However, in general, as long as a document's markup falls into predictable patterns that a processing application can anticipate, there is no need for a DTD. That is to say, as long as a documentwas constrained in predictable ways during creation, there is rarely a need to transmit the formal set of constraints along with the document.
 
You can use any constraint method you like in creating an XML document and preparing it for transmission. Following are some of the possibilities, and you can use as many as you like at one time:
 
  •  An SGML DTD used during or after authoring
  •  An XML DTD used during or after authoring
  •  Code based on a technology such as OmniMark or Balise that constructs a document (e.g., from a database) or is used as validator after authoring
  •  A schema based on a language such as SOX or DCD used during authoring
  •  Code in your own private proprietary language used as a validator during or after authoring
  •  Human copy editors used as "validators" after authoring
 
You might choose to transmit a partial or full XML DTD along with the document, but if you're not interested in allowing the recipients to apply the same constraints that you did (and you may very well not, since this is often your added value), you might want to transmit only a well-formed document, or a document with a tiny partial DTD that provides only attribute defaults and internal parsed entity declarations. If you used any non-DTD means of constraining the document, sending the DTD won't compromise all the aspects of your added value.

The student and the mechanic - how XML enables architectures to solve real-life document delivery requirements   Table of contents   Indexes   Developing XML Requirements that are Extensible