Realising the Potential of Object Technology Through New Working Practices   Table of contents   Indexes   Comparing Styling in Layout-driven &, Content-driven Documents

 
 

XML and the ATA Interchange Model


 
Dave   Cruikshank
  Senior Principal Scientist
  The Boeing Company
PO Box 3707 M/S 2L-17
Seattle   Washington  98124
Phone: 206 544 8876
Fax: 206 544 9878
Email: david.w.cruikshank@boeing.com
 
Biographical notice:
 
Dave Cruikshank
 
Mr. Cruikshank is currently co-chair of the ATA  (Air Transport Association) / AIA  (Aerospace Industry Association) Graphics Working Group and is a member of the TICC  (Technical Information Communication Committee) executive committee. He has extensive experience in both SGML  (Standard Generalized Markup Language) and graphics interchange.
 
ABSTRACT:
 
The ATA has been doing interchange DTD  (Document Type Definition) development since 1989 and several maintenance, training, and operations documents are currently being delivered in SGML . With the introduction of XML  (eXtensible Markup Language) as an approved standard for web applications of SGML , the ATA must review their interchange DTD s to determine the impact of supporting web delivery of XML documents in the future. This paper will explore the impact of XML on the ATA interchange model. The conference presentation will demonstrate, using representative ATA content constructs, the steps required to move from SGML to XML for a typical industry interchange and will address how a data model might be used to facilitate this interchange.
 
 

Disclaimer

 
The author is writing from his experience with the ATA interchange model and his experience in the industry. In no way, does the author intend to present an official ATA position on XML in this paper
 
 

Background

 
The ATA , under the direction of the TICC , has been developing SGML document DTD s since 1989. Initially, three specifications were developed by the ATA Text Working Group to define requirements for DTD development. Those specifications are the DTD General Requirements, the DTD Technical Requirements, and the Guide to Functional Requirements. There are currently sixteen document DTD s published in the ATA Spec 2100 covering digital data that were developed according to those requirements. In addition, several other DTD s have been developed by manufacturers to deliver manuals not explicitly covered by ATA requirements. Most of these DTD s have been developed using the requirements defined in Spec 2100. Extensive processes have been put into place for the generation, distribution, and manipulation of SGML instances by both the data producers and the data receivers.
 
The development of XML as an application profile of SGML comes at a time when the ATA is stabilizing its document interchange model and developing a data model based on business processes and information interchange. The introduction of XML , and its potential for web applications has significant impact on the ATA interchange model. XML has the potential of providing new direction for the implementation of the ATA information interchange model. The following sections will address each of the parts of the XML specification and how the ATA model might be impacted.
 
 

XML Part I

 
 

Approach

 
The ATA requirements documents that guided the developement of the industry DTD s were guidelines, and not true application profiles of SGML. The actual ATA DTD development activity, however, excluded many of the more complex syntactical constructs of SGML . In taking this approach, the ATA DTD syntax was developed using many of the same constraints used to define in the XML specification.
 
 

Common Structures

 
The ATA Technical Requirements Document defines the required markup for common structures used in many of the digital documents. The common structures identified are revision management and identification, effectivity management, COC  (Customer Originated Change) markup, graphics callouts and referencing mechanisms, character sets, the SGML declaration, standard text entities and paragraphs, cellular tables, document level attributes, referencing mechanisms, list structures, document hierarchy, public identifiers, and DTD presentation. There are several major areas where these requirements or implementation practices are in conflict with the base XML syntax.
 
 

Terminology

 
The terminology used in the XML specification is not precise enough to lead to unambiguous interpretation of the specification. There is a disturbing term defined in section 1.3 of the XML specification. The use of the word "may" is defined to mean "Conforming data and XML processors are permitted to, but need not behave as described". A quick review of the occurences of the word "may" in the specification revealed a large number of unintentional, incidental occurences leading to invalidation of many of the requirements. Testing conformance of applications to a profile cannot be performed unless requirements are explicitly stated. If, as the abstract of the specification states, XML "is an extremely simple dialect of SGML ", optional behavior of conforming applications cannot be allowed.
 
 

Element Content

 
Since tag omission has been eliminated in XML , the standard SGML "- -" syntax (<!ELEMENT mytag - - (#PCDATA)>) has been eliminated from the element content definition(<!ELEMENT mytag (#PCDATA)>). All ATA DTD s would have to be processed to convert them to the valid XML syntax. This would be a one-time activity and would not have a great impact on implementations.
 
 

Exceptions

 
The XML specification explicitly forbids the use of exceptions (inclusion and exclusion constructs). In the ATA DTD s, inclusions are used extensively for revision and COC markup and for effectivity markup. Exclusions are used occasionally to avoid recursive definitions in content models.
 
In order to satisfy XML syntax requirements, the ATA DTD s would have to be redesigned.
 
 

Revision and COC Markup

 
Revision and COC identification are implemented by using empty start tags to indicate the start and end of revised or customer revised data. While this model allows markup to occur across tag boundaries, many implementations only apply the markup around PCDATA. Revision and COC constructs must be paired tag sets with PCDATA content and included in an OR clause wherever else PCDATA is allowed using the mixed content model allowed in XML syntax. Priority of application of revision and COC markup would have to be explicitly defined in the content model.
 
 

Effectivity

 
Effectivity is often allowed as an inclusion to account for authoring practices, rather than intended authoring rules. The ATA has recently revisited its effectivity authoring practices and a revised set of effectivity authoring rules has been generated. Based on those rules, the DTD s could be redesigned to explicitly specify effectivity as part of the appropriate content models.
 
 

Exclusion in Table Model

 
The ATA Technical Requirements Document specifies the use of the CALS  (Continuous Acquisition and Lifecyle Support) cellular table model. In order to avoid tables within tables at the entry level, table was excluded from the table content model in the ATA requirements. A more careful definition of the entry content model would solve this conflict with XML syntax.
 
 

The SGML Declaration

 
XML explicitly defines all the parameters in the SGML declaration for all XML applications. No SGML declarations are allowed for specific applications. In general, the SGML declaration specified for the XML syntax supports the ATA implementations of SGML . The only real variance occurs in the specification of the LCNMCHAR and UCNMCHAR values. In addition to the use of the hyphen (-) and the period (.), the ATA SGML Declaration allows for the use of the underscore (_) as a valid NAME character. The underscore is used in many applications as a separator in the NAME value to support the ID/IDREF construct. When dealing with IDs, the underscore provides a convenient separator that is distinct from the hyphen. Conforming to XML syntax in this case has no impact on the DTD s, but does have an impact on many systems that are currently in a production environment.
 
 

Character Sets

 
The ATA Technical Requirements Document specifies several ISO Entity character sets as defined in ISO 8879:1986. These character sets include ISOtech, ISOpub, ISOnum, and ISOgrk1. In addition, data providers call out additional ISO Entity character sets and, in some cases, private character sets, as required by the content of a document. The adoption of ISO 10646:1993 UCS-2 (Unicode) by the XML specification opens the way to a much richer character set. The majority of character entities used in ATA applications would have to be mapped to the Unicode set and many processses modified to support this change.
 
 

Markup of EMPTY Elements

 
Since the ability to declare end tags as omissible in XML has been eliminated, the syntax for defining EMPTY element tags (start tags with no end tags) requires the use of a new close tag delimiter, etagc (/>) to signal empty elements to an application (<STAG/>). There are several examples of the use of EMPTY elements in the ATA DTD s. Alignment with XML syntax would not involve any major changes to ATA specifications, but would require data generating applications to change the way empty elements are marked up.
 
 

NUMBER and NUTOKEN Attributes

 
Among other attribute types, NUMBER and NUTOKEN attributes are forbidden in XML applications. Attributes of type NUMBER are used frequently in the ATA DTD s. In many cases this was done to clearly define the intent of the attribute value. The XML specification disallows the use of NUMBER attributes and specifies application specific validation to insure proper values. In general, this would not be a problem for the ATA applications, since these values are typically produced to conform to valid values.
 
The main problem with this change occurs because of the use of the CALS cellular table model. Both NUMBER and NUTOKEN values are specified in that model. Even updating the ATA requirements to align with the Exchange Table Model Declaration Module as defined by SGML Open ( now OASIS  (Organization for the Advancement of Structured Information Standards) ) would not solve this problem, since NUMBER and NUTOKEN are also used as attribute values in that model. It is assumed that OASIS will address this issue and produce an XML compliant table model that could be adopted. The impact of adopting changes to the table model would required one time changes to most of the ATA DTD s, but would not impact current processes.
 
 

Whitespaces

 
The XML specification addresses whitespace by providing a special attribute ( XML -SPACE) to control treatment of whitespace. Allowable values are DEFAULT and PRESERVE. In many ATA applications, long character strings (as in paragraphs or prior to begin tags where the mixed content model allows #PCDATA) will have embedded newline characters to facilitate readability and editability. The default behavior of an XML application may be to preserve whitespace and this may not produce the intended result. Current SGML applications tend to ignore extra whitespace allowing for the use of newline characters for readability/editability purposes. There are cases where preserving whitespace will allow the elimination of awkward constructs used in ATA application like the text graphic. Implementing new rules for whitespace in the data production process will require changes to many systems in the industry.
 
 

XML Part II

 
The base XML syntax allows for the use of the ID/IDREF syntax for internal linking. The current ATA linking mechanism makes use of the ID/IDREF mechanism for intra-document navigation and a reference locator that must be post-processed for inter-document navigation. From a purely interchange standpoint, the current model can continue to be used.
 
The XLL  (XML Linking Language) proposal on additional linking techniques provides for the use of web oriented URLs, HyTime links, and TEI extended pointers. These techniques provide mechanisms that could facilitate the delivery of data on the web and address the inter-document navigation process that must be performed after delivery in the current ATA methodology. Incorporating these new mechanisms into the current DTD s would require a great deal of work and testing. The XML linking proposal represents an opportunity that must be considered in the implementation of an information interchange model according to the ATA data model.
 
While the XLL proposal has various methods of linking, it will be extremely important to the ATA that applications do not do partial implementation of the linking specification. Additional costs to implement the more complicated linking schemes will make XML less attractive as an industry solution.
 
 

XML Part III

 
The proposal for XSL  (XML Style Language) and its adoption by vendors as a common style syntax for XML delivery is key to making this a successful standard. In the current SGML interchange transaction, receivers often use different applications for browsing, editing, and publishing. This has required the development of different style sheets using different syntaxes for each application. The development of these style sheets is time consuming and expensive to the data receivers. A common style syntax would alleviate some of this burden and allow XML to become a truly usable standard.
 
Again, while the XSL proposal has a couple of methods of specifying style, it will be extremely important to the ATA that applications do not do partial implementation of the style specification. Additional costs to implement the more complicated style schemes will make XML less attractive as an industry solution.
 
 

Conclusions

 
The impacts to the introduction of XML into the ATA processes fall into two general categories.
 
 

Current - Document Interchange Model

 
Within the ATA , the document oriented DTD s that are currently in use for interchange are in a stabilized mode and participants have invested a great deal of money in implementing them. The benefits of digital data are only now beginning to be realized for these participants. The current prevailing attitude in the industry is that, any significant change to the DTD s would have a serious impact on the business processes of all concerned.
 
While there are many features of XML that would greatly enhance and tighten the quality of the interchange, the cost to implement them may be prohibitive. Even the availability of low cost tools available to web applications will probably not offset the investment made to date in this model. It would take a powerful business case to reverse this attitude, especially considering the time it may take for XML to become a standard and XML tools to become readily available and truly robust.
 
Until then, the document oriented interchange model currently in use has to be considered an interim model until a data model is available to support information interchange.
 
 

Future - Information Interchange Model

 
The development of an ATA data model aligned with business processes provides an opportunity to redesign the digital data delivery process within the industry.
 
When the data model is complete, an XML DTD module should be designed to correspond to each of the entities in the model. Databases in both the data generators and data receivers environment should correspond to the data model and XML should be employed as the delivery mechanism, either for populating those databases or viewing in a web environment.
 
The web will play a significant role in this delivery scenario and the availability of XML tools for viewing and editing will significantly reduce the investment required in digital data implementations. The XML linking mechanisms could even provide a direct access to databases reducing the replication of data.

Realising the Potential of Object Technology Through New Working Practices   Table of contents   Indexes   Comparing Styling in Layout-driven &, Content-driven Documents