A Topic Map for SGML 97 Proceedings   Table of contents   Indexes   Bottoms-Up, A Paradigm Shift

 Bradley  Neil 
 

The Role of Industry Standard DTDs

 

Abstract:

 DTDs that are created for use in a specific industry often invoke strong feelings. Those who see SGML primarily as a means to exchange documents see them as essential to this task. Those implementing SGML primarily for its other benefits, such as multiple media publishing, may see such DTDs as an irrelevance. In this paper it is argued that neither position is totally satisfactory, and that while an industry standard DTD (if there is one) should play some part in the process of implementing an SGML solution, it should be compared to the real requirement, and if necessary tailored to suit this need.
 In addition, two separate issues need to be distinguished. Whilst an entire DTD may be targeted at a specific industry, fragments of that DTD may solve more general problems. These fragments will be discussed separately.
 

Why use an IS DTD?

 Some governments and industry-wide standardisation committees have produced Industry Standard (IS) DTDs to facilitate the interchange of documentation between organisations. There are some powerful reasons for using such a DTD.
 Consider the case of two organisations who wish to both produce and exchange documentation. By adopting an IS DTD, both organisations avoid the not insubstantial costs associated with document analysis and DTD design.
 In addition, information can be exchanged between these organisations with a minimum of fuss. The recipient of such information is at a particular advantage. This organisation will have a data repository that understands the document structures, and style-sheet filters already configured to render it on-screen or on page. The DTD itself is already understood, does not have to be exchanged along with the data, and products that require some kind of compiled version of the DTD do not have to be re-configured.
 

Why avoid an IS DTD?

 There are also some drawbacks to the use of IS DTDs, which derive from the fact that the needs of one organisation will rarely match exactly the needs of another. Each organisation tends to make different decisions concerning the content of their documentation (perhaps for reasons of commercial advantage), even in tightly regulated industries.
 First, the IS DTD may not identify every feature of the documents produced by the implementor. By ignoring this problem, important information will either not be tagged at all, or will be tagged inappropriately. Such information will be difficult to identify for searching or extraction purposes.
 Second, the IS DTD may contain elements that will never be used by the implementor. The presence of these elements on selection menus will confuse authors, and in the DTD they will likewise confuse programmers.
 Note that a subtle variant or combination of these first two factors may be encountered. For example, if an IS DTD contains several elements to describe paragraph levels, such as "P0", "P1" and "P2", but the implementor wishes instead to use a single level of paragraph, then there are both unnecessary elements which are to be ignored ("P1" and "P2"), and also an inappropriate element name for a simple paragraph, "P0", which would be better named "PARA".
 Third, in order to satisfy the varying needs of many organisations in the industry, the IS DTD rules may be too flexible. For example, the DTD may allow an author name to appear before or after a publication title, and also allow it to be absent. Document authors could then break an in-house rule stating that the author name must always appear, and must be located before the title. Every unnecessary degree of freedom may also add to the difficulty of writing translation filters, and certainly adds to the difficulty of developing and testing style-sheets. Worse still, there may be more than one mechanism included to model a particular data structure, and it would be unfortunate if document authors were able to choose a model at random.
 

The Pragmatic Solution

 Are the needs for exchange of information more important than the needs of accurate in-house data modelling? In most cases, it is possible to take a middle road, perhaps veering toward one extreme or the other, depending on the circumstances. This means taking account of an IS DTD, but modifying it to fulfil the real need.
 Although this approach hinders the transfer of documents between two organisations that have both modified the IS DTD, at least there will be some degree of commonality. To take a simple example - if the IS DTD contains an element that describes a paragraph, and calls it "PARA", then both derived DTDs would most likely contain this named element. If no account of the IS DTD had been taken, one implementor may have called it "P", and the other may have called it "PAR", which would confuse anyone trying to write a translation filter (well, perhaps not in this case, but certainly in more complex cases).
 The DTD designer should therefore first analyse current documents and the future needs of the organisation, then compare the results against a suitable IS DTD. Redundant elements should be removed, additional ones added, and loose occurrence rules tightened.
 This process is not always as destructive as it first looks when considering how to transfer data to an organisation using the original IS DTD. Tightening of context and occurrence rules has no effect on the validity of documents when they are later parsed against the IS DTD (the parser does not know or care that the documents were created under a less liberal regime). Also, removal of unnecessary elements usually means that they are just not available to document authors, the elements they can still access will be compliant (the exception being when the IS DTD actually requires the deleted element to be present). Only the addition of new elements guarantees problems, which can simply be resolved by either removing or re-naming these elements before they are transferred.
 

Useful DTD Fragments

 Some industry standard DTDs may attempt to solve more general problems. For example, the CALS DTDs had to include a model for producing tabular material, and product vendors wishing to sell to the defence market had to translate this model into a true tabular representation for displaying or printing tables. With this problem solved, and with the same products are now widely used outside the defence industry, other DTDs have incorporated the CALS table fragment so as to use the existing capability of these products.
 In these situations, there should be far less room for compromise. Where many SGML-aware products must be aware of the model, it is vital that people do not tinker with it. In this case it is far better to have a simple but robust model, widely supported by software, than to have many subtle variants, each unsupported and incompatible.
 The current situation as regards mathematical formulae proves the point. There are at least three models still vying for attention, with most products supporting a subset of them.

A Topic Map for SGML 97 Proceedings   Table of contents   Indexes   Bottoms-Up, A Paradigm Shift