| Intellectual Property in a Fragmented World | Table of contents | Indexes | XML - Practice finally makes perfect! | |||
Using XML in a Multi-Format Publishing Environment |
|
Geoff Nolan |
| Senior Systems Engineer |
| Turn-Key Systems Pty Limited 203/511 Pacific Highway Crows Nest NSW Australia 2065 Phone: +61 2 9906 1577 Fax: +61 2 9906 1342 Email: gjn@turnkey.com.au Web: www.turnkey.com.au |
Biographical notice: |
Geoff Nolan |
ABSTRACT: |
| XML publishing |
XML has been developed primarily as a means of information storage and interchange. However, it can also be used very effectively for real world publishing. It is particularly suitable as a means of maintaining material which is to be published in a variety of output formats (such as print, CD, Web). |
Introduction |
|
XML publishing software markup techniques ![]() |
Turn-Key has been developing publishing software for over 25 years, so we look to XML primarily as a means of document markup in real world production environments. Because of this, our approach may be viewed as novel by some, unorthodox by others. Nevertheless, we have found that the techniques described below are highly effective in producing flexible and useful markup with very low design and implementation overheads. |
DTD design ![]() |
Tip 1: Develop a Consistent Markup Design Philosophy |
Levels |
|
Two typical level definitions might be as follows: |
Content and Effects |
|
| content tagging |
There are three main content elements: <label>, <desc> and <p>. The last introduces a general content unit at the current level. Its name comes from the fact that it typically (though not universally) manifests as a paragraph in print. We reserve the <para> tag for those documents which actually contain a paragraph level. Content tags represent a change from element context to mixed context. |
style ![]() typesetting effects ![]() |
The final class of tags are the effects. These can range from the trivial (such as <sub> for subscripts), to the highly sophisticated (such as <quote> which can change fonts, infer quote marks of varying types, and invoke hyperlinks in electronic products). |
Heresy 1: The DTD is Used to Support the Markup, Not to Enforce It! |
|
| DTD, necessity for |
HTML has shown us that a strict DTD is not necessary to provide a functional markup, while XML has the option of doing away with the DTD altogether. In fact we are not advocating the abolition of the DTD just yet, though that time may come. |
| DTD limitations |
This is caused by two main weaknesses in DTD functionality. Firstly, there is no facility for context sensitive content models. Thus, I can not say that a <p> can only contain an <item3> when the <p> is itself within an <item2>. One can of course define <item1.p>, <item2.p> etc, but this lowers the level of tag abstraction, simplicity, and reusability. |
Secondly, the DTD cannot validate element content, for example by ensuring that text inside a <date>..</date> pair is actually a valid date. |
XSL ![]() validation ![]() |
At Turn-Key we have been considering using XSL for this purpose. The draft specification for XSL suggests the possibility of designing a validating stylesheet which can supplement (or even replace) the DTD, by testing the document instance against a number of context (and content) sensitive rules. Such issues are however outside the scope of this presentation. |
Tip 2: Handy Hints for Generalized Markup |
|
The following are a few guidelines to follow in your markup. They may seem simple enough, but we hardly ever see a document style where at least one of these rules is not broken. |
| abstraction footnotes printed output ![]() structural consistency superfluous tags tags v attributes |
Remember that, in general, printed output makes significantly greater demands on the markup than CD or HTML formats. Some of the items to be specified include: continuation lines; telltales and running heads; page number gapping; vertical justification; placement and numbering of footnotes; multi-page tables; rotated text; production of title pages, contents, indexes etc. If you can get your print formatting under control, there should be very little to add to accommodate other output media, since most items to be tagged require (or have the potential for) typesetting effects. |
Heresy 2: Too Much Tagging is as Bad as Too Little! |
|
| excessive tagging |
Many practitioners of SGML regard extensive tagging as a sign of good markup. However, each additional tag carries with it a significant cost. |
| reuse of tagged material |
Second and perhaps even more important is that the smaller the number and variety of tags, the greater the likelihood that text can be freely exchanged between publications. Accordingly, we take pains to eliminate as much tagging, and particularly style related tagging, as possible from our documents. This will be discussed further in the following section. |
Tip 3: Inferred Effects |
|
inferred tags typesetting effects ![]() virtual tags |
One method used for reducing the tag density and complexity in our documents is to use inferred effects. These are usually stylistic effects, such as font changes. However, it is possible to infer complex structures, such as multiple hyperlinks. |
To achieve all this using classical markup techniques, we'd need something like This is perfectly correct XML, but far more difficult to read and to enter accurately. |
XSL applications plug-ins ![]() |
To infer the required effects in print, we use tag conversion applets. These are currently implemented as plug-ins to our typesetting system. They could equally well be implemented as ECMAscript (JavaScript) routines in an XSL action set. The first plug-in effectively swallows the entire <l.ref>..</l.ref> block and emits a transformed equivalent. In this case the only required effect is to reduce caps after numbers to small caps. The second plug-in handles the <legn> block and italicizes the Act name, but not the year. |
Heresy 3: The PI Can Be Your Friend <? :=) ?> |
|
| PI usage |
It is said that the processing instruction is introduced by a question mark as a constant reminder that their use is dubious at best. However, the much maligned PI can prove very valuable if used with discretion. |
The fact that the DTD is unable to verify such PIs is largely irrelevant, as the DTD by its very nature cannot be relied upon to strictly enforce even normal tag usage. |
The moral here is that you can and should use all the tools that XML provides to maximize the effectiveness of your markup. |
Conclusion: |
|
| XML conversion costs XML publishing costs |
The techniques outlined above have proven highly effective in practice. We have a number of examples of products being converted from various proprietary formats to fully functional XML with a one-off overhead of about 30% on normal production costs and timings. When properly designed and implemented, the XML production environment eliminates most of the problems associated with suspect data and inconsistent tagging. In addition, the new tagging style is generally neater, and easier to learn and maintain. The conversion costs are therefore quickly recovered. |
The lesson here is that, with suitable planning and design, converting even large and complex publishing environments to XML can be both relatively painless and highly profitable. |
| Intellectual Property in a Fragmented World | Table of contents | Indexes | XML - Practice finally makes perfect! | |||