Beyond DTDs: constraining data content   Table of contents   Indexes   Context-Sensitive Documentation in Industrial Process Plants

 
 

Aspects, Effectivities, and Variants


, Tailoring Documents with SGML
 
Dr. Hans Holger   Rath
  Director Consulting
  STEP Stürtz Electronic Publishing GmbH
Technologiepark Würzburg-Rimpar
Pavillon 7
 D-97222 Rimpar   Germany
Phone: +49.(0)9365.8062.0
Fax: +49.(0)9365.8062.66
Email: consulting@step.de Web: www.step.de
 
Biographical notice:
 
Dr. Hans Holger Rath
 
Hans Holger Rath is director of STEP's 'Consulting' department since April 1998. He started at STEP in April 1996 as senior consultant/project manager. Before he joint STEP he was head of the 'Document Computing' department at ZGDV (Computer Graphics Center, Darmstadt, Germany). Dr. Hans Holger Rath studied computer science in Karlsruhe 1984–1990 and graduated at the TU Darmstadt with the doctoral thesis 'Literate Specifying of Hypermedia Documents' in 1996.
 
He was involved in the DTD development for the DIN  (Deutsches Institut für Normung e.V. — German Standards Institute) and ISO  (International Organization for Standardization) and cooperates very closely with several publishing houses and German aircraft industry. All in all he has more than eight years experience in information architectures and related topics.
 
ABSTRACT:
aspect
commercial publications
effectivity
 technical documentation 
variant
 

Tailored/customized documents become more and more important in technical documentation and commercial publications. Technical documents have to be tailored to the customer's environment (country, user skills). Commercial publications have to fit the readers' needs (interests, knowledge, level of requested detail). This paper presents the basic ideas behind variants, aspects, and effectivities. It shows where they are useful and how they are applied. This paper gives several SGML based approaches, and discusses their advantages and disadvantages.
 
 

Introduction

aerospace industry
individual documentation
telecommunication industry
 

Technical documentation is the classic application field for documents with general and customer/reader specific content. Especially aerospace industry with its huge amounts of documents and its customer oriented products uses various techniques to produce all the customer documentation with all its variants controlled by effectivities or aspects. The telecommunication industry (switches) is a second application field of technical documentation with aspects. The products — mainly hardware and software — are also very individual solutions and need individual documentation. The individual documents are controlled by parameters like 'installed features', 'delivered parts', 'country', and 'user skills'.
personal edition
reader profile
 

But not only technical documentation needs tailored/customized documents, even mass publications like newspapers or magazines should be produced customer oriented by the publishing houses. This could end up in personal editions: some readers prefer sports, others prefer financial information, again others look only for politics and local news. Electronic delivery would be the best way to bring the information to the consumer. The generation of the personalized publications depends on reader profiles like 'interests', 'knowledge', and 'level of requested detail'.
 
The need for tailored/customized documents can not be satisfied with individual captured and published documents. This would become much too inefficient and much too expensive. Therefore both corporate as well as commercial publishers are forced to deal with document variants already at the first step of the document life-cycle: capturing of the data. But the other steps 'management' and 'production' have to deal with variants, too.
 
How can SGML help? There are several SGML concepts which could solve the problem (attributes at mostly every element, EMPTY elements, marked sections, several elements for the different contents, etc.). But unfortunately none of them is perfect and their disadvantages show that tool support is needed. There is no concept in SGML which addresses the problem. We have to define our own semantic based on the existing concepts.
 
 

The Problem

variant condition
 

Variants are controlled by so-called aspects or effectivities . Examples for aspects/effectivities are 'user skill level', 'country', 'installed features for customers A, B, and C', or 'modifications for customers X, Y, and Z'. Each text block belonging to a variant carries a condition. The condition contains an assignment of the aspect with a concrete value — e.g., skill-level=expert and country=USA . The document carries a field for the variant control. When a variant should be generated this field is filled with the needed aspect combination — e.g., (skill-level=intermediate and skill-level=expert) or country=UK . Every text block which a condition that evaluates to 'true' will be displayed; all other text blocks will be hidden — e.g., all texts for intermediate and expert users as well as all texts for UK customers will be visible.
 
Production of tailored/customized documents requires the 'knowledge' about the contents and its structure. On the one hand SGML provides this needed knowledge about contents with neutral and semantic markup. On the other hand SGML and its strong structural approach complicates the management of the general and the variant specific text parts. In most cases content structure and variant structure will differ. In addition to that the structure of a document including all its variants is different from the structure of the document with one specific variant.
 
Some real-life examples supporting these statements:
  • A chapter should have one heading, but each variant requires another heading which results in a sequence of headings. → In one DTD version <heading> has to be mandatory but not repeatable, in the other DTD is has to be mandatory and repeatable (= repeat problem ).
  • The variant for beginners consists of one paragraph, the variant for experts consists of additional text splitting up this single paragraph into two or more paragraphs whereby the beginners text surrounds the additional text. → Markup surrounding the expert's text starts in one paragraph and ends in another paragraph (= boundary problem ).
 dangling link 
 hyperlink 
 

The problem becomes even more complex when link targets are inside variants. Link source and target anchors have to be synchronized considering the possible variants of the document to ensure that no dangling link exists.
document life-cycle
 

These differences between SGML structure and variant structure as well as the links have to be taken into account by the listed three stages of the document life-cycle: capturing, management, publication.
  • During capturing (or editing) of a document the writer wants to know/see the document with all variants as well as with only one specific variant. Both views should allow structure validation. This stage is the most important one, because the writer has to keep all variants under control. Up to 16 different aspects is a number often encountered.
  • Document management working on SGML entities or element level has to 'know' where variant text parts start and end — even if they do not harmonize with the SGML structure. Especially the conditions for variants have to be under the control of the document management system. Dangling links for all aspect combinations have to be detected.
  • Publication on paper produces one variant and is rather simple. On-line rendering of the data in an SGML viewer requires evaluation of the variant conditions. This could be done in a separate preparation process or on-the-fly in the viewer.
 
Publishing the data on-line requires an additional distinction between dynamic variants and static variants.
  • Dynamic variants are visible at the customer site. The reader can view the different variants. Good examples for the controlling aspects of dynamic variants are 'user skill level' or 'country'.
  • Static variants are not visible at the customer site. The owner of the data has removed those variants from the electronic document which should not be delivered to the customer e.g., for reasons of confidentiality. Examples for the controlling aspects are 'installed features for customers A, B, and C' as well as 'modifications for customer X, Y, and Z'.
dynamic variant
static variant
 

A further problem occurs when the effectivities are evaluated for publication. It might be possible that from a logical point of view lower effectivities are stronger (= more restrictive) than upper effectivities or lower effectivities contradict upper effectivities.

Note:

 
'Lower' and 'upper' refer to the element hierarchy level in the SGML instance.
A clever algorithm and a powerful condition language are needed to detect these contradictions. But these problems are out of the scope of this paper.
 
 

Example

 
The following example will be used later on for every presented SGML concept.
 
 

Given Doctype

 
This simple doctype should help arguing for and against the several SGML concepts.
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)      >
<!ELEMENT chapter - O (head, (para)+) >
<!ELEMENT head    - - (#PCDATA)       >
<!ELEMENT para    - - (#PCDATA)       >
]>
 
 

Given Variants

 
There are the aspects skill (= user skill level) and country (= country where product is installed) with the possible values EXPERT / BEGINNER and USA / GERMANY .
 
The following four document variants result from these aspects and settings

Note:

 
Notable differences are rendered italic .
:
Expert in USA
 
Chapter 1: Hardware Installation
 
Before you start with the installation of your hardware check if you have 110 voltage.
Beginner in USA
 
Chapter 1: Hardware Installation
 
Before you start with the installation of your hardware check if you have 110 voltage.
 
You can check the voltage with a voltmeter.
Expert in Germany
 
Chapter 1: Hardware Installation
 
Before you start with the installation of your hardware check if you have 220 voltage.
Beginner in Germany
 
Chapter 1: Hardware Installation
 
Before you start with the installation of your hardware check if you have 220 voltage.
 
You can check the voltage with a voltmeter.
 
These four variants are marked up as shown below:
 
Expert in USA:
 
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have 110 voltage.</para>
</doc>
 
Beginner in USA:
 
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have 110 voltage.</para>
<para>You can check the voltage with a voltmeter.</para>
</doc>
 
Expert in Germany:
 
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have 220 voltage.</para>
</doc>
 
Beginner in Germany:
 
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have 220 voltage.</para>
<para>You can check the voltage with a voltmeter.</para>
</doc>
 
ANY element
CONCUR feature
EMPTY element
attribute
element
marked section
processing instruction
 

SGML Helps ... or not?

 
SGML offers several ways to markup variant text blocks: EMPTY elements, processing instructions, marked sections, CONCUR feature, attributes, elements, ANY elements. But all of them require simple to complex tool support to make it user friendly.
 
 

EMPTY Elements

 
Two EMPTY elements added as inclusion exceptions to the root element mark the start and end of a variant text block. ID and IDREF links connect the elements, whereby this connection is only semantic.
 
Example:
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)
                      +(var—st, var—end)  >
<!ELEMENT chapter - O (head, (para)+)     >
<!ELEMENT head    - - (#PCDATA)           >
<!ELEMENT para    - - (#PCDATA)           >
<!ELEMENT var—st  - O EMPTY               >
<!ATTLIST var—st  cond   CDATA  #REQUIRED
                  id     ID     #REQUIRED
                  refid  IDREF  #REQUIRED >
<!ELEMENT var—end - O EMPTY               >
<!ATTLIST var—end id     ID     #REQUIRED
                  refid  IDREF  #REQUIRED >
]>
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have <VAR—ST COND="COUNTRY=USA" ID=VS1 REFID=VE1>110<VAR-END
ID=VE1 REFID=VS1><VAR—ST COND="COUNTRY=GERMANY" ID=VS2 REFID=VE2>220<VAR-END
ID=VE2 REFID=VS2> voltage.</para>
<VAR—ST COND="SKILL=BEGINNER" ID=VS3 REFID=VE3>
<para>You can check the voltage with a voltmeter.</para>
<VAR-END ID=VE3 REFID=VS3>
</doc>
 
Pro : EMPTY elements support free placement of variants even over element boundaries.
 
Contra : They mislead the writer with false hopes about SGML structure validation and complicates the generation of valid specific variants, which might become impossible. Complex tool support is needed to validate and build the variants.
 
 

Processing Instructions

 
Two processing instructions mark the start and end of a variant text block. The processing instructions have to carry the same information as the empty elements do.
 
Example:
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)      >
<!ELEMENT chapter - O (head, (para)+) >
<!ELEMENT head    - - (#PCDATA)       >
<!ELEMENT para    - - (#PCDATA)       >
]>
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have <?VAR ST COND="COUNTRY=USA" ID=VS1 REFID=VE1>110<?VAR END
ID=VE1 REFID=VS1><?VAR ST COND="COUNTRY=GERMANY" ID=VS2 REFID=VE2>220<?VAR END
ID=VE2 REFID=VS2> voltage.</para>
<?VAR ST COND="SKILL=BEGINNER" ID=VS3 REFID=VE3>
<para>You can check the voltage with a voltmeter.</para>
<?VAR END ID=VE3 REFID=VS3>
</doc>
 
Pro : Processing instructions support free placement of variants even over element boundaries.
 
Contra : They mislead the writer with false hopes about SGML structure validation and complicates the generation of valid specific variants, which might become impossible. Complex tool support is needed to validate and build the variants. Implementation is more difficult than for EMPTY elements.
 
 

Marked Sections

 
 
Use of marked section provides an SGML solution which guarantees correct nesting and validation but supports only simple conditions (INCLUDE or IGNORE and no boolean expressions).
 
 

Marked Section and EMPTY Element

 
Variant text blocks are surrounded by marked sections. An EMPTY element carrying the condition attribute is placed directly before the marked section. During editing all marked sections are set to INCLUDE. When a variant is needed the conditions of the EMPTY elements are evaluated and the following marked sections are set to INCLUDE or IGNORE.
 
The EMPTY element is added as an inclusion exception to the root element of the DTD.
 
Example:
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)
                      +(var)              >
<!ELEMENT chapter - O (head, (para)+)     >
<!ELEMENT head    - - (#PCDATA)           >
<!ELEMENT para    - - (#PCDATA)           >
<!ELEMENT var     - O EMPTY               >
<!ATTLIST var     cond   CDATA  #REQUIRED >
]>
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have <VAR COND="COUNTRY=USA"><![ INCLUDE [110]]>
<VAR
COND="COUNTRY=GERMANY"><![ INCLUDE [220]]> voltage.</para>
<VAR COND="SKILL=BEGINNER"><![ INCLUDE [
<para>You can check the voltage with a voltmeter.</para>
]]>
</doc>
 
Pro : Use of marked section provides an SGML solution which guarantees correct nesting and validation. Empty elements can carry complex conditions.
 
Contra : Tool support is needed to validate and build the variants. Most viewers and browsers cannot generate variants on-the-fly, because marked sections are static.
 
 

SGML Feature CONCUR

 
SGML feature CONCUR is an SGML solution which allows overlapping structures but is not supported by tools and makes markup too complex. It would not help either way, because during parsing only one doctype is active — the parser will not get the information concerning the element structure and the application at the same time.
 
 

Attributes and Elements

 
Each existing element which might be a variant border becomes a condition attribute cond for the aspect settings. An additional inline element has to be inserted if variants inside text are needed.
 
Example:
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)           >
<!ELEMENT chapter - O (head, (para)+)      >
<!ELEMENT head    - - (#PCDATA)            >
<!ELEMENT para    - - (#PCDATA | inl-var)* >
<!ATTLIST para    cond  CDATA  #REQUIRED   >
<!ELEMENT inl-var - - (#PCDATA)            >
<!ATTLIST inl-var cond  CDATA  #REQUIRED   >
]>
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have <INL-VAR COND="COUNTRY=USA">110</INL-VAR><INL-VAR 
COND="COUNTRY=GERMANY">220</INL-VAR> voltage.</para>
<para COND="SKILL=BEGINNER">You can check the voltage with a voltmeter.</para>
</doc>
 
Pro : SGML structure and variant structure are harmonized. The concept is easy to implement in tools.
 
Contra : The condition attribute has to be added to nearly all elements. Elements restrict the possible content of the variants. The placement of variant boundaries is restricted to the balanced SGML element boundaries. The added inline element inl-var cannot cover all possible inline models.
 
 

Special Variant Elements

 
Instead of adding a condition attribute to the existing elements new variant elements are added to the DTD. They cover variants on paragraph level and on inline level. Further variant elements might be needed for dedicated content models.
 
Example:
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)               >
<!ELEMENT chapter - O (head, (para | pl-var)+) >
<!ELEMENT head    - - (#PCDATA)                >
<!ELEMENT para    - - (#PCDATA | inl-var)*     >
<!ELEMENT pl-var  - - (para)+                  >
<!ATTLIST pl-var  cond  CDATA  #REQUIRED       >
<!ELEMENT inl-var - - (#PCDATA)                >
<!ATTLIST inl-var cond  CDATA  #REQUIRED       >
]>
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have <INL-VAR COND="COUNTRY=USA">110</INL-VAR><INL-VAR 
COND="COUNTRY=GERMANY">220</INL-VAR> voltage.</para>
<PL—VAR COND="SKILL=BEGINNER">
<para>You can check the voltage with a voltmeter.</para>
</PL-VAR>
</doc>
 
Pro : SGML structure and variant structure are harmonized. The concept is easy to implement in tools.
 
Contra : All possible/needed content models of the variants lead to separate variant elements or — to avoid a big number of variant elements — the models are to loose. The placement of variant boundaries is restricted to the balanced SGML element boundaries.
 
 

ANY Element

 
Only one variant element with content ANY is added as inclusion exception to the root element of the DTD. A condition attribute carries the aspect settings.
 
Example:
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)
                      +(var)             >
<!ELEMENT chapter - O (head, (para)+)    >
<!ELEMENT head    - - (#PCDATA)          >
<!ELEMENT para    - - (#PCDATA)          >
<!ELEMENT var     - - ANY                >
<!ATTLIST var     cond  CDATA  #REQUIRED >
]>
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have <VAR COND="COUNTRY=USA">110</VAR><VAR 
COND="COUNTRY=GERMANY">220</VAR> voltage.</para>
<VAR COND="SKILL=BEGINNER">
<para>You can check the voltage with a voltmeter.</para>
</VAR>
</doc>
 
Pro : SGML structure and variant structure are harmonized. Only one element is needed.
 
Contra : Content model of variant element has to be restricted by tool support — otherwise wrong sub-contents could be created by the writer. The placement of variant boundaries is restricted to the balanced SGML element boundaries.
 
inherited element content
 

New Content Model INHERIT

 
To avoid the specialized tool support for ANY element a new element content model INHERIT is needed. An element with INHERIT content allows only that content as its content, which is allowed by the parent element at the place where the element stands.
 
The variant element is added as inclusion exception to the root element.
 Note:  
 
The content model is not part of the SGML standard ISO 8879:1986. It is just an idea to improve SGML for variant control.
 
Example:
 
<!DOCTYPE doc [
<!ELEMENT doc     - - (chapter+)
                      +(var)             >
<!ELEMENT chapter - O (head, (para)+)    >
<!ELEMENT head    - - (#PCDATA)          >
<!ELEMENT para    - - (#PCDATA)          >
<!ELEMENT var     - - INHERIT            >
<!ATTLIST var     cond  CDATA  #REQUIRED >
]>
<doc>
<chapter>
<head>Hardware Installation</head>
<para>Before you start with the installation of your hardware check
if you have <VAR COND="COUNTRY=USA">110</VAR><VAR
COND="COUNTRY=GERMANY">220</VAR> voltage.</para>
<VAR COND="SKILL=BEGINNER">
<para>You can check the voltage with a voltmeter.</para>
</VAR>
</doc>
 
Pro : SGML structure and variant structure are harmonized. Only one element is needed. No specialized tool support is needed if INHERIT would become part of SGML.
 
Contra : The placement of variant boundaries is restricted to the balanced SGML element boundaries.
 
 

Conclusion

 
What is the conclusion? Variants are needed in various documents. They are controlled by aspect/effectivity conditions which are set by the writers. Because this might become a complicated task writers need tool support. Document management systems and publishing/rendering software (paper, on-line) have to take the variants into account, too.
 
A reasonable SGML solution is needed to get a useful implementation in the tools. But this reasonable solution is not very well supported by SGML — every approach has its disadvantages. Therefore it is necessary to know which solution fits which requirements best.
 
If ISO's SGML committee would introduce the new content model INHERIT some of the problems would be solved.

Beyond DTDs: constraining data content   Table of contents   Indexes   Context-Sensitive Documentation in Industrial Process Plants