Considering schemas   Table of contents   Indexes   Derivation, tolerance and validity

equivalence_class
parameter_entity
 schema 
type
 

XML Schema types and equivalence classes

 reconstructing DTD best practice
 Thompson, Henry S.  
 
 Henry S.  Thompson
  Edinburgh 
 HCRC Language Technology Group  
 Scotland 
HCRC Language Technology Group,  Division of Informatics,  University of Edinburgh,  2 Buccleuch Place
Edinburgh   EH8 9LW Scotland email: ht@cogsci.ed.ac.uk
 Biography
 Henry S. Thompson - Henry S. Thompson is Reader in Artificial Intelligence and Cognitive Science at the University, where he currently holds a World Wide Web Consortium Fellowship. He was a member of the W3C SGML Working Group, and is a member of the W3C XSL and XML Schema Working Groups. He is the author of DSC, the only publicly available implementation of the DSSSL transformation language, and of XED, the first free XML instance editor. He is editor of the Structures part of the XML Schema draft W3C recommendation. Henry S. Thompson has presented many papers and tutorials on SGML, DSSSL, XML, XSL and XML Schemas in both industrial and public settings over the last five years.
 Abstract
 Eve L. Maler and Jeanne El Andaloussi in their book "Developing SGML DTDs" describe a flexible and powerful methodology for DTD design and development which is widely used in a range of application environments, and is generally recognised as constituting 'best practice' in this area. It makes heavy use of parameter entities to define and exploit a class hierarchy of element types.
 XML Schema is a W3C-sponsered effort to define an alternative to DTDs for defining the structure of XML documents, using XML instance syntax. Not surprisingly therefore, it defines element types for declaring elements and attributes.
 Despite an official requirement to at least reproduce the functionality of DTDs, XML Schema none-the-less contains no text macro facility, which might be expected to reproduce the functionality of parameter entities. How then is 'best practice' to be carried forward from DTDs to XML Schemas?
 The answer lies in two powerful mechanisms which XML Schema introduces: user-defined types (distinct from element types, but crucially involved in their declaration) and element equivalence classes. This paper describes in detail XML Schema's concepts of complex type, type definition by derivation and element equivalence class, shows how they relate to one another, and illustrates their use to define type hierarchies and element class hierarchies without recourse to parameter entities.
 

Introduction

 Eve L. Maler and Jeanne El Andaloussi in their book "Developing SGML DTDs" describe a flexible and powerful methodology for DTD design and development which is widely used in a range of application environments, and is generally recognised as constituting 'best practice' in this area. It makes heavy use of parameter entities to define and exploit a class hierarchy of element types. Our understanding of formal language design has progressed since SGML was born, and text substitution macros, which is what parameter entities are, have come to be recognised as a less-than-ideal mechanism for enabling re-use and sharing of structure in formal definitions. Accordingly in the design of the XML Schema document type definition language , no text-substitution macro mechanism is supplied, but rather explicit provision is made for a hierarchically-structured approach to the definition of document types and their component parts. This paper explores the utility of these mechanisms in reconstructing best practice in structured DTD design without recourse to text substitution.
 

Document structure definition in XML Schema

 The XML Schema language distinguishes between element declarations and type definitions. A type definition is a collection of constraints on the names and forms of attributes and children which an element may have, for example:
 
<xs:complexType name="meeting">
<xs:all>
<xs:element ref="venue"/>
<xs:element ref="organiser"/>
<xs:element ref="participants"/>
</xs:all>
<xs:attribute name="when" type="xs:date" use="required"/>
</xs:complexType>
 Note:
 The above example uses the xs prefix for names taken from XML Schema, as do all the subsequent XML Schema examples, without providing a namespace declaration. In practice, of course, any prefix could be used (or none), given the correct namespace declaration, which as of this writing should use http://www.w3.org/1999/XMLSchema as the namespace URI.
 An element declaration associates a tag with a type definition, thereby requiring elements in instances with that tag to conform to the definition, for example:
 
<xs:element name="appointment" type="meeting"/>
 Taken together the above pair of definition and declaration is similar in effect to the following SGML declarations:
 
<!ELEMENT appointment (venue & organiser & participants)>
<!ATTLIST meeting when CDATA #REQUIRED>
 XML Schema differs from SGML or XML DTDs in separating type definitions from element declarations, thereby providing a mechanism for re-use which in DTDs would have necessarily involved parameter entities.
 

Structuring the type space: type definition derivation in XML Schema

 We observe that many uses of parameter entities in well-structured DTDs imply some sort of family relationship between element types. Whenever a parameter entity is used, for instance, to add one or more attribute declarations to an ATTLIST , every instance of an element type with such an ATTLIST is a member of a larger set, that is, the set of all elements which have, or may have, the relevant attributes.
 XML Schema makes explicit provision for defining types which specifically contain exactly what such families have in common, and then allowing other type definitions to be derived from the shared core. Consider the following fragment from the XHTML strict DTD:
 
<!ENTITY % cellhalign
"align      (left|center|right|justify|char) #IMPLIED
char       %Character;    #IMPLIED
charoff    %Length;       #IMPLIED"
>
 This defines the cellhalign parameter entity with text for three attribute declarations. Virtually all the table-related element types then use this parameter entity, for example:
 
<!ATTLIST thead
%attrs;
%cellhalign;
%cellvalign;
>
 In fact all the element types in XHTML which allow the cellhalign attributes allow the others referenced here as well, so in an XML Schema schema for XHTML, we would probably want to provide an abstract type definition with all these attributes, for example:
 
<xs:complexType name="tabular" content="empty">
<xs:attribute name="align" type="halignPos"/>
<xs:attribute name="char" type="Character"/>
. . .
</xs:complexType>
 Then all the relevant elements would have type definitions derived from this one, for example:
 
<xs:complexType name="tableBlock" content="elementOnly"
base="tabular" derivedBy="extension">
<xs:element ref="tr" minOccurs="1" maxOccurs="unbounded"/>
</xs:complexType>

<xs:element name="thead" type="tableBlock"/>
<xs:element name="tbody" type="tableBlock"/>
<xs:element name="tfoot" type="tableBlock"/>
 Several new bits of XML Schema syntax are introduced above. The type definition for tableBlock identifies itself as derived from that for tabular using the base attribute, and furthermore specifies that its relation to that definition is one of extension . What this means is that the attributes allowed and content model enforced by the derived definition is the union and concatenation respectively of those specified explicitly and those 'inherited' from the base definition. In the example above, this has the desired effect, in that theadtbody and tfoot are all declared with reference to a type definition which by the definition of derivation by extension allows all the tabular attributes as well as having an appropriate list-of-rows content model.
 Equally important to the economy and transparency of thesyntactic aspects of this approach to re-use, there is a parallel gain insemantic transparency: It is now manifest that the similarity of the content models and attribute inventory of the three elements is not accidental. The most straightforward approach to changing one of them would change them all, which is probably what is wanted. Applications which find it appropriate can treat them all similarly, by dealing with them at the level of type definition. To facilitate this, XML Schema-compliant processors must record the identity of the type definition used in schema-validating every element and attribute in instance documents.
 

Type definition derivation details: extension

 A complex type definition (one constraining an element's content and attribute inventory) may be derived by extension from another type definition, called thebase type definition. If the base is a simple type definition (constraining text content), then the only allowed extension is to add attribute declarations. If the base is itself a complex type definition, then not only may attribute declarations be added, but also the base's content model may be extended, if it is allows this.
 It follows that an important relationship holds between the members of a type defined by extension (that is, the element instances which satisfy its definition): every member of a type defined by extension contains within it a member of its base, where in version 1 of XML Schema we understand 'within' to mean 'subset' for attributes and 'prefix' for content model.
 Consider the following two type definitions:
 
<xs:complexType name='name'>
<xs:element name='title'
minOccurs='0'/>
<xs:element name='forename'
minOccurs='0'
maxOccurs='unbounded'/>
<xs:element name='surname'/>
</xs:complexType>

<xs:complexType name='fullName'
base='name'
derivedBy='extension'>
<xs:element name='suffix'
minOccurs='0'/>
</xs:complexType>
 Now consider members of the two types defined above:
 
<...>
<foreName>George</foreName>
<foreName>W</foreName>
<surname>Bush</surname>
</...>

<...>
<foreName>Albert</foreName>
<surname>Gore</surname>
<suffix>Jr.</suffix>
</...>
 The second, a member of the derived type, contains as a prefix a member of the base type.
 

Type definition derivation details: restriction

 A complex type definition (one constraining an element's content and attribute inventory) may also be derived byrestriction from its base type definition, which must be a complex type definition. Restriction amounts to closing down flexibility allowed in the base definition:
 
  • Eliminating optional attributes;
  •  
  • Removing members of choice groups;
  •  
  • Reducing allowed occurrence ranges on content model particles (perhaps all the way to elimination, if minOccurs is 0 in the base);
  •  
  • Restricting the type definitions of attributes or content.
  •  A different important relationship holds between the members of a type defined by restriction.: every member of a type defined by restriction is necessarily also a member of its base.
     Simple types may also be defined by restricting other simple type definitions, for instance be reducing the membership of an enumerated type or narrowing a value range.
     Consider the following three type definitions, a simplified version of definitions from the schema for schemas:
     
    <xs:complexType name="group">
    <xs:element ref="particle" minOccurs="0" maxOccurs="unbounded"/>
    <xs:attribute name="name" use="optional" type="xs:NCName"/>
    <xs:attribute name="ref" use="optional" type="xs:QName"/>
    </xs:complexType>
    
    <xs:complexType name="topLevelGroup" base="group" derivedBy="restriction">
    <xs:element ref="particle" minOccurs="1" maxOccurs="unbounded"/>
    <xs:attribute name="name" use="required" type="xs:NCName"/>
    <xs:attribute name="ref" use="prohibited" type="xs:QName"/>
    </xs:complexType>
    
    <xs:complexType name="refToGroup" base="group" derivedBy="restriction">
    <xs:element ref="particle" minOccurs="0" maxOccurs="0"/>
    <xs:attribute name="name" use="prohibited" type="xs:NCName"/>
    <xs:attribute name="ref" use="required" type="xs:QName"/>
    </xs:complexType>
    
     The first definition defines a group as having optional name and ref attributes and any number of <particle> as content. The second restricts this for use at the top level, to define a group, in which case the name and at least one <particle> are required, while the ref is prohibited. For use within content models, the third restricts in the other direction, requiring a ref (to a top-level defined group, by name and namespace), and forbidding either name or content. It should be clear that any member of either of the two derived types is a member of the more general base.
     

    Element equivalence classes

     The mechanism of type definition derivation described above allows XML Schema authors to reconstruct usages of parameter entities which reflect commonality of structure:
     
  • Elements with the same structure can be declared using the same type definition;
  •  
  • Elements with the similar structure can be declared with one using a type definition derived from that the other is declared to have, or both can be declared with definitions derived from a common base.
  • But elements may also be related because they appear in the same context. The following (slightly simplified) extracts from the XML specification DTD illustrates how this is annotated and exploited in the Maler and El Andaloussi style:
     
    <!ENTITY % local.list.class     "">
    <!ENTITY % list.class           "ulist|olist|slist|glist
    %local.list.class;">
    . . .
    <!ELEMENT div1 (head, (. . .|%list.class;|. . .)*, div2*)>
    
     References to %list.class; appear elsewhere in other content models in the DTD as well, and no member of the class appears anywhere _else_ on its own. XML Schema provides for reflecting this kind of element commonality using the notion of (asymmetric) equivalence class: any top-level element declaration can nominate another top-level declaration as one it is equivalent to. The set of all declarations which identify (perhaps via several steps) another declaration as their equivalence class (using the equivClass attribute) form itsequivalence class . Whereever it appears in content models, isntances may contain not only it, but also any member of its equivalence class. One possible XML Schema reconstruction of the above example would look like this:
     
    <xs:element name="div1">
    <xs:complexType>
    <xs:element ref="head"/>
    <xs:choice minOccurs="0" maxOccurs="unbounded">
    . . .
    <xs:element ref="list"/>
    . . .
    </xs:choice>
    <xs:element ref="div2" minOccurs="0" maxOccurs="unbounded"/>
    </xs:complexType>
    </xs:element>
    
    <xs:element name="list" abstract="true" type="listType"/>
    
    <xs:element name="slist" equivClass="list" type="simpleListType"/>
    
    <xs:element name="flaggedList" abstract="true" equivClass="list"/>
    
    <xs:element name="ulist" equivClass="flaggedList" type="bulletedListType"/>
    
    <xs:element name="olist" equivClass="flaggedList" type="enumeratedListType"/>
    
    <xs:element name="glist" equivClass="list" type="glossaryListType"/>
    
     Via one or two steps, all four of glistolistslist and ulist are declared as part of list 's equivalence class, so all of them may occur within <div1> in the indicated place. list itself and flaggedList are declared as abstract , meaning they can't themselves actually appear in documents -- they are included in the schema simply to provide a potentially useful layer of structuring.
     Two further aspects of this design disserve mention. No special provision for subsequent extension of the membership of the classes involved is required (this is what the local.list.class entity is for in the original DTD). Another XML Schema document which includes one with the above definitions by reference (by one of several mechanisms provided for modular and/or multi-namespace schema specification, see ) can add its own elements to one or the other class simply by referring to them via the equivClass attribute in its own declarations. Also, in order to enforce a degree of coherency, XML Schema does require that the type definition of elements declared as equivalent to others must be derived from its type definition. In the above example, this means that, for instance, enumeratedListType would have to be derived from listType (the type definition of flaggedList , by default).
     

    Conclusion

     The above examples have introduced two distinct but related mechanisms which the XML Schema language provides for reconstructing some common uses of parameter entities in structured DTD design. By bringing these constructs inside the language, rather than relegating them to the status of conventions for the use of text substitution, XML Schema has endorsed and facilitated an approach to structured document type definition of recognised power and generality.
     Bibliography
     
    MA 1996 Maler, Eve L. and Jeanne El Andaloussi, 1996.Developing SGML DTDs , Prentice Hall PTR, New Jersey, USA. ISBN 0-13-309881-8.
     
    MA 1999 Maler, Eve L., 1999.XML specification DTD , W3C, Cambridge, MA, USA. Available online as http://www.w3.org/XML/1998/06/xmlspec-19990429.dtd .
     
    XHTML 2000 Stephen Pemberton et al., 2000.XHTML™ 1.0: The Extensible HyperText Markup Language , W3C, Cambridge, MA, USA. Strict DTD also available as http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd .
     
    TH 2000 Thompson, Henry S., David Beech, Murray Maloney and Noah Mendelsohn, eds, 2000.XML Schema part 1: Structures , W3C, Cambridge, MA, USA. Also available as http://www.w3.org/TR/xmlschema-1 .

    Considering schemas   Table of contents   Indexes   Derivation, tolerance and validity