Infoloom
Semantic Integration Technologies
Michel Biezunski
Brooklyn, New York
mb@infoloom.com

How to read the XTM Document Type Definition?
What is an XTM document?
What is a Document Type Definition (DTD)?
Elements
Content Model
Containment rules
Attributes
XML Features
A Guide to the XTM Syntax
The XTM DTD
The XTM DTD (annotated)
XTM Code examples

What is an XTM document?

An XTM document is an XML document that describes a topic map and its properties.

What is a Document Type Definition (DTD)?

The XTM DTD expresses the list of components (elements) that are allowed within an XTM document, what they are made of (content model), how they are connected together (containment), and their internal properties (attributes).

19 element types are listed as part of the XTM DTD: their names are topicMap, topic, instanceOf, subjectIdentity, topicRef, subjectIndicatorRef, baseName, baseNameString, variant, variantName, parameters, occurrence, resourceRef, resourceData, association, member, roleSpec, scope and mergeMap.

Elements

Elements are building blocks for an XML document. The types to which they belong are declared in the DTD as "element declarations" and appear in documents as tags, with angle brackets.

For example, here is an element declaration:

<!ELEMENT baseNameString (#PCDATA)>

The keyword #PCDATA ("Parsed Character Data") is the conventional way to indicate that the content of this element is a string of characters. Thus the XTM document may contain tags such as :

<baseNameString>New York

Content Model

The content model of an element declaration describes what the element is made of. In the XTM DTD, there are three possible cases:

  1. The element contains string of characters.
  2. The element is empty.
  3. The element contains other elements.

Cases # 1 and 2 are indicated by reserved keywords: In the case #1, (#PCDATA), which must always be typed in upper case and must occur between parentheses, indicates that the element must contains characters (possibly none). In the case #2, EMPTY which must always be typed in upper case, indicates that the element does not contain anything. The useful information in this element must be provided by its attribute values (see below).

Case # 3 is expressed with a grammar for containment rules (see below).

Containment rules

The group of element types contained in an element appears within parentheses. Groups can be nested within a group.

If an element of a given type is allowed to contain more than one element type, then they constitute a list, and they are separated by "sequence indicators". There are two kinds of sequence indicators used in the XTM DTD:

  1. Sequence, represented by a comma (,)
  2. Or, indicated by a vertical bar (|)

For example, the following element type declaration:

<!ELEMENT instanceOf
   (topicRef | subjectIndicatorRef) >

indicates that an element instanceOf can include either a topicRef, or a subjectIndicatorRef element.

The DTD also contains ways to indicate that an element or group of elements can occur once only, one or more times, optionally, or any number of times including zero.

The notation used to describe these properties is as follows:

  • If the element type name is immediately followed by a question mark, it is optional.
  • If the element type name is immediately followed by a plus sign (+), it must occur at least once, and may occur more.
  • If the element type name is immediately followed by a star sign (*), it can occur any number of times including zero.
  • If the element type name is not followed by any of the following symbols: "?", "+", or "*", then it must occur once and once only.

For example, the following element declaration:

<!ELEMENT baseName
            (scope?, baseNameString, variant*)>

indicates that a baseName element may contain a scope element (it is optional because of the question mark), a baseNameString element (which must be present no matter what, and there can be only one such element, because of the absence of indicator), and it may or may not contain any number of variant elements (thanks to the star sign).

In the following example, the content model is expressed as an "OR" group, to which an occurrence indicator is applied:

<!ELEMENT scope
   (topicRef | 
    resourceRef | 
    subjectIndicatorRef)+
>

This element declaration for scope reads: A scope element contains any number, but at least one, of elements which can be of any of the three types listed, repeated and/or in any order: topicRef, resourceRef and subjectIndicatorRef.

Attributes

Attributes are properties that serve to further differentiate elements. Attributes are declared within an attribute list introduced by the expression <!ATTLIST. Attributes refer to a given element, and are declared, in the XTM DTD, immediately after the element declaration (this is not a requirement, and certain DTDs prefer to declare the attribute values in a different location.

An ATTLIST declaration is made of declarations of individual attributes. Each individual attribute declaration is a set of three fields: the attribute name, a keyword which expresses the type of attribute it is, and a default declaration.

The attribute names used in the XTM DTD are: xmlns, xmlns:xlink, xml:base, id, xlink:type, and xlink:href.

Three attribute types are used in the XTM DTD: ID, CDATA, NMTOKEN

  • ID means that the attribute is an identifier that has to be unique throughout the XTM document. In other words, no two elements can have the same value for their identifier attribute.
  • CDATA means that the attribute is a string of characters.
  • NMTOKEN means that the attribute is also a string of characters, among a particular subset defined within the XML standard. For more details, see the XML specification.

The attribute default declaration indicates whether the attribute is required or not, fixed or not. The keyword #REQUIRED indicates that an attribute value must be provided in the document. The keyword #FIXED followed by a value indicates that this attribute is pre-defined for all documents, and may not be redefined in a document. The keyword #IMPLIED means that a value may be provided in a document, but that the application knows how to handle the case where no attribute is provided, in other words, that this attribute is optional.

XML Features

  • The xmlns attribute is defined in the XML Namespaces Recommendation.
  • The xmlns:xlink attribute is used to say that the xlink namespace is being used within the XTM document type definition.
  • The xml:base attribute is defined by the World Wide Consortium's recommendation called XML Base (27 June 2001) and "may be inserted in XML documents to specify a base URI other than the base URI of the document or external entity."
  • The xlink:type, defined by the Xlink recommendation, is fixed for the XTM DTD and indicates that all links used are of the simple xlink type. This amounts basically to the equivalent of the href attribute used in HTML to point directly to an address which is generally a URL (Uniform Resource Locator) as defined in the IETF recommendation.
  • The xlink:href attribute contains the address (URL) which is the target of the link. This attribute plays the same role as the href attribute used in HTML to express the target of a link with an a element.













Semantic Integration
Technologies
Consulting
Production Services

Customers
Customers' Products

Partners
Mailing lists
Organizations

Presentations

Bio
Contact me

© 2005, Michel Biezunski