Modeling Relational Data in XML   Table of contents   Indexes   Integration and Interpretation of XML Schemas

 Burkett, William 
 Long Beach 
 Product Data Integration Technologies, Inc. 
 USA 
 
William C. Burkett
 Senior Information Engineer
Product Data Integration Technologies, Inc.
  100 W Broadway, Suite 540 Long Beach (California)  USA (90802)
Email: wburkett@pdit.com Web site:www.pdit.com
 Biography
 William Burkett has over 15 years of experience as an industrial and systems engineer specializing in system analysis and data modeling, information system integration, and product data exchange (PDE) technologies. Prior to joining P.D.I.T., he worked for McDonnell-Douglas and Lockheed on PDE technology and standards development programs. Mr. Burkett was an active participant in the development of the STandard for the Exchange of Product Model Data (STEP - ISO 10303, TC184/SC4) since its inception in 1984. More recently, he has been apply PDE principles to the design of XML standards for the integration of Defense legacy systems and the deployment of product catalogs
 

The Search for Meaning on the World Wide Web

 In order to build a "Semantic Web" [2] where automated agents effectively search the internet and find the right information that a user is looking for, a transition is necessary in the way that web resources are encoded. It is well-recognized that HTML is simply an easy-to-use presentation format for arranging and publishing content on the Internet for the visual consumption of human interpreters. This was the purpose of HTML - the underlying objective that guided the design of the HTML vocabulary and structure. As such, the data encoded in HTML is intended for a human audience, not an automated processing by "intelligent" applications.
 The problems that this paradigm has forced on the web is evident in the mechanisms that directories and search engines must resort to in order to organize web content and make it easier for members of the Internet community to find the information they are looking for. Mechanisms are reduced to
 
  •  ASCII text-based searches
  •  Manual cataloging of web resources in human-designed taxonomies.
 In other words, there are no mechanisms or clues in the data by which applications and automated agents could recognize and "understand" the content of web resources. The data didn't have the physical markers needed to figure out what all the character data in a resource "meant". The eXtensible Markup Language (XML) was created to provide those markers, those "clues" in the data that would enable or help applications "understand" what the data was about.
 Data models have been used since the 1960's to design, document, deploy, and manage data structures in information systems. Data models play two major roles with respect to the design and implementation of information systems:
 
  1.  1) Specify the structure of the data;
  2.  1) Specify the meaning of the data;
 XML (and HTML, both SGML derivatives) offers a formal approach for providing the former capability (within the domain of ASCII character encoding.) However, XML is weak with respect to the latter capability - semantics - because it originated in a narrow application domain: a document publishing/text processing. The anticipated "meanings" that XML/SGML are intended to represent were the components of published works, not application data; therefore, many features of the language were directly aimed at handling text strings rather than, for example, specifying data integrity constraints. In other words, XML/SGML was not designed to meet the same "meaning requirements" as those addressed by data modelling languages; it is ill-prepared to meet and handle the diversity of data and information requirements that users will expect of it.
 Although far from perfect themselves, logical data models are better suited to the role of specifying the meaning of web resource content. Data modelling has a long history of theory and practice that can be directly contributed to the specification of content schemas and deployment of meaningful data on the web. The PDML program (Product Data Markup Language) provides an example of the use of data models in this role; PDML used a data modelling language called EXPRESS to specify the content of specialized web resources for exchanging data between DoD product data systems. The EXPRESS schemas were then mapped to XML DTD's and these DTD's governed the XML documents used to exchange data. The purpose of this paper is to present and explain how EXPRESS was used as (Web Resource) Content Specification Language.
 Note: The "Content Schema" shall be used to refer to the formal specification of contents of a web resource rather than "XML schema" [1, 3] to avoid confusion with the W3C XML Schema work. A Content Schema is a specification of the elements, structure, and meaning of a web resource that defines how automated agents are to interpret the content(s) of the resource. "Content Schema" shall also subsume what is popularly known as an "XML Vocabulary".
 "Web resource" is used in the RDF sense [10] to denote an identifiable object on the World Wide Web, such as a web page.
 

Product Data Markup Language - PDML

 PDML is a suite of XML vocabularies (i.e., Content Schemas) and usage structure that was developed to integrate Department of Defense legacy data systems over the Internet. The structural architecture of PDML is analogous to the "star-satellite" structure of the client-server model. PDML is composed of the following components:
 
  •  A collection of Application Transaction Sets;
  •  · The Integration Schema;
  •  · Mapping specification between the Application Transaction Sets and the Integration Schema.
 The relationship between these components is illustrated in Figure 1. As PDML grows, additional transaction sets will be added to the specification.
 
 Figure 1 - Product Data Markup Language
 Recognizing that as a data specification language, XML DTD syntax is rather impoverished with respect to semantic features (e.g., datatypes), PDML chose the EXPRESS language [9] to formally specify the semantic content of the Application Transaction Sets and the Integration Schema. Because it was defined within an industrial data management environment, EXPRESS has all the semantic features and integrity constraint mechanisms for specifying unambiguous (or much less ambiguous) content for XML documents. Also, PDML approached the problem of XML vocabulary development from the "data management" perspective rather than the "document" perspective, so EXPRESS was more suited to PDML objectives by the nature of its design.
 The EXPRESS Schema was the master specification for the content of the Application Transaction Sets. In order to bind the EXPRESS schema to a DTD, a conversion algorithm was written mapping EXPRESS to an XML DTD. The algorithm was implemented in a small software tool that produced an XML DTD from the EXPRESS schema, and the DTD was then used as the specification for creating and validating Application Transaction Set XML Documents. See Figure 2.
 
 Figure 2 - Relationship of EXPRESS Schemas to XML Documents
 

Data Models, XML, and Web Resources

 In order to understand the relevance of data models and data management practices to the evolution of the World Wide Web, the nature and purposes of both data models and XML must be presented, compared, and contrasted. In particular, the primary feature that is common between them - the semantics of data - must be examined.
 

Data models

 In "Web Architecture: Describing and Exchanging Data" [2], Berners-Lee discusses the role of data models in the creation of the "Semantic Web". Swick and Thompson furthered this discussion of data models in the "Cambridge Communique" [13]. However, their use of the term "data model" differs slightly from that used in this paper. The term "data model" is variously defined as:
 
  •  1) "a specification of the data structures and business rules needed to support a business area." [4]
  •  1) "a set of concepts that can be used to describe the structure of a database…Most include a set of operations for specifying retrievals and updates on the database." [7]
  •  1) A conceptual representation of data consisting of a collection of logical concepts (e.g., objects, their properties and interrelationships) which hides/omits storage details to make the data easier to understand by users. [7]
  •  1) "a collection of data structure types…operators or rules of inference…[and] general integrity rules." [14]
 In other words, the term "data model" is used in two primary senses:
 
  •  The abstract concepts and their interrelationships that describe how data is stored a particular class of databases, e.g., the Relational Data model.
  •  A collection of concept types, relationships and rules that describe the data in a particular database, e.g., a relational database schema.
 The "Cambridge Communique" [13] uses "data model" in the former sense. In this paper, the latter sense is used since this definition ostensibly reflects the objectives of XML and Content Schemas: to specify the meaning and structure of the content of web resources. The important feature this use of data models is that they hide the internal, storage-dependent aspects of the data and concentrate on the information that is known by and available to the users.
 There are many different kinds of data modelling languages, each of which has its strengths and shortcomings. What they all share is a structure which
 
  •  can be represented by a finite digraph,
  •  consists of named concepts (nodes) and relationships between the concepts,
  •  the named concepts (may) contain properties.
 This description can also be applied to XML.
 

The role of XML on the World Wide Web

 The hope of XML is to bring meaning to the web, provide a mechanism for organizing and categorizing the huge cacophony of content deployed on the web, and make the content more useful for users of the web. XML itself, however, provides virtually no semantics; XML is simply a data encoding mechanism. The semantics of XML documents are ostensibly specified in Content Schemas (that can assume a variety of specification formats, such as DTD's, DCD, XML Schema or XML Data) and - if one is lucky - natural language definitions of the elements in the schema.
 A very important aspect of the role of XML in "bringing meaning to the web" is that this "meaning" is not primarily intended for humans, but rather for applications and automated agents that access and exchange the data. The web already has a format intended for human consumption: HTML. What this means to the design of Content Schemas is that the schema must be well-structured enough and semantically clear enough for the creators of these applications and agents to write code against.
 Which is exactly what data models and data modelling languages are intended to provide! The question, then, is whether data models offer any significant advantages over current languages for specifying Content Schemas (e.g., DTD's, XML Schema) in fulfilling the role envisioned for XML. This question is explored in below after a brief introduction to EXPRESS.
 

EXPRESS

 Despite the fact that the name of the language is written in upper case letters, EXPRESS is not an acronym. It is an ISO standard (ISO 10303-11 [9]) and a self-described "information modelling language" developed to specify the semantics of industrial product data for the purpose of exchanging and sharing data between and among industrial product data systems. It is a synthesis of features of the Entity-Relationship and Object-Oriented data modelling approaches and, in its lexical form, bears a strong resemblance to Pascal record declarations.
 While this presentation cannot provide a comprehensive tutorial on the EXPRESS language, an explanation of the following major concepts will provide a valuable introduction to the language:
 
  •  Entity
  •  Schema
  •  Attribute
  •  Data type
  •  Constraints
 

Entities

 The fundamental construct of the EXPRESS language is the entity. An entity is the representation of a concept-of-significance within an application domain. It specifies the name, the properties, and the meaning of the domain concept and the data instances governed by it. The following is an example entity (data type) declaration in EXPRESS:
 
    ENTITY product;
    
 
name  :  STRING;
    
 
identifier : STRING;
    
 
description : STRING;
    
 
END_ENTITY;

 There is also a graphical version of EXPRESS called EXPRESS-G; the graphical equivalent of the declaration above is:
 
 Figure 3 - EXPRESS-G entity declaration
 An entity is comprised of properties called attributes. The product entity above has three attributes: name, identifier, and description.
 Aninstance of an entity is an identifiable member of a data population that conforms to the entity data type declaration. To conform to the declaration, the must have identity, a type, and values for each of the declared attributes. See Figure 4.
 
 Figure 4 - Entity declaration and entity instance
 

Schemas

 A schema is a collection of EXPRESS declarations that establishes a bounded scope for the declarations and may be considered as a "container" for declarations like entities. Schemas cannot be nested.
 A schemagoverns the structure and meaning of the instances in a given data population.
 The following is an example of a schema declaration containing some entity declarations:
 
SCHEMA product_definition_schema;
 
ENTITY product;
 
id                 : identifier;
 
name               : label;
 
description        : text;
 
frame_of_reference : SET [1:?] OF product_context;
 
UNIQUE
 
UR1: id;
 
END_ENTITY;
 
ENTITY product_category;
 
name        : label;
 
description : OPTIONAL text;
 
END_ENTITY;
 
ENTITY product_related_product_category
 
SUBTYPE OF (product_category);
 
products : SET [1:?] OF product;
 
END_ENTITY;
 
-
 
-
 
END_SCHEMA;
 

Attributes and Data Types

 Attributes are named properties of an entity. An attribute consists of a role name and a data type. In the following entity declaration…
 
ENTITY product;
name : STRING;
identifier : STRING;
description : STRING;
END_ENTITY;
 …"name", "identifier", and "description" are attributes, each of which has a data type of "STRING". The role name describes the relationship of a datatype value to the entity. The datatype is the name of a domain from which values in instances are drawn. "String" is one of several simple data types defined in EXPRESS; others include integer, real, and boolean.
 Relationships between entities are established by using an entity as the data type of an attribute rather than a simple type. For example, the statement "person owns a car" is modelled with the following EXPRESS declarations:
 
ENTITY person;
 
name : STRING;
 
owns : car;
 
END_ENTITY;
 

ENTITY car;
 
year : INTEGER;
 
make : STRING;
 
model : STRING;
 
END_ENTITY;
 The equivalent declarations in EXPRESS-G are:
 
 Figure 5 - Relationship between entities
 The attribute datatypes may be specified as an aggregate, which changes the cardinality of the relationship from exactly one to one-or-more (for example). The follow declaration:
 
ENTITY person;
 
name : STRING;
 
owns : SET [1:?] of car;
 
END_ENTITY;
 …states that a "person owns 1 or more cars". Note that the inverse cardinality of this relationship is zero or more: "a car may be owned by zero, one, or many persons".
 

Constraints

 There are two principal constraint mechanism in EXPRESS: Local Rules and Global Rules. Local Rules are part of an entity declaration and specify constraints applicable to each instance of an entity. For example:
 
ENTITY time;
 
hour   : INTEGER;
 
minute : INTEGER;
 
second : INTEGER;
 
WHERE
   WR1: hour < 24;
 
WR2: minutes < 60;
 
WR3: second < 60;
 
END_ENTITY;
 The three "where" rules of the time entity declaration specify constraints on the permissible values of the attributes of an instance of time.
 Global Rules are declarations within a schema (peer-level with entities) that constrain existence, relationships, and values of and among entity instances. For example, without using the aggregate bound specifications, a Global Rule could be used to specify that a person must own 3 or more cars:
 
RULE owns_three_cars FOR (person);
 
LOCAL
    num_cars : INTEGER;
 
END_LOCAL;
  num_cars := SIZEOF(person.owns);
 
WHERE
    num_cars >= 3;
END_RULE;
 Additional information and tutorials about the EXPRESS language can be found at http://www.epmtech.jotne.com/learn
 

Functional comparison of data models and XML

 The principle requirements of data models and data modelling languages (derived from the definitions above) are to specify
 
  •  the structure of data in a physical encoding;
  •  the constraints necessary to ensure data integrity;
  •  the meaning (semantics) of data
 In the following comparison of features, the EXPRESS language will be used as an example of a data modelling language. EXPRESS is richer in features than other data modelling languages, but shares the same graph-based structuring common to all data model languages.
 

Structure

 The first objective - the structure of data - is the entire raison d'être for XML. XML provides a formal structuring syntax that is well-understood by the web development community (due to exposure to HTML). The primary structural mechanism of XML is containment - the hierarchical nesting of elements within elements.
 If one looks at the evolution of data model structuring paradigms:
 
  •  Flat file
  •  Hierarchial
  •  Network
  •  Relational
  •  Object
 It is evident that the progress is toward graph-structured representations of data. The reason for this is simple: although the simple structuring approaches (e.g., hierarchies) are easy to understand and process, they are far too limiting and subject to errors. Graphs structures permit the reuse of data objects through common reference to the shared object. The graph structure is reflected in EXPRESS in the "pointing" relationship between one entity and another.
 As the design of web resources evolve to accommodate applications and automated agents, there is every reason to believe that they, too, will evolve along this same cline. Thus, the underused ID/IDREF feature of XML takes on a new significance and importance in XML documents.
 

Integrity Constraints

 The strength of modern data modelling languages such as EXPRESS is in the specification of data integrity constraints. Other than constraints imposed by structure and by cardinality operators, XML is essentially bereft of integrity constraints. (Constraints can be included as metadata in XML, but mechanisms are not inherent in the language.)
 It should be noted data models are exactly the same as XML DTD's with respect to the data instances (e.g., XML documents) in that any integrity constraints must be enforced by the applications that produce and consume the data - the data itself does not contain the constraints. The difference is that data modelling languages such as EXPRESS provide formal constraint specification features as part of the language.
 

Semantics

 Given the objective of XML bringing meaning to the web, the most important comparison between data modelling languages and XML is in the ability to specify the semantics of the structured data. It is ironic that despite the importance of semantics and XML, there is virtually no investigation into or definition of semantics from the point of view of linguistics, cognitive psychology, or epistemology. Even investigations into the semantics of XML with respect to legal contracts [12] - an extremely important topic in e-commerce development - fails to examine the slipperiness of semantics from the natural language standpoint.
 Linguistically, semantics is defined a:
 
  1.  "The various phenomena pertaining to the meaning of words and sentences; the study of meaning in human language." [11]
  2.  "…the study of meaning expressed by elements of language or combination thereof." [6]
 The meaning of these definitions, of course, hinges upon the meaning of the term "meaning": 1)
 
  1.  "The meaning of a sign is … the concepts it evokes to the users of the [language] system…" [11]
  2.  "…something that is signified, something that one wishes to convey, especially by language." [6]
 This paper is not the proper place for a complete explanation of the relevance and relationship of linguistic theory to data models and XML. However, the important aspect of semantics and meaning highlighted in the above definitions and relevant to this discussion is that meaning always pertains to the human mind. This leads to the potentially controversial assertion that data - XML or otherwise - has no meaning unless it is interpreted by a human. "Interpretation" by applications or agents is simply an indirect interpretation of the programmer; applications/agents only process data - they don't "understand" or "interpret" it.
 Therefore, the effectiveness of a Content Schema as the specification of meaning of an XML document depends upon how well it evokes the same interpretations in different readers - and thereby its correct use by programmers. The ability of the Content Schema to evoke the same interpretations depends on
 
  1.  1) The features of the Content Schema Language;
  2.  The practices and conventions used in designing the Content Schema, addressing design characteristics as:
     
    1.  the clarity and precision of the scope of application of the content schema;
    2.  the clarity and precision of the definitions and overall presentation/documentation.
 Because of the interplay of these factors, data models are not inherently better than conventional Content Schema languages in the specification of meaning, but data models do provide more objective features with which to specify the meaning. For example, data modelling languages such as EXPRESS offer
 
  •  Stronger data typing
  •  Stronger integrity constraints
  •  More flexible data structuring and reuse/shared use of constructs
  •  Complete omission of semantically-irrelevant lexical mechanisms (e.g., minimization, parameter entities)
 All of which supports the argument that data models are better than XML DTDs at specifying the semantics of data. However, the Devil's Advocate must point out that a surfeit of semantic features does not, in itself, mean that a data modelling language is better than conventional content specification approaches.
 Good practices and conventions can overcome any shortcomings within a language; an addition, it could easily be argued that too many features are an impediment to designing good Content Schemas.
 

Mapping of EXPRESS to XML DTD syntax

 Because both EXPRESS and XML DTD's are both comprised of named, primary objects, the initial mapping of EXPRESS entities to XML elements is almost a no-brainer. For example, the EXPRESS declaration:
 
ENTITY product;
    
 
name  :  STRING;
    
 
identifier : STRING;
    
 
description : STRING;
    
 
END_ENTITY;
 …is converted to the following XML declaration:
 
<!ELEMENT product (product.name, product.identifier, product.description)>
 
<!ELEMENT product.name (#PCDATA)>
 
<!ELEMENT product.identifier (#PCDATA)>
 
<!ELEMENT product.description (#PCDATA)>
 …and XML:
 
<product>
<product.name>printer</product.name>
 
<product.identifier>PS775</product.identifier>
 
<product.description>color inkjet</product.description>
 
</product>
 The more challenging questions arise in mapping details and have to do with semantic subtleties of the languages. For example, this particular mapping is just one way of converting the EXPRESS to XML declarations. This particular approach is called an early binding approach because the resulting DTD uses the terminology and structure of a particular EXPRESS schema. The alternative is a late binding approach in which the concepts of the EXPRESS language itself are mapped to XML:
 
<!ELEMENT entity (attribute+)>
 
<ATTLIST entity
    name CDATA #REQUIRED>
 
<!ELEMENT attribute (…)>
 In the following discussion, an early binding will be assumed. A complete EXPRESS schema and corresponding DTD based on an early binding mapping can be found at http://www.pdit.com/pdml/EXP2DTD.txt
 There are a number of particular aspects of the mapping between EXPRESS and XML DTD's that require some discussion.
 This include:
 
  •  Schemas
  •  Attributes
  •  References
  •  Conformance
  •  Mapping conventions
 

Schemas

 Schemas actually map very nicely from EXPRESS into XML. In EXPRESS, a schema specifies a domain of values and thus "bounds" the collection of values. Thus, a schema maps naturally to the root element of an XML DTD; this element then serves as a container for entity instances.
 Because EXPRESS schemas are structured as a network rather than as a hierarchy, entities generally bear a peer-to-peer relationship to one another. As such, a valid XML document corresponding to the schema can contain zero, one, or more entity instances; the content model of the root element, therefore, is a giant choice particle containing the names of the independent entities in the schema:
 
<!ELEMENT product_definition_schema ((application_context | application_context_element | document | document_type | effectivity | product | product_category | product_category_relationship | product_definition | product_definition_formation | product_definition_formation_relationship | product_definition_relationship | product_definition_substitute)*)>
 The schema element also provides the appropriate place for specifying schema-related metadata, such as origin and date of the schema.
 

Attributes

 A rather amusing difference between EXPRESS and XML is that both languages contain a feature called "attribute", but not only do these uses of "attribute" not mean the same thing, the thing that they mean within the each language is not present at all in the other.
 This requires a bit of explanation. In EXPRESS, an attribute of an entity is a named property of the entity where the name of the attribute describes the role of the datatype with respect to the entity. For example, in the follow declaration:
 
ENTITY geometric_point;
 
x : REAL;
y : REAL:
 
z : REAL;
 
END_ENTITY;
 "x" describes the role of a REAL value with respect to the entity "geometric_point." XML has no equivalent facility!
 The only way that role names can be introduced is by adding an additional level of element declaration that captures the role name:
 
<!ELEMENT point (point.x, point.y, point.z)>
 
<!ELEMENT point.x (real)>
 
<!ELEMENT point.y (real)>
 
<!ELEMENT point.z (real)>
 
<!ELEMENT real (#PCDATA)>
 On the other hand, EXPRESS has no mechanisms corresponding to XML tag attributes for providing metadata about the content. Any such metadata would be indistinguishable from other EXPRESS-declared data. Another thing to note about EXPRESS attributes is that names of EXPRESS attributes are local to the scope of the entity declaration.
 Therefore, if two entities have an attribute called "name", then they are two different "names". This is reflected in the mapping by prepending the entity name to the attribute name when declaring an element for the attribute, as can be seen in the examples throughout this paper.
 

References

 The primary relationship between elements in an XML Document/DTD is that of containment - an element is contained within another element according to the ordering and cardinalities specified in the content model of the parent element. The hierarchical structure is illustrated and highlighted in most XML tools, as exemplified in Figure 6:
 
 Figure 6 - XML Structure
 As already pointed out, EXPRESS does not have the same "container semantics", but rather treats all data objects as first-class objects and establishes relationships between entities by "pointing" from one entity to another. The natural structure is a network:
 
 Figure 7 - EXPRESS Structure
 Mimicking the network structure of EXPRESS in XML required the development of a convention for handling "pointers" in XML. The convention adopted was the creation of a "handle" element that was a companion of and named after an entity. Given the person-owns-car example from above:
 
ENTITY person;
 
name : STRING;
 
owns : car;
 
END_ENTITY;

 
ENTITY car;
 
year : INTEGER;
 
make : STRING;
 
model : STRING;
 
END_ENTITY;
 The XML declarations would be:
 
<!ELEMENT person (person.name, person.owns)>
 
<!ELEMENT person.name (#PCDATA)>
 
<!ELEMENT person.owns (car_ref)>
 
<!ELEMENT car (car.year, car.make, car.model)>
 
<!ATTLIST car
  id ID #REQUIRED>
 
<!ELEMENT car_ref EMPTY>
 
<!ATTLIST car_ref
  refid IDREF #REQUIRED>
 The "handle" or "pointing device" is an EMPTY element called car_ref. This element would appear in the content model of the "owns" attribute of person. The car_ref element contains a refid value which, by convention, references a "car" element with an equal id value. (The car element is at the same level as the "person" element.)
 

Conformance

 With the introduction of a data model as the specification of the content of an XML Document, the notion of Schema Validity is introduced as well:
 
  •  Well-formed-ness is a fundamental requirements for XML.
  •  Validity is the adherence of an XML Document to the structure and rules specified in a DTD.
  •  Schema Validity is the adherence of the content of an XML document to the rules and structures specified by the Content Schema.
 The extra level of conformance called Schema Validity is recognized in the Cambridge Communique [13].
 

Frequently Asked Questions

 

Where is this work being used? Where is it being done?

 The use of the EXPRESS data modelling language for the content specification of an XML document was introduced in the PDML Project (www.pdit.com/pdml). As part of this project, the initial Early Binding specifications were developed and applied, and a small tool called EXML was developed to convert the EXPRESS schema to a DTD. The examples included in the paper were produced with EXML. EXML it is available free at:
 http://www.pdit.com/pdml/exmlintro.html
 The EXPRESS language was developed in and standardized through ISO TC 184/SC4. The Early Binding work presented here is a contribution to a larger effort within SC4 to develop standardized bindings between EXPRESS and XML DTD syntax. The project conducting the work is a joint effort of SC4 and ISO/IEC JTC1/SC34 (SGML). The ISO designation of the bindings once they become standardized will be ISO 10303-28 (Binding of EXPRESS to XML).
 

Why not use UML? XML Schema? XML Data? XML Information set?

 As a data modelling language, UML could have been used as a content specification language in PDML. UML (Unified Modeling Language [8]) is a more widely known and popular language than EXPRESS, and has richer, more expressive features than EXPRESS. However, UML, as an object-modelling language, has a different purpose than EXPRESS. "Objects" are not "entities". Objects in UML "do" something - they have functionality and capabilities and lend themselves to the development of application systems. Entities in EXPRESS, on the other hand, don't "do" anything other than represent a real-world concept and don't lead to application system designs or functionality. It was felt that UML is over-featured with respect to the requirements of PDML.
 There are too many aspects of the language irrelevant to project objectives. XML Schema and XML Data are intended to perform the same function as a DTD, but do it as an XML document rather than as a standard XML DTD. They are equivalent to DTD and, thus, are neither better or worse than DTDs as content specification language.
 XML Information Set is "an abstract data set …[which is] a description of the information available in a well-formed XML document." [5] Like XML Schema and XML Data, XML Information Set also takes as its domain concepts the things that are found in DTD and XML documents. The primary difference between them is that it appears XML Infoset is trying to abstractly describe (i.e., describe the contents without specifying the physical structure) the kinds of information that may be obtained an API from an XML document by, for example, an API. The objectives of the XML Infoset work don't seem to be directed at the role of being Content Schema specification language.
 

Do web resources need all the rigor introduced by data models?

 No.
 For a large number of applications, web resources do not require the rigorous mechanisms entailed in data models because the purpose of the resource might not include data management. Presentation and simplistic, one-off data exchanges don't need to meet the requirements of a long-term data resource.
 However, since XML is ostensibly targeting automated processing of web resources, then the ease and correctness of the processing would greatly be aided with good data management principles and practice. Therefore, while data models would be overkill with respect to many web applications, the growth of the web toward a Semantic Web where agents can find the semantically-correct information that they are searching for will required strong semantic specification languages - and a ton of good practice!
 

Summary and Conclusions

 The World Wide Web is still evolving and will probably continue to evolve in perpetuity. The growth in the recognition and desire for more data semantics on the web (i.e., "intelligent" data that supports and encourages application interoperability) will drive the evolution of web resources and the sophistication (and complexity!) of encoding techniques. XML is a mechanism that provides a step in that direction, but it is not enough. Semantic Content Specification Languages are needed that are applicable and usable across platforms to clearly specify the semantics and structure of data available on the web. Furthermore, there is a growth curve that the technology evolution must follow - mistakes will be made, and lessons will be learned.
 The long history of data model usage, data exchange, and application interoperability that is part of industrial information technology development provides a wealth of mistakes and lessons that can directly support the semantic evolution of web resources. The "Cambridge Communique" [13] recognizes the importance and role of data models with respect to web resources and the Semantic Web, but fails to cite the applicability of existing data model usage and research.
 The Content Schemas that specify the semantics of web resource must be independent of the encoding syntax. XML Infoset takes a step in this direction, but maintains vestiges of its document-oriented origin. Data modelling languages such as EXPRESS provide a mechanism that is both rich in semantic features and mappable to XML and other encoding syntaxes. This paper has illustrated an example and presented some of the details of the use of EXPRESS as an XML document Content Schema.
 

Bibliography

 
  1.  Beech, D., et al. XML Schema Part 1: Structures. (1999) http://www.w3.org/TR/xmlschema-1/. Date of page: 6-May-1999.
 
  1.  Berners-Lee, T., Connolly, D., and Swick, R.R. Web Architecture: Describing and Exchanging Data. (1999) www.w3c.org/1999/06/07-webdata. Date of page: 7 June 1999.
 
  1.  Biron, P.V. and Malhotra, A. XML Schema Part 2: Datatypes. (1999) http://www.w3.org/TR/xmlschema-2/. Date of page: 06-May-1999.
 
  1.  Bruce, T.A., Designing Quality Databases with IDEF1X Information Models. Dorset House Publishing, New York, 1992. ISBN 0-932633-18-8.
 
  1.  Cowan, J. and Megginson, D. XML Information Set. (1999) www.w3c.org/TR/xml-infoset. Date of page: 1999-05-17.
 
  1.  de Swart, H., Introduction to Natural Language Semantics. CSLI Publications, Stanford, 1998. ISBN 1-57586-138-0.
 
  1.  Elmasri, R. and Navathe, S.B., Fundamentals of Database Systems. Benjamin/Cummings, Redwood City CA, 1989. ISBN 0-8053-0145-3.
 
  1.  Fowler, M. and Scott, K., UML Distilled Applying the Standard Object Modeling Language. Addison-Wesley Object Technology Series, G. Booch, I. Jacobson, and J. Rumbaugh, ed. Addison Wesley Longman, Reading, Mass, 1997. ISBN 0-201-32563-2.
 
  1.  ISO. Industrial automation systems and integration - Product data representation and exchange - Part 11: EXPRESS Language Reference Manual. ISO 10303-11:1994, International standard, Geneva, 1994.
 
  1.  Lassila, O. and Swick, R.R., Resource Description Framework (RDF) Model and Syntax Specification. 1998: Wold Wide Web Consortium.
 
  1.  O'Grady, W., Dobrovolsky, M., and Aronoff, M., Comtemporary Linguistics An Introduction. St. Martin's Press, New York, 1989. ISBN 0-312-01878-9.
  2.  Reagle, J. Eskimo Snow and Scottish Rain: Legal Considerations of Schema Design. (1999) http://www.w3.org/TR/1999/NOTE-md-policy-design-19990910.html. Date of page: 1999-09-10.
  3.  Swick, R.R. and Thompson, H.S. The Cambridge Communique. (1999) www.w3c.org/TR/1999/NOTE-schema-arch-19991007. Date of page: 1999-10-07.
  4.  ter Bekke, J.H., Semantic Data Modeling. Prentice Hall International, 1992. ISBN 0-13-806050-9

Modeling Relational Data in XML   Table of contents   Indexes   Integration and Interpretation of XML Schemas