| XML Message Switching | Table of contents | Indexes | Mass-customizing electronic journals | |||
An XML-Based Interchange Format for EXPRESS-Driven Data |
| W. Eliot Kimber |
| Senior Consulting SGML Engineer |
| ISOGEN International
2200 N. Lamar St., Suite 230 Dallas Texas U.S.A. 75202 Phone: +1 214-953-0004 Fax: +1 214-953-3152 Email: eliot@isogen.com Web: www.isogen.com |
Biographical notice: |
| LaFontaine, Robin Monsell EDM Ltd United Kingdom ![]() Upton-on-Severn |
W. Eliot Kimber has been involved with generalized markup, SGML, electronic publishing, and hypertext for all of his career, mostly at IBM, more recently for Passage Systems and ISOGEN International. Eliot is co-editor (with Charles F. Goldfarb, Steve Newcomb, and Peter Newcomb) of the HyTime standard and a member of ISO/IEC JTC1/SC34, the ISO committee responsible for SGML and its related standards. Eliot was a founding member of the XML Working Group. Eliot is the author of the soon-to-be published book Practical Hypermedia: An Introduction to HyTime, part of the C.F. Goldfarb Series on Open Information Management. When he is not working on, writing about, or teaching about standards, Eliot works as a systems integrator, helping clients use SGML, HyTime, DSSSL, and related standards to their best effect. In his spare time, Eliot is a devoted husband and dog owner. |
| Robin LaFontaine |
| Director |
| Monsell EDM Ltd
Midpoint Monsell House, Monsell Lane Upton-on-Severn Worcestershire United Kingdom WR8 0QN Email: robin@monsell.co.uk |
Biographical notice: |
ABSTRACT: |
Use of XML to Serialize Data |
XML provides a character syntax that is naturally suited to the string representation of arbitrarily complex data structures. After all, traditional documents are nothing more than a collection of hierarchically organized nodes with properties. The XML syntax is robust and well standardized. XML documents can be reliably validated for syntactic correctness. Because XML tools are ubiquitous, the use of XML for representation syntax removes the need to document or implement scanner-parsers for the representation syntax. A single object is a collection of properties that can be naturally represented using XML elements and subelements (and, optionally, attributes). Relationships among objects can be represented using either direct containment or references using some addressing notation. There is no great difficulty in defining an XML markup language that will capture all the information content in a given set of objects. |
Thus, XML provides a convenient and useful representation syntax for interchanging abstract objects as physical data sets (as opposed to "in-memory" objects managed by some running program and accessed through an API). The use of XML facilitates interchange because it is a standard and easy-to-use syntax. Thus it is easier to write programs that can accept XML-based data. |
However, the use of XML only makes theparsing easier. It does not do anything to solve the problem of interpreting the data on the receiving end: you must know both how to reconstitute objects from their serialization and how to interpret the objects once they have been reconstituted. |
Thus, serializing and deserializing data using XML is, fundamentally, no different from using some purpose-built character syntax. The processing that has to be performed is the same. To get from the XML abstraction to the original objects, you must apply a process that knows how to map from elements and attributes to the original objects, that is, apply the serialization algorithm in reverse (the deserialization or reconstitution algorithm). Such processing will always be specific to the serialization method, which will usually be specific to a particular set of objects or object representation technology (e.g, STEP, Corba, COM, proprietary object database, etc.). There is no magic to using XML, at least for solving the fundamental problem of interchanging objects between object management systems. Of course, the use of XML makes the job of processing the character representation easier because the tools to build such processors are readily available and relatively easy to use. |
When using XML to serialize objects, there is a source of potential confusion. All XML documents are, inherently, serializations of the abstract objects that make up an XML document, that is, elements. In other words, the XML specification defines two things: an abstract data model for documents and a string serialization of that abstract data model. The XML data model is essentially a tree of elements and data characters. The XML specification formally defines the string representation and the XML Data Model specification (still in early stages of definition) defines the abstraction the string represents. The Document Object Model (DOM) is an API to the XML abstraction. All processing that is not simply string manipulation is applied to the abstraction of the document, not to the original XML character string. For example, in DOM-based processors, an XML document is first parsed and a DOM tree constructed, then the processor operates on the DOM tree. |
Thus in any XML (or SGML) processing system there is always a layer that manages the parsed document more or less according to the base XML data model, that is, as a tree of elements. This layer may be more or less formalized depending on the nature of the tool, but it is always there. |
The potential confusion is confusing the layer of XML objects (elements) with the objects originally serialized into XML syntax. For example, if I have a "person" object in my repository and serialize it to an element type called "person", it is tempting to think that the object that represents the element with a GI of "person" is the same as the original person object. But of course it is not. |
The element object with a GI of "person" exhibits the properties of an element, not the properties of a person. That is, it has properties like "GI", "element type", "content", "attributes", and so on, wheras the person object will have properies like "Name", "age", "sex", and "employer". These different object layers represent two different layers of abstraction serving different purposes and they should not be confused. |
The danger is that it is tempting to think that because XML documents have standardized access APIs like SAX and the DOM that simply by putting your data into XML form you get an access API for free. You do not. You only get a convenient layer on which you can build an access API. That is, you still have to write the reconstitution algorithm--there is no magic in the use of XML that will let you avoid this. While the existence of things like SAX, the DOM, XSL, and query languages for working with XML documents may make it easier to create reconstitutors (because they provide a robust infrastructure for working with XML documents), they do not remove the need to create the reconstitutors. |
For every unique data model or meta-data-model, there will be at least one serializer/reconsitutor. Even if two different data models use XML, the only reliable relation between them is that you can use the same parsing tools to read the serialization documents. There will be no necessary or predictable relation between the serialization and reconstitution algorithms. In the abscence of a universal fundamental data model, there is no way to achieve universal object interchange at the semantic level and the use of XML does not change this fact. |
It is also important to keep in mind that for any non-trivial data objects, the translation into XML cannot be simple or obvious such that one could intuit the reconstitution algorithm simply by examing the XML document instance. This is because the XML data model (elements, attributes, and character data content) is fundamentally different from other data models. XML has no notion of data type beyond string, which means that for most data types, there is no obvious or standardized way to represent that data type using XML syntax. A simple example is real numbers. If you have a data object that has a real number as a property value, you will have to decide how to serialize it. There are many choices: do you want compactness? Precision? Consistency with some existing string representation? This is only one example. Multiply this by the number of different possible data types, the complications of complex data types, and other such issues, and it becomes clear that there are many possible, equally useful ways to serialize objects. Which means that accessing the XML data model alone cannot tell you how to reconstitute a set of serialized objects. |
Having a set of DTD declarations for the serialization format doesn't help either because the DTD declarations only define the syntax rules for the document, they don't tell you what the syntactic elements mean or formally define the rules for going from objects to the serialization syntax. The DTD can help you validate that the resulting document is syntactically valid, but it can't help in understanding how to interpret the document. |
Thus, serializing and deserializing data using XML is, fundamentally, no different from using some purpose-built character syntax. The processing that has to be performed is the same. There is no magic to using XML, at least for solving the fundamental problem of interchanging objects between object management systems. |
I'm making this point because there seems to be a common misconception that having a DOM, for example, somehow solves a problem of object access, which of course it does not. You see people talking about "putting a DOM" over some set of objects. All this is doing is serializing the data objects and then making the serialization available. It is not making the objects themselves directly available. While this may be a useful thing to do (because it eliminates the need to literally create the serialization document), it doesn't really add any great value because you still have to convert the serialized form back into objects in order to process them in terms of their original semantics. Thus, if you have a set of objects and provide a DOM API to them, you've simply added an unnecessary layer of translation between the original objects and the software that wants to process them. Mapping the objects to a DOM is the same as writing a serialization process that generates a literal XML document. Translating the DOM representation back into semantic objects is, of course, the reconsitution process. While this provides a potentially useful integration layer, it is a weak one because it is defined in terms of the abstraction of a string serialization, not in terms of a more general object model. Note:
|
Of course, the use of XML makes the job of processing the character representation easier because the tools to build such processors are readily available. Thus, no-one need ever invent a syntax or write a parser ever again (not that that will stop people from doing it). In addition, the robustness of the XML language, coupled with an existing software infrastructure, makes it possible to use data representation techniques that might otherwise be too expensive to build into a purpose-built syntax. |
It also means that traditional document processing tools such as browsers and search engines can be applied to the serialized forms of objects, treating the serialized form as a document (for example, to browse the data with an XML browser). But note that operating on the serialized form of the objects is not the same as operating on the objects: you still have to reconstitute the objects in order to operate on them. In other words, there is a fundamental difference between a tool that can generically render any XML document (say a simple tree viewer) and a tool that formats and renders a serialization document in terms of the semantics of the objects serialized. In the latter case, the style sheets or transformations that create the rendition have effectively reconstituted the objects in order to reflect their semantics (even if the objects are not literally reconstituted). That is, a style sheet that renders a serialization document in terms of the semantics of the serialized objects has to have the same degree of understanding of the semantics of the original objects and their serialization algorithm that a processor that operated directly on the original objects would have to have. |
The use of XML only provides convenience, it does not eliminate the need to write software that understands the original objects. It doesn't matter whether you literally reconstitute the objects and then process them or reconstitute them virtually as part of the processing, it's the same amount of work. The only difference is the convenience with which you can apply a particular type of processing to the data. |
Assume that you have a collection of objects in some repository that provides a programming API by which you can access those objects in terms of their base schema. That is, if you have a "person" object, you can, in the API, ask for a "person" and get it. If you want to render these objects to a Web browser, you could write a program to this API that gets the objects and then generates HTML that reflects the desired rendition. However, it's unlikely that you have at hand an HTML generation tool that is also easy to integrate with your repository's API. You would probably end up building a lot of infrastructure just to be able to easily generate HTML from your objects. What you really want to be able to do is apply a style sheet to your objects to declaratively (as much as possible) describe their HTML rendition. It is unlikely that you have a style-based processor that is also able to easily connect to your repository. |
You could, alternatively, serialize your objects into XML, parse the XML into a DOM tree, and then apply DOM-based tools to derive the rendition you want. (You can also serialize your objects directly into a DOM tree without actually writing out and reparsing the XML document. This is what "putting a DOM over something" means.) For example, you could use an XSL style sheet within Internet Explorer 5 or Mozilla to define the mapping from the serialization of your objects to the desired HTML representation. But, you'd have to build into those XSL style sheets the knowledge of how to map from XML elements and attributes to the real objects. For non-trivial data sets, this mapping will always be non-trivial and, to some degree, non-obvious. |
Note that in both scenarios there is a middleman that would be best avoided. In the first case, the middleman is the infrastructure needed to provide a style-based processing environment for your objects as held in the repository. In the second case, the middleman is the DOM layer and that part of the style sheet that does the deserialization. |
It would be much better if you could eliminate the middleman. That is, if you could apply style sheets directly to your objects as they exist in the repository without having to either create a new layer of infrastructure or adding extra complexity to your style sheet. Unfortunately, nothing in the use of XML syntax provides this. This is because the problem is one of abstraction mapping and not syntactic interchange. Processing occurs in the abstract domain, while XML operates in the syntactic domain. XML companion standards operate in the abstract domain, but one that is tightly bound to the specific abstraction of parsed XML documents. There is no higher-level, more general abstraction in which the XML abstractions (and thus the XML-specific processing) are defined. Thus, if you use XML-based tools, you must always provide the mapping from XML's abstraction (elements and attributes) to your abstraction. There is no defined mechanism for applying XML tools (that is, XSL, XLink, etc.) to non-XML data abstractions. Serializing the data abstractions into XML doesn't change this. |
In the SGML world, there is a higher-level abstraction, groves, which is used to define style and hyperlinking processing. This means that these processes (and thus the tools that implement them) can be applied to any data objects, not just parsed SGML documents. When using these tools (DSSSL and HyTime), it is not necessary to first serialize your objects in order to be able to conveniently apply presentation and hyperlinking processing. The details of that process are explained in the two other papers on this panel. |
The point of all this is to make it clear that while the use of XML for serialization is useful and important, it does not make the fundamental problems any easier, it can only make solving them a bit more convenient. In other words, don't think that you're getting more than you are when you use XML for data serialization. |
Requirements on the XML Representation of EXPRESS-Driven Data |
EXPRESS-driven data is data that conforms to EXPRESS schemas (including the schemas themselves). The STEP standard defines an abstract representation of data conforming to EXPRESS schemas but not a concrete representation form for data held in EXPRESS-driven repositories EXPRESS-driven data consists of "entities", which are abstract data objects (not to be confused with SGML and XML entities, which are abstract storage objects). Entities are collections of "attributes". Every entity instance has one or more classes. Classes are organized into classes and superclasses. EXPRESS allows multiple inheritance (that is, a given class or entity type can have multiple supertype classes). Entity instances are related to other entity instances when one entity includes another entity in the value of an attribute. In abstract, these relations are simply object-to-object pointers. Note:
|
Entity instances are organized into "repositories", where a repository is simply an identifiable collection of entity instances. Entity instances have identity but the STEP standard does not provide or require that entity instance identifiers be exposed or persistent. (This means, for example, that there is no generalized mechanism for refering to particular entity instances from outside a repository in a non-repository-specific way--that is, there is no concept analogous to SGML element IDs or entity public identifiers.) |
The XML representation is an attempt to represent both schemas (entity type definitions) and entity instances for interchange using XML syntax. The current requirements on the XML representation effort are:
|
These requirements reflect our initial attempt to define requirements and are not completely refined or agreed upon, but they give a general idea of the nature of the requirements. |
Some key requirements are the enabling of early and late binding and the enabling of a range of instance to schema bindings. |
The current serialization formal for EXPRESS-driven data, defined in Part 21 of ISO 10303, is early bound in that the keywords in the data stream reflect the names used in the schema that governs the data being interchanged. Attribute values are not labeled, but are associated with their schema definitions positionally (according to the order the attributes are defined in the schema). This makes for a compact syntax but one that requires deep knowledge of the schema (which is not part of the interchange data set) simply to correctly parse the interchange file. The XML representation must enable the same approach, in which the schema is not transmitted with the entity instances. |
At the same time, it can be convenient to include some or all of the schema information in the interchanged data, either through the labeling of attribute values or by including the schema definition as part of the interchange package. Note that EXPRESS schemas already have a character syntax for their definition, so there is not necessarily a requirement to define an XML-based syntax for EXPRESS schemas, although it probably a useful thing to do. |
The binding-time requirement reflects the trade-off between generality (late binding) and compactness and ease of interpretation (early binding). In a late-bound syntax, the element types reflect the basic STEP model, that "entity" and "attribute". The connection between data instances and constructs in the governing schema are made in the data, e.g.: |
<entity class="person"> <attribute name="fullname">Nigel Shaw</attribute> <attribute name="sex">Male</attribute> </entity> |
Late binding has the advantage that there is only one document type definition that can be used for any collection of data. There is no need to define potentially complex schema-to-markup algorithms. Likewise, generic processors become easy to write (by "generic" I mean processors that do not require an understanding of the schema in order to perform their function). It has the disadvantage that the resulting data set is larger than the equivalent early-bound data set. Reconstitution of the original objects requires a bit more programming effort. |
In an early-bound syntax, the entity types and attribute names are reflected directly in element types or attribute names: |
<person> <name>Nigel Shaw</name> <sex>Male</sex> </person> |
Early binding has the advantage that it can be more compact and a more intuitive representation of the data, with the XML syntax closely modeling the structure of the original objects. It has the disadvantage that it requires a knowledge of both the serialization algorithm and the underlying metamodel. In this example, you have to know that the "person" element is in fact an entity instance and that the "name" and "sex" elements are attribute instances. It could just as easily be the case that the "name" element is an entity instance contained within an attribute named "person". There isn't enough information in the markup alone to know which it is. |
Another problem with early-bound forms is that they represent optimizations in particular directions. No optimization is universally optimal, so no early-bound representation will be optimal for all uses. By contrast, late-bound representations are equally suboptimal for all uses. Thus the choice between late or early binding is one of optimization vs. generality. |
For a given late-bound representation, there are infinitely many possible equivalent early-bound representations. That makes it difficult, if not ill-advised, to standardize on a single early-bound representation, because doing so precludes the use of any other possible early binding. However, if you standardize only the late-bound form, you ensure that all uses of the representation will be suboptimal. |
However, you must standardize exactly one late-bound form, as this represents the lowest common denominator, the format that all conforming processors must be able to read and write. |
Having standardized the late-bound representation, it is sufficient to define a formal mechanism for mapping any early-bound representation to the late-bound form. Thus, instead of standardizing specific early-bound forms, you standardize the mechanism by which early-bound forms are mapped back to the late-bound form (and thus to the form that all processors are guaranteed to know how to process). This leaves the optimization choices in the hands of individual users. When interchange is between parties that use the same early binding, there is no need to resolve the early-to-late binding mapping. When interchange is between parties that use different forms of early binding (or one party only uses late binding), resolving the early-to-late binding mapping allows the other party to process the data. |
The mapping might be through attributes, ala SGML architectures: |
<person expr-type="entity"> <name expr-type="attribute">Nigel Shaw</name> <sex expr-type="attribue">Male</sex> </person> |
(Note that XML name spaces do not solve the problem because they only indicate the base type of an element, drawn from some public vocabulary of element types. Namespaces do not provide a mechanism for indicating derivation of an element type from a more general type. That is, name spaces do not express IS-A relationships between local, specialized names and generalized names.) |
The requirement to enable a choice of bindings leads to a number of design challenges. It also significantly complicates the design because it must include a mechanism for defining the early-to-late binding mapping in addition to the representation mechanism itself. The SGML architecture mechanism, standardized in ISO 10744:1997, provides a ready-made mapping mechanism, but it also imposes some limitations that may be too constraining. In particular, the early and late bound forms must have essentially the same element structure. For example, if the late-bound form has entities with attribute subelements, then all early-bound forms must follow this same model. Specializations can introduce additional layers of containment, but they cannot omit layers required by the general model. Thus, the late-bound representation must be carefully designed so as not to be too constraining on early-bound designs. On the other hand, these constraints make it relatively easy to resolve the early-to-late mapping because it does not require structural reordering. It also helps that there are free architecture-aware tools for both SGML and XML processing (the SP parser and the SAXArch extensions to the SAX API). In addition, architectural mapping is not difficult to implement in an ad-hoc fashion using normal SGML and XML processing techniques. |
The requirement to enable the full range of schema inclusion, from none at all to transmitting the entire schema also complicates things, although it enables a more transparent representation syntax, one in which you can at least know the names for all the attributes even if you don't know the schema (although you still won't know how to interpret the attribute values without a schema). |
The requirement to reflect the result of select types reflects a complication of EXPRESS resulting from the ability to have multiple superclass types for a given entity type. Because an entity can have multiple supertypes, it can get attributes from each type. The attribute name space is only unique within an entity type, so at the intance level, you must maintain knowledge of which superclass provided which attribute. Because superclasses form a hierarchy, a given attribute may come from a parent superclass or an ancestor superclass. This information must be maintained on an instance basis. There are additional complications that are beyond the scope of this paper. Suffice it to say that a simple list of attribute name/value pairs is not sufficient to represent entity instance data in the general case. |
The Resulting Design |
Unfortunately, at the time of writing, there is not a solid design, stemming largely from a lack of resources available to work on the project. A demonstration design was developed in 1998 simply to demonstrate the concept. In the spring of 1999 we did a little work on a new design approach that reflected a refinement of the original requirements and some new design principles. This new approach is itself just a draft and there is as yet no concensus about its suitability. |
This latest design is a late-bound representation. It reflects the following design principles:
|
A sample of the markup is shown below: |
<?xml version="1.0"?>
<?IS10744 arch name="express-rep"
public-id="ISO 1030
Representation of EXPRESS-Driven Data//EN"
dtd-system-id="p2x0_0.dtd"
?>
<!DOCTYPE express-driven-data SYSTEM "p2x0_0.dtd">
<express-driven-data>
<schema id="schema-1">
<imports>
</imports>
<constant_block>
</constant_block>
<type_decl>
<type_name>mystring</type_name>
<base_type><string/></base_type>
</type_decl>
<type_decl>
<type_name>myotherstring</type_name>
<base_type><string/></base_type>
</type_decl>
<type_decl>
<type_name>thingy</type_name>
<select>
<type_ref>mystring</type_ref>
<type_ref>myotherstring</type_ref>
</select>
</type_decl>
<entity_decl>
<entity_id>foo</entity_id>
<attribute_def>
<attribute_name>bar</attribute_name>
<attribute_type>
<list><!-- LIST [1:?] of thingy -->
<cardinality><min>1</min><max>?</max></cardinality>
<allowed_types><type_ref>thingy</type_ref></allowed_types>
</list>
</attribute_type>
</attribute_def>
<!-- OPTIONAL LIST [2:10] of UNIQUE thingy -->
<attribute_def>
<attribute_name>baz</attribute_name>
<optional/>
<attribute_type>
<list>
<cardinality><min>2</min><max>10</max></cardinality>
<unique/>
<allowed_types><type_ref>thingy</type_ref></allowed_types>
</list>
</attribute_type>
</attribute_def>
<unique_rule>
<attribute_name_ref>baz</attribute_name_ref>
</unique_rule>
<where_rules>
<where_rule>
<label>WR1</label>
<expression>
<!--
(= (SIZEOF (QUERY (mi <* QUERY (item <*
SELF\\representation.items |
'CONFIG_CONTROL_DESIGN.MAPPED_ITEM' IN TYPEOF (item)) |
(NOT ('CONFIG_CONTROL_DESIGN.' +
'GEOMETRICALLY_BOUNDED_WIREFRAME_REPRESENTATION'
IN TYPEOF
(mi\\mapped_item.mapping_source.mapped_representation))))))
0)
-->
<sizeof>
<arg>
<expression>
<query>
<variable_id>mi</variable_id>
<aggregate_source>
<expression>
<query>
<variable_id>item</variable_id>
<aggregate_source>
<attribute_ref>
<entity_ref><self/></entity_ref>
<id_val>representation</id_val>
<attribute_ref><id_val>
items</id_val></attribute_ref>
</attribute_ref>
</aggregate_source>
<expression>
<in>
<arg>
<string_val>
CONFIG_CONTROL_DESIGN.MAPPED_ITEM</string_val>
</arg>
<arg>
<typeof>
<arg><variable_ref>
item</variable_ref></arg>
</typeof>
</arg>
</in>
</expression>
</query>
</expression>
</aggregate_source>
<expression>
<not>
<arg>
<in>
<arg>
<add>
<arg><string_val>
CONFIG_CONTROL_DESIGN.</string_val></arg>
<arg><string_val>
GEOMETRICALLY_BOUNDED_WIREFRAME_REPRESENTATION
</string_val></arg>
</add>
</arg>
<arg>
<typeof>
<arg>
<attribute_ref>
<entity_ref><variable_ref>
mi</variable_ref></entity_ref>
<id_val>mapped_item</id_val>
<attribute_ref><id_val>
mapping_source</id_val>
<attribute_ref><id_val>
mapped_representation</id_val>
</attribute_ref>
</attribute_ref>
</attribute_ref>
</arg>
</typeof>
</arg>
</in>
</arg>
</not>
</expression>
</query>
</expression>
</arg>
</sizeof>
</expression>
</where_rule>
</where_rules>
</entity_decl>
<type_decl>
<type_name>sexes</type_name>
<base_type>
<enum>
<member>male</member>
<member>female</member>
</enum>
</base_type>
</type_decl>
<entity_decl>
<entity_id>enterprise</entity_id>
<attribute_def>
<attribute_name>name</attribute_name>
<attribute_type>
<type_ref>string</type_ref>
</attribute_type>
</attribute_def>
</entity_decl>
<entity_decl>
<entity_id>person</entity_id>
<attribute_def>
<attribute_name>sex</attribute_name>
<attribute_type>
<type_ref>sexes</type_ref>
</attribute_type>
</attribute_def>
<attribute_def>
<attribute_name>employer</attribute_name>
<attribute_type>
<type_ref>enterprise</type_ref>
</attribute_type>
</attribute_def>
</entity_decl>
</schema>
<data>
<entity_instance id="i0001">
<partial_entity_instance>
<entity_ref><id_val>foo</id_val></entity_ref>
<attribute>
<attribute_name_ref>bar</attribute_name_ref>
<select_type_path>
<type_ref>mystring</type_ref>
<type_ref>binary</type_ref>
</select_type_path>
<simple_value>
<list_val>
<binary_val notation="uuencode">begin 777 $9G)
E9 end</binary_val>
<binary_val notation="uuencode">begin 777
$9G(E9
end</binary_val></list_val>
</simple_value>
</attribute>
</partial_entity_instance>
<partial_entity_instance>
<entity_ref><id_val>person</id_val></entity_ref>
<attribute>
<attribute_name_ref>sex</attribute_name_ref>
<type_ref>sexes</type_ref>
<simple_value>
<enum_val>
<member_ref>female</member_ref>
</enum_val>
</simple_value>
</attribute>
<attribute>
<attribute_name_ref>employer</attribute_name_ref>
<type_ref>enterprise</type_ref>
<entity_relationship>
<entity_instance_ref>i00002</entity_instance_ref>
</entity_relationship>
</attribute>
</partial_entity_instance>
<partial_entity_instance>
<entity_ref><id_val>customer</id_val></entity_ref>
<attribute>
<attribute_name_ref>employer</attribute_name_ref>
<type_ref>organization</type_ref>
<entity_relationship>
<entity_instance_ref>i00002</entity_instance_ref>
</entity_relationship>
</attribute>
</partial_entity_instance>
</entity_instance>
<entity_instance id="i00002">
<partial_entity_instance>
<entity_ref><id_val>enterprise</id_val></entity_ref>
<attribute>
<attribute_name_ref>name</attribute_name_ref>
<type_ref>string</type_ref>
<simple_value>
<string_val>Three Initial Company</string_val>
</simple_value>
</attribute>
</partial_entity_instance>
</entity_instance>
</data>
</express-driven-data>
|
This example includes an expression, which, like the similar MathML, is very verbose (we are investigating whether or not we can use MathML directly or as the basis for representing expressions in the schema). This example gives a feel for how the final design might look and some of the complexities of representing EXPRESS-driven data, complexities stemming from the richness of function in EXPRESS and the generality of the EXPRESS language. |
It should also be clear from this example that the abstract structure of the parsed document (that is, as a tree of elements) is still fairly far from the abstract structure of the original EXPRESS entities it is the serialization of. The processing needed to reconstitute the original entities is non-trivial. |
Implications for Similar Projects |
There are several similar projects under way. The XML Schema project within the W3C appears to have significant overlap in the area of representing schemas using XML syntax. However, our current understanding of the XML Schema work is that its scope of application is much narrower than that of the New Work Item in that it focuses on the needs of representing schemas for XML documents, not data modeling in general. This suggests that the two efforts, while similar, are fundamentally different and will not result in or benefit from rationalization of markup design. |
The Object Management Group (OMG) is sponsoring a project to define an XML interchange language for UML data and object models, the XMI. This effort appears to have significant overlap with the New Work Item, both in terms of scope of application and details. UML and EXPRESS have roughly the same scope of application and level of descriptive power. More research needs to be done to determine whether or not the XMI effort can be used as is or as a base for specialization for the New Work Item. |
The MathML language may provide a suitable base for representing EXPRESS expressions. The EXPRESS language includes a complete constraint expressoin language. More research needs to be done to determine whether or not MathML can be used, although initial research suggests that it can. It is probably a good test of the MathML design to try to apply it in this new domain, given that MathML is intended to be as general as possible. |
Other object serialization efforts may provide useful insights and experience and visa versa. However, the intent of the New Work Item is not to provide a general-purpose object serialization syntax, but merely a syntax for the interchange of EXPRESS-driven objects. |
Relationship of This Effort to STEP and SGML Harmonization Effort |
The XML Representation New Work Item is only tangentialy related to the STEP and SGML harmonization effort. This is because the STEP and SGML harmonization effort is working at the abstract level. It is trying to define the mapping between EXPRESS entities as abstract objects and grove nodes (SGML's analogous abstraction). Because it is operating in the abstract domain, issues of syntax are irrelevant. In particular, the existence or non-existence of an XML-based interchange format for EXPRESS-driven data does not affect the harmonization effort in any way. |
However, there is the danger that people will perceive that the existence of an XML interchange format for EXPRESS-driven data removes the need for integration of STEP and SGML at the abstract level. It does not. The goal of the STEP and SGML harmonization effort is to enable the interoperation of STEP and grove-based data in terms of its fundamental object models, not in terms of its serialization. Linking to an element that represents the serialization of a STEP data object is not the same as linking to the data object itself. They are two different things. This distinction is clearer with late-bound representations. With early-bound representations (where element types may reflect entity types), the distinction is not as clear, simply because the XML abstraction looks more like the original STEP data abstraction. However, it is still the case that the serialization of the objects is not the same as the objects themselves. |
|
Bibliography |
| XML Message Switching | Table of contents | Indexes | Mass-customizing electronic journals | |||