Graphics-based Product Documentation: Principles and an Application   Table of contents   Indexes   HL7-XML Progress Report

 

Common Business Library (CBL)

 Terry   Allen
  Commerce One 
Email: tallen@sonic.net
 
Biographical notice:
 
Terry Allen is a specialist in technical standards that support complex electronic publishing applications, including information discovery and retrieval, metadata, and internationalization. He is a codesigner of the Docbook DTD, the SGML application most commonly used for computer documentation (and good for other things, too!). He has participated in IETF and W3C working and special interest groups on HTML, URLs, URNs, MIMESGML, WEBDAV, XML, and the OCLC Metadata group. He designed and edited the first Web portal site (Global Network Navigator's Whole Internet Catalogue). Since 1997 he has been working on document design and architecture for electronic commerce systems. He is currently chairman of the OASIS Registry and Repository Technical Committee. For further details seehttp://www.sonic.net/~tallen/
 
ABSTRACT:
 
 
In this paper I'll described the XML e-commerce language I wrote in 1997 and 1998, its purpose, design goals, architecture, and features. I'll describe the semantics I built upon and discuss lessons learned from the project, including the phase in which the original DTD syntax was transformed into an XML schema syntax. And I'll indicate directions for future development. CBL's development was partly funded by the U.S. Department of Commerce i (NIST) Advanced Technology Program award 70NANB7H3048 to Veo Systems, CommerceNet, BusinessBots, and Tesserae Information Systems.
 

Copyright 1999 by Commerce One, Inc.

 
Copyright notice is not to be removed!
 

Why CBL?

 
 CBL  (Common Business Library) , was developed as an experimental prototype e-commerce language in  XML  (Extensible Markup Language) to describe documents exchanged among components of a componentized e-commerce system. The notion that an e-commerce system should consist of components is the “ECO” strategy enunciated by Marty Tenenbaum, former Chairman of Veo Systems (which was acquired by Commerce One earlier this year). In such a system services offered by components can be assembled into larger and varied services. The advantage of using  XML documents as interfaces instead of  CORBA  (Common Object Request Broker Architecture) objects (as was envisioned originally) is that they are easier to develop and use for a wider range of users, and that they represent naturally the documents, whether paper or EDI  (Electronic Data Interchange) , already in use in commerce. It's important to understand that CBL is an artificial language for e-commerce and e-commerce documents, not an exercise in artificial intelligence.
 
 CBL includes not only the  XML DTDs and  SOX  (Schema for Object-Oriented XML) schemas, along with their associated documentation, but also a specification for constructing compound  XML documents and packing them in  MIME  (Multipurpose Internet Mail Extensions) Multipart/Related messages and to a certain degree a model of document interchange ( SOX is Commerce One's submission to the  W3C  (Worldwide Web Consortium) for consideration by the XML Schema Working Group).
 
I began work on CBL in August 1997. Version 1.1 was released in September 1998, and version 1.2, which is represented in both  DTD  (Document Type Definition) syntax and  SOX , was completed in November 1998. At that point the specification had done its job in providing proof of concept, both to us and to the many who have downloaded the distribution; CBL is currently being employed only as a reference model. Further work is desireable to harmonize  CBL semantics with those of EDI (both the X12 and EDIFACT flavors), and with other specifications that have appeared since my CBL work began. In the meantime, we've used a simplified subset of CBL for several successful demo projects.
 

Design Goals

 
For the most part CBL DTDs represent familiar document types, such as catalogue entries and purchase orders. These document types, and the mechanisms used to construct them, originally were chosen in order to meet the following goals:
 
  •  Cover a broad range of common requirements at a reasonably simple level while designing for extensibility.
  •  Use existing implemented and stable functionality insofar as possible, by using international and  IETF  (Internet Engineering Task Force) standards and proposed standards where practical and to the extent they are useable, while steering clear of contentious, unimplemented, unstable, or unready specifications or standards. XML , to the degree it is used in CBL , is considered a stable application profile of SGML  (Standard Generalized Markup Language) , an international standard for which there exists considerable record of successful use. (Later, I created a SOX representation of  CBL , as I'll explain shortly.)
  •  Leverage reusability of documents and markup constructs.
  •  Make internationalization and localization of documents easy.
  •  Provide sufficient data typing to enable the construction of programs to process CBL documents.
  •  Maintain independence of transport protocol.
  •  Make it easy for electronic commerce participants to establish trust and exhibit good faith.
 

Starting Points

 
My first step was to scour the Web for e-commerce standards and specifications. There is no lack of them, but a year ago August there was very little in XML or in a format lending itself to XML ification. I found a number of obvious standards for the basis of an e-commerce specifications, such as ISO 8601 for date and time, and ISO 4217 for language codes. I also found the BSR  (Basic Semantic Repository) , which contains a partial union set of semantics from X12 and EDIFACT. From the BSR I extracted primitives for such things as addresses. (You can find the BSR online at.)http://www.iso.ch/BSR/
 
I then examined specifications for sets of e-commerce documents, such as OBI  (Open Buying on the Internet) , and in concert with my colleagues at Veo, devised a set of document types that would support both the construction of online trading communities and the scenarios of such specifications as OBI . I aligned relevant document types with semantic primitives defined by Rosetta Net (for catalog content) and the  IOTP  (Internet Open Trading Protocol) (for payment). Then I worked through as many business semantics models as I could to see if my document types were sufficiently robust to do real work, and tested them by constructing sample documents to support various specifications such as  OBI .
 

 CBL Architecture

 
From the standpoint of DTD design,  CBL 's DTD-syntax representation is traditional: it has an information pool composed of modules, some of which rely upon each other, and a set of modules that define the contents of document types; the document types themselves are meant to be containers that can be discarded when building larger document types.
 
As part of CBL I
 
  •  developed further the  MIME Multipart/Related compound document packaging proposal I've been promoting for several years (see my SGML '96 paper, “Package or Perish”),
  •  adopted the general syntax of the original XLink proposal, with some embellishments,
  •  worked out the usage of URNs in CBL such that resolution over the Net would rarely be required. Every CBL document type may contain a metadata section at its outset. The metadata elements provided to date are minimal, but they include, optionally, one or more URNs. A  URN  (Uniform Resource Name) , which is uniquely a name for the document, can be used later in an exchange of e-commerce documents as a proxy for the document itself, without requiring the construction of a  URN resolution service, provided its name space does not allow the assignment of URNs for variable resources such as “the latest version of something.” If you have already acquired a copy of a document (such as a taxonomy) identified by URN , when it is pointed to by URN in a new compound document (such as a product description), you can assume safely that you need not refetch and examine it. In general, CBL is constructed so that you can use URLs everywhere you can use URNs, and so that you can use URNs in typed pointers as shorthand for a piece of a compound document.
  •  made extensive use of typed pointers (pointing constructs that must target a document of a particular document type). Large-scale documents are constructed of sets of smaller ones in part by use of these pointers, thus retaining the semantics of document types as they are aggregated and allowing for lazy evaluation and easy semantic integration of disparate document types. A typed pointer that points to a contact information document looks like this:
 
<market.participant.info.pointer>
<urn.reference
urn.string="urn:x-commerceone:identity:henry.morgan">
</urn.reference>
</market.participant.info.pointer>
 
  •  These link elements bear an attribute, fixed in the attribute declaration, indicating whether the documents they point to are properly content of the parent document or lie outside of it. This device allows one to draw a limit around the logical content of a compound document (or, more generally, a hypertext collection) and avoid unwanted recursion in processing. CBL's link element syntax is compatible with the provisional XLink specification.
  •  constructed a taxonomy DTD for description of products along various axes. This taxonomy model describes a recursive hierarchy of taxons, in which taxons contain child taxons. In order to avoid growing a single enormous tree, taxons may point to parents, children, or siblings in other taxonomies. Taxons have names and descriptions, and may have sets of keywords. They may also contain descriptions of queries that may be made against schemas describing instances of the entity classified by the taxon: this device enables a user to know what questions it makes sense to ask about a schema for a particular product.
  •  built a query language for XML documents, including provision for querying across document boundaries within compound documents. This query mechanism is based on the Xpointer mechanism in the XLink specification.  CBL uses only part of the Xpointer semantics in its query mechanism, representing them in instance syntax and augmenting them to support logical AND and OR, and traversal of links. This mechanism is considered a placeholder for an XML query mechanism yet to be specified.
 
E-commerce requirements not dealt with in CBL or dealt with only minimally include:
 
  •  workflow and service description
  •  updating and versioning (it is intended to use the IETF  WEBDAV  (Web Distributed Authoring and Versioning) specification)
  •  security and digital signatures (although there is a placeholder for digital signatures in the MIME packaging specification)
  •  message acknowledgements
  •  legal information including terms of business
  •  payment (work remains to be done on aligning CBL with IOTP )
 

Semantic Domains

 
 XML markup, in association with its documentation, gives meaning to a document's content. (Markup is often falsely called “self-describing metadata,” but of course the description is in the markup documentation, not the  XML document.) For e-commerce, this meaning covers three basic domains:
 
  •  Business document semantics (“line item”)
  •  Product description semantics (“has four-wheel drive”)
  •  Business logic semantics (“extend no credit to new customers”)
 
Common to all these domains are at least some from among the general datatypes describing such things as time, space, number, and physical properties. Aside from that commonality, these domains are largely disjunct, but software that processes e-commerce information must deal with them jointly:
 
  •  business logic is applied to business document semantics
  •  business documents must handle product semantics
  •  product descriptions must eventually support manufacturing logic, which will be generically similar to business logic
 
In CBL I have dealt with these domains in different degrees:
 
  •   SOX and CBL in  SOX provide some datatypes.
  •   CBL provides a set of business document semantics designed to support traditional SGML goals, including information reuse and also object-oriented code generation.
  •  I have left specific product description semantics undescribed except for a demonstration; CBL provides for general product description semantics. There is an ocean of specific product semantics (such as those now being specified by Rosetta Net), which can also be used to support manufacturing specifications, but a general e-commerce language should not include that entire ocean. I've provided a slot for specific product descriptions within the product description and catalogue entry document types, so any specific product description may be used.
  •  I have considered business logic out of scope for  CBL —I have provided only a placeholder pointer to it in the MIME compound document specification. I may send you a purchase order so you can fulfill it, notarize it, or archive it, and I have to indicate my intention outside of the document if it is to be used unchanged for all those purposes.
 

Transition to SOX

 
About the time I began to have things worked out, Murray Maloney, Alex Milowski, and Matthew Fuchs began to develop what became SOX , and I spent many hours converting my  CBL DTDs into SOX schemas, initially in a mechanistic fashion, later taking into account SOX's mechanisms of inheritance and extension. By version 1.2 of CBL I was writing  SOX schemas first and thinking in  SOX ; the DTDs became secondary products. This development led to considerable discussion within the company about how to use SOX effectively—and my education in the alternate universe of object-oriented programmers. But it is from SOX source (more properly, source conforming to a rearranged subset of SOX ) that our programmers have been working for the past year.
 
Going forward, we intend to use the XML schema language that the W3C will specify.
 

 CBL Lessons Learned

 

The Obvious

 
We can't hand-craft everything . It will be necessary to generate huge wads of SOX from existing specifications. Optimization probably must be done by algorithm or not at all, rather than in Terry's wetware.
 
Naming is contentious . As you all know. Our Java programmers complained unendingly about my naming syntax (lower case with periods as separators). I have since discovered a multitude of naming styles in existing e-commerce specifications that are similarly unlike Java conventions. I'm afraid programmers will have to live with syntaxes they don't like; models of multiple names associated with multiple contexts, such as ISO 11179, may be useful for relieving this tension. For example, in IEEE P1489, the Standard for Data Dictionaries for Intelligent Transportation Systems, one find a name of the formCONSTRUCTION.ROAD_TargetCompletion_date . In an ISO 11179 schema one might represent this as the name of the data element and add
 
<synonymous.name
context="Java">constructionRoadTargetCompletionDate
</synonymous.name>
 
I did not try to avoid qualified names for element types and attributes, but I did try to avoid the flattened names of EDI . For example, where CBL has anaddress element EDI has compounds such as Goods.DeliveryLocation.Address , which collapse containing context with the names of elements that ought to be reuseable. These collapsed constructs are not unlike queries, but they have no place in an XML schema.
 

Heritage

 
 EDI transaction sets are useful . At least some of them, anyway. They represent information sets people actually want to exchange, and can be transformed into document types by deleting unneeded information (such as trailer fields) and separating out the semantics of individual documents from those describing batches of documents.. Rosetta Net is developing information interchange models that includes stubs for what could be document types (Rosetta Net is current filling in these stubs with EDI transaction sets), and document types developed by the Open Applications Group look like a rationalized version of EDI's. There is little point in reinventing these information sets, although new ones are required for describing markets, their workings, and their participants.
 
Certain EDI basic semantics are useful, but BSR is not the solution because of the collapsed context problem I mentioned earlier. X12 and EDIFACT badly need reengineering and rationalization, so as to sort out reuseable primitives (Address) from contextual semantics (CustomerAddress). And above all their archaic syntax has to be junked.
 

OO XML

 
Multiple inheritance is essential . Corky (who exists only for the purpose of this example) is a swine, a sow, a mother, aSus scrofas domesticus , a Gloucester Old Spot, a pet, a thing licensed by Sonoma County, a patient of Dr. Clive N. Huff, the object of an insurance policy, and an input into the manufacturing process that will produce this fall's supply of pork. There is no way to arrange the matrix of Corky's attributes in a single tree. The best we can do is construct a  SOX representation of a swine schema that uses inheritance along the axis most useful to our immediate needs, and represent information along the other axes by XML containment and pointers to taxonomies (taxonomies are very important).
 
 SOX -based e-commerce schemas require constructs not present in EDI . For example, in  CBL 1.2 I have both asimple.line.item and an unpriced.line.item (for those cases such as a request for bid, in which the price is unknown and should not appear). For SOX I've created quite a few prototypes (38 at last count) to support inheritance, which  EDI knows nothing of.
 
A lot can be done with datatypes . Most enumerations can be reduced to datatypes. The ability to specialize datatypes and provide constraints on acceptable values is very powerful.
 

The Not Obvious

 
You can be too abstract . At one time I modelled a general transaction description document type. It helped me a lot in realizing what a transaction is (the exchange of value between entities, perhaps many such exchanges) but it was too abstract to be useful in the real world. So I broke it down into purchase order, invoice, request for bid, response to request for bid, which is the level of abstraction found in  EDI and the level that people find comfortable.
 
Semantic mapping is essential . To use semantics already defined elsewhere, it is necessary to be able to point to them, a facility added to SOX in its latest revision. It is worth noting that ISO 11179, which deals with the specification of data elements and the organization and operation of a data element registry, has facilities for doing semantic mapping (and has considerable intellectual overlap with SOX ). Whether this mapping should be done within the XML schema or from outside, in an independent document, depends, I think, on whether one is trying to reuse well known definitions of semantics in a new schema or construct mappings among existing schemas (which may be read-only).
 
Business logic must be expressed . To construct efficient schemas for business documents it is necessary to know how the information they encode is to be processed—or at least what sets of information will be processed together. I've considered it out of scope for CBL , but I'm beginning to wonder if I'm right about that.
 
Process logic must be expressed . I may send you a purchase order for you to fulfill it, for you to notarized it, or for you to archive it. I can't express that intent within the document, which I may want to digitally sign and use unchanged for all three purposes. I found I couldn't reasonably express processes in CBL ; something along the lines of UML  (Unified Modeling Language) is needed and I now just point out to a hypothetical process description. I'm pretty sure this is out of scope for CBL , but it's needed for a complete e-commerce system.
 
Both registries and repositories are essential . ISO 11179 defines a registry, which can be seen as an interface to a repository. There is some variability in what these entities are called by different people, but the distinction is between metadata (the information held in the registry) and the data (the DTDs or XML schemas themselves). For  XML to succeed on the Web we need a means of serving DTDs on demand. To enable sane development of XML schemas, we need to enable reuse of XML schema source. In both cases a repository is required: for XML on the Web, it is what an  XML client would access (perhaps directly, without going through a registry) to resolve a DOCTYPE declaration. For an XML development environment, a registry is essential to permit an overview of what exists already and can be reused, and the repository behind the registry is essential for managing authoritative source (I managed the registry part of this problem in wetware during CBL development, but even inside my head that approach doesn't scale).
 

Beyond CBL

 
 CBL was an interested exercise as a prototype, and allowed me to develop solutions to many XML architecture problems. But from a practical point of view, its semantics are insufficiently integrated with those already used (in a various and confusing ways) in commerce.
 
We intend to respecify CBL in the ECO Framework Project forum (described at), embracing as much of the semantics of EDI as possible. To do this we must determine what pieces of EDI semantics are actually used and used consistently, we must create uniform  XML representations of EDI's various code sets (and the code sets it relies on, such the list of all the world's airport codes), and we must rebuild EDI's innards. As I remarked earlier, EDI's document types are useful—so they are a starting point for top-down design. And the atomic data elements are useful—so they are a starting point for bottom-up design. In the middle there's the problem of what comes in the middle. Some of EDI's “segments” are probably useful; larger structures common across document types need to be defined. And, as I discovered in the course of conversion to SOX , one needs prototypes to provide a common basis for structures that share some contents. A respecified  CBL will be much richer in both prototypes and large structures than EDI —representing XML's facilities for reusing information.http://www.commerce.net/projects/currentprojects/eco/wg/

Graphics-based Product Documentation: Principles and an Application   Table of contents   Indexes   HL7-XML Progress Report