Objects and XML for Next Generation Web Applications   Table of contents   Indexes   Electronic Information Commerce

Erlangen
 Germany 
Simon, Lothar
eidon GmbH
 
Lothar T. Simon
 General Manager
eidon GmbH
  Am Weichselgarten 7 Erlangen  Germany (D-91058)
Email: sim@eidon-products.com Web site:www.eidon-products.com
 Biography
 Lothar T. Simon is founder and general manager of eidon, a SGML/XML/CGM service provider and producer of eidonXbase, a content (component) management system based on relational database technology. He had a strong background in expert system technology and hypertext systems before he started to apply SGML to create "intelligent texts" in 1990. In 1991, he invented eidonXbase, which was to become the first content management system for SGML worldwide. He (co)authored two books and a number of articles on hypertext.
 

Introduction

 Reusing content of documents means saving time and money. Content reuse is therefore a key function a document content management system has to offer. Typically, such systems implement this requirement by enhancing their internal data model and controlling the reuse "from outside the documents". We argue that conforming to the concept of SGML/XML (all relevant information about the document has to be inside the document) this should instead be implemented using HyTime links, i.e. attributes enclosed inside the documents themselves as well as based on XQL and additional scripting functions. This approach leads to a simple, powerful, open, andportable solution for reusing document contents.
 In this article we will describe the basic technical approach and its implementation in the eidonXbase content management system, discuss the role of the query language, and show in the context of an application how one can organise very easily the reuse of document contents. Finally, we will give an example of how this content reuse mechanism can be used to automatically build "overview" or "summary" documents from the contents of existing documents.
 
 

Document versus system controlled content reuse

 Explicit contents-related structuring of information using SGML/XML allows a number of applications which could otherwise only be realised on a circuitous route or not at all, respectively. In this context it is essential that the documents themselves contain all relevant information: Identifying the author of a piece of text can be achieved very conveniently if the author is named and expressly marked within the document itself.
 One of the major SGML/XML-based applications consists in the reuse of existing text with the objective of compiling new documents using portions of old ones. None of the reused parts has to be rewritten, checked, or possibly translated in addition. They are simply taken over.
 Contents reuse is regarded as one central function of a content management system. Many implementations, however, are based on the assumption that the system itself is supposed to administer both the contents to be reused as well as the correspondent locations of such reuse. This violates, however, the hard and fast rule of SGML/XML, saying that all information being of relevance over the lifetime of a document are to be stored within the document. And what could be more important than keeping track of the origin of reused contents? If one fails to observe this basic rule, one will inevitably lose a central part of the information on the document with the next system change (which occurs mostly much earlier than one fears). A SGML/XML-conform solution, however, is as simple as self-suggesting: The information as to the place and kind of information being reused should occur in the form of attributes or HyTime links within the documents themselves. And this was exactly the path we were pursuing with eidonXbase: For the purpose of reusing contents, the very contents of documents are interpreted.
 

Technical approach

 The technical approach is based on a simple idea. One can use special "reuse attributes" to describe which contents from one document ("source document") is to be reused in another document ("target document"). Reuse attributes to be used in target documents have to describe nothing else but:
 
  •  the source document (so-called "source attributes"),
  •  the part of the source document to be reused ("fetch attributes")
  •  the place inside the target document for insertion of the reused content.
 Source attributes declare the name of the source document(s) and optionally its (their) version(s) and/or variant(s). Fetch attributes declare which content is to be fetched. The tag where such a fetch attribute is attached defines the place for insertion.
 

Source attributes

 To declare a source of contents to be reused, and in addition to a mere reference to a document, a management environment also requires that one is able to refer to the desired version and variant. This is done using the "DNAME" (for document name), "DVER" (for document version) and "DVAR" (for document variant) attributes. Using these attributes it is not only possible to incorporate any number of source documents within one target document, but also to use the current or any firm version of a source document or to pick specific variants of a particular source document.
 

Two reuse mechanisms

 

Same tag reuse

 We identified two different kinds of practicable reuse mechanisms based on "fetch attributes". The first of these mechanisms - "same tag reuse" - is particularly suitable for reuse in the more narrow sense, since it is based on attributes which were allocated on a static basis and thus enables the "reuse paths" to be traced back and checked using the standard content query functions of the system.
 
 The same tag fetch attribute (named "reuse") provides an extremely simple, yet powerful mechanism for describing the content to be reused. Attaching such an attribute to a tag in the target document simply asks the system to fetch a part of the source document which is tagged in the same way (generic identifier plus all attributes).
 

Query reuse

 The query fetch attribute contains or names an XQL query (which can be enhanced with eidonXbase Xript commands for more advanced functionality) to find the contents to be reused. In this way it is possible to generate almost any contents "into" target documents. This mechanism is especially suited for the dynamic creation of indexes, tables, glossaries, etc. Retraceability, however, is restricted to the determination of the query used to create the contents.
 
 

The interpretation of reuse attributes by eidonXbase

 Based on these reuse attributes, eidonXbase includes content to be reused into target documents during export and discard it during import. No preparation or additional programming is required. This mechanism will work with any SGML/XML editor. It makes sense to adapt the editor in such a way that it will not release reused contents for editing at all, or that it will at least highlight them by some colour.
 

Sample reuse workflow

 A sample simple workflow for creating a new document based on the content of an "old" document may look) as follows:
 
  •  Copy the "old" document (andregard the original as the source and the copy as the target document).
  •  Attach reuse attributes to all tags of the target document you want to reuse from the source document.
  •  Change all other parts of the target document at will.
  •  Import the target document. That's it!
 

Sample application

 Content reuse often is required in technical documentation where manuals for different variants of a machine have to be described. With the document-controlled reuse mechanism implemented in the eidonXbase system one can build up an information pool with content reuse very easily and efficiently: It is obvious that a clear distinction between source and target documents is a good idea. This prevents from creating "reuse spaghetti code". Thus, one will create a set of source documents, e.g. one of them containing maintenance information, another one functional descriptions, yet another one procedures etc., all information to be shared by different target documents. The same documents can exist as translated "shadow documents" in different languages. Organising easy and exact identification of contents to be reused is an important issue for source documents (content-oriented tags, special "IDs", or "REUSE-IDs"). To create a target document you simply "include" the contents you want to reuse.
 An example which shows the power and ease of this approach comes from an SGML/XML/CGM application using eidonXbase for one of the largest machine builders in the world. Upon encountering an error, these machines generate a message number. The message numbers including references to the pertinent measures for a stepwise elimination of the respective error are described in tabular form on message documents. The formulation of the task was quite simple: Upon entering a message number, the technician was supposed to obtain a dynamically created document containing precisely the table entry for the desired message number plus the complete description of all remedy steps contained on the corresponding list. The Xript script language of eidonXbase realised the dynamic creation of this document. The referenced contents, however, were not firmly copied into this document, but rather linked in via "same tag reuse" attributes. The entire functionality was realised within just one day.
 

Summary

 Document-controlled document content reuse is simple because it is controlled by very few simple attributes and requires virtually no preparation. The most difficult task would be to write an XQL query for a complicated reuse requirement. It is powerful because you can reuse virtually every content of your documents contained in the database. Everything you can do in XQL (plus in the 150 commands XMLbase Xript language) is available for reuse. It is open as reuse information, is not hidden inside the system, but instead is enclosed inside the document. Thus, reuse management can be done by the same functions one uses to query and analyse the contents of the documents. Finally, documents with reused contents are portable because they contain markup information as to the location and manner of contents reuse. All the information for processing this information elsewhere is available. As the need for content reuse is one of the big arguments for content markup and management, and the advantages of the approach presented above are obvious, we encourage a discussion to standardise this kind of reuse mechanism.

Objects and XML for Next Generation Web Applications   Table of contents   Indexes   Electronic Information Commerce