UIML: An XML Language for Building Device-Independent User Interfaces   Table of contents   Indexes   XML in Healthcare: The HL7 Experience

Adams, Ann
 USA 
Webster
Xerox Corporation
 
Ann Adams
 Project Manager
Xerox Corporation
 800 Phillips Rd. Bldg. 845-17S Webster (New York)  USA (14580)
Email: Ann.H.Adams@usa.xerox.com Web site:http://www.xerox.com
 Biography
 Ann Adams has over twenty years’ experience in information systems design, development, testing and management. Her areas of specialization include database and SGML publishing applications, as well machine language translation. Ann has spoken at conferences on document management, translation technology and multinational authoring at events such as Documation and conferences held by the Localisation Industry Standards Association, Society for Technical Communication and American Medical Writers Association. She is co-author of a case study of multinational writing at Xerox published in the Journal of the STC. Ann has completed coursework for the MS in Information Technology at Rochester Institute of Technology. She is a member of the Rochester NY SGML/XML Users Group and chair of the local chapter of the IEEE Computer Society.
 

Introduction

 Choosing a content management system can be confusing, in an environment where new products and architectures appear in rapid succession. This paper will provide a roadmap for those who need to manage data elements of various sizes and types.
 

Background

 content management 
 
In the recent past, if a systems manager went shopping for a content management solution, the choices were rather limited and the selection process was therefore simple. Quite likely, the problem was one of file management, so the choice consisted of several systems that accomplished that function. Operating system limitations further narrowed the possibilities; so much of the selection process turned on price or some other feature, such as foreign language or specific printer support. If, on the other hand, SGML was to be managed, the choice of product was even more limited, with very few pioneers in that market.
 browser  
 granularity 
 
Today, the systems manager who wishes to manage content faces a much more bewildering array of choices. According to the June 2, 1999 report from the Delphi Group,Document Management: Into the Mainstream "the market itself is becoming ever more highly fragmented with many players targeting specific vertical niches and applications." Two recent developments have contributed to this fragmentation. The advent of the Web browser as the user interface of choice removes the prior limitations on client operating system support. In addition, several products that formerly managed only files can now also manage SGML and XML elements, although the granularity of those elements will vary from product to product. Meanwhile, the systems that manage pure SGML have expanded their reach to accept traditional desktop publishing files with some constraints. Other new products, designed with Web architecture from their inception, are primarily XML-centric.
 

Fundamental questions

 Another consideration arises from the question of how to notify customers that updated information is available for their use. An infrastructure to support the delivery of the new data should be in place if the investment in the content management system is to be justified. The desire for and feasibility of reusing units of information are also issues that should be addressed in the original evaluation of various systems. Unless such questions are answered prior to selection of a content management system, the organization may not be able to make maximum use of the technology chosen.
 This paper will describe the capabilities and architectures of some of the content management systems available today, with implementation case studies.
 

A file management system turns toward SGML

 Documentum 
 
Throughout most of its existence, Documentum has functioned as a file management system. Versioning, history and check-in and checkout were strictly on a file-by-file basis. The most prevalent use of such a system was for the storage and control of common file formats, with tight integration with the Microsoft Office suite, word processing file types and such archival formats as Adobe's PDF. This emphasis enabled Documentum to successfully target regulated industries such as those submitting new drug applications and manufacturing concerns engaged in a stringent quality drive such as the pursuit of ISO9000 certification.
Datalogics
Framemaker
 
Recently, Documentum has expanded its reach to include the ability to accomplish its management tasks on a smaller subset of certain types of documents. Through the efforts of a vendor long known in the SGML arena -- Datalogics -- Documentum can now control documents created with Framemaker or Framemaker+SGML.
Arbortext
FrameLink
 
In the case of the Framemaker+SGML integration, the integration software (which is called FrameLink+SGML) allows access to some piece of SGML information, which is certainly smaller than a document, but considerably larger than a single element. Similarly, Willow accomplishes much the same goal with an Arbortext ADEPT integration to Documentum. In what instance might such a document fragment be of interest?
 

Defining a "procedure"

 granularity 
 
The technical publications department of a manufacturing firm produces maintenance manuals for its field service representatives. These manuals, which are created in SGML, consist of many sets of steps to guide the technician in the diagnosis, isolation and repair of problems and malfunctions in the equipment. The manuals consist of a collection of these steps, each of which zeros in on a particular cause of the machine problem. This collection of steps is accompanied by at least one graphic illustration. In some cases, a second graphic is necessary to support the text directing the reader to the correct course of action. If the collection of steps becomes too long or a large number of graphics is necessary, the writer of the service manual will usually rethink his decision to include this much material and break the steps into more manageable units. This collection of steps along with its supporting graphic or two was christened an "information element" at another manufacturing firm -- Caterpillar -- whose service manual creators were early pioneers in defining the granularity of their output. At the manufacturing company in question, this discrete collection of information is called a "procedure". The creators and the consumers of the information implicitly understand its intent and "size". The addition of supporting multimedia elements does not alter the basic concept. The construction of the organization's Document Type Definition (DTD) supports this idea of a procedure.
 

Infrastructure changes make use of the procedure

 In its first generation of electronic service manuals, the manufacturing company distributed the data on CD-ROM, as reported in the article "Creating Electronic Documents that Interact with Diagnostic Software for On-Site Service" in theIEEE Transactions on Professional Communication, Volume 40 Number 2, June 1997 .
 For the next release, the situation has changed. The World Wide Web is ubiquitous, as is the ability of the users to access it through a browser. Hard drive space on laptops has gone from precious to commodity status. The 28.8-baud modem is being replaced by affordable broadband network access. For these reasons, it is no longer necessary to re-issue a CD-ROM to update the service manual information. The update can be downloaded along with the technician’s daily service call instructions. So, it has been determined that the unit of update will be a procedure, with appropriate user notification
 Another circumstance points to the procedure as the chosen unit of granularity in this application. The integration of the service manual with parts list information requires a customized cross-referencing application. The cross-referencing software applies a unique identifier to each individual SGML element in the entire manual. These unique ID's make the reuse of individual elements unfeasible, removing one of the main benefits of defining small units of information.
 In this case, Documentum is used to manage the procedures with the Framemaker+SGML elements mapped to that level in the document. Documentum is able to handle this level of granularity well and the procedure is used as the unit of update, as mapped through the Framelink+SGML interface.
 

A pure SGML management system turns toward file fragments

 Astoria 
Canterbury
 
In its original release, the Astoria object management software dealt strictly with SGML or XML elements. Today, in a product known as Canterbury, the Astoria product can be modified to deal with templates in Framemaker. If the manufacturing company cited above had created their service manuals in Framemaker rather than in SGML, they might wish to manage their procedures with Canterbury, which would allow them to access that same level of granularity which would suit their requirements in that particular instance.
 

Requiring reuse

 training 
 
A high tech software company produces both instructor-led and Web-based training courses. The course material has much commonality, but is presented differently, depending on the delivery medium. Due to increased demand for the company's products, independent production of the multiple types of courses is no longer economical or timely. The goal is to no longer design courses, but to provide "instructional content" which can be used across all the current and future output venues. A goal for the future is to interface to a database to enable new business opportunities for "publish for one", or push delivery of services
 The reuse requirements are detailed and stringent. For instance, a bulleted list in a student guide would be summarized on an instructor overhead, but presented as a series of HTML screens on the Web. A vehicle currently exists by which individual elements are assembled into a Computer Based Training course. Thus, the delivery mechanism exists and the audience is able make use of the update of single elements.
 In this scenario, the information chunk at the procedure level cited above does not meet the needs of the content creators. In this case, the ability of Astoria to manage and reuse down to the level of each individual tagged element is called for. Managing at a higher level will not allow this organization to meet its goals.
 

SGML managed at the document level

 A publishing company supplies newsletters and current information to subscribers on a daily basis. Their customers pay a premium to be sure to have the most up-to-date and accurate data. This company publishes only in English and isn't much concerned with revisions or reuse, as a change for them usually means a new law or legislative ruling dictates a rewrite
 Xyvision  
 
This organization uses SGML to standardize their output, both from a structure standpoint and for a common look to their printed output. They make use of the automated loose-leaf publishing and complex tabular composition of Xyvision Production Publisher and store their documents in Xyvision's Parlance Document Manager. Typical documents produced with XPP include scientific journals, industrial catalogs, directories, legal and financial documents, textbooks, and technical manuals.
 This setup works very efficiently for this company, for whom the management of procedures or individual elements is overkill and not applicable for their business needs.
 

Underlying Technology

 The database technology used to power these content management systems seems to have dictated various products' original focus. However, as the vendors have gained experience and sought to expand their potential customer base, many have adapted their technology to broader uses. The vendors who have relied on object technology for their database engines have extended their offerings to include management of larger chunks of data than their underlying infrastructure handles natively. The object-oriented database concept seems to be holding its own, in spite of early suspicions in the relational community and several vendors have survived and appear to be stable.
 Oracle  
 
Meanwhile, the largest relational database vendor, Oracle, has added some object attributes to its latest product. The content management vendors have managed to create mapping capabilities into the relational databases while still maintaining acceptable performance. They have accomplished this by keeping their object/relational mapping at a high enough level that impedance mismatch does not become an issue.
 

Conclusion

 When shopping for a content management system, it pays to carefully analyze the size of the data object that will best suit the goals that you are trying to reach. An analysis of the uses for your data and the optimum chunk size will start you thinking in the right direction. Once you have determined what that best fit is, be sure that a delivery infrastructure exists (or can be created) so that you can maximize your investment and take full advantage of your information assets.

UIML: An XML Language for Building Device-Independent User Interfaces   Table of contents   Indexes   XML in Healthcare: The HL7 Experience