XML and Healthcare Vocabularies in Real-world Products   Table of contents   Indexes   Leveraging Standards to Create APIs

Parsons, Jonathan
 Reading 
 USA 
Xyvision Enterprise Solutions, Inc.
 
Jonathan Parsons
 Director, Product Marketing
Xyvision Enterprise Solutions, Inc.
 30 New Crossing Road Reading (Massachusetts)  USA (01867-3254) Web site:http://www.xyenterprise.com
 Biography
 Jon Parsons has over 18 years experience in applying computer technology to the creation, management, and delivery of information. At Xyvision Enterprise Solutions, he is Director of Product Marketing for both Xyvision Production Publisher, a sophisticated batch and interactive composition engine, and Parlance Content Manager, a content repository enabling component management in an XML world.
 

Important Principles in Content Management

 With the ever widening adoption of XML, the question of how best to manage the information contained in XML files and documents becomes crucial for both workgroups in an enterprise and for the entire enterprise itself. Information about products, business transactions concerning those products, information as a product, all encoded in XML format, becomes the foundation on which almost any company conducts its business.
 Experience with structured markup in the SGML world has given valuable real world lessons for how best to approach content management in the XML world. These lessons lead to the conclusion that the following characteristics are crucial to almost any enterprise approaching content management:
 
  •  Object-oriented presentation of and access to the information
  •  Workflow support of collaboration and the development of information
  •  Ability to search both content and metadata to locate objects in the repository
  •  Ability to manage large numbers of objects and maintain performance
  •  Ability to use known databases throughout the enterprise
  •  Ability to store and manage unstructured data as well as structured information
 These principles are useful guidelines in formulating and evaluating approaches to content management. Each of these grows out of experience in managing structured markup in publishing applications and each is worth exploring in a bit more detail.
 Object Orientation: An object-oriented approach to the management of XML fragments allows the creation of a repository of objects that can be readily accessed either interactively or programmatically. Graphical interfaces that allow users to assemble, recombine, share, and reuse information readily and easily provide large payback in eliminating the need for duplicated information. Such interactive interfaces allow end users to navigate to the information they need and to point and click on the objects needed to assemble new documents. The object-oriented approach provides a paradigm for reuse of information, reducing the time required to create or repackage information. In addition, by enabling a repository of discrete chunks, the object-oriented approach allows the information to be accessed programmatically and delivered to requesting applications as needed at the precise granularity required.
 Workflow: For many applications, a strong notion of when data is complete, ready, and publishable is a key requirement. Making information that is incomplete or premature available to wide audiences who should not have access to the information at an early stage can have serious business consequences. As enterprises become more dispersed, the ability to collaborate from remote locations becomes more and more important. Short lived teams that convene to focus expertise on a problem and capture the solution in a document or information set need an ability to work together in an orderly fashion. Users who rely on the accuracy of information require notification when that information is updated in the repository. A good content management system will track the state of information and identify when it is under development, or changes, or has become out of date - and then automatically take action to notify the interested parties. This is all part of managing and supporting workflow.
 Searches: As the number of XML objects increases and as the use of content management moves from departmental focus to wider audiences across departments or even the full enterprise, the ability to find the relevant object in the repository becomes more and more important. A good content management system will offer the ability to search the repository for the information desired. The search capability ought to be two-fold, providing both a way to access information objects by querying on metadata and attribute values and by querying on the textual content of information objects. Users can then ask questions like "What objects have changed since midnight last night?" or "Where is the discussion on the new features of Version 2.0?" or "What statutes govern this particular topic?".
 Scalability: As more and more objects are added to the XML repository, the ability to scale the content management system becomes more and more important. Here the question of architecture comes to the fore. While some content management systems are built upon object-oriented databases, the approach recommended here is a hybrid object/relational approach that uses an object-oriented layer on top of a relational database. This approach offers the advantage of the proven reliability and scalability of relational technology while providing the advantages of storing objects at the appropriate level of granularity for optimal reuse.
 Leveraging IT skills: A key advantage of the hybrid object/relational approach is that the underlying database used for the repository can be a relational database already familiar within the enterprise. IT organizations already are familiar with relational databases and that expertise can be leveraged in building and maintaining the XML repository. In addition the general confidence in and known track record of widely deployed relational databases makes the management decision to adopt and roll out an XML content management system easier. There is an economy to be had in common system administration skills used to maintain the underlying relational database.
 Supporting unstructured data: While the use of XML as a standard structured data format is to be encouraged wherever it can be, there remains a vast amount of legacy information in unstructured formats and the current wide deployment of tools that produce unstructured information ensures that structured and unstructured information will coexist for some time to come. Therefore, it is important when introducing a content management system capable of handling the structured environment of XML, that it be able also to handle the unstructured information in the current and near term environment as well. By doing so, such a content management system will enable the evolution to XML while also providing access to the existing knowledge assets of the enterprise.
 

One Vendor's Approach to Content Management

 Parlance Content Manager is a product offering from Xyvision Enterprise Solutions that meets the criteria outlined above.
 Object-orientation: Available on multiple operating system platforms (Solaris, AIX, and NT), it provides an object-oriented layer on top of standard commercial relational databases that are widely deployed in the IT world. The hybrid object/relational repository stores XML fragments as objects and provides the means by which users can control access to the information. Information objects can be checked out and under development at the same time that others access those objects for other purposes. Those outside the project team who retrieve the object while it is being edited (or have it served to them out of the repository in response to a query) see only the last officially published version. Their view of that object will not change until the person editing it has "posted" it or checked it in after completion of the defined review and development cycle.
 Another aspect of the object-oriented approach is the ability to make use of the object in multiple contexts. To take an example from the legal world, a paragraph from a statute might appear in a version of the annotated laws of the jurisdiction, again in a handbook with commentary for the practicing lawyer, and in a third use it may be part of a case book for law students. When the statute is modified by the legislature, the amended paragraph need be changed and posted only once in the XML repository and the change will be reflected in each of the contexts in which it appears.
 Yet another characteristic of the object-oriented approach taken by Parlance Content Manager is its ability to maintain knowledge of an object at a certain stage in its development. Each object has a full history kept and each individual object can be rolled back to any point in its revision history. In addition, the state of a collection of objects (a document or information set) can be checkpointed at significant times in the life of the information. The state of the entire collection of objects at that checkpoint can then be retrieved on demand from the repository. This provides a very sophisticated ability to manage and control configurations of information objects. Checkpointing can be used for many purposes, including the tracking of engineering changes in product documents, maintaining multiple versions of published documents, and ensuring that Web postings are consistent, "official," and current.
 Workflow: Parlance Content Manager offers built-in workflow management that supports the collaboration of a team of people writing and developing information in a workgroup environment. Users have complete flexibility in defining roles, specifying steps in the process, and triggering automatic behavior at significant points in the process. The integrated workflow manager allows privileges and security to be assigned to individuals and roles, so that at each step in the process a user has access to exactly the right tools and privileges needed to perform the task at that step. The workflow manager enables the tracking of projects and project status so that at any point in time it can be determined whether a project is on schedule or slipping and which components of the project have been completed or are still in process.
 Searching: Parlance Content Manager provides two ways of finding information in its XML repository. One is a query tool that finds objects in the repository based on the metadata kept on each XML object in the system. The second is a full content-based retrieval system based on Verity's search engine. Users can ask for any object containing a certain word or phrase anywhere in the text, or they can make use of their knowledge of the XML structure and ask for a certain phrase within a particular text element. Even more precise queries can be made, asking for objects containing tagged XML text elements with certain attribute values. Users can construct very specific queries using patterns and boolean logic, as well as requesting that the search engine take into account variant word forms in searching for the desired phrase.
 Scalability: Parlance Content Manager has met the challenge of real world implementations where the number of documents, the number of objects, and the number of users has each been very large. The number of documents can be in the tens of thousands, the number of objects in the hundreds of thousands, and the number of users in the several hundreds.
 Leveraging IT skills: Because Parlance Content Manager is built upon commonly deployed relational databases, typically an IT organization already has used such a database, has developed skills and experience in using such a database, and sees value in using such a database as an XML repository.
 Supporting Unstructured Data: While XML-based tools will likely be rapidly deployed, and XML will be the focus for the creation of new information, existing tools that create unstructured information, graphics files, and other forms of information will, for some time, need to be stored, managed, retrieved, and associated with XML-based information sets. The ability to integrate and manage that unstructured information along with and as part of the XML repository is crucial. Parlance Content Manager can accept such unstructured information, track it within the workflow and repository using metadata just as with XML objects, and provide access to it on demand.
 

A Customer Case Study: Managing Automotive Manuals

 For a successful worldwide launch of a new car, the manufacturer needs a full range of publications. The owner's guide, audio guide, and warranty material must be present in each vehicle. A service manual must be available to the service providers in every country.
 These documents must be translated into as many as 30 different languages. Each document must be modified to conform to the national regulations for each country that imports the new car. Cars shipped to Mexico, for instance, must include jumper cables, and the documentation must reference this.
 Creating, managing, and coordinating the information associated with a single new vehicle launch is a huge undertaking. Managing the information for all of a manufacturer's models is a monumental one.
 Furthermore, the information management challenge compounds over time, because each model changes to a greater or lesser degree each year. For each of the models, in each of the years, in each of the languages, the information must be available in print, on the Web, and on CD-ROM.
 A company such as Tweddle Litho Company of Clinton Township, Michigan faces this challenge using Parlance Content Manager as an SGML and XML repository to control the dynamic process of creating, changing, and delivering automotive information from start to finish.
 Founded in 1954 as a small printing company, Tweddle Litho pioneered computerized photocomposition and typesetting. As technologies changed, the company seized the opportunity to move further upstream from publishing to encompass the entire data management cycle, including writing, translation, and other related services.
 Today, with customers such as Ford Motor Company, General Motors, Chrysler, Nissan, and Volvo, Tweddle Litho produces approximately 55 percent of all the automotive owner's literature worldwide.
 

Global Markets, Global Information

 As Tweddle's business grew, so did the requirements for internationalizing the information it managed for its customers. It soon became acquainted with the difficulties of the translation process:
 
  •  Passing information to translation houses, communicating about "what was really meant," and answering questions about context
  •  Tracking the status and progress of the translations underway and what had been returned
  •  Retranslating what had already been translated and hadn't changed — when only a small portion of a document was modified
 Managing the process manually was time consuming and prone to error. It also required a large editorial staff.
 In 1997 Tweddle undertook to find and apply a content management system that would strategically position them to solve the large-scale problems of its customers. They knew they had to support a vehicle release in 30 languages in 60 countries simultaneously. They had to support 40 vehicle lines with a total of 6000 books. They concluded that Parlance Content Manager was the content management system that could do it.
 

Managing the Components

 The object oriented approach used in Parlance Content Manager allowed Tweddle to achieve three goals, each of which contributed to controlling costs:
 
  •  To handle their information as small units or chunks that could be assembled into documents. These small units provide the level of granularity needed to control translation costs.
  •  To reuse information in different contexts. This enables them to write a component only once and when it changes, update it only once, having the changes replicated in each place it is used, thereby reducing writing costs.
  •  To build different views of the information. These views allow a technical writer to focus on a particular subsystem such as air conditioning or braking that is used in more than one model of car. One writer can specialize and focus on creating complete and accurate descriptions for a particular subsystem. Another writer can call in the relevant descriptions in developing a view of the information that is the owner's manual or a repair manual.
 

Improving the Translation Process

 The translation workflow at Tweddle is based upon the use of in-country translators; that is, native speakers of a language living in the country for which the information is destined. In this workflow, there is significant exchange of information at "arm's length" between Tweddle and the translators. A key aim in managing the translation process is to preserve context so that the translator has sufficient information to make sound decisions about which word to use in rendering one language into another.
 When changes are made to a previously translated document, Parlance Content Manager creates a translation package for the translator that includes a proof version that shows how the component looked in the last printed book, the previous source file for the component, the revised source file, and a list of differences. The package provides the context the translator needs.
 The system creates the translation package automatically, triggered by the changes made when the revised component is returned to the repository after revision. It is checked out for translation and the translation package is delivered electronically to the translator. Once the translation is complete, the revised translation component is checked back into the database and its status is automatically noted by the workflow manager.
 A further advantage of the Parlance approach is the ability to coordinate parallel work. A single component can be in translation for several languages at once or components that will ultimately be used for the same document can be written in different languages and then put into the translation process to produce the needed versions.
 

Managing Delivery As Well as Development

 When all revisions and translations are complete, Parlance Content Manager assembles a complete version of the book from its components and sends the document to Xyvision Production Publisher, a high-end batch composition system, which automatically composes and paginates the document into publishable print format.
 The improved and automated translation process has shortened cycle times considerably. Where before it could have taken as much as 6 months to produce a translated book, the same results can be achieved in 3 weeks.
 In short, because of its object oriented approach, its integration of robust and proven relational databases, its support of workflow and collaborative information development, its ability to find and access information in a large repository readily and easily, its ability to handle real world demands, and its ability to manage all sorts of data, Parlance Content Manager provides an efficient, proven, and cost-effective way to manage XML content.

XML and Healthcare Vocabularies in Real-world Products   Table of contents   Indexes   Leveraging Standards to Create APIs