How Five Industries Will Benefit from the Grove Paradigm   Table of contents   Indexes   XML data processing and Relational Database Systems

 

The XML Assembly Line: Better Living Through Reuse

 Simon   Nicholson
  Business Development Manager
  Chrystal Software  Key West, 53-61 Windsor Road
Slough   Berkshire  United Kingdom  SL1 2EE
Phone: +44 1753 559522
Fax: +44 1753 511955
Email: simonn@chrystal.co.uk Web: www.chrystal.com
 
Biographical notice:
 
Simon Nicholson is the Business Development Manager for Chrystal Software Inc, a Xerox New Enterprise Company, based in the European Headquarters in the UK.
 
Simon has been with Chrystal Software since its inception, and was with XSoft prior to Chrystal's formation. During this time he has worked in both sales and technical roles, both in Europe and the USA, and has been closely involved with the growth of the Astoria Document Component Management System, and has spoken at a number of industry events and conferences including SGML/XML events in Europe and the US.
 
Prior to this Simon was with Xerox in a number of capacities including four years in sales training and two years in business consultancy. In total Simon has in excess of twelve years experience in the document production market, including a year spent in St. Helena!
 
Simon is currently the Chairman of the Board of Directors of OASIS, having previously served as Secretary/Treasurer.
 
ABSTRACT:
 
Organisations are becoming overwhelmed by customer demands for the right information delivered at the right time and in the right format. Consider these challenges:
  •  A medical publisher wants to create client-specific editions from a knowledge-base of medical information.
  •  A financial service company needs financial reports tailored for personal investment portfolios.
  •  A help desk wants to improve customer service by locating relevant information with precision and speed.
  •  A manufacturer of luxury automobiles must create owner manuals tailored to individual configurations.
  •  A cellular phone company must distribute current documentation with its products in the language of each company in which it sells. From France to Saudi Arabia to Mongolia, without the documentation, the product can't be sold.
 
The common thread among these companies is the requirement to produce custom documents with mass market efficiency. To solve the problem, one organization may be effective using a "search and find" methodology where another may need "pull-push" or an agent-driven template approach. Nevertheless, the challenge is how to assemble this information quickly and cost-effectively.
 
With its ability to describe data and content, XML  (Extensible Markup Language) Extensible Markup Language has added a new dimension to enable information access, reuse, and dynamic assembly. With illustrations from real-life applications, this presentation will discuss the different methodologies for document assembly, explain XML 's impact on these applications, and describe the latest technology trends in this area.
 

The first assembly line

 
Henry Ford realized his dream of producing an automobile that was reasonably priced, reliable, and efficient with the introduction of the Model T in 1908. This vehicle initiated a new era in personal transportation. It was easy to operate, maintain, and handle on rough roads, immediately becoming a huge success. By 1918, half of all cars in America were Model T's. To meet the growing demand for the Model T, the company opened a large factory at Highland Park, Michigan, in 1910. Here, Henry Ford combined precision manufacturing, standardised and interchangeable parts, a division of labour, and, in 1913, a continuous moving assembly line. Workers remained in place, adding one component to each automobile as it moved past them on the line. Delivery of parts by conveyor belt to the workers was carefully timed to keep the assembly line moving smoothly and efficiently. The introduction of the moving assembly line revolutionised automobile production by significantly reducing assembly time per vehicle, thus lowering costs.

Note:


Taken from the brochure "Henry Ford," (c) 1992 Henry Ford Museum Greenfield Village. <a href = "http://www.hfmgv.org"/>
 
 
The automobile assembly line forever changed manufacturing. Identifying and componentising work processes produced a ten-fold increase in production, quality, and profitability.
 
What if you apply these same principles of a precise, standardised assembly line to the challenge of publishing technical documents? Is it possible to gain similar benefits of quality, speed, and profitability
 

 XML automates information

 
 XML allows information to be automated, producing results as impressive as did the assembly line. XML , eXtensible Markup Language, is the emerging standard for delivery of structured content on the Web. This simplified subset of SGML  (Standard Generalized Markup Language) offers the benefits of its parent language's strict separation of form and content, but is much easier to use and thus likely to gain the same widespread acceptance as HTML  (Hypertext Markup Language) . As a true subset of SGML , XML files can be parsed and validated in the same way that SGML files can. And they can be processed by existing SGML -capable tools
 
 XML is not a fixed tag set like HTML . As with SGML , developers can define any number of new tags. But unlike SGML , XML does not require a DTD  (document type definition) or a rigidly enforced definition of the structure and element as long as the document conforms to XML rules. Publishers working in large groups, creating highly organized, complex documents may choose to continue working with SGML .
 
There are many ways to create XML . Some companies will author XML directly in an editor designed for that purpose. Others will convert existing content from proprietary publishing formats into XML . Still others will leverage existing SGML documents and use XML to assemble it for delivery or interchange.
 
 XML makes it natural to work with information as a series of components, assign metadata, and search within an information set. By combining XML with the use of a component management system that brings engineering efficiencies to the publishing process, organisations can better manage the collaborative authoring process, gaining access control, versioning, information reuse, dynamic document assembly, and easy distribution to multiple media.
 

Components Are Everywhere

 
Component management is an application that maps closely to the advanced technologies and methods used in engineering and manufacturing. From new cars to software, components are the way we make things today. In manufacturing industries as much as 80% of products now consist of components drawn from a company's part library or purchased from suppliers. Product designers routinely tap into internal databases and on-line parts warehouse services in the course of drafting and specifying new models. In software, most of the new code being written are objects, self-contained bundles of information and operations with the ability to send and receive messages in standard ways. Programs can be created by assembling a bunch of these object components and making them exchange information and services with each other.
 

Components reduce complexity and increase flexibility

 
And now components are becoming the trend in publishing as well. Why? Because in publishing, as in other industries, components simplify complexity and increase flexibility for adapting to change. Consider these general advantages of components and how they come into play in a component management publishing environment:
  •  Components make it possible to break down complex systems into pieces that are easier to understand and work with. For publishing groups this means that teams of writers and editors can work on components of the same document simultaneously. Users can more easily locate specific information since components can be searched explicitly.
  •  When something needs to be revised or customised, changes can be made to just the component(s) affected without having to redesign the whole document. If a single paragraph in a document needs to be revised, the author can check out just that paragraph from the component management system rather than the whole document. Or, if it's important to see the change in context, the author can check out the section the paragraph appears in. After editing, when the section is checked back in, versioning information is applied only to the paragraph that has changed.
 

 XML Makes Components

 
 XML brings intelligence to data. It breaks up the information into smaller information components. The smaller and more specific the component is, the more addressable and reusable it is.
 
For example, the document in this illustration uses descriptive tag names to identify the components and structure of the document. A component is a piece of information that can be used independently, such as a paragraph, chapter, instructional procedure, warning note, part number, order quantity, graphic, side-bar story, video clip, or one of an infinite variety of additional information types.


Documents components described with XML.

 
 
Unlike conventional document management systems, component management can deconstruct document files into their component parts (e.g., sections, paragraphs, footnotes, part numbers, etc.) and manage these independently. Information can be searched for, revisions tracked, and assembled into new documents, for example, at the level of a specific section or even paragraph
 

Metadata

 
Another way XML adds value to information is through attributes or metadata. By adding "information about information," users can further describe the information for repurposing. A user assigns attributes to a particular component, for example, to specify whether or not to include it when publishing the document for the Web as opposed for print. When the document is published, the component management system will make the proper adjustments for the target media.
 
Metadata can also be used to identify the intended audience for specific components. In this case, a "beginner" requires more information than an "expert." The component management system will assemble a document and publish the information that matches these criteria


Metadata can identify the intended audience of specific content

 
 

Managing components

 
One of the compelling benefits of managing documents at the information unit is that users can effectively create an information pool from which to draw. Imagine users searching for information meeting unique criteria, organising it as they wish and dynamically creating a new deliverable. For example, a financial portfolio manager could create a series of articles and recommendations which could then be dynamically organised into unique documents based on the profile of each investor.
 
Component-level management assists information sharing across publishing teams and accelerates nearly every aspect of the authoring process - including editing, review and revision, foreign language translation, and distribution via multiple media. With the ability to reuse, modify, and reassemble content components in much the same way engineering organisations do with component code and parts, publishing organisations can achieve similar dramatic improvements in productivity, time to market, and ease of customisation
 

Business Applications

 
As XML gains in popularity, the value of component management will be defined by individual business needs. Consider these very real scenarios:
  •  A medical publisher wants to create client-specific editions from a knowledge-base of medical information.
  •  A financial service company needs financial reports tailored for personal investment portfolios.
  •  A help desk wants to improve customer service by locating relevant information with precision and speed.
  •  A manufacturer of luxury automobiles must create owner manuals tailored to individual configurations.
  •  A cellular phone company must distribute current documentation with its products in the language of each company in which it sells. From France to Saudi Arabia to Mongolia, without the documentation, the product can't be sold.
 

Custom documents

 
The common thread among these companies is the requirement to produce custom documents with mass market efficiency. To solve the problem, one organisation may be effective using a "search and find" methodology where another may need "pull-push" or an "agent-driven" template approach.
 

Search and Link: The Virtual Document

 
With an ad-hoc search, users search for the appropriate information, review the results, refine the results to include only the necessary information, and rearrange the information hierarchy. A help desk technician may use this type of assembly to quickly find applicable information based on a customer inquiry. When published, this virtual document provides many benefits to help desk and customer. The help desk technician can easily create documents with a short life-cycle without incurring excessive overhead. The customer benefits by getting a document tailored to their exact requirements.
 
Because the search provides results by pointing to a component, information must by highly defined with metadata. The payback to a little extra work on the input side of the process is that documents are easy to build, system overhead is extremely low because of component reuse, and data is kept up-to-date all the times
 
The search and link option for document assembly with XML relies on the user to provide as much information about the content of each component at as precise of a level as necessary, but the results are greatly increased value to both the technican and the customer


The search and link method allows users to find and refine results before assembling the virtual document.

 
 

Push and Pull

 
Nothing is new under the sun, especially not the push/pull method for document assembly. A very common application of push publishing is "looseleaf" publishing. This push method included distributing change pages to subscribers that must be re-filed into the notebooks that make up, for example, your company's policies and procedures manuals. Not surprisingly, the percentage of pages that actually were inserted into these notebooks is less than 15%. This phenomenon poses a grave risk when the information being pushed around in paper form relates to safety issues.
 
The pull method is more relaxed. Users pull information at will, but there is no way to verify that everyone who needs the information actually pulled it down
 
Combined, push and pull publishing work together very effectively. A cellular phone manufacturer may create a notification of a change and update the pertinent information and push it out to a customer base. The supplier, in turn, pulls the new information into their document repository.
 
An example of a combination push and pull methodology is channels in Microsoft Internet Explorer. In this case, a channel is a Web site designed to deliver content from the Internet to your browser. The content provider can suggest a schedule for your subscription, or you can customise your own. You can also choose to either be notified that there is new content available or have the updated content automatically downloaded to your hard disk (for example, at night or when your computer is idle) so you can view the pages at your convenience.
 
Coincidentally, the definition of these channels is an XML application called the CDF  (Channel Definition Format)
 

Agent-Driven Templates

 
More formal than an ad-hoc search, agent-driven templates are best suited for output to various media and in formats that are very customer-specific. These documents and the information contained therein are usually long-lived and rely heavily on reuse. A good example can be illustrated by a consumer health information producer. This producer works from a very large "knowledge base" with thousands of topics cross-referenced in an interrelated hierarchy. Component management allows writers to work at the sub-file level and reuse content that can be as focused as a paragraph or a warning statement.
 
When ready to publish, the system dynamically assembles to designated components using a customer-specific XML template. To allow for maximum flexibility, the template inserts standard boilerplate information specific to this customer as well as components found by previously stored queries.
 
The options for such on-the-fly customization are endless. And, the content is consistently up-to-date thanks to linked reuse across the repository


An example of agent-driven assembly template described in XML.

 
 

The Importance of Reuse

 
Reuse is the most compelling feature of the XML assembly line because it saves so much time. Without it, document assembly would be impractical and virtually useless. Manually locating and changing dozens of information elements in hundreds of contexts can consume countless hours. Component management solves that problem by allowing XML documents to reuse content across documents.
 
All of this is made possible by creating standardised and interchangeable parts with XML and employing a component management technique to keep the assembly line moving smoothly and efficiently. For global business processes, document assembly with linked reuse helps organizations get to market faster around the world. Companies save money because they can produce technical publications without redundant labour while increasing customer satisfaction by delivering information assembled to support unique configurations.

How Five Industries Will Benefit from the Grove Paradigm   Table of contents   Indexes   XML data processing and Relational Database Systems