XML in Healthcare: The HL7 Experience   Table of contents   Indexes   The PGC/GCA introduces a new standard in communications

 Ann Arbor 
Arbortext
 Bartlett, PG  
 USA 
 
PG Bartlett
 VP Marketing
Arbortext
 1000 Victors Way Ann Arbor (Michigan)  USA (48108) Web site:http://www.arbortext.com
 Biography
 As vice president of marketing, PG Bartlett has been instrumental in Arbortext's development as the world's leading provider of content creation, management and delivery software for enterprise document automation. A founding member of the XML Working Group, Arbortext has actively contributed to all of the W3C's XML-related specifications efforts.
 Since joining Arbortext, Bartlett co-authored two electronic presentations distributed by SGML Open, the industry consortium, and authored several white papers. He is a frequent presenter at major industry events and has been invited to present and chair sessions at Seybold Seminars, Comdex, XML conferences, AIIM, Networld+Interop, knowledge management conferences, and other major events.
 A graduate of Northwestern University, Bartlett has served 18 years in technical and marketing positions in leading-edge high-technology companies.
 

Introduction

 XML has become the hottest topic in Information Technology (IT) because of XML's potential to be applied to any project where data is exchanged - in other words, it can be used in any IT application at all. And unlike every other IT domain such as programming languages or operating systems, XML has no competition. (SGML does not represent competition to XML because at the data level, SGML and XML are virtually equivalent.)
 This paper focuses specifically on the use of XML in publishing and other applications that involve document information. We show how you can not only use XML to get out from under today's publishing problems, but also how to set a direction to take advantage of eBusiness opportunities. The key is to leverage XML's remarkable dual capability to represent both text and data, which translates into its unique support for both content and transactions.
 The target audience for this paper includes business and technical managers who want to know which critical technical issues surrounding XML's application to document information will have the greatest impact on their organization's long-term ability to realize business benefits and accomplish corporate goals.
 

Dealing with Information Glut

 If content is king, are we its slaves?
 For most of us, the problem is both too little information - and too much. Industry analysts estimate that knowledge workers waste 20% of their time dealing with excessive challenges in finding, incorporating and presenting published information. And that number doesnot account for the consequential damage of relying on content that's stale, incomplete or just plain wrong.
 Properly applied, XML offers a way out of this mess. There are organizations out there who are already reaping the benefits of solving at least some of these problems through the use of XML. Let's look at the principles they followed:
 

Looking at the big picture: corporate goals, business benefits and the information lifecycle

 In the face of XML hype that's reaching unprecedented levels, organizations are feeling increasing pressure to plunge in anddo something , rather than risking being left behind. Doing something can either yield tremendous business benefits and competitive advantage, or you can just waste resources doing the same things to achieve the same results in a different way.
 The rational beginning, of course, is to make sure that whatever you do with XML delivers business benefits that align with corporate goals. Corporate goals either fall under cost cutting or revenue growth, with most businesses focusing on the latter. Within the latter, companies undertake a variety of strategies, such as expanding marketing share domestically or internationally, bringing more new products to market, reducing time to market for new products, reducing product lifecycles. These strategies in turn lead to a variety of tactics, some of which will apply to the flow of business-critical document information both within and outside the organization.
 Document information fuels the flow of knowledge both within and outside the organization, in many instances crucially important to speeding product development and delivering high levels of customer service. Improvements can be achieved at virtually every point of the life cycle of that information:
 
  1.  Create
  2.  Review
  3.  Translate
  4.  Store/retrieve
  5.  Assemble
  6.  Format
  7.  Deliver
  8.  Consume
 Following are some examples of business strategies and how they could connect with the use of XML in publishing:
 
  •  Speed products to market - many documents are created during the product development cycle, some focused on internal information sharing and process definition, and others focused on helping customers understand, purchase, operate and service your products. While many product development organizations already deploy sophisticated tools such as CAD/CAM, SCM and EDA systems, their authoring and publishing systems are typically not consistent, integrated or automated.
  •  Expand market share domestically - as products become increasingly commoditized, companies must increasingly "differentiate on delivery" by focusing on the services around the buying, learning, implementing, using and understanding their products. User experiences on the web are rapidly training your customers to expect fast, personalized, seamless services, much of which revolves around the immediacy, accuracy and relevance of the information you present to them at the time of need.
  •  Expand market share internationally - extending the model above for domestic marketing expansion into other countries requires that you take into account local information needs and infrastructure while reducing the lag between domestic introduction and launches in other markets. The bulk of the effort revolves around translation, which can be speeded up significantly to reduce the time to ready products for delivery in other countries.
 

Creating granular information: enabling reuse, concurrency, translation, personalization and automation

 The foundation for every XML document project is to enable the creation of modules of information. These modules are the building blocks for achieving many key benefits that XML can deliver:
 
  •  Reuse - while authors can copy existing information and paste it into other documents, this approach to reuse raises the cost of revisions because each instance of that information must be found and revised separately. The superior approach is to reuse information by reference to a single source so that changes to the source can be automatically and immediately reflected in every document that refers to it.
     Two significant challenges of enabling reuse are: 1) setting up a system that enables reuse at any level of the document that's desired; and 2) ensuring that authors write information modules that may be used in multiple contexts. You can deal with the first challenge by ensuring that your system can handle arbitrary granularity in the reuse of information instead of being restricted to a single level. You can address the second challenge by a combination of training and building DTDs that specifically consider reuse. (A company called Information Mapping has developed an authoring approach to creating modular information and offers training in their approach.)
  •  Concurrency - modular information lets multiple authors work on the same document at the same time, each focusing on a different subject area. This represents a significant change from the practice of assigning each author to develop an entire book. The modular approach lets authors shift from being generalists to being subject matter experts, increasing both their efficiency and their value to the organization. And allowing multiple authors to work on a single document can reduce development time dramatically.
     Organizations that adopt a modular approach to authoring usually create new documentation job categories. One company has eliminated "technical authors" entirely; they employ "subject matter experts," "information integrators," and "delivery specialists," the last of whom focus on systems that automatically format and deliver content in print, on the web, and in other forms.
  •  Translation - the work of translators typically only begins when the original document is finished. Creating information in modules means that translators can also work concurrently, translating modules as the original language is completed, reviewed and approved.
     Reusing information by reference instead of copying and pasting also reduces the cost of translating, since it eliminates the cost of translating information that's already been translated.
     Personalization - personalization can contribute significantly to the utility of information by tailoring it to each individual's needs. Personalization involves the exclusion of irrelevant information so that each individual receives everything they need but only what they need.
     Enabling personalization requires both the capability to identify the appropriate audience or audiences for any part of a document and also the mechanism to remove irrelevant information at the time of delivery.
  •  Automatic formatting - one of XML's greatest virtues is its potential for use as a media-independent storage format for any kind of information. In other words, XML lets you store information in a "pure" form, without being constrained to any particular medium.
     Those of us raised on word processing take for granted the granularity needed for formatting because we naturally think about documents as separated into formatting objects such as titles, paragraphs, lists, headers and footers. In fact, the first step to understanding XML may be to realize that traditional document formatting contains implicit information that only a human can understand. For example, only a human can figure out that the phrasescherchez la femme ,The Fountainhead , andabsolutely not! have very different meanings (foreign phrase, book title and emphasis) even though they're all formatted the same. In contrast, XML can explicitly capture differences in meaning (although it's not always useful to do so).
     XML's media independence also offers the opportunity to liberate subject matter experts from wasting time on formatting, allowing them to focus solely on content. Preferences vary widely on this point, however. Some organizations find value in allowing their authors to see formatting in a WYSIWYG (What You See Is What You Get) view. Others deliberately show their authors a non-WYSIWYG view to inhibit authors either from assigning XML tags based on desired appearance (often called "tag abuse") or to prevent them from rewriting content to achieve a desired page break.
     Creating content that's tied to a specific medium not only constrains that information to a specific appearance, but also constrains it to a specific sequence and modularity for delivery. For example, a printed book may contain a table of contents followed by chapters, while on the web the table of contents would be presented in one frame while one chapter at a time is displayed in the other frame. The handling of footnotes is another example: in print, footnotes appear at the bottom of the page or the end of the chapter, while on the web footnotes may appear as pop-ups when the mouse pointer hovers over the related reference.
 One of the most interesting and challenging aspects of modularizing information is that the level of granularity that's ideal for reuse is almost certainly different than the granularity that's ideal for formatting. In fact, the ideal level of granularity differs for each of the eight phases of the information lifecycle listed earlier.
 For example, if you could personalize information only at the same level that you create information for reuse, then you're likely to find that the level of personalization you can achieve is much too coarse for your needs. For instance, you may want to omit rows in a table for a certain audience, but this is much finer granularity that you would want for reuse.
 One of the greatest benefits of XML is that it lets you develop document systems that adapt to any arbitrary level of granularity at any point in the process. This flexibility requires attention up front - your DTDs and XML software must be designed with arbitrary granularity in mind.
 There are document systems that only support 2-tier compound documents, where information modules cannot contain other modules. You may need a system that supportsn -tier compound documents.
 

Building shared repositories: enabling finding, sharing, collaboration and control

 Another key to successful XML document projects is to use document management systems or content management systems as a common storage area for document content. These systems provide a variety of key benefits that are listed separately here but are ultimately part of a whole system:
 
  1.  Finding - storing all content in a centrally accessible system (regardless of whether it's a single server or a federation of servers) is a prerequisite to make that content available for searching. This is the key to being able to find content to reuse or update.
  2.  Sharing - the most reliable way to make document modules available for others to reuse in their own documents is to store these modules in a centrally accessible yet controlled repository. Without a central repository, active document components could disappear, "breaking" all the documents that refer to them.
  3.  Collaboration - enabling multiple authors to work on shared content requires a mechanism to ensure that only one person at a time can change that content. That's the role of the check-in/check-out functions of a document management or content management system.
  4.  Control - repositories provide several systems to help ensure the quality and integrity of the content stored within. These systems include access control, which prevents unauthorized viewing or changing; version control, which keeps track of multiple versions of the same content and ensures that only the latest version is changed; and compound document control, which ensures that document components in use are not deleted (in other words, compound document control maintains "referential integrity").
 

Automating content delivery: enabling timely content

 The cost of creating document information is often dwarfed by the cost of maintaining it. We've heard estimates that revisions costs from four times to twenty times more than the cost for originally creating. Costs are higher because of the proliferation of content after it's created. Specific areas that contribute to the proliferation of content include:
 
  1.  Copying and pasting information creates multiple copies that must each be located and separately revised every time it changes.
  2.  Each separate instance of the same revised information must also be translated into other languages.
  3.  For each medium where the information appears, the content must be separately revised. This is true for most organizations because the conversion from the original format (e.g., print) to another format (e.g., web) involves significant manual intervention.
 The impact of these three sources of content proliferation is multiplicative, which leads to the surprisingly high cost for maintaining content. Considerable savings can be realized by modularizing content, creating a single source, and reusing content by reference instead of by copying and pasting. This approach deals with the first two sources of content proliferation listed above, but the third source remains and it's considerable.
 Eliminating the third problem involves storing document content in a media-independent format such as XML (or SGML) and setting up processes that can deliver that contentautomatically for each medium that's needed. The effort to set up automated processes varies depending on the desired quality of the result. For example, you can quickly set up a print composition process that produces single column output with simple headers and footers, but it's a bigger challenge to handle multiple columns, automatically generated tables of contents and indices, and intelligent page layouts that balance the need for page fullness against the need to keep related elements together on the same page.
 Still, the reward for automating delivery is to enable nearly instant updates of content. This can reduce the lag to deliver content changes from weeks or months to hours or days.
 

"The Faster I Go, the Longer it Takes": Avoiding the Publishing Trap

 Using XML will help organizations streamline their publishing processes and improve their ability to deliver useful content to their customers. A comprehensive solution that addresses all of the foregoing issues leaves an organization in a strong position to go further and readily take advantage of eBusiness opportunities.
 However, many if not most organizations will fall into the trap of developing an XML publishing application with only near-term goals in mind under the mistaken belief that using XML is a panacea that will let them do everything. This may be symptomatic of the truism where there's never enough time to do the job right, but there's always time to do it over. Or it may indicate that the organization hasn't gauged the full measure of the opportunity in front of them.
 For example, it's possible to see only a multiple output publishing problem, where the solution is to automate delivery of word processing files on the web. That sort of system may be able to achieve some level of personalization as well, but it not only falls short of what's possible in a publishing system, but also leaves the organization poorly positioned for the next opportunity.
 In particular, some organizations try to leave the existing content creation process in place, tacking on a publishing system to deal with existing content. This leads to a "garbage in/garbage out" problem, where the result is limited by the quality of the data. In particular, conventional content creation leads to the following key issues:
 
  1.  Limited granularity - as we showed earlier, arbitrary granularity is a key requirement for achieving the full measure of benefits that XML document software can deliver. Word processors lack the facilities for supporting arbitrary granularity - in fact, even a native XML editor can fall short in this area.
  2.  Inconsistent data - we've seen countless organizations that have tried to use existing authoring tools to create highly structured content. In every case, this approach exposes a fundamental problem: the most efficient user interface is one that guides users to create valid content and prevents invalid content a the moment of creation. Postponing validating for a batch process creates a motivation to "get it to parse," regardless of the shortcuts involved. Time and time again, authors facing a deadline will commit any measure of "tag abuse" in order to get it to parse, even if the actual tagging is inappropriate.
  3.  Inefficient processes - to avoid tag abuse, some organizations let content creators operate as usual and assign someone else to make it valid. However, this fractures the process because any change necessary to fix up the content typically fail to make their way back to the original content creator. That means that revisions to the content must be processed as if the content were brand new. Even if only five percent of the content changes, the entire instance must be manually processed again.
 

Preparing for eBusiness?

 eBusiness lets organizations leverage the web to integrate their business processes with their partners' processes to enable data sharing and applications integration. Because XML is a standard data format that can represent both document data and transaction data, the proper use of XML sets the stage for eBusiness.
 For eBusiness applications to work, the underlying data must be absolutely consistent, highly granular, available in a timely manner and completely secure. As we've shown earlier, the quality of most legacy content falls short of these requirements.
 The requirements for eBusiness applications related to document content become more evident when you consider typical uses:
 
  •  Content sharing - every eBusiness application involves sharing data with multiple partners (suppliers, customers, channels, etc.). To meet the diverse requirements of multiple partners requires the capability to meet a wide range of data needs, from simple and coarsely granular data to extremely complex and highly granular data. In order to automate the sharing of content, interactions and processes must be repeatable, which requires that the data be absolutely consistent across all transactions.
  •  Collaboration - sharing data can be a two-way street. Imagine being able to share with partners both the creation and delivery of content, so that the final form betrays no trace of its origin. This would enable business partners to present a uniform face to their mutual customers regardless of the source. eBusiness principles can not only allow an individual organization to deliver superior services, but also can allow an entire consortium of vendors to achieve competitive advantage over those outside the consortium.
  •  Personalization (tailored content) - customers and partners attach high value to content that's precisely tailored to their needs and omits anything irrelevant. The eBusiness goal is to make sure that everyone receives all the data they need, only what they need, and at the time of need.
     Meeting this goal enables such benefits as customer self-service, where customers can quickly arrive at exactly the information they need when they need it.
  •  Portals - one of the ultimate expressions of eBusiness is the "portal," which presents a common face to all kinds of business data and applications, not just document data. Portals also provide a platform for interconnecting disparate applications. XML has caught fire as thelingua franca for applications integration, thus enabling document applications to interchange with data applications.
     From a document perspective, portal technology enables documents to merge in content from non-document data sources (such as databases, ERP systems and similar sources). This capability enables organizations to wrap document content around database content, even when that content is interactive.
     
    •  Examples of real applications follow:
       Manufacturing instructions - one manufacturing instruction can be applied to several products, each with minor variations. At the time of need, the production worker enters a specific product number and the appropriate manufacturing instruction is populated with data from that specific product's bill of materials (stored in a product data management system) and delivered to the screen. Manufacturing instructions can also vary based on the experience of the worker and the equipment available at that specific workstation.
    •  Marketing information - one product data sheet contains geographically varying details such as pricing, measurements and specifications. The details come from spreadsheet data. At the time that a salesperson needs a datasheet, the appropriate spreadsheet data is extracted based on that salesperson's region and inserted at the appropriate points in the document.
    •  Service and diagnostic information - service information and troubleshooting instructions are presented at the time of delivery as an interactive application. The user sees only the instructions appropriate to the current diagnostic step; when the user enters the result, the next appropriate step is displayed.

XML in Healthcare: The HL7 Experience   Table of contents   Indexes   The PGC/GCA introduces a new standard in communications