XML for legislation drafting, management and Web delivery&mdash,How structured document representation facilitates automatic processing   Table of contents   Indexes   Case Study: Boeing Intelligent Graphics for Airplane Operations and Maintenance

Structured XML Editing
 

 Ann Arbor 
 Arbortext, Inc. 
 Bartlett, PG 
 Michigan 
 
PG  Bartlett
Vice President, Marketing,  Arbortext, Inc.  
 1000 Victors Way
Ann Arbor  (Michigan)  48108 
Email: pgb@arbortext.com

Biographical notice

As Vice President of Marketing for Arbortext, PG Bartlett has been instrumental in Arbortext's development as the world's leading provider of content creation and management software for enterprise XML applications. Bartlett has served 18 years at technical and marketing positions in leading-edge high-technology companies. He is a regular presenter at major industry events and has been invited to present and chair sessions at Seybold Seminars, XML conferences, CALs conferences, and other major events. Since joining Arbortext, Bartlett co-authored two electronic presentations distributed by SGML Open, the industry consortium, and authored several white papers.

 Introduction
Introduction
 Organizations with large amounts of document information typically require an XML authoring and editing tool that easily integrates with content management tools and content delivery tools. That combination yields a complete automated document system that can be the key to an organization gaining significant competitive advantage through improvements in information quality, time to market, and production costs. The designs of these systems usually emphasize data integrity, data reusability, process automation, and workflow consistency. Data integrity is key to the other design factors because without absolutely consistent data, the rest becomes difficult or impossible to achieve.
 The title of this paper, "The Benefits of Structured XML Authoring for Content Management," contains two key concepts: structure and content management integration. Each of these concepts plays a pivotal role in the successful deployment and operation of a high-performance automated document publishing system. The following paragraphs explain more about what these terms mean and why they are important.
 Figure 1. Shown above are two views of the same document. The left view illustrates how the apparently unstructured information on the right can gain structure through the use of XML.
 Introduction
Structure
 The contents of documents are often described as "unstructured" information in contrast to the "structured" information stored in a relational database.
 But what if that could be different? What if you could impose a rigorous structure on a document? What if you could handle documents as if they were data?
 It can be different! And XML turns out to be the key, because XML allows you to impose structure on an apparently unstructured collection of text. With consistent structure, documents can be treated just like other data, which can be automated, processed, reused, protected, classified, and extracted for use in a limitless variety of ways.
 Imposing structure on document information yields several key benefits:
 
  • Multiple Outputs - Structured document data is often described as "presentation independent" because it's stored in a way that's independent of any particular medium. That allows organizations to deliver their information automatically from a single repository to the Web, CD-ROM, print, and other media. This is a huge contrast to word processing and desktop publishing file formats, which are oriented specifically towards publishing on paper.
  •  
  • Reuse - Many organizations re-create existing information far more often than they reuse existing information. That inefficiency causes inaccuracies, version skew, delivery slips, and inflated costs. One of the primary reasons to build a structured document repository is to eliminate those costs by enabling the maximum possible reuse of existing information. Storing that information in a structured database provides the controls needed to maintain the integrity of the data regardless of when, where, and how often it's used.
  •  
  • Interchange - Organizations can interchange their data freely with suppliers, partners, and customers when the data is based on a standard structured data encoding scheme like XML.
  •  
  • Automation - Structuring your document data and storing it in a repository can yield process improvements through intensive automation that are similar in kind and degree to the benefits of implementing relational databases to replace handwritten ledgers.
  •  Introduction
    Content Management Integration
     Every organization that manages large amounts of document information will, sooner or later, seek both to structure that information and to store that information in a content management system.
     The specific method of content management varies. In some applications, document information is stored directly in a database. In many others, it's stored under the control of a document management system.
     Regardless of the specific approach, these systems primarily ensure data integrity through security controls that prevent unauthorized viewing and changing, and revision controls that keep track of changes from one version to the next.
     Figure 2. Compound documents are assembled from a collection of document components. Each component may likewise consist of a collection of even smaller components. In some applications, the top-level compound document may consist of thousands of components nested to thirty levels.
     Content management systems for structured document information invariably must keep track of information at a highly granular level (see Figure 2). For example, instead of storing complete books in a single chunk, "compound documents" are assembled from small components that are stored separately.
     Some components are tiny. For example, individual cells in a table may be stored in various places and appear together only when delivered as a document.
     It can be difficult to make this easy. Typical document creation tools are designed to create pages, whether they're printed pages or web pages. But building compound documents out of reusable components requires a structured authoring tool that's designed to handle highly granular documents and that integrates tightly with databases of all kinds, including relational databases, document management systems, and content management systems.
     Such systems can display collections of document components as if they were single documents while preserving the properties of each individual component. That approach allows an author to view every document component within the context in which it's used, while at the same time ensuring that the author changes only those components for which the author is permitted to make changes and that are not currently under revision by another author.
     Introduction
    Who Needs These Tools?
     Should your organization approach its document applications through the use of structured XML authoring tools integrated with content management systems?
     The answer depends on the characteristics of the information you create and the processes you use to create it. The typical profile follows:
     
  • Large Amounts: Unless your organization publishes thousands or even millions of pages, current XML- based technologies may be too expensive to justify the return. If you're a manufacturing organization larger than $100 million or a publishing company larger than $25 million, then you're likely to reap sizable rewards from implementing an automated document system.
  •  
  • Multiple Outputs: Most organizations need to publish their information on multiple outputs, the most popular being the Web, CD-ROM, and print. That requirement alone has been sufficient to justify an investment in a new automated document system. But if you're aiming not only to deliver on multiple outputs, but also to leverage the capabilities of electronic media, then it's even more important for you to build a document repository that is media independent so that you can use each medium to its full advantage.
  •  
  • High Value: The type of information we're talking about represents a large investment in the "intellectual capital" required to create it because it's the sort of information that is either vital to a related product or is the product itself. Examples include operating guides, service manuals, parts catalogs, policy and procedure manuals, and reference manuals (e.g., encyclopedias, legal cases, legislation, regulations, and medical drug information)
  •  
  • Long Lived: Closely associated with "high value" is "long life." Most types of information that are worth a significant investment last for years or even decades. In addition to the initial investment, this information often receives additional investment throughout its lifetime in the form of additional revisions.
  •  
  • Reusable: Although a few applications involve little or no reuse, much of the information in a typical publication from a large organization either already existed before within other documents or will be reused in the future.
  •  
  • Consistent: Imposing structure on a document only makes sense if many documents exist of the same type. There is nothing to be gained by adding structure to one-off documents. For example, while it's likely to be worth the investment to impose a formal structure on service bulletins if you publish 30 every year, it's probably too costly to do the same for a single annual report.
  •  
  • Formal Processes: This is the clearest differentiator of all. Virtually all information that comes out of a process that is formally defined can benefit from a formal structure. When applied to document information, a "formal" process has the following characteristics: defined and repeatable workflow, assigned resources and mission-critical deliverables.
  •  Introduction
    Important Characteristics
     The following information lists the important characteristics to look for in a structured XML editor that integrates with content management systems. These characteristics are divided into three main categories:
     
  • Authoring Issues: Affect those who create and revise the information, not only full-time writers but also those who are occasional contributors to the process.
  •  
  • Application Development Issues: Affect those who develop and maintain the products, applications, and infrastructure to support the process.
  •  
  • Business Issues: Affect those who have to approve the investment in new technologies and who risk the most when an investment goes wrong.
  •  Introduction
    Authoring Issues
     When you look at a structured XML editor, you should look first to see if it provides all the usual editing features such as cut, copy, paste, and drag and drop, and convenience features such as a preferences panel and multi-level undo.
     Then you should look for two specific capabilities that are designed specifically for structured authoring:
     
  • Task-Matched Authoring Tools: Creating highly structured documentation involves more than just typing. An editor with "task-matched" authoring tools provides editing tools that are appropriate for the type of data being entered.
  •  
  • Enforced Consistency: To maintain the integrity of your data so that it remains processable and reusable, you should look for a tool that prevents your authors from creating data that is inconsistent or invalid.
  •  Introduction
    Task-Matched Tools
      Writing a user manual involves a lot more than writing paragraphs and heads. Typical technical documentation consists of large amounts of different types of information, and only a portion of that information is relatively "free-form" text such as titles, paragraphs, and lists.
     Finally, a structured authoring tool should also provide a way to navigate and edit the structure itself. This capability should be provided through an alternate view of the document that shows its structure.
     Other information, especially the information in tables, is better suited to a restricted form of data entry such as the various controls you see in the dialog boxes of software programs. These controls include pushbuttons, check boxes, radio buttons, drop-down selection lists, sliders (e.g., volume controls), and other controls.
     An authoring tool for structured XML information should allow you to match the type of information to be entered with the best tool for the job. In some cases, you'll want all three capabilities in the same window for the same document (see Figure 3).
     Figure 3. This view of a structured editor shows how the three types of task-matched authoring (standard editor, structure view, and a form interface) can be applied to automotive service information.
     Introduction
    Structure Consistency
     Data integrity is the single most important factor in building a highly automated system that's built on top of structured data. The integrity of your data is crucial because automated processes must rely on the validity and consistency of your data in order to perform their functions properly.
     One of the most important features of a structured XML editor is its capability to ensure that documents remain consistently structured at all times. This capability is especially important when that structured data is stored in a repository that is accessible to other authors and to automated processing applications.
     Continuous consistency is also vital to ensure efficient workflow and repeatable processes. Authors who are allowed to create invalid and inconsistent data must either clean up their data later or turn it over to someone else to clean up. Either way, the organization pays the cost of extra work that adds no value but increases costs and time to market.
     Introduction
    Development Issues
     Developing a powerful system to handle large amounts of structured XML documents is no different than any big automation project. Building a system to suit your needs will involve a combination of standard products and additional application development work in the form of configuring, programming, and other customizations.
     This section describes the key characteristics of a structured XML editor integrated with content management that primarily affect those who have to develop systems based on that tool.
     Introduction
    Content Management Integration
     Structured XML authoring tools are just one of several pieces that comprise an enterprise solution for creating, managing, and delivering document information. One of the key additional tools is a content management system.
     Organizations can integrate structured XML authoring tools with many different tools for content management. Some start out by building their applications on the file system. Others plunge right into document management or component management. (Some component management systems describe their products as "authoring support" tools because they are specifically designed with information authoring - and not just document management - in mind.
     Whatever system you choose to manage your content, the approach you take to integrating your authoring tools with your content management tool has an enormous impact on performance, scalability, and ease of use.
     Ideally, you would choose an authoring tool with an API (Application Program Interface) specifically designed to interface with content management systems. Through that API, the authoring tool can "speak" with the content management system at a component level and not just at a document level.
     This connection provides several key features:
     
  • Seamless User Interface: Instead of switching back and forth between the authoring tool and the user interface of the content management system, it's possible to "build in" to the authoring tool everything the user needs to browse, search, and select documents and document components from the content management system. Figure 4 shows an example of an editor interface that displays the contents of the content management system.
  •  Figure 4. You can browse documents and components directly within a structured XML authoring tool instead of switching back and forth between the authoring tool and the content management system.
     
  • Compound Document Authoring: Everyone wants to reuse existing information instead of wasting the time and resources to create it again. To achieve that reuse, you must create your information in small, easily reusable components and build "compound documents" that are simply collections of these components.
  •  
  • But when the time comes to edit that information, you should look for a tool that can load compound documents without first combining all the separate components into a single monolithic document. That feature allows the authoring tool to deliver the following benefits:
  •  
  • You can open a compound document and check out only those components you want to change, which leaves the remaining components available for other authors to revise.
  •  
  • You can open enormous documents very fast because the authoring tool only loads the components necessary to fill the screen.
  •  
  • You can perform "granular updates" where components that are changed can be reloaded without reloading everything.
  •  
  • Collaborative Authoring: Several users may have the same compound document open for viewing, but by enforcing permissions and checkout at the component level, each user is restricted to editing the components he or she has checked out. This means that in a workgroup authoring environment, each subject matter expert can simultaneously edit their portion of the publication while seeing it in the context of the full publication.
  •  Introduction
    Customization
     Customizing the document system can provide dramatic improvements in productivity, information quality, or performance. Application developers within or outside of the organization must perform those customizations.
     Some of the customization needed for an automated document system is to build tools for authors. For example, forms and dialog boxes may provide a faster and easier user interface to certain types of information.
     Many structured authoring products have tool sets to help make customization easier. One should look for a rich scripting language, open APIs, and even a visual programming environment to ease setting up data input forms
     Introduction
    Business Issues
     Many of the issues surrounding the selection, implementation, and operation of an automated document system represent a significant impact on the business success of the project. Ultimately, your aiming to go beyond competitive parity to achieve real competitive advantage.
     Organizations that have earned outstanding returns from automated document systems built on structured data include the following examples:
     
  • Heavy equipment manufacturer improves author productivity by 100%, saving them the hiring of 600 professionals over a five-year period.
  •  
  • Publisher of daily report reduces 30% of their payroll costs by eliminating regular overtime through streamlining their processes.
  •  
  • Textbook publisher increases revenues substantially by offering customized versions of their textbook at prices competitive to standard versions.
  •  
  • Electronic equipment manufacturer reduces production lags from three weeks to two days.
  •  The following paragraphs describe the characteristics you should look for in an automated document system to help you achieve the sort of business successes described above.
     Introduction
    Authoring Productivity
     Have you ever spent ten minutes writing a memo to your boss and another ten minutes formatting it to make it look good? If so, then you know how much time you can waste on tasks that add little value.
     With the advent of WYSIWYG (What You See Is What You Get) word processing and desktop publishing software, authors spend as much as half their time manipulating the appearance of their documents and the other half creating new content. For many organizations, this is a tremendous unnecessary expense.
     In principle, authors are experts in the subject matter of the document while graphics designers are experts in the appearance of a document. When that principle is violated in practice, the productivity of the subject matter experts - the authors - drops by half or more.
     For those organizations that publish only on paper, using authors for document design represents a costly inefficiency. But for many organizations who deliver their information in multiple forms (e.g., in print and on the Web) and who aim to "personalize" documents through automatic assembly of document components to suit individual needs, WYSIWYG no longer makes any sense at all because the information may never be delivered in the same form in which it was created.
     With some tools, you may find that it's possible to force authors to leave the document design alone but still show them how the printed page will look. The problem with that approach is that the only way an author can affect a page layout is by rewriting to add or remove words. That could lead to an even greater loss of efficiency.
     Structured XML authoring tools separate content from presentation completely by showing a view of the data that uses formatting only to provide cues about meaning instead of showing the actual outputs. For example, emphasized words can be shown in italic and titles can be shown in large bold letters, but column breaks and page breaks are not displayed.
     Views designed only for authoring can provide additional assistance by displaying in easy-to-read form information that may be tiny when printed. For example, copyright information may be printed in tiny letters but may be displayed in larger letters without enlarging the entire view.
     Introduction
    Batch Composition
     In traditional WYSIWYG environments, authors manually inspect and adjust column breaks and page breaks to keep related elements together and reduce excessive white space. But using a structured XML editor allows you to create a system that automates page layouts and relieves authors from this low- value work. "batch composition" is the technology that makes this possible.
     A number of vendors offer products which provide batch composition for structured document files. By automatically balancing page "fullness" with the need to keep related elements together, these tools can produce attractive pages with no need for manual intervention or inspection. In addition, they can often automatically generate supplemental text, footnotes, endnotes, tables of contents, cross references, indexes, and lists of figures, equations, and tables.
     Some organizations must lay out their documents to conform to legal requirements such as the formatting of safety warnings. For example, it may be a requirement that safety warnings appear in their entirety and on the same page as the text to which they're related. A composition tool can ensure that the document comply with that legal requirement or issue a fatal error if compliance is not possible (for example, the safety warning exceeds the size of the page). This eliminates manual inspection and the increased liability risk from the errors that will inevitably escape inspection.
     Introduction
    Presentation Independence
     By its nature, information stored in XML is independent of any particular way of presenting it. That means that through the application of a stylesheet or other transformation method, XML information can be delivered from a single information base to multiple outputs, usually automatically.
      The alternative to this approach, which is in common practice today, is to set up a process where authors create the information with the goal of printing it and then handing off the information to another group that handles online delivery. That group converts the information to the online format and manually adjusts the appearance, sequence, and links to adapt the information for online delivery. In that process, it's common to improve the information itself, but often those improvements are not reflected back to the original source.
     When the original information is revised, the online group has to make a decision: do they make the same revisions to the online information that were made to the printed information? Or do they convert the printed information to the online format and then make all the manual changes again? No matter which way they go, the result is an expensive and wasteful process.
     Introduction
    Standards-Based
     Structured XML authoring tools are based on open standards that are outside the control of any individual vendor. (Although XML itself is technically a "specification" and not a "standard," there is no practical distinction.) With the right choice of technology, you can protect your organization from dependence on any single vendor
     The key to vendor independence is to build your automated document system based on open standards such as XML (and its related specifications, XSL, XLL, the DOM, and other emerging specifications.)
     Making the right decision will also ensure high performance and maximum scalability. Choosing tools with the characteristics described in this paper can help ensure that your data remains standards-compliant throughout the entire process of creating, managing, and processing your information.
     Introduction
    About Arbortext
     Arbortext is the leading supplier of structured XML editing tools that integrate with content management systems. The company's ADEPT product suite includes an editing tool, a page publishing tool, and various application development tools.
     The company focuses on organizations with large amount of information, primarily manufacturers, publishers, and government. The company's approach is to help its customers create their document information in small, intelligent, easily reusable components, store those components in a database under document management control, extract those components from the database and assemble them into a limitless variety of documents, and deliver those documents automatically in print, on CD-ROM, and on the Web.
     For more information, visit www.arbortext.com.

    XML for legislation drafting, management and Web delivery&mdash,How structured document representation facilitates automatic processing   Table of contents   Indexes   Case Study: Boeing Intelligent Graphics for Airplane Operations and Maintenance