| Structured XML Editing |
Ann Arbor ![]() Arbortext, Inc. ![]() Bartlett, PG ![]() Michigan ![]() | PG
Bartlett
Vice President, Marketing, Arbortext, Inc.
Biographical notice As Vice President of Marketing for Arbortext, PG Bartlett has been instrumental in Arbortext's development as the world's leading provider of content creation and management software for enterprise XML applications. Bartlett has served 18 years at technical and marketing positions in leading-edge high-technology companies. He is a regular presenter at major industry events and has been invited to present and chair sessions at Seybold Seminars, XML conferences, CALs conferences, and other major events. Since joining Arbortext, Bartlett co-authored two electronic presentations distributed by SGML Open, the industry consortium, and authored several white papers. |
| Introduction Introduction |
| Organizations with large amounts of document information typically require an XML authoring and editing tool that easily integrates with content management tools and content delivery tools. That combination yields a complete automated document system that can be the key to an organization gaining significant competitive advantage through improvements in information quality, time to market, and production costs. The designs of these systems usually emphasize data integrity, data reusability, process automation, and workflow consistency. Data integrity is key to the other design factors because without absolutely consistent data, the rest becomes difficult or impossible to achieve. |
| The title of this paper, "The Benefits of Structured XML Authoring for Content Management," contains two key concepts: structure and content management integration. Each of these concepts plays a pivotal role in the successful deployment and operation of a high-performance automated document publishing system. The following paragraphs explain more about what these terms mean and why they are important. |
|
| Figure 1. Shown above are two views of the same document. The left view illustrates how the apparently unstructured information on the right can gain structure through the use of XML. |
| Introduction Structure |
| The contents of documents are often described as "unstructured" information in contrast to the "structured" information stored in a relational database. |
| But what if that could be different? What if you could impose a rigorous structure on a document? What if you could handle documents as if they were data? |
| It can be different! And XML turns out to be the key, because XML allows you to impose structure on an apparently unstructured collection of text. With consistent structure, documents can be treated just like other data, which can be automated, processed, reused, protected, classified, and extracted for use in a limitless variety of ways. |
| Imposing structure on document information yields several key benefits: |
| Introduction Content Management Integration |
| Every organization that manages large amounts of document information will, sooner or later, seek both to structure that information and to store that information in a content management system. |
| The specific method of content management varies. In some applications, document information is stored directly in a database. In many others, it's stored under the control of a document management system. |
| Regardless of the specific approach, these systems primarily ensure data integrity through security controls that prevent unauthorized viewing and changing, and revision controls that keep track of changes from one version to the next. |
|
| Figure 2. Compound documents are assembled from a collection of document components. Each component may likewise consist of a collection of even smaller components. In some applications, the top-level compound document may consist of thousands of components nested to thirty levels. |
| Content management systems for structured document information invariably must keep track of information at a highly granular level (see Figure 2). For example, instead of storing complete books in a single chunk, "compound documents" are assembled from small components that are stored separately. |
| Some components are tiny. For example, individual cells in a table may be stored in various places and appear together only when delivered as a document. |
| It can be difficult to make this easy. Typical document creation tools are designed to create pages, whether they're printed pages or web pages. But building compound documents out of reusable components requires a structured authoring tool that's designed to handle highly granular documents and that integrates tightly with databases of all kinds, including relational databases, document management systems, and content management systems. |
| Such systems can display collections of document components as if they were single documents while preserving the properties of each individual component. That approach allows an author to view every document component within the context in which it's used, while at the same time ensuring that the author changes only those components for which the author is permitted to make changes and that are not currently under revision by another author. |
| Introduction Who Needs These Tools? |
| Should your organization approach its document applications through the use of structured XML authoring tools integrated with content management systems? |
| The answer depends on the characteristics of the information you create and the processes you use to create it. The typical profile follows: |
| Introduction Important Characteristics |
| The following information lists the important characteristics to look for in a structured XML editor that integrates with content management systems. These characteristics are divided into three main categories: |
| Introduction Authoring Issues |
| When you look at a structured XML editor, you should look first to see if it provides all the usual editing features such as cut, copy, paste, and drag and drop, and convenience features such as a preferences panel and multi-level undo. |
| Then you should look for two specific capabilities that are designed specifically for structured authoring: |
| Introduction Task-Matched Tools |
| Writing a user manual involves a lot more than writing paragraphs and heads. Typical technical documentation consists of large amounts of different types of information, and only a portion of that information is relatively "free-form" text such as titles, paragraphs, and lists. |
| Finally, a structured authoring tool should also provide a way to navigate and edit the structure itself. This capability should be provided through an alternate view of the document that shows its structure. |
| Other information, especially the information in tables, is better suited to a restricted form of data entry such as the various controls you see in the dialog boxes of software programs. These controls include pushbuttons, check boxes, radio buttons, drop-down selection lists, sliders (e.g., volume controls), and other controls. |
| An authoring tool for structured XML information should allow you to match the type of information to be entered with the best tool for the job. In some cases, you'll want all three capabilities in the same window for the same document (see Figure 3). |
|
| Figure 3. This view of a structured editor shows how the three types of task-matched authoring (standard editor, structure view, and a form interface) can be applied to automotive service information. |
| Introduction Structure Consistency |
| Data integrity is the single most important factor in building a highly automated system that's built on top of structured data. The integrity of your data is crucial because automated processes must rely on the validity and consistency of your data in order to perform their functions properly. |
| One of the most important features of a structured XML editor is its capability to ensure that documents remain consistently structured at all times. This capability is especially important when that structured data is stored in a repository that is accessible to other authors and to automated processing applications. |
| Continuous consistency is also vital to ensure efficient workflow and repeatable processes. Authors who are allowed to create invalid and inconsistent data must either clean up their data later or turn it over to someone else to clean up. Either way, the organization pays the cost of extra work that adds no value but increases costs and time to market. |
| Introduction Development Issues |
| Developing a powerful system to handle large amounts of structured XML documents is no different than any big automation project. Building a system to suit your needs will involve a combination of standard products and additional application development work in the form of configuring, programming, and other customizations. |
| This section describes the key characteristics of a structured XML editor integrated with content management that primarily affect those who have to develop systems based on that tool. |
| Introduction Content Management Integration |
| Structured XML authoring tools are just one of several pieces that comprise an enterprise solution for creating, managing, and delivering document information. One of the key additional tools is a content management system. |
| Organizations can integrate structured XML authoring tools with many different tools for content management. Some start out by building their applications on the file system. Others plunge right into document management or component management. (Some component management systems describe their products as "authoring support" tools because they are specifically designed with information authoring - and not just document management - in mind. |
| Whatever system you choose to manage your content, the approach you take to integrating your authoring tools with your content management tool has an enormous impact on performance, scalability, and ease of use. |
| Ideally, you would choose an authoring tool with an API (Application Program Interface) specifically designed to interface with content management systems. Through that API, the authoring tool can "speak" with the content management system at a component level and not just at a document level. |
| This connection provides several key features: |
|
| Figure 4. You can browse documents and components directly within a structured XML authoring tool instead of switching back and forth between the authoring tool and the content management system. |
| Introduction Customization |
| Customizing the document system can provide dramatic improvements in productivity, information quality, or performance. Application developers within or outside of the organization must perform those customizations. |
| Some of the customization needed for an automated document system is to build tools for authors. For example, forms and dialog boxes may provide a faster and easier user interface to certain types of information. |
| Many structured authoring products have tool sets to help make customization easier. One should look for a rich scripting language, open APIs, and even a visual programming environment to ease setting up data input forms |
| Introduction Business Issues |
| Many of the issues surrounding the selection, implementation, and operation of an automated document system represent a significant impact on the business success of the project. Ultimately, your aiming to go beyond competitive parity to achieve real competitive advantage. |
| Organizations that have earned outstanding returns from automated document systems built on structured data include the following examples: |
| The following paragraphs describe the characteristics you should look for in an automated document system to help you achieve the sort of business successes described above. |
| Introduction Authoring Productivity |
| Have you ever spent ten minutes writing a memo to your boss and another ten minutes formatting it to make it look good? If so, then you know how much time you can waste on tasks that add little value. |
| With the advent of WYSIWYG (What You See Is What You Get) word processing and desktop publishing software, authors spend as much as half their time manipulating the appearance of their documents and the other half creating new content. For many organizations, this is a tremendous unnecessary expense. |
| In principle, authors are experts in the subject matter of the document while graphics designers are experts in the appearance of a document. When that principle is violated in practice, the productivity of the subject matter experts - the authors - drops by half or more. |
| For those organizations that publish only on paper, using authors for document design represents a costly inefficiency. But for many organizations who deliver their information in multiple forms (e.g., in print and on the Web) and who aim to "personalize" documents through automatic assembly of document components to suit individual needs, WYSIWYG no longer makes any sense at all because the information may never be delivered in the same form in which it was created. |
| With some tools, you may find that it's possible to force authors to leave the document design alone but still show them how the printed page will look. The problem with that approach is that the only way an author can affect a page layout is by rewriting to add or remove words. That could lead to an even greater loss of efficiency. |
| Structured XML authoring tools separate content from presentation completely by showing a view of the data that uses formatting only to provide cues about meaning instead of showing the actual outputs. For example, emphasized words can be shown in italic and titles can be shown in large bold letters, but column breaks and page breaks are not displayed. |
| Views designed only for authoring can provide additional assistance by displaying in easy-to-read form information that may be tiny when printed. For example, copyright information may be printed in tiny letters but may be displayed in larger letters without enlarging the entire view. |
| Introduction Batch Composition |
| In traditional WYSIWYG environments, authors manually inspect and adjust column breaks and page breaks to keep related elements together and reduce excessive white space. But using a structured XML editor allows you to create a system that automates page layouts and relieves authors from this low- value work. "batch composition" is the technology that makes this possible. |
| A number of vendors offer products which provide batch composition for structured document files. By automatically balancing page "fullness" with the need to keep related elements together, these tools can produce attractive pages with no need for manual intervention or inspection. In addition, they can often automatically generate supplemental text, footnotes, endnotes, tables of contents, cross references, indexes, and lists of figures, equations, and tables. |
| Some organizations must lay out their documents to conform to legal requirements such as the formatting of safety warnings. For example, it may be a requirement that safety warnings appear in their entirety and on the same page as the text to which they're related. A composition tool can ensure that the document comply with that legal requirement or issue a fatal error if compliance is not possible (for example, the safety warning exceeds the size of the page). This eliminates manual inspection and the increased liability risk from the errors that will inevitably escape inspection. |
| Introduction Presentation Independence |
| By its nature, information stored in XML is independent of any particular way of presenting it. That means that through the application of a stylesheet or other transformation method, XML information can be delivered from a single information base to multiple outputs, usually automatically. |
| The alternative to this approach, which is in common practice today, is to set up a process where authors create the information with the goal of printing it and then handing off the information to another group that handles online delivery. That group converts the information to the online format and manually adjusts the appearance, sequence, and links to adapt the information for online delivery. In that process, it's common to improve the information itself, but often those improvements are not reflected back to the original source. |
| When the original information is revised, the online group has to make a decision: do they make the same revisions to the online information that were made to the printed information? Or do they convert the printed information to the online format and then make all the manual changes again? No matter which way they go, the result is an expensive and wasteful process. |
| Introduction Standards-Based |
| Structured XML authoring tools are based on open standards that are outside the control of any individual vendor. (Although XML itself is technically a "specification" and not a "standard," there is no practical distinction.) With the right choice of technology, you can protect your organization from dependence on any single vendor |
| The key to vendor independence is to build your automated document system based on open standards such as XML (and its related specifications, XSL, XLL, the DOM, and other emerging specifications.) |
| Making the right decision will also ensure high performance and maximum scalability. Choosing tools with the characteristics described in this paper can help ensure that your data remains standards-compliant throughout the entire process of creating, managing, and processing your information. |
| Introduction About Arbortext |
| Arbortext is the leading supplier of structured XML editing tools that integrate with content management systems. The company's ADEPT product suite includes an editing tool, a page publishing tool, and various application development tools. |
| The company focuses on organizations with large amount of information, primarily manufacturers, publishers, and government. The company's approach is to help its customers create their document information in small, intelligent, easily reusable components, store those components in a database under document management control, extract those components from the database and assemble them into a limitless variety of documents, and deliver those documents automatically in print, on CD-ROM, and on the Web. |
| For more information, visit www.arbortext.com. |