Meta-analysis of clinical documents for principled markup of Medical Records   Table of contents   Indexes   STEP and SGML/XML: what it means, how it works

 

Authoring Tools and the Expanding Radius of Deployment

 Bruce   Sharpe
  VP Development
  SoftQuad Software Inc.  108 - 10070 King George Highway
Surrey   British Columbia  Canada
Email: bsharpe@softquad.com
Phone: 604-585-8394 x315
Fax: 604-585-1926
 
Biographical notice:
 
Bruce Sharpe holds a Ph.D. in the area of mathematical physics. He is grateful that the advent of the IBM PC in the early 1980's provided an opportunity to leave this absurdly difficult field. He immediately embarked on a series of leadership roles in software development that has been going on for the last 14 years. His areas of interest have included text processing, database programming, algorithms for digital image processing, music applications, the Internet and document authoring. He has been Vice President of Development at SoftQuad since 1996, where he has led the teams that created the most recent versions of HoTMetaL, the HoTMetaL Application Server and XMetaL. When he is not playing at work, he is generally at home in beautiful British Columbia playing with his three children, who many suspect are his true avocation.
 

Introduction: The Promise of XML

 
XML holds out many promises. Let's review some of them.
 

All the advantages of SGML, but less pain

 
SGML has been widely deployed for many years, and is known to be a very effective solution for several document and content management requirements. Its ability to separate content from presentation and have self-describing document structures is very powerful and, properly implemented, can be used to achieve highly streamlined production processes.
 
The problems that SGML solves are very widespread, but industry adoption of SGML has been relatively modest. It has seen most use in large organizations with industrial-strength requirements, because the cost of deployment has been perceived to be prohibitive. Fortunately, XML offers virtually the same degree of power as SGML, but with reduced complexity.
 
Even in those organizations that have deployed SGML, the full benefits of it are often not realized because there has been a lack of author acceptance of authoring tools. For every person in the publications department who is working in SGML there are 10 or maybe 100 contributing authors who are using a word processor. This creates a costly "conversion wall".
 
The promise of XML is that it will achieve broader acceptance than SGML, which in turn should attract more vendors, lead to better tools, broader deployment and lower costs.
 

Beyond HTML

 
HTML has enjoyed a success that any markup language would envy. But HTML tags are oriented to presentation, not content, and they quickly become limiting when you are trying to convey or preserve some intelligence in the information being transmitted over the Web.
 
XML is designed precisely to overcome these kinds of limitations.
 

The future of the Web

 
The hype is that XML is the future of the Web. With press releases appearing every day announcing some new Web application based on XML, and the endorsement of such industry leaders as Microsoft, Netscape, IBM and Oracle, the buzz has some credibility.
 

Promises, promises ...

 
Amazingly enough, many of these fond hopes show definite signs of coming true. New authoring tools designed for broad acceptance are now available. Novel strategies for presenting and exchanging content on the Web through XML are emerging every day. And with support for XML in the major browsers, XML threatens to become mainstream, perhaps sooner than anyone expected.
 
This paper will describe how the advent of XML has breathed new life into traditional solutions for creating, managing and delivering content, and will showcase the new tools that are behind it.
 

Where Will You Find XML?

 
There are several areas of deployment of SGML and XML, and successively broader circles of adoption. A new generation of modern authoring tools has a role to play in each.
 

Where SGML is already established

 
Organizations that have already adopted SGML often have several authors contributing content to a centralized group such as a Technical Publications department. Such organizations often find that the payback from using SGML is diminished because content contributors resist using tools for authoring directly in the chosen format. This resistance arises because the tools present an unfamiliar interface and generally behave in unaccustomed ways. The contributors continue to use their favorite word processor and the TechPubs group is left with the task of converting that content to SGML.
 
There is a heavy cost in this scenario though. The conversion step is time-consuming, expensive and error-prone. Also, although contributors are using a familiar tool, it is typically not particularly suited to the content and productivity suffers. Finally, authors lose control of the content once it goes through the conversion.
 
The newest generation of authoring tools, such as SoftQuad's XMetaL discussed below, provide a new level of ease of use and familiarity that can allow an organization to extend the reach of native authoring beyond the conversion fence around TechPubs.
 

Where SGML could be a solution, but isn't

 
A better authoring tool can reduce the cost of entry for an organization that wants to adopt a markup language as a basis for their documents. This is the second and potentially much larger area of adoption. It is increasingly important because of the broad awareness of XML, which results in it being considered for deployment for many of the same reasons as SGML, but in places where SGML was not chosen for one reason or another.
 

All around the Web

 
While XML is quite suitable for traditional document applications, it is of course also intended for getting content onto the Web. Some of the most interesting new applications are those where SGML, XML and HTML overlap.
 
This represents the broadest circle of deployment and where XML is most strikingly taking us in the future. Here is a brief summary of three major categories of Web XML usage.
 

Website production

 
Many Web site creators are realizing that, as challenging as the initial creation of the Web site might be, it is even more difficult to maintain the site and keep it current as content changes. An increasingly common approach for managing this situation is to separate the architecture of the site (how the information is organized and how users navigate through it) from the presentation (what kind of graphics and layout are used) and the content.
 
While there are high-end systems that will do this for you (such as Vignette's Story Server), there are many home grown solutions out there too.
 
XML is an ideal technology for this kind of Web site production. The general idea is to write the content in XML and store it in some form of database. To actually create the site, HTML templates are used. The site is generated by pouring the XML content into these templates, which have "hooks" in them indicating where the various bits of XML should go. The result is typically ordinary HTML which can then be published to the Web server and delivered to standard browsers. The site generation can be done off-line or on demand, the latter creating personalized web pages in response to user requests.
 
In its simplest form, the content database is just a collection of files. For more sophisticated needs, you would use a content management system such as those offered by Inso (DynaWeb), POET (Content Management Suite) or Object Design (eXcelon). Oracle has also announced support for XML in its upcoming Oracle8i product.
 
XML offers major benefits in this scenario. Content contributors (the authors) don't need access to the actual Web pages of the site. Nor do they need to understand HTML. XML tools that can be customized for their particular type of information can make the authoring task painless while ensuring that their content has valid structure so that it will fit smoothly into the downstream processing. The webmasters benefit because they can define a site structure once and not have to tinker with it when new content is added or modified. Updating a site becomes a much more automated process, one that does not require resources as expensive as those required to create it in the first place.
 

On the server

 
XML can be used on Web servers for two basic reasons. The first is to provide a standard way of wrapping database data to create custom content or for exchanging data between servers. The second is to add information to content to make it easier to share and syndicate.
 
Once you start sharing data or content between parties, they have to agree on the tags and what they mean. There are many industry groups that have gotten together to do just that. For example, there is a group working to define an XML/EDI standard for business transactions across the web. The Open Applications Group has recently defined a set of XML-based APIs for financial, manufacturing, and many other kinds of business data.
 
XML provides a useful wrapper for relational data, but it is mostly an "under the hood" technology, in the sense that it is automatically generated by one computer and consumed by another computer without being directly accessed by people. In this usage, XML is a new, more convenient and standard way of doing something that is already being done in other ways: sharing data that is stored in relational databases. Because XML is a standard, it saves everyone the trouble of deciding what format and syntax to use, and eliminates the need to reinvent the basic processing tools with every new application.
 
XML's novelty is more apparent when we are talking about sharing other kinds of content. The desire to do this is so strong that an industry group has submitted a proposal to the W3C called ICE (Information and Content Exchange) to define a standard (XML-based, of course!) for syndication. The ICE proposal is not restricted to any particular kind of content, but there are several groups working on creating industry-specific definitions, for everything from classified ads and real estate listings to electronic books to be sold and delivered over the Internet. Because XML describes the information and is in a standard format, consumers of the content can reliably extract its information and repurpose in their own sites.
 
XML is arriving on the scene just in time to enable these kinds of applications. But its benefits can only be realized if the content is valid, that is, it has the correct syntax and structure. Again, the right authoring tools can guarantee this.
 

In the client (Web browser)

 
The big news in this category is that the next versions of the browsers from both Microsoft and Netscape will support the display of XML. This means that XML content can displayed in those browsers without having to be transformed into HTML first. Moreover, the browsers will support the DOM (Document Object Model), a W3C standard for accessing and manipulating XML. The DOM will let Web site authors manipulate XML content inside pages through scripting.
 
XSL (Extensible Stylesheet Language) can also be used to transform XML to HTML on the fly in the browser, providing a way to deliver data and present it on demand.
 
How will these features be used? You can expect to see XML data delivered to the browsers but only selectively displayed, under user control. For example, your stock portfolio information could be downloaded to your browser, but you could chose to first show only the top three biggest changers. Then you could show summary statistics or historical data, all without going back to the server to get more data. The result is a faster, more individualized browsing experience.
 
It is interesting to note that you can "sprinkle" XML tags among HTML tags. Since browsers ignore tags they don't understand, you can display such pages in all browsers. But the pages (with a little care) can also be valid XML documents, and so are available to be processed by XML applications. So, for example, you could display your job ad in your own site as HTML, but have the same ad picked up by a content aggregation site to be stored in a database. The XML tags could let you reliably indicate which numbers in the ad refer to salary, which are part of the address, etc. The W3C working group on XHTML is addressing this application of XML.
 

Benefits of XML

 
There is a common thread in the XML scenarios described above. Many people are authors, whether casual or professional. However, there are relatively few publishers (whether to print, the Web, CD, etc.). The many provide content to the few. Too often, the people on the receiving end have to spend an inordinate amount of time converting, massaging, and correcting the input they receive.
 
Authors use tools they are comfortable with and publishers have to cope with content that is not in the format that their organization has decided to use. This is the "conversion wall" that organizations would dearly love to break down.
 

Benefits for the organization

 
The conversion step that is all too common today is error-prone, time-consuming and expensive. Conversion costs can easily run to $30-50 per page or more. A large company can spend tens of millions of dollars on conversion every year. Under modest assumptions, the payback period for an XML/SGML authoring tool that can reduce those costs can be six months or less.
 

Benefits for the author

 
Authors suffer in the conversion process too. They lose control of their content and the revision cycle for updates or to correct conversion errors is very awkward. Moreover, even though they are using a tool they are familiar with (typically a word processor), the tool may not be very well suited to the type of document they are working on. An authoring tool that can be readily customized for a particular situation can improve author's productivity substantially.
 

Requirements for Authoring Tools

 
There is an economic benefit to be realized with XML, by reducing costs all the way through the production stream. This benefit is maximized by starting at the source and giving authors a tool to create XML/SGML content directly. What are the requirements for such a tool?
 

SGML and XML

 
The tool should be capable of creating valid SGML and XML content. This not only makes it suitable for widespread deployment today, but also makes sense for organizations that are in transition from SGML to XML.
 

Flexible

 
In this new generation of authoring tools, ease of use and fit for the content type is achieved through rich customization interfaces that enable everything from toolbars to details of editing behavior to be precisely adapted to the editing task.
 
This customization itself should be based on techniques that are familiar and well-supported from the HTML world: JavaScript, VBScript, COM, CSS, and the W3C DOM. The cost of deployment is correspondingly reduced.
 

Easy to customize

 
The tool should strive for the same goal that Larry Wall described for Perl when he said that "easy things should be easy and hard things should be possible." The tool should have built-in authoring assistance for any DTD so that the customization task is an incremental effort on top of reasonable defaults. And it must have the depth to be extended to cope with more challenging tasks.
 

Easy to author with

 
The customization should at least allow for the creation of an authoring environment that is going to be familiar to authors used to a word processing editing model. Even better is to enable the incorporation of authoring aids that are oriented toward particular document types.
 

Validating

 
The economic returns from deploying XML/SGML only start to occur at the point where the content is valid. The tool must ensure that authors end up with a valid document, without being a nuisance about it.
 

XMetaL: Content Creation for XML and SGML

 
XMetaL is a new XML and SGML content creation tool from SoftQuad Software. In the following sections we describe how it addresses each of the requirements discussed above.
 

Goals

 
The overall goal of XMetaL is to provide authors with a familiar, productive editing environment. They should be able to produce valid documents in spite of themselves. If they don't like the whole idea of tags they don't have to deal with them. But if they want to get close to the tags, it is a full-scale structured authoring tool.
 

Architecture

 
XMetaL supports two types of users: customizers and authors. On-screen display is controlled through CSS style sheets. XMetaL uses a COM architecture as the basis for customization and supports the W3C DOM interfaces.
 

First steps

 
Out of the box, any valid XML or SGML DTD can be used to create and modify documents. There is automatic support for valid authoring, through the provision of context-sensitive element and attribute editing. XMetaL makes intelligent guesses about styles and editing behavior for elements
 

Basic customizations

 
Quick Styles lets a customizer quickly modify common styles for batches of elements. Change lists are the way to specify that groups of elements can be converted into each other, and they will show up in the authoring interface as a drop-down list of styles (in the Microsoft Word sense of style). Replaceable text can be provided at any point so that document templates can guide the author. Templates can also be provided for each element, so that any combination of child elements and replaceable text can be inserted automatically.
 
Other quick customizations let you designate which elements behave like images, lists and paragraphs. Explicit guidance can be given as to what happens when the author presses the Enter key, which is the primary way a new document section is initiated in a word processing model.
 
All toolbars and menus are completely customizable.
 

Advanced customizations

 
XMetaL's COM interfaces, of which there are over 160, can be accessed through scripting and via objects written in languages such as Visual Basic, Java or C++. XMetaL is a Windows scripting host and ships with VBScript and JavaScript. Any scripting engine that supports this interface can be used with XMetaL. Scripting engines for Perl, Python and other languages are readily available.
 
Any language, such as Visual Basic or C++, that can create ActiveX objects can be used to provide highly sophisticated customizations for any DTDs, including full-scale user interfaces and access to databases and document management systems.
 

Summary

 
XML promises to broaden the acceptance of structured document solutions. Organizations are adopting XML when they would not have adopted SGML, or perhaps not even been aware of it. They reap the same benefits though: a streamlined production process and reduced costs.
 
XML also promises to benefit organizations that are already using SGML, because XML is driving the development of friendly, familiar content creation tools that authors will use, thus reducing conversion costs and headaches.
 
Finally, XML promises to change the Web: how sites are made, how content is exchanged and how it is delivered to the end user.
 
In all cases, the need to create XML and SGML content is being met with a new generation of authoring tools that offer unprecedented ease of use. This in turn maximizes the benefits of structured document solutions, by pushing the use of the final document format farther upstream, to the authors.

Meta-analysis of clinical documents for principled markup of Medical Records   Table of contents   Indexes   STEP and SGML/XML: what it means, how it works