| Special Characters and XML: The Dark Side of the Force | Table of contents | Indexes | The XML Assembly Line: Better Living Through Reuse | |||
The XML Cuisinart; Making Users Happier and Markup Better With XML & SGML Appliances |
| Ensign, Chet Matthew Bender & Company, Inc. New York ![]() | Chet
Ensign
Director of Editorial & Electronic Information Technology, Matthew Bender & Company, Inc.
Biographical notice Chet Ensign is the Director of Electronic & Editorial Information Technology for Matthew Bender & Company, Inc., a leading provider of analytic legal information. His department is the technical right-hand to the editorial and publishing division of the company, developing and supporting editorial systems, SGML and XML applications, and data architecture standards. Mr. Ensign is the author of "SGML: The Billion Dollar Secret," the 3rd volume in the "Charles F. Goldfarb Series on Information Systems," published by Prentice-Hall. He is a co-author, along with Steve Pepper and Charles F. Goldfarb, of the "SGML Buyer's Guide", the comprehensive reference work on SGML and XML products and services, also published by Prentice Hall. Mr. Ensign is one of the founders and a board member of the SGML Forum of New York, Inc., the tri-state area SGML user's group. He has spoken on SGML and related topics at events such as GCA SGML/XML conferences, Softbank Seminars, Society for Technical Communication seminars and CD-ROM Expo. |
Appliances: Tools To Change "Master" to "Mass Market" |
| Note: Throughout this paper, I use the terms "XML", "SGML" and "markup" pretty much interchangeably. Although the effect may sometimes be awkward, I have done it to avoid even more awkward grammatical constructions like "SG&X-ML." Readers should feel free to substitute whatever set of initials they prefer, because the ideas apply equally to both. |
| Once upon a time, you had to be a Thomas Alva Edison to do anything useful or practical with the electric motor. The device was novel and powerful and showed great promise for transforming the way people did their work. But it was not widely adopted into common use until bright product designers started wrapping plastic cases around it, sticking a few simple On/Off buttons on the front and giving the resulting gizmos catchy names like "Mix Master." Only then did the electric motor become a mass market, general purpose technology. |
| XML and SGML are novel and powerful and have already shown us how to transform the way people produce information products. But all too often, to the end users (especially those who write the content), they look like something only a whiz kid could love, much less actually use. SGML tools have often faced an uphill battle for acceptance; XML, although it promises less technical complexity, will likely face the same struggle for end user mind share. Many of us who saw the point of the technology from the beginning, who intuitively understand it, grumble at the fact that people resist learning and using it. "It is simply not that hard," we say. "Once you try it, you'll like it." |
| Yet the end users have a point. General purpose markup editors straight out of the box do not make structured writing easier or more intuitive to most writers. Making explicit something that was previously implied by formatting seems, at first, awkward, artificial and unnecessarily complicated. We can help make the process easier, ensure user acceptance, and get better quality data if we create the equivalent of the "XML Cuisinart" -- simple but powerful tools focused on accomplishing specific markup tasks and tailored to the ways that our content creators approach their work. |
| Many of us have been doing this intuitively all along. We have been developing dialog boxes and batch scripts to help users be more productive, and ease the process of tagging data. This paper simply proposes a conceptual category for these sorts of tools: the "markup appliance." It will explain why we need appliances and where they fit in our overall systems. It will describe the advantages and benefits both we and our users will get from their use (not least of which is making XML editing more approachable and palatable). It will list the characteristics that these tools have in common and, along the way, demonstrate several examples that we have applied in our organization to good effect. |
XML Tools ![]() | Why Do We Need XML Appliances? |
| But we should start by asking if there really is a need for tagging appliances? What can an appliance do for a user that a good structure-aware editor can't? How will it make our systems more successful, especially since developing appliances may increase our overall system development costs? |
| To answer these questions, let's look the goal of structured editing. That goal is not (!) to create valid SGML or XML documents. There may be some technical satisfaction in that (for us anyway), but it's certainly not the point of the exercise. The real goal is to capture expertise in content rich with unambiguously-identified information that we can use to drive processing and build useful products. And the people who will create that rich content are not us, but the writers, editors and data technicians who will use the systems we field. |
| Writers et al are active agents in an information food-chain. They have content and information products to produce, refine or turn into a product. Their goal is to get their part of the work done and out the door. Often they view SGML as irrelevant to their task, or even an outright obstacle. In "Structuring XML Documents" (Prentice Hall PTR, 1998, pg. 120), one of the books in "The Charles F. Goldfarb Series on Open Information Management", David Megginson hits this nail on the head when he writes: |
| "Your authors - especially if they are new to structured documents - might think of markup not as part of writing but as something that they have to do in addition to writing. In other words, they might think of the time that they spend adding markup as overhead, and if so, they will want to reduce that time as much as possible." |
| We need to keep that thought firmly in mind if we are to build tools that they will use, and that, at the same time, will give us the structured content we need. |
The Focus Of Our Tools Needs To Be Their Expertise |
| Think of it this way. Everybody has their "sphere of complexity", a subject that they know in depth. Whether that subject is law, engineering, mathematics or medicine, organizations are willing to pay them quite well to put that expertise to use. And just as often, the people we want for their expertise could not care less about ours. Our job, as developers, is to make sure they don't need to. The more demands we make on them to learn an additional expertise, the more resistance we face, unless those added requirements are directly relevant to their knowledge. |
| Most of us get one or two complex subjects that we can become skilled at: civil litigation and Olympic one-design sailing; electrical engineering and Civil War reenactment; French cooking and brain surgery. One we get paid for, the other we get to do for fun. After that, we want things to be easy. We could all learn to start a cooking fire with a flint and steel, but few of us actually do. Most of us could learn to do our own auto maintenance, but just look at the success of "Jiffy Lube." Raising domesticated animals for food? Certainly feasible, but a shrink-wrapped Purdue oven-stuffer roaster is faster and doesn't leave all the feathers to clean up. |
| Our clients have at least one of their complex expertise slots already taken up. That's the one they get paid for. In my company, it is law. In others, it will be engineering, physics, software development, and the like. With one complexity slot already taken, we must consider ourselves lucky if they also turn out to be reasonably computer literate without driving them over the "two expertises" limit. And assuming that foundation of computer skills is there, can we really expect them to take to doing a lot of extra work if they see it as "overhead?" |
| No. If we want to achieve our overall goal, then we have to provide our users with tools to do what they need to do (create the content) while giving us the result we need (valid, complete and accurately applied markup). Part of getting there starts with good DTD design. But another part of the job is developing tools that make adding their knowledge to the markup more straight-forward - in other words, markup appliances. |
It Takes More Than a Structured Editor |
| Doing that takes more than just giving them an SGML editor, no matter how good the formatting looks on screen. |
| Markup editors have, from their beginnings, chased the paradigm of the WYSIWYG word processor. Out of the box, their approach to structured writing tries to be just like unstructured writing. In one sense, that's not surprising. For years, WYSIWYG has been the prevailing model for how documents are constructed on computers. It is also what customers say they want. And while many of us argued the "content not format" argument in theory, our DTDs also continued to be pretty print-centric. In the beginning, it was hard for any of us to think outside the WYSIWYG box. It took a long time - and the impact of the World Wide Web - to disrupt that paradigm and get a new brand of thinking happening. |
| The problem was - and is - that markup editors are not WYSIWYG tools (nor should they be). WYSIWYG tools provide direct access to look & feel. They offer no access to structure & identity and no way to identify and store expert knowledge. The fit has been wrong from the start, and that has become the crux of user complaints. Here we were, the developers, trying to take their WYSIWYG away, but not providing them with anything new or different or better in its place. |
| As Megginson points out, structured writing makes the two acts of WYSIWYG writing - composing and formatting - explicitly different. One is now the act of typing content and one is the act of applying structure. It also makes the structuring one more demanding, because you have more choices. Now, you have to state precisely what it is about this text that makes it italic instead of just deciding that you want it italicized. Smoothing over the rough edges of that split and making it easier on the end user is what the appliance idea is all about. |
What Does An Appliance Look Like? |
| If we accept the notion that appliances will help make structured information easier to create, then what do these things look like? What are their characteristics? What makes an 'appliance' different from any other general purpose structured editing tool? |
| An SGML or XML appliance may be as simple as a batch script that allows the user to provide a few initial settings then executes on an entire collection of tagged files, or it may be as involved as a series of interactive dialog boxes that walk the user through a specific task. I suggest that the defining characteristic of an appliance is that it is focused on performing one or two small but specific sets of tasks. It does not try to be a general purpose tool. Instead, it is designed to optimize some specific task and help its user to execute that task quickly and efficiently. A good appliance will reduce the task to the fewest necessary steps, minimize distractions that interfere with the user's concentration, limit the choices or options to just those required, and help focus the user's concentration on the key parts of the task at hand. |
| An appliance will also carry out as many of the logical parts of the task as possible itself. There is no point in making our users do something by hand if we can do the tagging automatically. If we can programatically tag some of the input data (even if the experts still need to double check the resulting markup), we ought to do it. Leaving our users to carry out actions that they know the computer could do for them makes them justifiably frustrated with us and the tools we give them. The goal of our efforts should be to build tools that let them do those parts of a task that only they can do, and require as little other effort from them as possible. |
| For example, suppose we want our experts to categorize hierarchical subdivisions of our content by assigning values to selected attributes. Certainly this can be done with an SGML editor. However, there are some drawbacks that make this less than ideal. For one thing, they have to scroll through the document, looking for each division-type element. Not necessarily efficient. For another, several steps will be needed to open the attribute value dialog box, select those attributes of interest, and set their values. (One of our users called this a recipe for carpal tunnel carping to the HR department.) Further, those division elements may have a number of attributes, only a few of which do we want our experts to touch. Lastly, we may also want to control the universe of choices available to them, without providing either an unrestricted CDATA input box or hardwiring the valid values directly into the DTDs. (In fact, different teams may require different sets of values. In that case, we definitely want to avoid specifying them in the DTD.) |
| The problem is tailor-made for solving with an appliance. Using a programmable editor or a GUI-development tool, we can build a classification tagging tool to help them perform the task. The tool can be set up so that it jumps directly from one subdivision element to the next and presents only the specific attributes that we want the user to set, leaving all the rest hidden from view. It can also be set up to present lists of valid attributes values, either by including them in its code or drawing them from external configuration files. |
| In fact, we built such a tool for our legal editors. Their productivity jumped from being able to tag 10 or 12 elements in an hour to being able to tag over 100, with a significant increase in accuracy. Their satisfaction jumped too, because the tool was designed in cooperation with them. It reflected their feedback, matched their language and concept of the problem, and gave them only what they needed to do the job. |
An appliance that classifies paragraphs. Fewer than half the applicable attributes are included.
|
| So what does a markup appliance look like? In general, I would suggest that it will have many of the following characteristics: |
An interface to a batch process appliance. Note how few choices are given to the user.
|
| Again, the goal of an appliance is to isolate and simplify a markup task, eliminating its stigma as clerical "overhead". Technical proficiency, whether in the SGML design itself or in the programming logic, should be hidden behind the tool. Ideally, you have been successful if your users never realize enough about the tool's innards to even think to express their appreciation. |
The Three Types of Markup Appliance |
| We can group appliances into three basic types: |
| Batch appliances will be those that run with little or no human intervention. That does not mean that they eliminate the need for people. Often, some user input will be needed in the beginning to set options. And often, the result produced by the appliance will need human review and double-checking. But the appliance itself can get access to all the rules, logic and data that it needs to run independently. |
| Take, for an example, an automatic cross-referencing appliance. If you need to cross-reference large volumes of existing text, you don't want to do it by hand. Instead, you would build a batch appliance with rules for identifying text that looks like a reference, and actions for turning them into cross-references. The appliance may have access to external files that define the universe of potential cross-reference targets. It will certainly have logic for handling ambiguous text strings that it can not resolve itself. And it will have general house-keeping code for logging events, flagging ambiguous text strings, and reporting its completion, status and any errors to the user. The resulting XML output will still need human QC to ensure the accuracy and correctness of the markup. But the batch appliance relieves your production staff of hours of tedious drudge work and puts their efforts to better, more productive, use. |
| An interactive tool will be a dialog box or forms-driven interactive appliance, either built on top of a more general purpose SGML editing tool or built as a stand-alone application. (Indeed, one of the benefits of the appliance concept is that it allows you to break the process of developing a robust XML authoring environment into a series of discrete problems to be solved one-by-one.) |
| For example, lets look again at that cross-referencing problem. After running a batch appliance over the content, your production staff needs to check the results and correct any erroneous tagging. Assuming that speed is of the essence, and the production staff wants to attack this problem head-on, you could provide them with an interactive appliance customized to hammer through the references. Instead of their having to scroll through the content, looking for each cross-reference tag, it could scroll for them. It could provide them with a point-and-click way to check the reference, and it could provide them with a simple interface for correcting the reference if they find a mistake. |
| We built a pair of appliances like this to help our production staff quickly perform an upgrade to a specific set of elements. The combination of a batch appliance, drawing information from a configuration file and from our document management system, and an interactive tool to let them rapidly QC the resulting SGML made feasible a project that everyone was dreading. The prospect of going through hundreds of Gigs of data, finding and changing markup, was not making our production staff happy. The development of a simple set of appliances to batch through the tedious parts of the task and quickly verify the results was what made it feasible to tackle the project at all. |
| Hybrid appliances combine aspects of both batch and interactive tools. They may first execute a batch process then invoke an interactive component for cleanup, verification, additional tagging, etc. They will very likely have interfaces that are dramatically different from the 'page paradigm' and they may well draw input from other applications, such as your document management system. In this, they provide us with a double benefit. They help us solve practical markup problems and, at the same time, they help evolve everyone's thinking beyond the WYSIWYG document bias. |
| To take the cross-reference problem again, envision an appliance that would first run in batch mode, identifying and tagging references as best it could. But, once finished, it would extract from the document strings of text that contained cross-reference tags and display them, line upon line, in a database table or a spreadsheet format. If it were really slick, it would present the markup as icons to make reading easier. The user could then verify or correct tagging in a highly productive environment, where all the markup of interest was gathered together in a very terse display. Once the user had finished all the checking and fixing, he would click the "Apply Changes" button and a batch process would write the corrected markup back into the source file. |
| This is one appliance we have not yet built. But we have several systems to build and problems to solve where a hybrid appliance will be the perfect solution. In fact, for some of those challenges, the hybrid approach will be the only practical way to get the problem solved. |
Conclusions |
| Several years ago, the SGML community watched with amazement as HTML and the World Wide Web took the world by storm. It was terrible SGML (or so we thought), yet it was more widely adopted in a few short years than SGML was before or since. But we've all come to accept that simplicity and simple yet far-ranging hypertext capability were its two most powerful and appealing qualities. The Web was, in a sense, the first SGML appliance. |
| An appliance is a situation where the parts, taken individually, may well be greater than the whole. By taking advantage of some, but not all, of the capabilities of different software products we create "obvious" tools for our end users. And often -- not always, but often -- those are the best kinds of tools we could give them. Appliances, by explicitly not trying to be general application tools, create configurations that are intuitively obvious to people with expertise that we value, but without the technical sophistication in an expertise that is ours. If we focus on giving these users "obvious" tools, we ultimately serve our common purpose - to make the information world richer, more useful and accessible to all. |
| Special Characters and XML: The Dark Side of the Force | Table of contents | Indexes | The XML Assembly Line: Better Living Through Reuse | |||