How SGML can Support a Dynamic Public Affairs and Communication Policy   Table of contents   Indexes   Do you Need XML? A Checklist...

 
 

Designing Microdocument Architecture™ systems


 
Mark   Baker
  Manager
  Omnimark Technologies
Corporate Communications 1400 Blair Place
Ottawa   Canada
Phone: 1 613 489-0999
Fax: 1 613 489-0995  K0A-2E0
Email: mbaker@omnimark.com
 
 

The role of microdocuments

 
In every industry there is a danger of falling so much in love with a solution that we fail to notice that the problem has moved on. When SGML was invented, the linear document was king. Hypertext was more an idea than a reality and was looked on mostly as a way of connecting linear documents. The linear document had dominated our civilization for centuries, why expect it to change?
 
But today it has changed. Dynamic web sites create custom pages on the fly to suit the needs of individual users. Even as we struggle to perfect linking technology, hypertext is giving way to individualization as the means to provide the best navigation and integration of information. Why change busses three times in hypertext when the taxi of dynamic individualization will take you door to door?
 
Even where document creation and delivery is still necessary and appropriate, writing, storing, and maintaining linear documents is becoming untenable. Products with the life cycle of May flies require faster updates. Time to market implodes, requiring instantaneous updates. Product customization proliferates versions, requiring custom documentation. News and information providers are challenged to deliver information with the immediacy of the evening news and the focus of a special interest magazine. Information must be single sourced, constantly updated, and reused in a dazzling variety of combinations. Only databases can handle the volume and the transaction speed this kind of delivery demands. Linear document based approaches can't keep up.
 
SGML was not designed to be a final format. Nor is XML. Their virtue lies in being a neutral middle form that can preserve the integrity of information which is to be used in many different ways. Information is what is valuable. Formats are incidental. XML/SGML enables the separation of content from format. When SGML was invented, no one could have anticipated that that would not be enough, that being linear would compromise being neutral. But today, when we synthesize web pages on the fly to meet the needs of individual readers, we need to separate content from synthesis and synthesis from presentation.
 
Linear document forms encode a particular synthesis, a particular selection and serialization of information. To store content in a neutral form which allows many different syntheses to be created we need to store information components with a rich set of relationships that allow dynamic and programmatic synthesis. This is a job for a database, but neither relational nor object databases alone are enough. We also need the text representational power of XML/SGML. We need a combination of databases and XML/SGML, which we call a Microducment Architecture™ or MDA™.
 
 

Designing Microdocument Architectures

 
Part of MDA's power lies in the fact that it is simply a new way of combining known techniques and technologies from the database and SGML/XML worlds. There are people in every organization who know how to do all the activities involved in designing and building an MDA system, greatly reducing the cost and time required to implement robust content management. However, designing MDA systems does involve new ways of thinking about information structures and relationships, and the systems that manage them.
 
Of all these new ways of thinking, the one that presents the greatest challenge is that of getting into our heads that we are no longer designing documents, except perhaps as an eventual output of the system. When we learned SGML and XML we had to struggle to free ourselves from thinking about documents in terms of their formatting or appearance. Similarly, in learning MDA we have to struggle to free ourselves from thinking about information in terms of its synthesis for a particular document. We are no longer designing documents: we are designing information sets. The first step to designing an MDA system is designing the information set it is to contain.
 
Information set design involves the following steps:
 
  • Determine the extent of the information set
  • Understand its general structure and how it changes
  • Create an information design specification
 
 

Determine the extent of the information set

 
An information set is defined as the information on a defined topic that allows a defined set of audiences to perform a defined set of tasks. Begin your definition with the topic. The definition needs to be precise and clear. You don't necessarily have to include everything in every current document. When writing free form, authors tend to use asides and anecdotes. Most documents contain huge gaps and inconsistencies. Eccentric information sets hard to manage. You are not trying to encompass all the chaos in your current information, but to eliminate it. To make your content manageable, you must define exactly what is and what is not the topic the information set.
 
Next you must define the users you intend to serve and the specific tasks or interests you intend to support. No matter how limited the topic, an information set that serves the needs of every imaginable task of every possible user is infinitely large. Define precisely which users and which of their tasks are to be supported.
 
In defining users of an information set, don't forget to include authors as users and authoring as a task. Authors rely on the information set as they develop new information and users are major sources of information. While traditional document techniques clearly divide the roles of author and reader, an information set may have many contributors and many users, and many users may be both contributors and readers.
 
Once you have defined the topic and the users and their tasks you know how large your information set is. Now apply the test of reasonableness: Can this set of information in fact be collected and maintained with the resources you have available? If not, narrow your definition now until you have defined an information set that you have a reasonable chance of managing.
 
Now that you know the extent of your information set, it is time to analyze the current state of information set. Much of the information in your information set probably exists already in documents, databases, mailing list archives, heads of developers, salespeople, trainers, tech support staff, and users. Find out where the information is and what form it is in.
 
 

Understand its general structure and how it changes

 
Once the information set is defined, its structure must be laid out using a combination of relational techniques (entity relationship diagrams) and XML/SGML modeling techniques. MDA is an extension of the relational database model designed to allow the management of information relationships existing in descriptive text. As such, relational techniques are used to map the main structures of the information set and markup is used when the limits of relational structures are reached.
 
At the information set design phase, the key thing is to identify the information types in your information set and the interesting relationships between them. Techniques for doing this include entity-relationship diagramming, object-hierarchy mapping, and XML/SGML modeling. However, it is important to bear in mind that this is not database or repository design per se, nor it it document analysis. You are simply striving for a general statement of the key information types and interesting relationships, without worrying too much at this stage about the specifics of implementation.
 
Which relationships do you describe? Document analysis tends towards linear and hierarchical relationships, both of which are relationships based on adjacency, and heavily weighted towards a specific selection and ordering of information. Database modeling, on the other hand is rigorously non-linear. Neither one holds the whole truth. Draw freely from both,and from any other techniques you know, to describe those relationships that are likely to be of interest to youth defined user for their defined tasks. Don't worry about what kind of links they are so much as whether they sound interesting. This will result in a complex overlapping web of relationships. This may seem intimidating if you are only used to document analysis. Resist the urge to reduce these relationships to a single hierarchy. Just write them all down.
 
How granular should you be in distinguishing one type of information from another? Here again, bear your users' tasks in mind. There is no universal algorithm for determining granularity, only common sense. It is costly to define and maintain structures so minute that they serve no user need. On the other hand, defining structures at too high a level may prevent you from serving a particular user's needs later on.
 
Once you have mapped out the types of information in your information set you must carefully map the patterns of change for each type. Changes in documents tend to be managed at the document level, and managing those changes is expensive. An MDA system must handle change at a much lower level. Since MDA drives dynamic systems it must be able to support changes at any time, and the patterns of change in the information set must be well understood to ensure the system supports the changes that occur.
 
For each information type you have defined, ask:
 
  • How and when does each type of information change?
  • Who changes it?
  • Who and what is affected by the change?
 
Remember that for most complex products, most of the information about what the product can do is generated by users after the product is released. Record these changes in detail. Later you can figure out how to capture them and reflect them back to the user community. Record them now. You will be surprised how much valuable information is being generated in your user community.
 
 

Create an information set design specification

 
Write down all the information you have gathered in the previous stages and create an information set design specification. Conclude the specification with a set for recommendations. These recommendations should cover the kinds of information products you want to produce and the strategy for authoring and data acquisition.
 
 

MDA design

 
The actual MDA system design phase will involve several specialists from different fields, including XML/SGML design and programming, database design and programming, user interface design and programming, authoring, editing, and media specific design for each of your target media. The team you assemble will use your information set design specification as the starting point for designing their individual parts of the system. In implementing an actual MDA system several important topics must be considered:
 
What is the current state of the information set? This should be laid out in the information set design specification, but now you have to consider some practical implications. Much of the information in your information set may already in documents, databases, mailing list archives, and other sources. It is essential to know where it is, what form it is in, and how much control you may have over how it is created and stored.
 
How complex are the tasks or interests the system is to support? User's tasks may be simple or highly parameterized. Simple tasks are just data, but parameterized tasks require structure to handle the parameters. A well designed MDA system will be able to resolve all or many of the parameters of a task, based on available data or interaction with the user, creating individualized instructions. If the information set contains many simple tasks, structure may be necessary to provide for adequate selection and grouping of tasks. Where tasks are both numerous and complex, structures that provide for parameterization, selection and grouping provide the basis for radically improved information products that shield users for complexity and improve productivity.
 
How closely are authors involved in the operation of your system? The kind and amount of structure that can be achieved and managed in an MDA systems is directly related to the nature of authors participation in the system. We distinguish three categories of author involvement: absent authors (material is created outside the system, with no control by the system owner), innocent authors (authors use tools provided by system but know nothing about the system or what it's for), and active authors (authors understand the system and work actively to make it function).
 
How will DTDs be used to extend the modeling capability of the relational database schema? MDA supports two methods of integrating markup with relational data, "DTD as model" and "DTD as data". Strict MDA systems treat microdocuments as data types in the database, meaning all microdocuments in a given field of a given table have the same DTD. Treating DTD as model supports the creation of a single unified data model that makes possible sophisticated output in a wide variety of formats. Less strict MDA systems use "DTD as data", which treats the DTD as part of the data content of the system. This allows MDA to work effectively with legacy data sources and material from a variety of sources.
 
 

Conclusion

 
Microdocument Architecture, like XML/SGML, is a highly general means of expressing the structure of information. As such it lends itself to a wide variety of design and implementation approaches and adapts to solve a wide variety of problems. Rather than specifying narrow technical approaches, MDA design techniques focus principally on the core ideas of MDA: moving away from document structures and document management techniques to information components, flexible structures and content management techniques. It is not necessary to master every technical skill from the database and XML/SGML worlds in order to design, build, and manage a sound MDA system. It is necessary to begin to think of information as information, and to appreciate the distinction between content and synthesis as well as the distinction between synthesis and presentation.

How SGML can Support a Dynamic Public Affairs and Communication Policy   Table of contents   Indexes   Do you Need XML? A Checklist...