Legal Issues of Electronic Commerce: Activity Policies, Intelligent Agents and Ethical Transactions   Table of contents   Indexes   An SGML-based Office Document Exchange and Management

 
 

Enabling Everyday Business Applications to Work with Structured Information by using the Associative Model


 
David   Jones
  Senior Technology Consultant
  Infrastructures for Information Inc.
116 Spadina Ave.
5th. Floor
Toronto   Ontario  Canada  M5V 2K6
Phone: (416) 504-0141
Fax: (416) 504-1785 Web: www.i-4-i.com
Email: djones@i-4-i.com
 
Biographical notice:
 
David Jones
 
David Jones has been working within the SGML community for the past several years. Mr. Jones was previously employed at SoftQuad Inc. as SGML Product Manager, and now works for the sales group at Infrastructures for Information Inc. as Senior Technology Consultant. David has participated in key product/technology demonstrations to major clients and investors, some of whom include: Nortel, Cisco, Hitachi, Oracle, Bell South, Proctor & Gamble, Honeywell, Financial Post and Ziff Brothers Investments. David has also presented or participated at major industry events such as: WWW6, Seybold Seminars, Microsoft Site Builder, Internet World and Internet Expo. David is committed to working with companies who are bringing structured information to real world applications.
 
ABSTRACT:
 Adobe Acrobat 
 Associative Model 
 Associative Modeling 
Microsoft Office
 

The use of structured markup has been around for many years now, however there are only a few applications that work with structured information. This paper will describe how, through the use of a radically new software development philosophy known as Associative Modeling , it is possible to enable everyday applications (such as Microsoft Office, Adobe Acrobat , etc.) to work with structured information be it SGML or XML. By using the Associative Model developers will be able to create applications that separate content from it's residing structure but still maintain the relationship between that content and it's structure, facilitating the ease and speed with which structured information will be incorporated into new classes of information technology applications. These same techniques will also bring structured information to everyday applications as well.
 
 

Introduction

 
Structured information has been in use for well over ten years now, however in that time only a limited few applications have been built to work with it. While XML promises to broaden the effectiveness of complex structured information, it nevetheless fails to solve a growing problem: structured information is not useful in mainstream applications. Only recently have we seen the emergence of tools and technologies which allow existing applications to have access to structured information services, and be benefited by that access.
 
Today, there are about 5 dedicated generic commercial SGML/XML authoring tools, about 3 real SGML/XML database tools, 3 SGML Browsers and 2 SGML conversion tools. Most of these tools are based on SGML parsing technology and techniques that are well over 5 years old (and older for some). For the amount of work being done with structured information, and the type of work being done this seems to be relatively few applications.
 
The problem with structured information such as SGML and XML is that the structure and the content are inseparable in the data stream, and unlike formats such as RTF, the structure pertains to what the content means and not how it should be formatted. It is why enabling existing applications to work with structured information is so problematic.
 
What I plan to accomplish in this paper is to illustrate how through the use of modern parsing technology and programming techniques, developers can enable existing everyday applications to work with structured information.
 Adobe Acrobat 
 Associative Model 
 Associative Modeling 
 Microsoft Word 
 architecture 
enterprise
 

This paper will describe how to enable everyday applications such as Microsoft Word, Adobe Acrobat and many others, to produce useful output when their input is structured information - either SGML or XML instances. A radical technique that we call Associative Modeling allows these applications to process content in their native formats while, at the same time obeying the SGML or XML document models that define the contents structure. The purpose of the Associative Model is two-fold: to bring the effectiveness of SGML and XML into the mainstream immediately, and to allow an enterprise-wide architecture in which mainstream applications routinely interact according to shared data models.
 Associative Model 
 

The mechanism behind this involves two major components, the first being a new class of parser technology, and the second, the Associative Model . The new class of parser technology is a parser that can think of an input stream in two separate ways, the first being the content of the stream and the second being the structure of the stream. The Associative Model is the ability to manage the relationship between the two separate streams, the content and the structure. These components are what is at the core of being able to enable existing applications to work with structured information.
 API, Application Programming Interface 
 Adobe Acrobat 
 Microsoft Word 
business environment
 

In order to better describe how these technologies and techniques work to enable existing applications to work with structured information, lets set some parameters. The goal is to enable everyday applications to work with structured information. To that end we will select Microsoft Word and Adobe Acrobat as our test cases. These are applications are common in the business environment , they are a word processor, and a document delivery tool respectively. In each case, these tools have a rich enough API to allow a developer access to the content stream.
 
 

The Components

 
 

What Kind of Parser Technology is Required?

 middleware 
 

In order to enable everyday applications to work with structured information a new type of parser is required. In an architectural sense, this new type of parser forms what we will call the middleware layer. The technology used in the middleware layer can be thought of as being essentially a structured information server for applications. This technology provides services to an application by means of a set of predefined function calls. The middleware layer communicates with the application through the applications API.
 middleware 
 

Now that we've established what the middleware layer is, how does it work? the primary function of this layer is to take content from an instance and allow the application to have access to it. This layer also allows the application to access the structure of the instance as well. What makes this layer unique is that it treats the content quite separately than the way it treats the residing structure.
 
 Associative Model  
 

What is the Associative Model ?

 Associative Model 
 middleware 
modeled
relationship
 

With the middleware layer established, what makes it unique from common SGML/XML parsers is that it needs to be able to separate the content of an instance from the markup yet still maintain the relationship between the two. This is what the Associative Model does. The Associative Model is a methodology by which the relationship between the content of an instance and it's residing structure is modeled separately by the middleware layer. It is this separate model of the relationship between the content and it's structure which is at the heart of the Associative Model . The Associative Model allows the content of an instance to exist outside of the confines of it's original structure.
 Associative Model 
 

The separate model that is created is a generic model of the content and where it lies within the confines of it's structure. The model is essentially a map of all of the components that make up a structured document. Since this map is not specific to any particular instance, the rules associated with it can be applied to multiple applications. What this simply means is that within the confines of the Associative Model , we always know what content is, what structure is and how they relate to each other. These rules can be re-applied as necessary.
 Associative Model 
 

By separating content from it's residing structure, that content can populate other applications. The Associative Model maintains the relationship between content and the structure it was associated with. The Associative Model equally well with SGML and XML, the only issue that arises is that it thinks of content as only being CDATA. All markup including attributes and attribute values are considered as structure. So in the event that some is constructing a DTD where they have the option of creating empty elements with information stored as attribute values as opposed to creating an element where the information is CDATA within that element.
 
 

Example Cases

 
 Microsoft Word 
 

Microsoft Word

 API, Application Programming Interface 
 Microsoft Word 
 

Our first example of how this can enable an existing applications to work with structured information is with Microsoft Word . Although Microsoft Word is being discussed in this example case, this techniques described herein can be applied to any other authoring package that has a rich enough API to access the content stream.
 MS Word 
 Microsoft Word 
word processing
 

Microsoft Word is the premier desktop word processing tool available today and in the foreseeable future. MS Word is pervasive throughout the industry, it is the most commonly used authoring tool available with the largest installed base.
 MS Word 
 Microsoft Word 
Near & Far Author for Word
SGML Author for Word
 

MS Word is not able to work with SGML/XML natively and requires the use of various plugin tools such as SGML Author for Word (Microsoft) and Near & Far Author for Word for it to work with structured information. These tools have certain failings that make them inadequate for general use. In both cases these products are post processing tools, they create an instance when a document is saved, all structural elements are inferred based on MS Word styles. Essentially these tools use styles to apply structure to a document. This causes problems specifically when these tools need to deal with a recursive content model or mixed content models (i.e.. how do you format a section in a section with Microsoft Word styles?).
 
By using the Associative Model and new parsing technology, these limitations can be overcome.
 Associative Model 
 Microsoft Word 
 middleware 
 

What's key to this technique is the ability to separate content from it's residing structure but still maintain it's associative relationship. If the content is free from it's residing structure, that content can be used to populate any application. In this example we can take the content from an instance and use that content to populate a Microsoft Word document. The Associative Model allows us to maintain the relationship of this content and it's original structure. What this means is it that the application (by way of a middleware layer) knows that the content, now in a word document, is related to a particular structure. For example, we may have the following instance fragment...
 
<CHAPTER>
<TITLE> Mary's dog knows</TITLE>
<PARA>In this section we will examine what Mary's dog knows.</PARA>
</CHAPTER>
 MS Word 
 middleware 
 

In this example, the content of the element<TITLE> would appear in the Word document as a string of text. The middleware layer which sits on top of Word knows that the string of text is actually content from the<TITLE> element. It is the middleware layer which allows us to then query the structure in relation to the content. For example, wherever the cursor in MS Word may be, you can query the structure to display what element the user is in at that moment. In addition, The middleware layer can query the structure so that it can apply MS Word style information to that content based on it's context in the structure.
 MS Word 
 Middleware 
 

The following illustration shows the relation between the Middleware Layer and MS Word.
 
 
 Adobe Acrobat 
 

Adobe Acrobat

 Adobe Acrobat 
 

Adobe Acrobat is the defacto standard for document delivery today and in the foreseeable future. Adobe Acrobat is pervasive throughout the IT industry, it is the most commonly used document delivery tool available (aside from HTML browsers) with the largest installed base.
Acrobat
 Adobe Acrobat 
 

Adobe Acrobat is essentially a postscript file viewer and is not able to natively work with SGML/XML. The postscript files that Acrobat uses are known as PDF  (Portable Document Files) . Postscript is entirely about the presentation of content, it has no inherent understanding of structure in the way that a structured document does. By serving out SGML/XML instances as PDF documents, a publishers loses the inherent intelligence which is stored in the structure of an SGML/XML instance. Aside from this, PDF is still the most commonly used means of distributing content that originated from an structured document.
 
Most publisher have several means of distilling a PDF document from a structured document, including Frame+SGML, Adept Publisher, in-house proprietary scripts, etc... So what the publisher ends up with is their original structured document (the one that they edit, archive, etc...) and a separate PDF document which they deliver.
 Adobe Acrobat 
 Associative Model 
 

In this particular case, the Associative Model can be used in reverse. That is rather than separating content from structure to populate another type of document, we can search for content and match it back with the structure it originally came from. This allows Adobe Acrobat to do some interesting things with a structured document.
 Adobe Acrobat 
 Associative Model 
 Microsoft Word 
 middleware 
 

As with the Microsoft Word example what is required is a middleware layer that fits between Adobe Acrobat and a structured document. This layer, combined with the Associative Model allows us to build an application which can query an instance for certain content and based on it's context (what element it resides in) apply certain functions to it. For example by using this technique a publisher can build a table of contents (based on element structure) for a PDF document by matching the content in that PDF document back with the structure that it originally resided in. This technique could also be used to build a searchable index of elements for the PDF document (i.e. find all <warning> elements in the document). This technique can even be used to enable ID/IDREF combination links that were in the original instance to become "hot" in the PDF document.
 
The result here is a PDF document that can be navigated, searched, queried and traverse in a similar way one would in a structured document. This type of enhanced PDF retains much more of the value of the original instance.
 
The following illustration show the relationships between the various components.
 
 
 

Conclusion

 
These technologies and techniques can be applied to various types of applications, in effect creating a whole new class of structured information products. When applied to existing applications, these technologies and techniques allow structured information to become the format neutral means of storing information that has always tried to be. When applied in this manner, structured information takes on more of the role of a schema for content, not unlike the relationship between SQL and relational data. The models described above can be manipulated at various levels to control what part of an instance you want to get access to.
 
I believe that I have shown a unique means by which developers can create a whole new class of applications. It is no longer beyond the realms of reason and technology to use structured information in all types of applications. I believe that these technologies and techniques will be what finally brings structured information to everyday business applications.
 
 

Acknowledgments

 
Special thanks to John Turnbull, Documentation Manager at Infrastructures for Information Inc. for reviewing, editing and contributing to this submission.

Legal Issues of Electronic Commerce: Activity Policies, Intelligent Agents and Ethical Transactions   Table of contents   Indexes   An SGML-based Office Document Exchange and Management