| Legal Issues of Electronic Commerce: Activity Policies, Intelligent Agents and Ethical Transactions | Table of contents | Indexes | An SGML-based Office Document Exchange and Management | |||
Enabling Everyday Business Applications to Work with Structured Information by using the Associative Model |
|
David Jones |
| Senior Technology Consultant |
| Infrastructures for Information Inc. 116 Spadina Ave. 5th. Floor Toronto Ontario Canada M5V 2K6 Phone: (416) 504-0141 Fax: (416) 504-1785 Web: www.i-4-i.com Email: djones@i-4-i.com |
Biographical notice: |
David Jones |
ABSTRACT: |
Adobe Acrobat ![]() Associative Model ![]() Associative Modeling ![]() Microsoft Office |
The use of structured markup has been around for many years now, however there are only a few applications that work with structured information. This paper will describe how, through the use of a radically new software development philosophy known as Associative Modeling , it is possible to enable everyday applications (such as Microsoft Office, Adobe Acrobat , etc.) to work with structured information be it SGML or XML. By using the Associative Model developers will be able to create applications that separate content from it's residing structure but still maintain the relationship between that content and it's structure, facilitating the ease and speed with which structured information will be incorporated into new classes of information technology applications. These same techniques will also bring structured information to everyday applications as well. |
Introduction |
|
Adobe Acrobat ![]() Associative Model ![]() Associative Modeling ![]() Microsoft Word ![]() architecture ![]() enterprise |
This paper will describe how to enable everyday applications such as Microsoft Word, Adobe Acrobat and many others, to produce useful output when their input is structured information - either SGML or XML instances. A radical technique that we call Associative Modeling allows these applications to process content in their native formats while, at the same time obeying the SGML or XML document models that define the contents structure. The purpose of the Associative Model is two-fold: to bring the effectiveness of SGML and XML into the mainstream immediately, and to allow an enterprise-wide architecture in which mainstream applications routinely interact according to shared data models. |
Associative Model ![]() |
The mechanism behind this involves two major components, the first being a new class of parser technology, and the second, the Associative Model . The new class of parser technology is a parser that can think of an input stream in two separate ways, the first being the content of the stream and the second being the structure of the stream. The Associative Model is the ability to manage the relationship between the two separate streams, the content and the structure. These components are what is at the core of being able to enable existing applications to work with structured information. |
API, Application Programming Interface ![]() Adobe Acrobat ![]() Microsoft Word ![]() business environment |
In order to better describe how these technologies and techniques work to enable existing applications to work with structured information, lets set some parameters. The goal is to enable everyday applications to work with structured information. To that end we will select Microsoft Word and Adobe Acrobat as our test cases. These are applications are common in the business environment , they are a word processor, and a document delivery tool respectively. In each case, these tools have a rich enough API to allow a developer access to the content stream. |
The Components |
|
What Kind of Parser Technology is Required? |
|
middleware ![]() |
In order to enable everyday applications to work with structured information a new type of parser is required. In an architectural sense, this new type of parser forms what we will call the middleware layer. The technology used in the middleware layer can be thought of as being essentially a structured information server for applications. This technology provides services to an application by means of a set of predefined function calls. The middleware layer communicates with the application through the applications API. |
middleware ![]() |
Now that we've established what the middleware layer is, how does it work? the primary function of this layer is to take content from an instance and allow the application to have access to it. This layer also allows the application to access the structure of the instance as well. What makes this layer unique is that it treats the content quite separately than the way it treats the residing structure. |
Associative Model ![]() |
What is the Associative Model ? |
Associative Model ![]() middleware ![]() modeled relationship |
With the middleware layer established, what makes it unique from common SGML/XML parsers is that it needs to be able to separate the content of an instance from the markup yet still maintain the relationship between the two. This is what the Associative Model does. The Associative Model is a methodology by which the relationship between the content of an instance and it's residing structure is modeled separately by the middleware layer. It is this separate model of the relationship between the content and it's structure which is at the heart of the Associative Model . The Associative Model allows the content of an instance to exist outside of the confines of it's original structure. |
Associative Model ![]() |
The separate model that is created is a generic model of the content and where it lies within the confines of it's structure. The model is essentially a map of all of the components that make up a structured document. Since this map is not specific to any particular instance, the rules associated with it can be applied to multiple applications. What this simply means is that within the confines of the Associative Model , we always know what content is, what structure is and how they relate to each other. These rules can be re-applied as necessary. |
Associative Model ![]() |
By separating content from it's residing structure, that content can populate other applications. The Associative Model maintains the relationship between content and the structure it was associated with. The Associative Model equally well with SGML and XML, the only issue that arises is that it thinks of content as only being CDATA. All markup including attributes and attribute values are considered as structure. So in the event that some is constructing a DTD where they have the option of creating empty elements with information stored as attribute values as opposed to creating an element where the information is CDATA within that element. |
Example Cases |
|
Microsoft Word ![]() |
Microsoft Word |
API, Application Programming Interface ![]() Microsoft Word ![]() |
Our first example of how this can enable an existing applications to work with structured information is with Microsoft Word . Although Microsoft Word is being discussed in this example case, this techniques described herein can be applied to any other authoring package that has a rich enough API to access the content stream. |
MS Word ![]() Microsoft Word ![]() word processing |
Microsoft Word is the premier desktop word processing tool available today and in the foreseeable future. MS Word is pervasive throughout the industry, it is the most commonly used authoring tool available with the largest installed base. |
MS Word ![]() Microsoft Word ![]() Near & Far Author for Word SGML Author for Word |
MS Word is not able to work with SGML/XML natively and requires the use of various plugin tools such as SGML Author for Word (Microsoft) and Near & Far Author for Word for it to work with structured information. These tools have certain failings that make them inadequate for general use. In both cases these products are post processing tools, they create an instance when a document is saved, all structural elements are inferred based on MS Word styles. Essentially these tools use styles to apply structure to a document. This causes problems specifically when these tools need to deal with a recursive content model or mixed content models (i.e.. how do you format a section in a section with Microsoft Word styles?). |
By using the Associative Model and new parsing technology, these limitations can be overcome. |
Associative Model ![]() Microsoft Word ![]() middleware ![]() |
What's key to this technique is the ability to separate content from it's residing structure but still maintain it's associative relationship. If the content is free from it's residing structure, that content can be used to populate any application. In this example we can take the content from an instance and use that content to populate a Microsoft Word document. The Associative Model allows us to maintain the relationship of this content and it's original structure. What this means is it that the application (by way of a middleware layer) knows that the content, now in a word document, is related to a particular structure. For example, we may have the following instance fragment... |
<CHAPTER> <TITLE> Mary's dog knows</TITLE> <PARA>In this section we will examine what Mary's dog knows.</PARA> </CHAPTER> |
MS Word ![]() middleware ![]() |
In this example, the content of the element |
MS Word ![]() Middleware ![]() |
The following illustration shows the relation between the Middleware Layer and MS Word. |
![]() |
Adobe Acrobat ![]() |
Adobe Acrobat |
Adobe Acrobat ![]() |
Adobe Acrobat is the defacto standard for document delivery today and in the foreseeable future. Adobe Acrobat is pervasive throughout the IT industry, it is the most commonly used document delivery tool available (aside from HTML browsers) with the largest installed base. |
Acrobat Adobe Acrobat ![]() |
Adobe Acrobat is essentially a postscript file viewer and is not able to natively work with SGML/XML. The postscript files that Acrobat uses are known as PDF (Portable Document Files) . Postscript is entirely about the presentation of content, it has no inherent understanding of structure in the way that a structured document does. By serving out SGML/XML instances as PDF documents, a publishers loses the inherent intelligence which is stored in the structure of an SGML/XML instance. Aside from this, PDF is still the most commonly used means of distributing content that originated from an structured document. |
Adobe Acrobat ![]() Associative Model ![]() |
In this particular case, the Associative Model can be used in reverse. That is rather than separating content from structure to populate another type of document, we can search for content and match it back with the structure it originally came from. This allows Adobe Acrobat to do some interesting things with a structured document. |
Adobe Acrobat ![]() Associative Model ![]() Microsoft Word ![]() middleware ![]() |
As with the Microsoft Word example what is required is a middleware layer that fits between Adobe Acrobat and a structured document. This layer, combined with the Associative Model allows us to build an application which can query an instance for certain content and based on it's context (what element it resides in) apply certain functions to it. For example by using this technique a publisher can build a table of contents (based on element structure) for a PDF document by matching the content in that PDF document back with the structure that it originally resided in. This technique could also be used to build a searchable index of elements for the PDF document (i.e. find all <warning> elements in the document). This technique can even be used to enable ID/IDREF combination links that were in the original instance to become "hot" in the PDF document. |
The following illustration show the relationships between the various components. |
![]() |
Conclusion |
|
Acknowledgments |
|
Special thanks to John Turnbull, Documentation Manager at Infrastructures for Information Inc. for reviewing, editing and contributing to this submission. |
| Legal Issues of Electronic Commerce: Activity Policies, Intelligent Agents and Ethical Transactions | Table of contents | Indexes | An SGML-based Office Document Exchange and Management | |||