| Using XML in a Teleeducational Tool | Table of contents | Indexes | XML in the BMW Group: Sharing information components across the enterprise | |||
A new metaphor for editing structured documents |
| Christian Wallgren |
| Product Manager |
| PharmaSoft AB
P.O. Box 1237 S-751 42 Uppsala Sweden Phone: +46 18 185452 Fax: +46 18 109200 Email: Christian.Wallgren@pharmasoft.com Web: http://www.pharmasoft.com |
Biographical notice: |
ABSTRACT: |
The Case study |
Background |
| regulated documents |
This presentation will show how a new metaphor for editing may be used for certain types of highly structured and formally regulated documents. |
| parmaceutical |
It is based on our experiences from the design and development of an SGML/XML-based tool for the pharmaceutical industry. |
I hope it will of interest even for people outside our industry, because our findings clearly have some general applicability, both from a practical and theoretical point of view. |
| Formal documents |
Formal documents |
Formal documents is a concept, which has no theoretical definition. There is a resemblance to legal documents like contracts, insurance policies etc. |
authorities healthcare ![]() |
In the interaction between the Pharmaceutical Industry, the Regulatory Authorities and Public Health, this type of documents are widely in use. Examples are the "Summary of Product Characteristics" and the "Package Insert". These formal descriptions of the product are of vital importance both for the physicians and for the patients. They are bound to be as regulated as the product itself, which means that every sentence and every word is weighed carefully by the authorities before an approval. In some cases the text has to be translated into each of EC languages and all versions have to be approved at the same time. |
From an SGML/XML point of view, these documents have the following characteristics: |
|
|
The typical formal document SPC (A Summary of Product Characteristics)
|
||||||||||||||||||||||||||||||||||||||
| formal document |
Today these documents are mostly prepared in word-processing environments, which means that the structure is supported only by templates and guidelines and not enforced by any mechanism. In some cases templates have been issued by the authorities to be used by the companies, i.e. by Swedish and Danish MPA (Medicinal Products Agency) |
|
| editing environment |
The traditional editing environment |
MS Word ![]() |
MS Word output format (.doc) has until today been recommended by EMEA (the European Agency for the Evaluation of Medicinal Products) . The reason for this is the fact that MS Word is so widely used, that its storage format serves as a de-facto standard. The metaphor is the traditional word-processing document. |
Discussing this with the pharmacists and other experts, we pointed out that the word-processing format now being used |
This seemed to bother them, especially considering the longevity of the formal documents. Also, after showing the functionality of an SGML-editor, they thought that this kind of tool would serve the need to enforce the formal structure of the document. |
The professionals emphasized the need of better control and better management of information. Although they took for granted that the necessary word-processing features would still be available, they seemed willing to trade off some of the "vanilla" in exchange for faster processing and simplicity. This is easy to understand, since each day of delay in getting an approval is extremely costly for the companies and any facility, which unburdens the authors from dealing with styles and layout is welcome. |
The task for the authorities in this process is to audit and comment the document, sometimes even making amendments to it. Then the documents may oscillate between the company and the authority, each round generating a new working version. Finally the document is approved by the authority. |
After that a Summary of Product Characteristics or a Package Insert has been approved, the authority would like to have the document stored in a way, which enables context directed search on phrases through all products in the database, regardless of manufacturer. Thus, it must be possible to make analysis like: Give me the all sentences within the chapter "Undesirable effects" between 1980 and 1999 containing the word "antiretroviral agents" in EMEA SPCs. |
Using regular relational DBMS methods for storing the documents, it would be easy to integrate it with the existing and planned systems. |
The hunt for a functional environment |
Given this information, we began to realize that substituting MS Word was really a challenge from the point of easy authoring, but still our chances were not too bad. The main obstacles were the traditionalism and conservatism among some users. |
We realized that we would have to avoid technicalities and to functionally delimit the environment to get an acceptance. General SGML/XML concepts like elements, entities, attribute value, notation etc must be hidden. |
In order to achieve this goal we could customize a commercial SGML- or XML-editor. There are excellent products with built-in customization tools, APIs and macros, with which we could hide the technicalities and enhance the user friendliness. |
Also, we have, for the time being and in this environment, an impression that the users are reluctant to substitute the main-stream word-processor with a general all-purpose XML/SGML editor. Since the usage of the editor is limited to certain applications, highly competent but general editors will be hard to financially justify. |
Using the metaphor of a general business application was chosen instead. |
| application specific editor |
An application specific editor |
We decided to build an application specific editor. In SGML terms, that meant an editor to be used for a predefined set of DTDs. As said before, the demand was to use generally available storage techniques, such as relational DBMS. The second design decision taken was to fully to integrate the database with the editor and merge it into one product. |
After two years of development the first beta of our product was shipped to customers in December 1998. |
| PS Author |
This product, which is named PS Author, has currently the following features: |
|
Having tested this solution in production, the customers' major reaction was positive, with one exception. They lacked a function to import the stock of MS Word documents into the product. This will be possible in the next release of the product. |
The theory part |
DTD, Document Type Definition ![]() schema language |
The choice of schema language |
In PS Author the traditional SGML DTD is used as a schema carrier. It also has a grammar role in SGML, but not in XML . The advantage of the DTD syntax is |
The weakness lies in its limited data typing capacity. |
We are looking at using the Document Content Description for XML proposal as an alternative schema language in a forthcoming release of the product. |
| enabling architectures |
The use of an enabling architecture |
An interesting question is: "How can we formally specify the range of possible schemas, which can be handled by the editor?" |
Extended Facilities ![]() HyTime ![]() |
A specification of the SGML Extended Facilities, formally contained in the HyTime standard, gives us the Architectural Forms. By using an enabling architecture it could be possible to require each client DTD to conform to the meta-DTD. This is described by Steven Newcomb, who says that "DTDs can be permitted to change in any way that does not violate the constraints imposed by the SGML architecture" . |
Before we try to answer this question, it must be clarified that although testing conformance of schemas against an architecture may be done programmatically, this is not the task of an architecture engine. The architectural validation performed by this engine is done on the instance and not the DTD. |
But is it really possible to verify if a DTD only produces instances, which are architecturally valid with respect to one or more specified architectures? I will answer this by showing an example. |
By using the standard of Architectural Forms we have made PS Author an engine of its own enabling architecture. Today only one architecture is used. The forms of this architecture are itemized inWAL-002 . |
To illustrate how this works, let us select the element "5.1 Pharmacodynamic properties" which is a subelement to "5. Pharmaceutical properties". "5.1 Pharmacodynamic properties" must be followed by "5.2 Pharmacokinetic properties" (seeWAL-002 ). They both are derived from the "Full-text" form and they are children of the "Body-heading" form. |
However, there is no guarantee that the element type generating this instance has the form of |
<!ELEMENT pharmpro - - (pharmady, pharmaki)>
|
which is the declaration in the SPC-DTD. It could well have been generated by an element type like |
<!ELEMENT pharmpro - - (pharmaki & pharmady)>
|
In the current release of the product only - model groups of the type sequential - mandatory non repeatable (A, B,C,..,N) are permitted for certain forms. Thus, there is a need for a second architecture, which is currently not implemented, but used implicitly. The problem here lies in the way this restriction has to be expressed. E.g. an ordering architecture for model groups, containing 1 to 3 elements would look like: |
As can be seen, this is a rather clumsy solution to the problem when the number of elements grows. |
The conclusion is that Architectural Forms is a powerful concept in modeling different types of formal documents. For some purposes, like conversion using style-sheets, it is sufficient with the main architecture. For other purposes, it is necessary with more complex architectures, which are hard to express in SGML DTD-terms. |
Whether Architectural Forms can serve as a formal model for describing the possible schemas that an editor can support still remains to prove. |
An interesting use of Architectural Forms could be to let the product accept instances with no DTD, i.e. no explicit schema, and derive the schemas by using the architectures of the product. However this is not an option in our case. We have to be able to formally declare what kind of schemas, and hence, what type of document structures, that we can support, not only what we can accept. |
| Design |
Design aspects |
The idea of assigning an architecture to the editing tool may seem to be a little bit backward. There are some specific merits in this approach, which I would like discuss: |
|
|
PS Author main window screen shot
|
||||||
Summary |
I have tried to show that in some cases it may prove appropriate to use an editing metaphor, which is related to the specific tasks of business, rather than to general word-processing conventions. PS Author is an example of such an approach. |
It is also been my ambition to explore the possibilities of delimiting the range of schemas and to show that such a limitation has merits in an extremely simple implementation of storage and rendition mechanisms, simple adoption of schemas and a simple user interface. |
|
Bibliography
|
| Using XML in a Teleeducational Tool | Table of contents | Indexes | XML in the BMW Group: Sharing information components across the enterprise | |||