| DOCSTEP - Technical Documentation Creation and Management using STEP | Table of contents | Indexes | Configuration and version management in an SGML-based document management system | |||
| Espert Christophe |
| François Patricia |
| Futtersack Philippe |
Hypermedia Database |
Abstract: |
| In the context of Document Management Systems, the notion of document is becoming less and less preponderant. A document corresponds to an assembly of information objects -SGML or non SGML objects- that may be shared by several documents. Moreover, these information objects are interconnected by various kinds of links. |
| The conventional The conventional SGML Databases offer a good support for storing and manipulating collections of independent SGML documents. They have to evolve for managing a network of SGML and non-SGML documents, i.e. hypermedia documents. SGML allows to define inter-document links by using id/idref attributes and entity sharing. HyTime goes beyond the SGML limits concerning the hyperlinking features by offering the semantic to model complex links, such as a link from a document to a very precise location inside an other one. In order to offer all the functionalities necessary for managing hypermedia documents, SGML Databases must then take into account all the above constructs. The schema of these SGML databases consists in a tree structure representing the mapping of the SGML meta-model. But it has to evolve towards a graph structure for representing the HyTime hyperlinking model. This paper presents the principles to extend an SGML Database to an HyTime Database and the functionalities of a web interface to access to the documents stored in the database. |
Introduction |
| This paper is the result of a current collaboration between Aérospatiale Aircraft Business and the Research and Development Division of Electricité De France. This collaboration concerns a study and research project in the structured electronic document database field. Although the specific industrial contexts are different, numerous common requirements may be identified in this particular field and a large benefit may be expected from a common study. |
| Aérospatiale and Electricité De France are two big French companies which produce respectively, aircrafts (Aérospatiale Aircraft Business) and electricity. Both need to manage a large amount of documentation in their own industrial context. As a consequence, a significant benefit is expected from powerfully computerized documents. |
| After presenting this study's industrial contexts, we succinctly present our approach for specifying an SGML database. Then, we focus on our strategy for evolving towards an HyTime hypermedia database. In , we show how we have chosen to implement this SGML/HyTime Database . Finally, we conclude by giving the progress status of our work and the main issues which remain to be studied in depth. |
Aérospatiale industrial context |
| Aérospatiale Aircraft Business is responsible for producing all the technical documentation delivered with aircraft. This aircraft technical documentation is subjected to severe constraints, particularly in terms of volume (more than 300.000 pages for one kind of plane), content format (textual, technical data, illustrations), content customizing (airline customizing), authoring (performed by various industrial partners), update frequency and longevity requirements. |
| Since the 80's, paper media has been giving way to electronic media in the aerospace community. However, the SGML format has been adopted by the aerospace regulatory organization (ATA - Air Transport Association of America) as the documentation exchange standard between aircraft manufacturers and airlines. As far as SGML structured documentation is concerned, Aérospatiale Aircraft Business is involved in three main domains: standardization, documentation production, documentation utilization software development. |
| Aérospatiale participates in various military and civil standardization committee groups which design DTDs for aircraft documentation delivered to airlines. |
| In terms of documentation production, SGML technical publications are available, since 1993, for new aircraft: Airbus A330/A340. These SGML publications are produced by different means (native SGML production, proprietary Airbus format conversion, etc...) depending on the kind of manual produced. However, a large documentation system re-engineering project is going on; it aims at integrating new documentation technologies in the documentary product as well as in the production process. |
| In terms of documentation utilization, Aérospatiale proposes a software, called ADOC/ADIC, which allows aircraft documentation delivered to airlines to be integrated into their own information system. This software consists of both an editorial workbench and a consultation system. |
| Aérospatiale Aircraft Branch is also involved in research and prospective activities in the electronic documentation field. These activities mainly consist in studying new standards and technologies related to electronic documentation, in order to evaluate their adequation to aerospace needs and specificities, i.e. their applicability in the aerospace field. Such research activity development aims at preparing future aircraft documentation and related technological evolutions in the three above-mentioned domains: standardization, documentation production and documentation utilization software development. One of these research activities concerns the storage and management of hypermedia aircraft documentation. |
Electricité De France industrial context |
| Electricité De France is the public company providing electricity for 30 millions of french customers. Production, transportation and distribution is assumed by the same company. Energy is also exported to numerous european countries. Moreover, the know how is exported for nuclear power station and electric network building. |
| Concerning R&D, the main goal of the R&D Division is to do research on electricity topics. From the information system viewpoint, the research activity tends to generate technical documents. Otherwise, as a complement to electricity research, we assume an activity on scientific topics such as computer sciences and documentation engineering in particular. In this context, we are studying the new systems to manage electronic documents. The Electronic Library Project aims at managing the documents concerning general activity of the Division. |
| In the context of general activity, we keep track of the division flowchart, of the people employed by the division, and of accounting information. Moreover, the employees produce sets of electronic documents to describe what they intend to do, and later, the corresponding reports. As examples, we can mention activity descriptions (about 2000 documents of 2 pages each year), activity reports (twice a year) and internal technical reports describing general results of research actions (5000 documents of 10/100 pages each year). All these documents are generally written with a very popular word processing tool. |
| This information is collected through the office automation network and stored in relational databases. Only the bibliographic information (the title, the abstract, and the authors of the internal technical reports) are stored in a coded format. Internal technical report bodies are digitalized (because of the heterogeneity of the collected documents) and stored by a specific application on WORMs optical disks. |
| Many applications manipulate these data, to print or fax a report, to retrieve a selected set of information, or to compute synthetic results by using natural language and statistical techniques. |
| SGML/HyTime is also used for nuclear station documentation. This type of documentation present many common point with aircraft documentation. An SGML/HyTime database is a mean to manage rich and durable information, independently from the content of the information, and even independently from the structure of the information if the SGML/HyTime database in generic enough. |
An SGML database : reminder |
Main choices for defining the database model |
| Our strategy for defining an SGML database model has been described and analyzed in , comparatively to related work , . We just sum it up here. We will rather describe in this paper our strategy for evolving towards a hypermedia database. |
| We have chosen to propose a fully generic database model, that means a completely DTD-independent model. And, in order to be SGML full compliant, this model is derived from the SGML abstract syntax. Like the ESIS mechanism, specified in an SGML annex , , which defines a set of information on the element structure, we used the same philosophy to get information from the complete abstract syntax. The ESIS consists of a flow of information generated as the document is being parsed. It is defined to be the relevant set of information for recreating the source document as well as for implementing any structure-based application. |
The database model principles |
| shows a very simplified subset of this database model which corresponds to a tree of SGML components decorated with SGML attributes. |
![]() |
| We won't give more details about the SGML database model in this paper, to focus on the link features. |
Extensions for managing hypermedia documents |
Why to evolve towards an hypermedia documentation |
| Whether in a technical publication production or utilization context, the document concept is greatly evolving. On the one hand, documents become largely inter-dependent. They reference each other, share components... . On the other hand, documents tend to only appear as end-products which result from an assembly of documentary data. |
| Managing a web of documentary data i.e. SGML and non-SGML (illustrations, technical data) data connected by various kinds of links, rather than a collection of independent textual documents is therefore an increasingly emerging requirement. |
| The database tree model has to evolve towards a graph model. This graph model is close to the one defined for hypertext documents in that they both allow non-linear and interactive functionalities to be provided. But unlike most of hypertext-related work , we are constrained to deal with standards for exchange, longevity, and regulation reasons. |
Our approach for evolving towards a hypermedia database |
Evolving towards the HyTime exchange standard |
| The SGML exchange model offers limited hypermedia features which are not sufficient for satisfying all our requirements. So we have chosen to evolve towards the HyTime exchange model which is an ISO exchange standard, fully up-compatible with the SGML standard. Based on the SGML syntax, HyTime introduces a new concept: the Architectural Form concept. An Architectural Form is a set of rules whose semantics are used to specify DTDs. Architectural Forms allow hypermedia features ( hyperlinks...) to be modeled as well as time-based features (scheduling...). A HyTime document therefore consists of an SGML document, some elements of which having standardized semantics. |
Our strategy for evolving towards a hypermedia database model |
| The SGML database model quickly reminded above only represents the SGML tree structuring constructs which may be qualified as syntactical constructs. This model consists in a direct mapping of the SGML tree meta-model; exchange and database models are then isomorphic. As a consequence, for each SGML document to be imported within the database, this model may be instantiated as the document is being parsed e.g. sequentially read. In the same way, pre-order traversal of the tree model allows any marked-up source document to be recreated. |
| But, as far as hypermedia features (whether in SGML or HyTime standard) are concerned, semantic concepts are added to some syntactical constructs. And these semantic concepts have to be largely known and used within the database for providing rich hypermedia capabilities (browsing, ...). The database model therefore not only consists of a mapping of the exchange model syntactical constructs but has to be semantic-aware. |
| Our strategy for evolving towards a hypermedia database model therefore consists in: |
| partitioning the database model into two layers : |
|
| defining the database hypermedia model in two steps: |
|
|
The Hypermedia database model principles |
First step: taking into account SGML hypermedia features |
![]() |
![]() |
![]() |
|
Second step: HyTime hyperlink features modeling |
![]() |
![]() |
|
|
An SGML/HyTime Document Management System prototype |
| gives the architecture of the Database Management System we are prototyping. Here, we show how the applications are connected to SGML/HyTime database layer we suggested. This database layer is based on the O2 ODBMS . |
Document Loading |
| The SGML schema layer is populated at parsing time. When an SGML document is loaded, an SGML parser returns to the ODBMS a sequence of information corresponding to the structure and content of the document. Numerous objects are instantiated for the declaration, DTD and the instance. |
| SGML hypermedia constructs are managed too. External entities sharing is managed. However, special attention must be paid to cases in which SGML entities contain a partial tagging completed by the document referencing it. Each SGML useful reference is converted into an OID (Object IDentifier). Thus, the ODBMS manages the necessary functionalities like object sharing, object locking or object deep copies in case of inconsistent modifications of shared entities. An SGML document manager, based on a catalog manager, manages system and public identifiers. Then, the ID/IDREF links can be translated into OIDs by running a specific method. |
| The HyTime processing is run in a specific pass. It is in charge of resolving the locators and the links. The HyTime processing methods populate the BOS (Bounded Object Set) according to a BOS level associated with the documents. |
Information Access Interface |
| We have developed a first level of applications. Quick developments were possible because of the object modularity and the integrated tools offered with the ODBMS. |
| As far as Information access is concerned, we have developed two kinds of navigation plus query interfaces using both internet-intranet technologies but in two different ways. The describes the functional architecture of the HyO2 prototype. |
First kind of interface |
| This interface consists of two combined applications. |
| The first application is anavigation application. It enables navigation through the database object composition to be performed. Today, this navigation starts from the persistent document object set. Then, the tree structure of each document chosen is interactively built and displayed. Inter-document links (corresponding to IDREF attributes) are traversed as well as anchoring links (corresponding to entity references). |
| This navigation interface is developed using HTML and JAVA technologies. Database object composition is mapped into HTML documents which are interactively displayed using the Netscape HTML browser. Non-SGML objects referenced or anchored in an SGML document are displayed using Netscape plug-in facilities. |
| We are now enhancing this interface in order to enable navigation through the hypermedia network associated to HyTime hyperdocuments stored within the database. This interface will be based on navigating through the database objects related to the schema semantic layer. |
| The second application is anSGML/HyTime query interface. It offers query facilities based on SDQL partially derived from DSSSLQuery and HyQ. This query interface runs on top of OQL, the database query language. It enables filters based on SGML/HyTime structure and text content to be applied on the database. This access is rather reserved for specialists who know the SGML/HyTime document structure. |
| Both navigation and query interface are combined so that it is possible to navigate from a query result. |
Second kind of interface |
| Lastly, an HTML interface giving access to the SGML/HyTime documents, assisted by a full-text language, is developed. |
| This interface is designated to end-users who are not familiar at all with SGML. The structure is almost completely hidden behind the HTML/Java interface. Users access to the documents by using the Topic full-text query langage. The application was developed on top of the Topic API in order to customize the query interface and to map the SGML tree structure to the nested text boxes managed by the full-text engine. The users can choose between 5 ways to build queries: |
|
| The result of a query is presented as a table of documents (see ), and the user can click on a document in the table to display the document body with its highlights (see ). |
| Moreover, the HyTime links anchored to the displayed document are alive in a frame on the left side of the displayed document. So the user can navigate through the web of HyTime links hidden behind a graphic representation looking like a table of content. |
![]() |
![]() |
![]() |
![]() |
Conclusion |
| At this stage, we have also extended the specification with the HyTime domain concerning hyperlink management. |
| Concerning the prototype, we already developed a generic object schema of the SGML standard, an SGML document loader and some graphical tools to navigate through and visualize SGML documents stored in an O2 database. |
| We have good performances at loading time and excellent access results even on a large amount of documents (many tens of thousands for EDF). However, more test must be performed on large size documents (many tens of megabytes for Aérospatiale). |
| We also wrote an object schema of HyTime (hyperlink and location address modules) and the first methods to compute HyTime Processing. We are extending our applications for the end-users. The first step was to make a dynamic generation of HTML/Java presentations from the HyTime documents stored in the database. |
| We obtained a very good foundation for an SGML/HyTime database. Concerning HyTime, we are convinced that the HyTime concepts fulfil our hypermedia needs, but we are still waiting for the HyTime tools to come on to the market. |
| In accordance with the HyTime functionalities, we must validate the complete HyTime specification on our prototype. We will use the HyTime concepts for versionning management and study how to map a "HyTime versionning DTD" to the ODBMSs versionning management module. We have yet to implement variant management. We aim at using or even extending this versionning management module for this implementation. |
| On the interface part, we offer hypertext-like navigation access, but we must work on an interface for end users which hides the SGML/HyTime syntax and offers a query language based on SDQL partially derived from DSSSLQuery and HyQ. This new langage will run on top of OQL, the database query language. |
BIBLIOGRAPHY |
| DOCSTEP - Technical Documentation Creation and Management using STEP | Table of contents | Indexes | Configuration and version management in an SGML-based document management system | |||