Business benefits of an SGML and STEP integration   Table of contents   Indexes   The Role of Industry Standard DTDs

 Biezunski  Michel 
 

A Topic Map for SGML 97 Proceedings, A new SGML animal

 

Abstract:

 This paper explains what is a Topic Map and describes how we have made it for the current Cd-Rom.
 

Introduction

  It is possible to retrieve information withinindividual printed documents since a number of devices have been invented, such astables of contents ,indexes ,glossaries ,cross-references .Catalogs ,thesauri ,bibliographies are the tools that are used for browsing amongcollections of documents. Topic Maps are the standardized electronic solution ... and it is based on SGML.
 

Topic Maps: a new SGML animal

 Topic Maps are a standard representation of navigational information that is intended to be used for interchanging such devices as indexes, thesauri, glossaries, on sets of heterogeneous documents (structured or not structured). It can be thought of as the equivalent of a neutral database scheme, that should allow its users to preserve the value added on their information repositories with semantic navigation.
  Typical users of Topic Maps include SGML users who need to maintain links accross living documents, while avoiding the overhead caused by maintenance of huge amount of data as systems evolve. Note that Topic Maps can also be used if source documents are not in SGML.
  The conceptual basis of the Topic Map architecture is based on the possibility standardized by HyTime to separate the semantic information of a link from the address of the (possibly multiple) anchors. The architecture that has been designed will be updated to take into account new standard formalism being defined for links. An XML representation is planned as well.
  The Topic Navigation Maps Project is a work done under the auspices of ISO WG8, the group responsible for SGML and related standards (Convenor: James Mason). The co-editors of this project are Martin Bryan (UK) and Michel Biezunski (France).
  This work is the continuation of a project that has been initiated since 1992 first within the Davenport Group, where it was then known asSOFABED (Standard Open Formal Architecture for Browsable Electronic Documents) and since 1993 within theCApH (Conventions for the Application of HyTime) , an activity sponsored by the GCA and chaired by Steven R. Newcomb (TechnoTeacher, Inc., USA).
 

An experimental Topic Map for the Proceedings of the SGML Europe 97 Conference

 Conference Proceedings are a good candidate for showing the interest of Topic Maps. They are made by a series of papers written by a number of different people. Readers may wish to get quick access to subjects of interest, without having to go through the whole content. It is also possible to derive interesting navigational strategies from the very content of the DTD, such as a "geographic kind of navigation". By navigating the enclosed Topic Map, it is possible for example to find immediately where the companies to which the authors belong are located.
 This Topic Map has been built to show some navigational possibilities that can be applied on a set of SGML documents exploiting the existing DTD. The potential of Topic Maps is greater, as it is aimed to organize navigation within sets of information objects, some of whose might be not structured as well, because Topic Maps are superimposed on a set of existing documents. Furthermore, multilingual navigation is now under study as part of the Topic Navigation Maps Standard, to enable navigation not only by choosing a given language, but also by expressing in different languages the constructs themselves, such as SGML generic identifiers or attribute values used for creating and navigating Topic Maps.
 For the current project, we have decided to create topics which are a guess of the terms that one may use to navigate through each papers. Furthermore, we have created a "network" of interconnected information, between people, companies, and geographic locations.
 

Topic Map Tool under construction

  This Topic Map has been created using the EnLIGHTeN application, currently developed at High Text. The EnLIGHTeN project started in 1995 and was first designed of an illustration of what Topic Map Navigation may look like. After a while it became clear that this was in fact becoming a tool that could be useful in a variety of situations, and we decided to focus on a user interface to a link database that enables now the creation and maintenance of Topic Maps without requiring any previous knowledge of SGML and HyTime. EnLIGHTeN shows that users can focus on the semantics of the information, while leaving the tedious addressing tasks to machine processing. It is most useful in situations where there are a number of cross-references to maintain in an environment wher documents are constantly evolving.
 EnLIGHTeN is used today for modeling topic map applications. It has been designed to be modular and extensible, and will become part of a variety of existing applications or applications under development.
 

How we have created this topic map

 To create a Topic Map, it is necessary to identify topic types, topic titles, and the roles that each of the topic play at a given occurrence. Then topics can be related with other topics, by means of various relations that can be created at will.
  In technical terms, a Topic is defined as an SGML element, conforming to the HyTime architectural form for links. It is itself an SGML architectural form (as defined in ISO/IEC 10744),
 EnLIGHTeN is a Topic Map creation tool which does not require a Topic Map DTD to be present before starting defining the topics and the relationships. On the contrary, the DTD is a document that is automatically produced with the instance by collecting the topic annotations created by the authors of the Topic Map.
 The basics of the creation of a Topic Map is to identify topics throughout the documents. Once topics have been identified, they are automatically grouped together, and each instance is linked to the others. This approach eventually fulfills the same task as cross-references, but instead of saying "see also somewhere else", we say: "here this is about a given topic". The software is finding the addresses, not the user. If the address changes, the whole topic map is re-calculated automatically.
  The second task is to create a set of relations that are independent from the documents and that are applied to a given document set. For example, the fact that Spain is a country belonging to Europe is independent from the occurrences where it applies. Therefore, creating relationships between topics is like creating an independent knowledge base, that is maintained and updated separately. The possibility to be able to apply without any effort previous work made by a company to add value to its information set can be considered of main interest for the commitment to Topic Maps.
 Here are the steps we have followed:
 Deriving information from the DTD
 
 
  • Identify elements that can serve as topic types. For example, "City" is such an element. The City is an element that is part of the address of an author.
  •  
  • Get the information provided by the authors. A special element, called keyword , has been provided in the DTD for this purpose.
  •  
  • When necessary, group elements to create a topic type. The name of an author is made of 2 elements:fname andsurname and the topic "author" used in the Topic Map results from the concatenation of the two elements "surname" and "fname".
  •  
  • Derive topics from the fact that terms are being defined, or acronyms are being developed. Once this has been done, the other occurrences of a given term can be related to the definition. In that case, the definition of a topic is a specific role where the term is mentioned. Roles played by occurrences are part of the possibilities of description offered by the Topic Map Model.
  •  
  • Create relationships between topics. Several elements in the DTD can be interpreted without ambiguity as a relation. The fact that a company is located in a city can be expressed as a relation between the company (here described with the "affil" element) and the city. The interest of doing so is that it becomes possible to find at a glance who is working in a given company.
  •  
  • Some "hand work" might be necessary to enhance navigational possibilities. This work resembles the indexing work made by the author of a book or of a paper with a word processing software application.
  •  
  • Collect topics provided by each author and run an indexing engine to find other occurrences of the same topics in the other documents.
  •  

    Output Options

     As EnLIGHTeN is an SGML application, it requires at some step documents to be in SGML. If, like in this application, source documents are provided in SGML, they must be fully compliant, i.e. parsable SGML. A planned extension of EnLIGHTeN will also work with XML documents.
     The output format for displaying documents has been chosen in this application to be HTML. EnLIGHTeN has features built-in allowing to create a set of HTML documents that group the information on the Topic Map: tables of contents, indexes, lists of relations, dictionaries, etc.
      The transformation of source SGML documents to HTML is performed using James Clark's DSSSL Engine (Jade). A DSSSL specification has been written to format the source SGML documents into HTML. The links necessary for navigation from the documents to the topic map screens and vice-versa are added by a processing made by EnLIGHTeN.
     Other output formats are available or under development. Any future changes to the HTML specification will be taken into account by changing the DSSSL spec that is used for outputting the documents.
     

    When several standards work together

     EnLIGHTeN illustrates what can be done with the Topic Navigation Maps standard and shows how SGML, DSSSL, HyTime, HTML (and soon XML) can be made work together, by showing respective advantages of each of the standards, in the following way:
     
  • SGML is used independently at two levels: horizontal and vertical. The horizontal use of SGML is the traditional use of SGML for structuring documents in the base repository. This is here only an optional feature. The same kind of processing is possible even for documents which are not SGML structured. The vertical use of SGML means that SGML can be used for describing, from outside, the addresses of the topics anchored and the links that are superimposed on these anchors.
  •  
  • DTDs are not mandatory, as Topic Maps are external processing on documents, and do not require knowledge of the actual structure. This means for example that an anchor can be an element such as a paragraph, or a bigger one such a section, or even a whole document, and there is nothing that prevents any element at any level of granularity to be considered the anchor of a topic link. In other terms, considering that an element is an anchor has nothing to do with its hierarchical position within an instance. Therefore, the XML approach that does not require a DTD in all cases is perfectly adapted to this context.
  •  
  • HyTime contains architectural forms that describe independent links, and that are well adapted to this approach. The current version of EnLIGHTeN outputs a HyTime document containing the Topic Map version, that is compatible with the 1992 version. The changes that have occurred in the HyTime specification will result in syntactical changes in the HyTime output that will be used for future interchange. It will not change the way the application behaves. The Topic Navigation Maps future standard DTD (or rather meta-DTD) will incorporate the changes, to keep in pace with the evolution of the other standards.
  •  
  • The independence between formatted output and the documents themselves is a consequence of DSSSL/SGML and represents an important advantage for documents that have to be maintained over the long term. The availability of the DSSSL standard and of James Clark's DSSSL Engine has proven to be an important asset for this type of application. Being able to isolate the formatted output, even if it appears as a specific SGML-like output (here, HTML) solves the problem of the browser's independence. The same kind of application could work as well with other browser's specification.
  •  
  • HTML is useful here because as it is equipped for triggering link navigation, it is a good output even for complex links, if links are written through an automatic process, such as the one provided by EnLIGHTeN. The choice has been made here to output all links as HTML links. Other solutions are possible, including Java applets, various dialog boxes, etc., but the simplest one (HTML only) seemed the more adapted for Web outputs.
  •  

    Current opportunities

      "ISO 13250 -- Topic Navigation Maps" is planned to be published as a standard at the end of 1998. This is an open initiative, under the auspices of ISO. Any contribution, addendum, user requirement, is welcome. The architecture will be extended to support multilingual information objects.
     Specific Topic Maps have already started to be designed to enhance maintainability and long-term evolution of document management. A methodology exists for the creation of Topic Maps. They have already been successfully tested for the energy industry, legal publishing, financial industry. New applications are planned. Other sectors are welcome, where heavy document management is an issue: libraries, reference documents, navigation in archives, among others.
     Industrial-strength software based on Topic Maps will open a new generation of applications, where databases, relational or object, with complex queries will be integrated with document creation software. In some of the implementations, SGML will become a resource for machine processing, while in others a fully developed user interface based on SGML tools will give their users full control over their structured documents.
     Prototype applications are interesting to build now, because they might provide new requirements for the generic standard architecture, as well as a list of user requirements for the applications to build. If you are interested by this approach, this is the best moment to join.

    Business benefits of an SGML and STEP integration   Table of contents   Indexes   The Role of Industry Standard DTDs