Bottoms-Up, A Paradigm Shift   Table of contents   Indexes   SGML Template Driven Database Extraction: A New Approach to Report Generation

 Bryan  Martin 
 

CD 13250: SGML Applications - Topic Navigation Maps

 

ISO/IEC JTC1/SC18/WG8

 

Document Processing and Relating Communication --

 

Document Description and Processing Languages

 

Information Processing -- SGML Applications -- Topic Navigation Maps

 

Scope

 This standard provides a mechanism, based on techniques defined in ISO/IEC 10744:1992, for identifying information objects that share a common topic. It can also be used to define the relationships between sets of related topics. This standard can, for example, be used to define:
 
  • tables of contents and subject indexes for individual documents, or related sets of documents
  •  
  • glossaries that can be shared by more than one document
  •  
  • the relationship between topics within a thesaurus
  •  
  • the relationships between multilingual thesauri, glossaries, etc.
  •  

    Purpose

     This standard provides facilities for creating, maintaining and interchanging topic-based navigational aids to large corpora of documents containing interrelated information. The standard makes a distinction between the highly concentrated and independenttopic navigation maps -- sets of relations between the topics covered in a given corpus -- defined within this standard and the addresses of relevant information within the corpora themselves, which are defined using facilities provided by ISO/IEC 10744, which defines the Hypermedia/Time-based Structuring Language known as HyTime.
     Topic navigation maps can improve the accessibility of information by facilitating, and to some extent automating, the task of providing, and imposing editorial consistency and maintainability on, navigational resources. Topic navigation maps are designed to simplify groupware-supported production of data for which navigational aids such as indexes, glossaries, tables of contents, lists and catalogs need to be generated. Topic navigation maps can also be used to enhance the navigability of very large information bases.
     This standard provides an SGML architecture, defined according to the rules specified in the SGML Extended Facilities annex of ISO/IEC 10744, for creating and maintaining data that classifies information in documents according to topic, and classifies topics with respect to each other. The standard will help to increase consistency, and decrease redundancy, not only in navigational aids within documents, but also in navigational aids used with multiple documents, such as master indexes. The discipline that can be imposed by using the facilities provided in this standard will assist those who create and/or collect libraries of documents, and who wish to provide a given collection with a unified, consistent, and minimally redundant topic index.
     The Standard Generalized Markup Language (SGML) defined in ISO 8879:1986 allows all kinds of documents to become databases. By providing ways to navigate data stores so that parts of documents that are relevant to a particular topic can be easily found and organized rapidly by machine, this standard augments the suitability of SGML for electronic document interchange. The number and complexity of indexable topics, and the relationships between them, greatly exceeds the number and complexity of relations normally represented in traditional databases or, for that matter, in the kinds of indexes normally found in books. The number of topic relationships that might usefully be represented with respect to any reasonably large collection of documents is, in fact, for all practical purposes limitless. Moreover, even in archived documents, new kinds of topic relationships can be expected to appear from time to time. This standard, therefore, is specifically designed to allow multiple topic maps to be created over a period of time for any collection of data,and to allow for different topic maps to be inter-related.
     Creating and maintaining indexes can be a difficult and expensive proposition. Many indexes are indexes in name only. All too often, even when an index is well thought out, well constructed, and useful, little thought is given to its maintainability. When the time comes to create an updated or corrected index, the original documentation for the topic architecture of the index is no longer available. Indeed, it may never have existed or have been consciously expressed in any abstract way. Even an index on which enormous maintenance effort has been expended can quite easily become self-inconsistent, especially when the size of the indexing task dictates that it must be a cooperative effort, or when there have been changes in the responsible personnel.
     An application-neutral, internationally understandable, rigorous, and yet flexible and open way to represent topical indexes, such as the one set forth in this standard, can help to make indexes easier to make, easier to maintain, and easier to use. Creating a topic navigation maps is a complex task, similar to planning and building a building, involving myriad assumptions and artistic decisions. As new relationships are discovered and included as part of a topic architecture, the architecture changes. Many specialists may have to collaborate and contribute, over a number of years, to an evolving topic navigation map, which at any given time must unambiguously and comprehensibly govern all maintenance activities. Unless those who are adding and/or maintaining anchors have clear guidance, the instantiation of that topic navigation map -- the index itself -- may become unsound and unsafe.
     A topic navigation map defines both topics and the relations that they bear to one another. It must, therefore, permit:
     
  • any number of topics, which should be formally defined by those with knowledge of the subject matter,
  •  
  • any number of categories of topics, with subcategorization to any level,
  •  
  • any number of relations between topics, and
  •  
  • any number of categories of relations between topics, with subcategorization to any level
  •  to be represented, universally interchanged, processed, merged, and used for data navigation. An international standard for representing (among many other things) arbitrary relationships between arbitrary pieces of information wherever they arein situ , exists in ISO/IEC 10744. This standard uses a HyTime-based approach for linking topics with information, and an SGML architecture is defined that can support applications that provide:
     
  • the ability for many experts in a given field of knowledge to share in, and jointly contribute to, the evolution of a common map of topic relationships in each given field of knowledge;
  •  
  • the ability to merge such maps, whenever multiple fields of knowledge must be used simultaneously, in such a way as to maximize the meaningful cross-connections between them; and
  •  
  • the ability to use such maps in a variety of ways for a variety of purposes, such as extracting printed and online indexes and glossaries for particular documents. Extracted indexes are able to reflect the relationships between topics and subtopics represented by maps of topic relationships, and are extractable automatically or semi-automatically from the map of topic relationships as part of a formatting, pre-formatting, and/or authoring process.
  •  Topic navigation maps are defined using TNM.SemanticAssignment-form elements whose roles are defined by the user, and TNM.TopicRelation-form elements that identify specific relations between topics. Categories of topics may be iteractively identified and described by linking suitable topics to other topics belonging to the category.
     A topic map is created by linking, using HyTime hyperlinks, several pieces of information about a topic through a semantic assignment. Each semantic assignment has an anchor role (anchrole) attribute that defines the relationship between a topic and the references that are made to it. The first anchor role identified by the anchrole attribute provides a formal definition of the topic. This notion of definition is very general: a definition can be any portion of information (no specific internal structure needed) that describes the information being pointed to.

    Bottoms-Up, A Paradigm Shift   Table of contents   Indexes   SGML Template Driven Database Extraction: A New Approach to Report Generation