Development of SGML/XML Middleware Component   Table of contents   Indexes   XML: The Universal Publishing Format

 
 

Managing information networks with Topic Maps


 
Michel   Biezunski
  High Text
5 rue d'Alsace
Paris   75003  France
Phone: +33 1 42 05 93 15
Fax: +33 1 42 05 92 48
Email: michel@hightext.com Web: http://www.hightext.com
 
 

Abstract

 Topic Maps 
 

Topic Navigation Maps is in international standard project (ISO 13250). It answers the need for improved retrieval of online information. Topic Maps enable users to define their own navigation strategies in a set of electronic documents, and can be used for maintaining living documents repositories, including Web sites and Intranet applications. This presentation gives the major design principles and will include a concrete example of a Topic Map-based set of documents.
 
XLL  (Extensible Linking Language) CD  (Committe Draft) DIS  (Draft International Standard) WG  (Working Group) JTC  (Joint Technical Committee) SC  (Subcommittee) TNM  (Topic Navigation Maps) IEC  (International Electrotechnical Commission) ISO  (International Organization for Standardization)
 
 

Why Topic Maps?

 
 

Answering the requirement of efficient information retrieval

 
The more information is produced, the more urging it becomes to take benefits of powerful methods to retrieve it.
 
Traditional tools used for navigating printed material, such as indexes, cross-references, glossaries, catalogs, etc., do not fit well in the world of online information.
 
 

Structuring unstructured information sources

 DTD, Document Type Definition 
 database 
 

On the one hand, Databases and XML/SGML provide an efficient way of structuring information sources. They require pre-existing models (DTDs, schemas), and tools exist that help navigate once the structure is known.
 
On the other hand, tools exist that help throwing some light in the darkness of the infoglut of unstructured information, e.g. the World Wide Web: powerful indexing engines, technologies based on clustering of semantics driven from a language.
 
Allying object/relational database technologies and powerful linguistic tools might be "the" answer. But those are using opposite types of user interaction. In a structured information environment, the user is so much in control that practically nothing can be done without controlling (i.e. knowing) what the structure is at first place. In unstructured information, the user relies on the result provided by the tools and usually has no or very little power to change the result obtained by this automatic processing. It's useless when browsing other people's documents, but it can be extremely useful when preparing documents for publication or for internal use (Intranets for example).
 
Topic Maps are a hybrid solution that can help taking advantage of the two worlds: they are views superimposed from outside on information repositories that can have been previously structured or not. These views provide a user-structured model for navigating the information. Therefore, the two approaches described above can be seen as a front-end step to build topic maps. Instead of being a final result, they can now appear as just the first step in a longer process to make information easily accessible.
 
Topic Maps are user-editable views on heterogeneous information repositories. If the information repositories have been previously structured, and existing structure is acceptable, then building a Topic Map becomes a nearly 100% automatic process. If they are not previously structured, then building a useful topic map can take more time. But at the end, the result is the same.
 
Because Topic Maps do not belong to the information repositories, there can be as many topic maps as possible on the same set of information. Therefore, topic maps can be used, as SGML was primarily designed for, to realize to objective of "one source, multiple outputs". The difference is that Topic Maps do not require documents to have been produced in SGML from start.
 
 

Topic Maps are an application of SGML, HyTime, XML and XLL

 
 

SGML

Independent Links
 

The interchangeable form of Topic Maps are formally expressed in SGML. SGML is considered here a syntax, which can be understood, parsed, validated and interchanged on many systems. However, it's different from classical SGML in the sense that even if it's editable by hand in an SGML editor, it's not very comfortable to do so, because it's almost like dealing with assembly code with a text editor.
 
Philosophically speaking, Topic Maps are an SGML application, because they are describing in a structural form the semantics of the information to which it applies. It is actually a tagging mechanism.
 
 

HyTime

 
Topic Maps are an application of HyTime because they use the powerful model of universal addressing and independent linking. Any information chunk can be addressed, whatever format it is encoded in, and links are used to describe semantics of the relation. The fact that addressing and linking are considered two separate issues is at the core of the Topic Map model. Topic Maps have been designed during three years in a GCA-sponsored committee called CApH  (Conventions for the Application of HyTime) , chaired by Steven R. Newcomb, a major player in the HyTime community.
 
Formally, Topic Maps are expressed as a set of architectural forms. The standard does not require a particular DTD to be used. It contains a template that serves as recipes for adding attributes to elements in any DTD that can fit a specific environment.
 
 XLL 
 

XML and XLL XLL

 
Topic Maps fit naturally the XML paradigm. Because they apply on SGML, they apply of course on a simplified form of SGML. Constraints on the representation of SGML documents brought by XML result in making Topic Map software easier to build. In that regard, Topic Maps are taking benefit of the advantages brought by XML over full SGML.
 
But there is more: when applied to SGML sources, Topic Maps do not rely on DTDs, because they address any element or any part of an element, directly in the instances. The fact that XML documents will not necessarily contain DTDs is perfectly appropriate in this context.
 
The linking part of XML, called XLL (Extensible Linking Language) is made of two parts: an addressing language (X-Pointers) and link constructs. There are two link constructs in XLL: the simple links and extended links. Extended links apply the same concept of independent linking as HyTime does. They contain the attributes necessary to fully describe topic map information. Therefore XLL extended links are a good candidate to represent topic map interchange information.
 
 

The ISO 13250 (Topic Navigation Maps) standard

 
ISO 13250 is a project conducted under the auspices of the working group ISO / IEC JTC 1/ WG 4 (formerly ISO / IEC JTC 1/ SC 18/ WG 4), which is also in charge of SGML, DSSSL, HyTime and related standards. The co-editors of this standard are Martin Bryan and Michel Biezunski.
 
The first version of the Committee Draft in 1996 came from the current specification from CApH . Work is planned at the ISO meeting in Paris in May preceeding the SGML/XML conference and results of this work will be presented during the conference.
 
Final CD and DIS are planned for the fall and winter of 1998, and final publication should occur in February 1999.
 
 

The Topic Map Architecture

 
Three architectural forms are used, all made of links:
  1. Topic
  2. Topic Relationship
  3. Filter
 
 
 

Topics

 
Topics are links that point to portions of information that are about a given subject. A set of anchors is the "title" of the topic. A topic may have zero, one or several titles. It is easy to understand topics with one title. Several titles can be used, because some topics can be alternatively described by equivalent title description: for example, "art museum" and "museum of art" can be two titles represented the same topic. Also, a topic can be represented by titles in different languages: "art museum" and "musée d'art". (In that case, filters can point to each title to qualify the language.) Topics with no title can seem odd. But one category of them is frequently used: cross-references, which are links between two anchors (source and target), can be seen as topics without titles. Adding a title on a cross-reference and therefore, upgrading it to a topic, can make maintenance of complex document repositories much simpler.
 
 
 

Topic Relationships

 
Topic relationships relate topics together. They can be seen as the representation of a knowledge base, and can serve as an equivalent layer for information usually described in a relational database.
 
 
 

Filters

 
Filters are a third category of links that can be used either to include or to exclude information. Filters apply on topic types, topic relations and individual anchors. Uses of filters include:
  1. Language
  2. Semantic universes
  3. User profiles
  4. Security levels
  5. Validity of the information
 
 
As filters are user-defined and not fixed in the standard, any other filter type can be used. Filters are additive, i.e. it is possible to apply several filters at the same time. The resulting navigation will display the intersection of the information network that is applicable.
 
 
 

Topic Map Applications

 
Topic Map Applications are document/database applications. They can be implemented with full SGML/HyTime features, or within an XML environment. They can be seen primarily either as document-driven, or database-driven, or a mixture of both.
 
The conformance requirement with ISO 13250 is that Topic Map application software is able to read the formal definition of a Topic Maps as expressed in the standard and able to export it in the same formalism. ISO 13250 does not require specific features to be present in the application software. This leaves field wide open for competitive products.
 
 

Topic Map Demo

 
The conference proceedings have been processed with EnLIGHTeN, a Topic Map organizer developed at High Text. The proceedings, which are in SGML source form, have been processed through EnLIGHTeN and the HTML pages displayed are showing information available on topic types and instances, and topic relationship types and instances in the form of online indexes, dictionaries and relational tables.

Development of SGML/XML Middleware Component   Table of contents   Indexes   XML: The Universal Publishing Format