Michel Biezunski is working as an independent consultant. He specializes on SGML applications, and has worked specifically on document architectures based on links within information objects. He has been actively involved in the activities of the ISO committee in charge of SGML and related standards, and is a co-editor of ISO/IEC 13250 Topic Maps, the standard which is presented in this paper. He created and chaired the SGML Users' Group France until 1997. He has created, designed and is developing Topic Map Loom, a Topic Map-based technology. He is giving workshops on Topic Maps as well as on FrameMaker application development.
Topic Maps are a tool to organize information in a way that is optimized for navigation. It addresses the problem of infoglut that we are facing. Too much information resolves eventually at no information, unless there are ways to filter and to extract efficiently the kind of information which is really needed. This problem has already been solved for printed material. Book indexes basically perform the same function, i.e. allowing readers to go directly to the portion of the document that is relevant to their information need. Topic Maps are the online equivalent of printed indexes, and it happens that they can do more: they are a powerful way to manage link information, such as glossaries, cross-references, thesauri, catalogs, they enable the merging of structured, unstructured information. The fact that topic maps are now becoming an international standard is also an incentive for software vendors, who are now able to propose standardized tools to manage link databases.
Topic maps represent a structured view over a set of information resources. The base resources themselves do not need to be structured. If they are already structured, then building a topic map can be partially and even totally automated. If they are not structured, then the information necessary to help navigated should be added. This process of post-structuring can be either as precise and costly than classical retrofitting applications, or made lighter because the only information that needs to be structured has to do with semantic. Every piece of information used for formatting does not play any role in topic map, and therefore the application is somewhat lighter than applications that require everything to be retrofitted prior to start any processing.
Topic Maps provide a better way to manage information by defining semantic categories used to instantiate links as first class objects.
Topic Maps are "connection hubs" in the information space. It reconfigures itself when something evolves. As topic maps are made with multi-headed links, when a single occurrence is added or deleted from a hub, all other existing connections still apply. The connectiveness of topic maps is their major advantage in terms of maintainability.
Topic Maps resolve in a model for creating descriptions of knowledge at variable levels of complexity. This model should be rather qualified as a "meta-model" since it is used for creating individual models. For that respect, the Topic Map architecture is like SGML or XML: it contains a common syntax used to define any possible variant.
Topic Maps are user-editable views on heterogeneous information repositories. On the one hand, they can be used to design a comprehensive model that represents the essence of the knowledge base of a whole company or even groups of subsidiaries, and on the other hand they can be used by individual users to define their own view of an information set (for example a hard disk) with their own meaningful terms. And every intermediate position in this spectrum is also possible.
A topic is the building block of a topic map. It is a multi-headed link, that points to all its occurrences. A topic link aggregates everything (i.e. every portion of information) that is about a given subject. The "subject" of a topic is the thing that it is about.
Topics are instantiated outside the information sources and they collectively comprise a topic map.
Among all occurrences of a given topic, a distinction can be made among subgroups. Each subgroup is defined by a common role. For example, simple mentions of a topic can be called "mentions", while occurrences that play the role of being the definition(s) can be called "definitions". Occurrence roles can be used to distinguish graphic from text, main occurrences from ordinary occurrences, etc. The occurrence roles are user-definable and therefore can vary for each topic map.
Topics can be grouped in classes called "topic types". A topic type is a category to which one given topic instance belong. Examples of topic type include "person", "city", "product", "part", "equipment", "work", "company", etc. The topic types are defined by the designers of each topic map.
Topic types can be used for example to build specialized indexes, and therefore improve search facilities.
The name is an important property of a topic. A name appears in 3 different forms:
The base name, the one which is used to actually designate the topic.
A name used to display the topic, which may be different from the base name. The topic map standard does not anticipate any specific use for display names versus base names, but the reason why there is this difference is to help applications find the relevant information for displaying topics. There might be cases where two topics have different base names but the same display name (although this is not necessarily a good idea). Display names can be something else than characters, they can be graphics. A topic may be represented by an image. It can be convient on certain systems to represent non-latin characters as graphics.
Sort names are the names used as sort keys. In some cases, the name itself will not work. Such cases include numeric data, or names including a roman numeral. There are languages in which the ordering is different from the one which is provided by the computers. For example, in spanish, 'll' is considered a letter as such, and therefore is alphabetically sorted after 'l' and before 'm'. Therefore, words starting with 'lo' will be sorted BEFORE words starting with 'll'. This behavior can be customized by the user using sortname to make the computer understand what the sorting sequence is.
A topic may have zero, one, or several names. The case of a topic having one name is the most common case. A topic without a name is a link between occurrences which are presumably on the same subject, without any explicit indication on what this subject is. These kinds of links are actually commonly used (at least with two occurrences): These are cross-references.
There are various reasons might have several names. It might be convenient to access a topic under several access keys, aka topic names; "art museum" and "museum of art" is an example of such a case. In a printed index, this is a case where a cross-reference is used between entrries. We would then find "Museum of art: see Art museum". If these two phrases designate exactly the same topic, there is no reason to create an indirection that slows down the time necessary to access the information resources, and therefore it is more efficient to consider the two phrases as alternate names for the same topic.
Another case of use of multiple names is a multilingual topic map, where the same topic is intended to be used by multiple language speakers.
Topics can be related together through some association expressing given semantic. For example, one topic can be a container for other topics, and it becomes possible to describe topic trees. This feature can help build "virtual tables of contents", i.e. tables describing contents which does not reflect a sequential order within a specific document, but instead organizes chunks of information as if they were presented in a classical, sequentially ordered document. In other words, the containment semantic for a topic association can serve to dynamically assemble fragments of information.
Any kind of semantics can be defined by topic map designers for topic associations. For example, an "employment" association can be used to describe the relationship between a person (employee) and a company (employer).
TOpic associations are almost ordinary links, except that they are constrained to only relate topics together. Because they are independent of the source documents in which topic occurrences are to be found, they represent a knowledge base which contains the essence of the information a company or organization is creating, and actually represents its essential value.
Topic associations can also be used to capture the semantic of the relations used to build thesauri. An unlimited number of topics can be associated within "topic associations".
Topic names, topic occurrences and roles played in association with other topics comprise the "topic characteristics". A characteristic is said to be assigned to a topic. This assignment can be qualified by its scope, to make an assignment more precise, and increase its relevance. In other words, the scope is what delimits the validity of an assignment. Scopes are made of a set of components called "themes". Scopes can be used to distinguish between vaarious names for the same topic (suc as, for example, nickname, formal name, usual name, casual name, etc.), or to characterize the domain of knowledge in which an assertion is valid. This feature can be used to disambiguate between names; scopes also qualify the roles played by occurrences.
The occurrence role is itself already a theme. But other themes can be used to specify the domain of validity of the assignment. For example, we can find the case where a given paragraph plays the role of a "definition" of a topic in the theme of "beginners" while it plays the role of a "mention" of the same topic in the theme of "experts" (because there is elsewhere a definition of the same topic that is interesting mainly for expert readers).
Scopes can be used for a variety of applications, an can go from implicit (no scopes expressed as such) to very sophisticated. Topic map software creators will want to offer a wide range of scope-enabling applications, from 'no scope' to complex querying/editing capabilities based on scope values.
Facet is a term which has a double meaning. On the one hand, a chunk of information may exhibit various facets. On the other hand, multiple facets can be applied to view the topic in different ways.
Facets are a mechanism to apply a property having a range of predefined values to any information object. Facets apply from outside the information resources, as topic maps do, but facets are independent of topic maps. They can be used to add information to topic maps if they exist, or also to qualify some information objects even when no topic map is applied. To draw a parallel with SGML/XML markup, applying facets would be like inserting attributes from outside.
Facets are used for example to filter (in or out) portions of information that exhibit a given value for a given property. They can be used for example to extract the portions which are in a given language in a multilingual environment. They can also be used to extract only portions of information appropriate to a given security level. Another use of facets is to extract from an information base what is relevant to a targeted audience.
There is a certain overlap between facets and scopes. Facets represent the simple way to implement a mechanism to sort out information. Scopes are more specific, as they are used to enrich the value of topic maps by enhancing the relevance of the characteristics that a topic map can return. Deciding to use one rather than the other will depend both on the topic map design decisions and on the features offered by the topic map aware software applications.