Trying not to get lost with a topic map

Trying not to get lost with a topic map

Rafal  Ksiezyk

Information Architect
STEP Poland Ltd
 Warsaw  Poland
Phone:+48 22 695 43 08 Email: ksiezyk@fuw.edu.pl Email: step@step.pl Web: www.step.pl

Biographical notice

Rafal Ksiezyk is the Information Architect at STEP Poland Ltd, division of STEP St├╝rtz Electronic Publishing GmbH.

His main fields of interest are application of structured information technologies to reference publishing and law publishing, and effects at the border between documents and databases.

Rafal was responsible for implementation of SGML-based editorial system in Polish Scientific Publishers, the largest reference publisher in Poland.

He is author of some newspaper articles popularising XML/SGML technology in Poland, and also maintains Web page "XML/SGML in Poland" under http://www.fuw.edu.pl/~ksiezyk/sgml.html.

Rafal graduated from Department of Physics, Warsaw University.


Abstract

Topic Navigation Maps, international standard ISO/IEC 13250, offers a promising aids for classifying and navigating large corpora of documents.

Generality and complexity of the paradigm may cause problems with right information interchange between data model and editors and end users.

Discussion of hierarchical and distributed methods of information modelling lead to definition of canonical view of TNM data and proposal for standard GUI [Graphical User Interface] controls.

Intro

Let's assume we have a personal collection of millions or even thousands (it's enough to have hundreds) of documents being various kinds of texts, images, sounds etc. And let's assume we have to pass to somebody idea of what the documents are about and how are they interrelated. Wouldn't it be a topic map?

Basics

TNM [Topic Navigation Maps] is an international standard ISO/IEC 13250, established in March this year, providing a language (expressed with SGML [Standard Generalised Markup Language] and HyTime [Hypermedia Time-based Structuring Language] ) to build a layer of abstract topics and relations between them, helping to organise collections of documents.

The model allows our information objects to be classified as occurrences of particular topics. In this way topic represents a metacategory for group of documents, but in the other way is completely defined by set of it's occurrences.

Topics may be associated with other topics. Association is expressed by a link. Topics and associations may have assigned topic types and associations types respectively.

Fig. 1

Association between topics Granada and Spain

So we may for instance set up a topic "Granada" of the type "city" categorising an encyclopaedia article and some photographs, and relate it to other topic "Spain" with "located in" association.

Do we like hierarchy? Do we like trees?

Hierarchy classification means that we assign properties to objects in a tree-like structure. One root category is divided into more specific subcategories (branches).

Humans have got used to tree-like hierarchies. We started to get experience with trees from Eden, but seriously, Aristotle is recognised as the father of hierarchical classification. Nowadays most of the information classification and presentation methods are based on tree paradigm. The basic methods of science are analysis and synthesis, which reflect tree-like decomposition of the problem to the subproblems and finding the partial solutions recursively.

The largest benefit of tree-like decomposition is inheritance of the parent's properties by children. This allows us to deal with abstractions instead of dealing with unordered variety.

Of course when you address well defined fields of knowledge hierarchy fits nicely, but when you want to cover all the picture, or at the boundaries, it doesn't fit at all. One of the main driving forces behind development of TNM was the need to have more flexible categorisation scheme than only hierarchical ones.

If we want to preserve this in TNM we have to build rules for tree-like parametrisation of the map.

Do we like maps?

Leaving simple but limited paradigm of a tree, we fall into troubles in interfacing new map-like or net-like structure (precisely speaking this is described by mathematical graph theory; see paper Euler, Topic Maps, and revolution by Steve Pepper) to users.

There are lots of maps over the World. People are travelling. So what's wrong? Maps we know, have only one type of association between topics physical connection by road or rail, and topics are placed at the sphere with a simple topographical measure distance. Average TNM will have lots of association types, and no single distance measure between topics. It will look like acollection of Web pages with their links. Has somebody seen a map of the Internet?

Everything is a topic

If you take TNM as presented above, it looks heterogeneous. We have various beings: topics, associations, topic types, association types and more others, not mentioned here. But in fact every such a instance is also a topic. This is needed because e.g. someone may need to classify topic types into geographical (city, country) and social (politician, scientist). In this case topic type as a topic itself has assigned two possible meta topic types: geographical and social.

Such a mechanism makes TNM very powerful, since everything can be modelled from scratch using just topic and a link constructs, not necessarily beings predefined in standard. The second aspect is simplicity in software design, because every construct in TNM may be processed in the same way as a topic.

Such a homogeneity seems to be very useful in user interface design. Let's redraw our example with this new perspective (Fig. 2).

Fig. 2

Pure topic[TMLOOM-AMP]association view of Granada and Spain

Now we have something much more close to the tree. Parts of this map cut out may be treated as hierarchies. We are going right way.

On the other hand this kind of freedom may be dangerous. One of the lessons from the ERM [Entity Relationship Modeling] is: don't make too generic models. Frequently used example is single entity related to itself, said to be the model of the World. And this famous picture describes also our topic[TMLOOM-AMP]association model (Fig. 3).

Fig. 3

Generic model of the World

Do we care about users?

TNM seem to be flexible repository for storing knowledge. For successful application, as always, we have to make a design specifying meritorious and technical range of material covered by TNM . But the crucial point here is communication of the design, goals and meaning of the work:

At the level of topics every understands everything. The problem is that everybody's understanding is different. So it's hard to control the consistency of the work being done. If our editors are not sure what to do, what does it all mean, and what their colleagues are doing, we'd better give up and go back to the trees.

If we take TNM in the topics[TMLOOM-AMP]associations view, homogeneity lets us apply simple interface. This can be illustrated with a metaphor of the maze, where from one room you have simple choice of few corridors to go (like in tree) but finally you my come back to the same place (not possible in tree). Rooms play the role of topics and corridors are associations.

Speaking other words: crucial is to note that if you start exploration of the TNM at Fig. 2 from the single topic, the rest of the map becomes a tree. The only exception that may spoil the rule are loops, which are not present at the picure. In case of loops the travel to the leaves of the tree can be infinite, but may still make sense as the pursuit for information.

The above assumption lets us apply simple folder explorer control for TNM purposes (Fig. 4).

Fig. 4

Simple TNM GUI at the level of topics[TMLOOM-AMP]associations

After steeping down to the GUI based on the lowest level of naked topics, we can start to add more symbols for standard and custom constructs. On the Fig. 4 standard topics roles (association, topic type, anchor role) have symbol @ in their names for clarity. But cool GUI would go further and e.g. use icon of the wall for the topics of the type city. In this case for topic Granada containers city and @ topic type are not needed. Lots of such a modifications can be made depending on the application, but the canonical form has to be always remembered and available.

Canonical form will help to propose language to query and filter information contained in TNM.

Do we like summaries?

I haven't seen any commercial TNM , but I'll bet, this will look like a Gordian knot. You take it and unless you hide 90% of the information you don't know anything.

You try to ask: give me all topics that are associated through relation "located in" to topic "Granada" and through relation "about the subject" to topic "XML". And you find nothing!

Don't worry. This is not so important, as far as we are at "XML Europe" in Granada.

Acknowlegments

I would like to acknowledge support of my co-workers Ryszard Burek and Jacek Staszelis.