Topic Map cartography   Table of contents   Indexes   Registries &, repositories

 

The "GPS of the information universe"

 Topic Maps in an encyclopedic online information platform
Wittenbrink, Heinz
 
 Heinz  Wittenbrink
 Product Development
  Germany 
 Munich 
www.wissen.de
www.wissen.de,  Leuchtenbergring 20
Munich   D-81677 Germany
Phone: +49 89 748515-31 Fax: +49 89 748515 89 email: heinz.wittenbrink@bertelsmann.de web site: www.wissen.de
 Biography
 Heinz Wittenbrink - studied Literature, Linguistics and Philosophy in Münster, Cologne and Paris. As editor and executive editor, he was responsible for numerous encyclopedic works of the Bertelsmann Lexikon Verlag, published in print as well as on CD-ROM. He is now in charge of the product development of the encyclopedic website www.wissen.de.
 Abstract
 The Bertelsmann Reference Group has launched a knowledge web site, www.wissen.de, in spring 2000. The encyclopedic content owned by the publishing house and external content from newspapers, magazines, other online services, scientific institutions, and museums are connected by a topic map structure that controlls the display of information, the navigation and the retrieval. This paper presents examples of the topic map structure, shows the navigation in the information pool, explains how the map was extracted from existing material, and discusses the collaborative enlargement of the information pool by the users.
 

A topic map architecture for an encyclopedic website

 It is the purpose of this presentation to show how topic maps are used for the information architecture of an online encyclopedia. The german encyclopedic website www.wissen.de shall be used as an example of a large scale implementation of the standard. In this paper I will describe a model of the use of topic maps in an encyclopedic application; this model is not yet completely implemented in our existing application and will certainly undergo some revisions during the remaining implementing process. I will not discuss desiderata of the standard but simply show that it works and that the basic concepts of the standard are highly useful for the organization of a very complex information network. Topic maps in www.wissen.de allow or will allow:
 
  1. thematic searches leading in a straightforward manner to the core information
  2. thematically restricted and complex searches
  3. the linking of related informations about the same subjects
  4. the display of informations about related subjects
  5. an important part of the the layout-control
  6. an intuitive navigation through the information and the personalization of the displayed content
 

Knowledge versus information - topic maps and the functionality of online-encyclopedias

encyclopedias
 
From the beginning the elaboration of the topic map standard seems to have been undertaken with the special requirements of encyclopedia publishing in mind. Encyclopedias are commonly referred to as one of the major application fields of topic maps . For encyclopedic publishing it is necessary to organize an extremly high amount of information resources comprehensively - articles, statistics, tables and illustrations -, to keep track of tenthousands and more cross references and to update multivolume works on a regular basis. Many encyclopedia publishers are working on several different A-Z encyclopedias in parallel, e.g. a one-volume, a ten-volume and a twenty-volume version. Experience shows that it is very difficult to synchronize the updating process and that a lot of work is usually done twice and more, as long as there is no "abstraction layer" as it is provided by topic maps.
 Usually A-Z encyclopedias have no index since they are organized as an index themselves.
But only in online encyclopedias topic maps will show their real power.
 Today's encyclopedias on CD-ROM and DVD-ROM have become economically as important as print encyclopedias, and it will only be a matter of time when the internet will become the main field of encyclopedic publishing. The online-version of the Encyclopedia Britannica is available for free since October 1999, and in the German speaking countries two major encyclopedias are being published online since spring 2000, one of which is "www.wissen.de" ("wissen" means "knowledge" and "to know" in German).
 What is the main difference between an encyclopedia on the WWW and a traditional encyclopedia? Of cause it is always possible to publish the print text of an encyclopedia "as is" on the net, but the relation between the information contained in an encyclopedia and the information on the same subjects to be found in other sources is different. The online encyclopedia shares the same virtual space and the same virtual time with all the other information resources on the net. All these informations are "just a mouseclick away". In the world of atoms encyclopedias used to be a sort of extract of information resources that were difficult and time consuming to access. In the world of bits it is possible to link the encyclopedia directly to all the informations that it summarizes.
 Nevertheless online encyclopedias have to solve a similar problem as their printed predecessors. Today the access to information is not difficult because it is time and space consuming to get to the information; today the access is difficult because of the ubiquity of information. The selections and results presented by web catalogues and search engines are neither sufficiently structured nor validated.
 The main purpose of an online encyclopedia will therefore be to give access to validated, updated and customized information, whereever this information can be found on the net. In many cases it is not necessary that the encyclopedia repeats this information. Linking to the information resp. integrating this information into the platform will be enough, especially when standards as XLink and new XML-based tools will allow the seamless integration of information from different external sources into virtual and "self-updating" documents. Online encyclopedias will resemble in some respects to portals and web directories; they may become intelligent information agents within the next decade.
 For these reasons www.wissen.de is not simply an encyclopedia on the net but what we call an "open encyclopedic information platform". The proprietary content of the publishing house - in our case dictionaries, encyclopedias and chronicles - is combined with content from commercial and non commercial partners and with editorially checked resources on the net. The information platform has to organize this content. If encyclopedias on the net are a tool which gives access to the continuously growing mass of information resources of the internet, a comprehensive linking engine as provided by topic maps is necessary. Encyclopedias themselves have to fulfill the functions of topic maps as defined in the standard:
 In general, the structural information conveyed by topic maps includes: groupings of addressable information objects arount topics (occurrences), and relationships between topics (associations).
 In the future topic maps will probably be much more than a tool for or a platform like www.wissen.de. They will be used to reformulate a lot of the information traditionally contained in A-Z encyclopedias and to connect them with resources accessible online.
 

The implementation of topic maps in wissen.de

 In www.wissen.de topic maps are implemented in the data model of the core application. The topic map content structure was translated into a relational database system with tables for topics, associations and occurrences. www.wissen.de works for its proprietary content - encyclopedias, dictionaries, chronicles - with an XML-based architecture; the XML-documents contained in the same database-system as the topic-informations are transformed by an XSLT-processor into HTML-documents delivered to the client. The search for information and the display of information are realized using java-servlets on the base of the topic characteristics of the information objects in the database.
topic-ID
 
The starting point of the repository of topics in www.wissen.de was an A-Z encyclopedic dictionary with about 120.000 entries. Roughly spoken each of these entries corresponds to a topic. This topic serves as an anchor around which information can be organized - "vertically" by adding information resources about this topic and "horizontally" by connecting it to related topics via "associations". The topic map standard requires a unique identifier attribute for every topic. We used keys from our legacy editorial system to get these IDs. Probably we will have to replace these keys in the future by strings using the international basename of each topic.
basename
displayname
sortname
 
In order to allow an internationalization of www.wissen.de, the basenames of the topics will be replaced by international, meaning English names. The sort- and displaynames are up to now only in German. The scope-attribute of the topicnames contains a reference to the language. This means that other language versions can be integrated as a sort of additional layers by adding display- and sortnames of the topics and occurrences in the respective language. In the future, foreign partners can also extend the www.wissen.de topic database by their national content. An Italian encyclopedic site may add a lot of italian subjects, that will also be accessible to the German user.
topic-type
 
For the attribution of topic types to the subjects covered by the A-Z encyclopedia a list of about 40 different entry types was used. Examples of topic types are "person", "ruler", "state", "animal" etc. Up to now we have not defined "superclass/subclass"-relations between these types.
 We consider that the topic map templates and consistency constraints desired by and others are vital for developing a topic map powered application to a knowledge base.
A second repository of topics was provided by a hierarchical index of categories. In the case of www.wissen.de it has up to five levels and more than 1000 categories. Examples are: "nature" - "astronomy" - "solar system" - "planets". From different chronicles we added events with the general topic type "event" and additional types as "discovery", "contract", "foundatation" etc. Dates are regarded as topics of the type "date" because that makes it easier to integrate them in the overall structure.
 I will not tackle here the special problems related to the monolingual and bilingual dictionaries integrated in www.wissen.de. Up to now the dictionary entries are regarded as treating topics of their own without any other relation to the topics of the A-Z encyclopedia than the identity of the search name.
 topic-associations 
 
The real power of topic maps resides in the associations between topics - typed links showing which relations exist between the subjects of an information resource. In a first phase we had to rely on informations already existing in our databases in order to establish a network of associations between the topics of www.wissen.de. Such informations are contained in the category index, in a geographical and a chronological index and in the existing cross references. The relation between a topic and a category is an association of the type "belongs_to_category". The discipline-topics are connected by "is_part_of"-associations. "Particle physics" is for instance a part of "physics". From the existing data it was also possible to take the cross references between articles and to use them for a "is_referenced_by" association. This association will have to be specified in the future because it is not really typed. A geographical and a chronological index allow some additional basic associations as "is_situated_in", "belongs_to_epoch" "happened_at_date". We are now in the process of adding editorially further associations. In the end most of the information contained in the A-Z-encyclopedia will be transformed into topic associations. We use regular expressions to extract the informations, e.g. to transform typical recurrent verbs in the articles as "wrote" or "consists of" into typed associations. Combined with a relational database containing all the "hard facts" statistical information this topic map database will be an electronic equivalent of a traditional encyclopedia. It will allow presentation of the content with graphics and tables that are easier to understand than traditional textual descriptions of usually rather boring facts.
 The content presented to the user of www.wissen.de is dynamically generated by display rules related to the topic characteristics in the database. These display rules use the topic type, the occurrence type and the associations in order to determine which information related to the topic is presented in what way and in which context. They take the user input into account - the immediate input by mouseclicks or typing, the supposed preferences of a user and the explicit preferences of registered users.
 

Search of a single topic

searchname
 
Topics are used in order to provide precise answers to the questions of the user. The general principle of the search in www.wissen.de will be "one query - one answer". The default search is done via the search names of the topics of the encyclopedia. Only if this search has no positive result the full text index is used. Normally the user gets one and only one structured result set provided by a topic link to the different occurrences of the topic.
 Example: When the user types the searchstring "Mars" this input is interpreted as a query for the topics with the searchname "Mars". The query can have three possible results:
 
  1. one topic with the searchname is found,
  2. more than one topic with this searchname is found,
  3. no topic with this searchname is found.
In case 1 the application displays information related to Mars, i.e.occurrences of this topic (see below ).The default is the display of the display name of the topic, of an A-Z-encyclopedia-entry and of links to other occurrences and to related information. This corresponds to the headword search in a traditional electronic encyclopedia. But there are some important differences: a) It is possible to replace the entry by another entry, e.g a longer encyclopedic article for another audience. The different media and related informations are not linked to the article but to the topic, so they will appear in the same way with the longer article. b) The topic seach works also for subjects without encyclopedic articles, either when these articles make no sense, e.g. in the case of historical dates, or when they do not yet exist. If new content is integrated, e.g. a travel guide, the editors will identify a lot of subjects not covered by articles of the A-Z encyclopedia and integrate them into the topic database. If there is no encyclopedia article other occurrence types are defined as "default occurrences" for the display on the screen. In the case of touristic destinations the default may be a photo and the beginning of a travelguide-text. In the case of dates the defaut is to display the chronicle entries related to this date and a timeline.
 If two or more topics with the same search name are found, the user has to choose in a list the subject he is interested in. The list shows also the topic type. In the case of Mars the user would have to decide whether he wants information about the planet or the roman god.
 Only if no topic with the search string "Mars" is found (or if the user decides himself to start it), a fulltext search is triggered. But the results of this fulltext search are structured as topic occurrences.
 

Occurrence roles and Xlink

topic-occurrences
 
All informations in www.wissen.de are handled as topic occurrences. This means that
 
  1. they can be retrieved via the topic
  2. links between informations about the same subject are topic links
  3. the role and the type of the occurrences is explicitly specified
  4. internal and external links are handled in the same manner
 Topic maps constitute the model for all kinds of links used in www.wissen.de. All HTML-links visible to the user are generated on the base of topic- and association-links stored as Xlinks in the database. Topic maps allow the administration of the links; new content is integrated into the site by links to existing - or new - topics. When new topics are introduced, they are connected by topic associations with the existing topics. So it is basically via topic maps that www.wissen.de constitues an information network.
 Wissen.de integrates different types of information. The goal is to give access to different levels of information about every topic. The user can decide on the depth of information he is interested in. Examples of occurrence types and roles are:
 
  1. treatment in a short encyclopedic article,
  2. mention in a short encyclopedic article,
  3. dictionary entries,
  4. chronicle entries,
  5. treatment in an overview article,
  6. mention in an overview article,
  7. special informations provided by partners, e.g. scientific institutes, museums and magazines,
  8. photo with caption
  9. animation with voiceover
  10. statistical data
  11. news and informations about special events (TV-shows, webcasts, congresses etc.),
  12. informations provided by the members of the www.wissen.de-community, e.g. in forums, chats and faqs,
  13. editorially reviewed informations on other internet sites.
 About the planet Mars e.g. www.wissen.de will contain: a short article on the planet with basic information; an explanation of the word "mars" and its ethymology, several chronicle entries about the exploration of Mars, a long article about the solar system where the planet Mars is treated on several pages, another longer article about space probes, pages from the ESA, the German Museum in Munich and a Max-Planck-institute, hints to a TV-coverage of the solar system, dates for possible observations of Mars, an expert chat about the possibility of life on Mars, links to the best internet sites with information about Mars. These are different occurrences of the topic "Mars". The display of these informations is controlled by the layout the user has chosen, by the topic type and by the role of the occurrence.
 To the user the different layouts appear as sections, e.g. "A-Z", "how-to", "school". Within these sections queries look for topics about which information exists in occurrences with a certain role. If the user is in a section with recipes and types in "pepper" he gets the occurrences of pepper in the recipes as result. When one occurrence is displayed on the screen, the others are shown as typed links.
www.wissen.de uses the occurrence type attribute to specify whether an information resource belongs to www.wissen.de, to a partner whose content is displayed in the look-and-feel of www.wissen.de, to a partner whose content is displayed in a new browser window or whether it is third-party-web-content. The occurrence roles are the different text- and mediatypes used in www.wissen.de.
occurrence-role
occurrence-type
 
The occurrence roles determine how an occurrence or a link to an occurrence is displayed on the screen. The HTML-templates contain a reserved default space with a headline for all occurrence roles: In the right of the main content display there is space for longer encyclopedic features, news, internetlinks etc. These fields are filled with occurrences of the topic in case those occurrences exist. IF THEY don't exist, the fields are filled with occurrences belonging to the next upper node in the category tree. If the user types in "Mars" the application looks for occurrences of the topic Mars with the role "news". If it doesn't find any news about Mars it displays all existing news about the solar system because "solar system" is the category Mars belongs to. If there are no news about the solar system news about astronomy are displayed, if there are no astronomical news, news about natural sciences and so on.
 Xlink 
 
What appears to the user as a traditional HTML-link frome one wissen.de-resource to another internal resource is "behind the scene" in the database an extended Xlink connecting two occurrences of one and the same topic. Inline links are either links to topics or to topic occurrences. A link from the article about Mars to the astronomer Schiaparelli who discovered the Mars channels is generated by treating Schiaparelli in the article as an occurrence of the astronomer. The role of this occurrence is "mention". The default behaviour of links of this type is to open a small window with the short encyclopedia article about the mentioned topic. Links from a section in a document to a section in another document will be realized by treating both sections as occurrences of the same topic. The Mars channels for instance are a topic on their own. When this topic is also treated in a passage about earthbound astronomical observations in another text, a link between both passages can be generated.
 Thus the topic map architecture allows a semantic treatment of all the links contained in www.wissen.de. The links are bound to topics and have a precisely defined functionality either in presenting the relations of one topic to another topic or in specifying the function of an information resource for a topic. Whereas the latter could also be done via Xlink attributes, without the topic structure links could only connect the information resources without specifying the semantic basis of the link. The topic map structure will comprise all kinds of content on www.wissen.de (in the moment a lot of content, easpecially from partners, is integrated before the appropriate topic map structuring is done). Each piece of content will have a defined occurrence role.
 The default display of content is the A-Z encyclopedia article with links to related occurrences. But in many cases the users will not be interested primarily in an encyclopedia article. School students will look for learning stuff; people interested in cooking will look for recipes, other users will want to get directly informations from members of the www.wissen.de-community about e.g. certain kinds of products. When travel guides are added to www.wissen.de, many users will be interested to get the travel information directly without having first to pass by encyclopedia articles. Therefore the user can choose directly a content region as - in the moment - "school", "how-to" or "opinions". If he does so content of the appropriate occurrence type is displayed in the center and in a template with a special layout. The search is restricted to topics of specified categories and/or occurrence roles. Nevertheless the user has via links access to all other occurrences of the topic.
 

Associations and specified search

 topic-associations 
 
The advanced or specified search makes use of topic types and associations. Theoretically every possible combination of types and associations can be used as search input. It is possible to search for all "persons born in France in the 18th century" or for all "mammals living in India". This type of seach is in the moment realized via a selection in lists. In a list of topic types the user has to select the type, in a list of content categories he can select the discipline etc. He can also select associations, e. g. "born in" and type in associated topics. The result is a list with all the topics possessing the selected properties. It is one of the main targets of our software development to allow this type of search via a natural language input, because input fields and lists are not flexible enough for complex queries. The input string should simpy be something as "Which philosophers were born in France in the 18th century and travelled to Britain?"
 

Semantic networks and visual navigation

association-types
 
Typed associations are the most visible and spectacular feature of topic maps. Topic links with multiple, typed occurrences, an id, searchnames and display names, a basename and types are extremely useful for the establishment of a complex index, but for the user they act in the background and do not much more than to facilitate his search. Associations make it possible to construct information spaces, to show which subjects are connected between themselves and in what the connection consists. Topic map supported information applications allow therefore new ways to navigate in information universes.
 The traditional encyclopedias in electronic formats allowed only a keyword or fulltext search. The fulltext search con be refined by Boolean operators. The topic search as it is realized in www.wissen.de allows to find information that is not retrievable by keywords. A search for "solar system" in a traditional encyclopedia yields the article about the solar system and all the sparsed occurrences of the string "solar system" in the texts of the encyclopedia. It will not find for instance an article about comets if this article does not contain the words "solar system". The topic search finds the different information objects about the solar system as such as well as the different "components" of the solar system, e.g. planets, asteroids, comets etc.
 Associations from one topic to other topics are represented as a list of related subjects, classified by the association type, or graphically by a java applet called "visual index" that shows the associations as labeled arrows and the topics as nodes. In our example these links would point from "mars" to the other planets (topics of the same topic type), to "planet(s)" (the topic that is the topic type of "mars"), to the moons of Mars Phobos and Deimos etc.
 The applet gives the user the freedom to decide which associations he wants to see. If he looks for a town, he can for instance look only for the people who were born in this town. In the topic map standard associations are themselves topics that have types. In combination with the types of the associated topics this makes it possible to switch whole groups of associations on and off.
 The association "lies in" is geographical. The association "fostered" can not be specified by a discipline, but the persons that are associated with a geographical place can be classified. The controller of the topic applet has to query for the types of the associations and of the associated topics in order to present only one category of information. The categories of the different topic types are in an hierarchical order that is represented in the database. Writers, painters and composers belong to the category "artists" which is itself associated with the catagory "culture".
In the case of a town www.wissen.de can present e.g. only the geographical, historical or economic associations. All the asssociations point to topics that are themselves treated in www.wissen.de, because they have their own occurrences.
 

Complex topics, scope and the organization of information

 Usually examples of topic maps use topics that have proper names and very simple associations, e.g. "born in". A very large part of a normal A-Z encyclopedia with short articles can be represented by this kind of topics and associations. In www.wissen.de they constitute the basic network of the overall topic map structure. But as soon as topic maps are used as metadata for longer encyclopedic and scientific texts, other types of topics and associations have to be introduced. The existence of life on Mars for instance is an important topic, whose type may be "scientific problem".
 scope 
 
The attribution of a scope to the associations of such complex types can be used for a structured display of information resources. If the topic is "Classical Greece" it is possible to use scopes as "basic", "school knowledge" and "scientific archeology" to show which topics and which topic occurrences are related in which context to the subject.
 

Personalization and collaborative features

 personalization  
 
Personalization of information is one of the important aims of www.wissen.de. The user should get exactly the information he needs in a specified time. It should normally not be the task of the user to search for the interesting pieces of information in an enormous amount of useless matter. Registered users of www.wissen.de can decide themselves about important aspects of the information that is displayed. They can choose which categories of information they want to see on the start page, about which subjects they want to be informed in newsletters etc. But even users who are not registered can receive a high amount of personnalized information if the application interprets the actions of a user as informations about his interests and preferences. That means that the display of information is controlled by rules which take the interaction of the user with the application into account. The user who selects on the start page a feature about the Sun is most likely interested also in other astronomical items. So he will get the feature about the Sun in the central panel, surrounded by links to the other occurrences of the topic sun, but also by links to other astronomical news, features etc. This content must be generated dynamically. In www.wissen.de the personalization of content is not yet completed. In the near future the application will look for topics that are as similar as possible to the topic selected by the user, and it will at the same time look for occurrences of the same category if there are no directly related occurrences. To a user who looks for Florence the application will also offer links to Venise and Milano. The quality of this personnalized access to information depends on the denseness of the semantic network of the application. The associations, types and occurrence roles in the topic map model allow it to determine very precisely where information objects are similar and where not. Therefore they provide an ideal base for personalized sisplay of information.
 A lot of basic topics and topic association will be common to many different applications. It is an open question whether it makes sense to maintain topic maps as propriety of a company or whether they should be developed in an open source model comparable to the Mozilla and the Open Directory Project.
 Acknowledgements
 Many thanks to Hans Holger Rath (STEP GmbH, Rimpar) and Holger Hvelplund (TEXTware AS, Copenhague) for the introduction into the subject and the patient answering of numerous newbie-questions.
 Bibliography
 
ISO 13250:1999 International Organization for Standardization, ISO/IEC 13250, Information technology - SGML Applications -Topic Maps, Geneva:ISO 1999 (available at http://www.topicmaps.com )
 
Ksiezyk1999 Rafal Ksiezyk, Trying not to get lost with a topic map, Granada:XML Europe 1999 (available at http://www.topicmaps.com )
 
Rath1999 Hans Holger Rath, Technical Issues on Topic Maps, Markup Technologies 1999 (PDF at: http://www.topicmaps.com )
 
Pepper1999 Steve Pepper, Euler, Topic Maps, and Revolution, Granada:XML Europe 1999(PDF at: http://www.topicmaps.com )
 
Rath/Pepper1999 Hans Holger Rath, Steve Pepper, Topic Maps: Introduction and Allegro, Granada:XML Europe 1999(PDF at: http://www.topicmaps.com )

Topic Map cartography   Table of contents   Indexes   Registries &, repositories