![]() |
Topic Map technology - the state of the art | Table of contents | Indexes | Creating semantically valid topic maps | ![]() |
|||
| Rath, Hans Holger |
| Hans Holger Rath |
| Director Consulting |
Germany ![]() Rimpar ![]() STEP Electronic Publishing Solutions GmbH ![]() | STEP Electronic Publishing Solutions GmbH,
Technologiepark Würzburg-Rimpar Pavillon 7 D-97222 Rimpar Germany Phone: +49.9365.8062.0 Fax: +49.9365.8062.66 email: consulting@step.de web site: www.topicmaps.com/ |
| Biography |
| Abstract |
Introduction |
ISO ![]() | TheISO committee JTC 1/SC 34/WG 3Information Technology – Document Description and Processing Languages – Information Association standardized ISO/IEC 13250 Topic Maps in the autumn of 1999. Formally speaking, the ISO standard defines a model and interchange syntax for Topic Maps. The initial ideas – which date back to the early 1990's – related to the desire to model intelligent electronic indexes in order to be able to merge them automatically. But during several years of gestation, the topic map model has developed into something much more powerful that is no longer restricted to simply modelling indexes. |
GPS, Global Positioning System ![]() | A topic map annotates and provides organising principles for large sets of information resources. It builds a structured semantic link network above those resources. The network allows easy and selective navigation to the requested information. Topic maps are the “GPS of the information universe”. Searching in a topic map can be compared to searching in knowledge structures. In fact, topic maps are a base technology for knowledge representation and knowledge management. |
| The basic concepts of the standard are topics, occurrences of topics, and relationships (“associations”) between topics. Section gives a short overview. |
| The section summarizes the paper and gives an outlook on further topic map developments. |
Topic maps in a nutshell |
SGML ![]() | The standard defines an interchange representation of topic maps defined in terms of anSGML architecture . A topic map is basically an SGML (orXML ) document in which different element types, derived from a basic set of architectural forms, are used to represent topics, occurrences of topics, and relationships (associations) between topics. The key concepts are thetopic (andtopic type ), thetopic occurrence (andoccurrence role type ), and thetopic association (andassociation type as well asassociation role type ). Other concepts which extend the expressive power of the topic map model are those ofscope ,theme ,public subject andfacet . |
XML ![]() | Note: |
| This short overview about topic map concepts provides the basics only. Application examples can be found in . |
Topics |
| Atopic , in its most generic sense, can be any “thing” whatsoever – a person, an entity, a concept, really anything – regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. |
| With the words of the standard, the term “topic” refers to the element in the topic map instance (thetopic link ) that represents the subject being referred to. |
| Examples of topics are : USA, Pennsylvania, Philadelphia, William Penn. |
| A topic should have one or moretopic types . Topic types are a typical class-instance relation and they are themselves defined as topics by the standard. Having topic types as topics the expressive power of topic maps is used to say more about the type. |
| Examples of topic types are : country, state, city, person. |
Topic characteristics |
| Every topic has two characteristics (or at least one of them): atopic name and anoccurrence . |
| The topic name consists of three parts: thebase name , thedisplay name , and thesort name . Only the base name is required. |
| Examples of topic names (base / display / sort) are : U.S.A. / USA / United States of America. |
| An occurrence is a link to an information resource that is somehow relevant to the topic. The linked resource is typically an information object outside the topic map. |
| Examples of occurrences are : chart of the USA, article about Pennsylvania, video about Philadelphia, portrait of William Penn. |
| Every occurrence belongs to oneoccurrence role type . Occurrence role types are – as topic types – themselves topics. |
| Examples of occurrence role types are : chart, article, video, portrait. |
Associations |
| The real power of topic maps results fromassociations between topics. |
| Examples of associations are : Pennsylvania is in USA, Philadelphia is in Pennsylvania, Philadelphia was founded by William Penn. |
| Each association has oneassociation type . |
| Examples of association types are : is in, was founded by. |
| Each topic that participates in an association plays a role. The role is described by anassociation role type . |
| Examples of association role types are : state / country, city / state, city / person. |
| Both association types and role types are again topics. |
Scopes |
| The concept ofscope is important to avoid ambiguities between topics and their characteristics. Any assignment of a characteristic to a topic is considered to be valid within certain limits, which may or may not be specified explicitly. The limit of validity of such an assignment is called its scope. A scope is defined in terms ofthemes and themes are topics. |
| Examples of scopes are : to distinguish between “Paris” in France and “Paris” in Texas assign the scopes “France” and “USA” to the two topics. |
Identity |
| Merging of topic maps requires a way of establishing the identity between seemingly disparate topics from different maps. The specification ofidentity attributes on the topic elements that address the samepublic subject is the explicit solution the standard offers. The other solution is implicitly through thetopic naming constraint which states that any topics that have the same name in the same scope refer to the same subject. |
Facets |
| Facets provide a mechanism for assigning property-value pairs to information resources without modifying them. A facet is a property; its values are calledfacet values . |
The missing pieces: an overview |
| During the years of its gestation the topic map model changed many times – from an extremely high level of generality to much more specific models designed to be used solely for navigation. The final result is – as most standards – a compromise. The working group believes that it offers an optimal balance between extreme power and flexibility on the one hand, and sufficiently well-defined semantics on the other. |
W3C ![]() | The members of the working group always had in mind that the standard has to be implementable, and they tended towards a more general model because of both implementability and applicability reasons. They knew that first practical applications might uncover concepts which are not explicitly described in the standard, but they felt it was more important to have a base standard approved and published than to delay publication any longer merely to add further refinements. Adapting the standard to the XPointer (or XPath) addressing format – as soon as it becomes aW3C recommendation – is already on the agenda of the working group. |
The STEP Group
|
Separating the declarative part |
| Topic maps are a well-designed standard for modelling semantic information networks. It defines the basic concepts and almost everything in the map is itself a topic. Even the “objects” declaring a topic map are topics, namely themes, topic types, occurrence role types, association types, and association role types. Having such recursive declarations makes perfect sense when the goals are to limit the concepts to a sensible minimum and make topic maps self-contained and self-documenting. |
| But the standard does not provide a name or definition for the list of declarative “objects” of a map and this can lead to some confusion: Users often mix up “declarative” topics and “regular” topics during discussions. In addition to that, the different tasks of topic map design, creation, and maintenance are hard to distinguish and to separate. The same is true for user access rights: As long there is no distinction, different rights cannot be assigned to the map. |
| The separate declarative part could also be used for defining classes of topic maps that share a common set of topics for types with predefined semantics. |
| The standard therefore stands in need of a formally defined construct that covers the declarative part of a topic map. |
Applying theoretical background |
| The most interesting constructs in topic maps as far as representing knowledge structures is concerned are associations. Because these are in fact relations it makes sense to take a look at mathematics and apply some of the theoretical background of relations. Furthermore the scientific fields of linguistics and philosophy may provide additional taxonomies. |
| The concepts that we find could lead to predefined basic association types and association properties. Neither of these are covered by the standard today, but they could offer much more precise semantics in the maps. The topic map template will be the ideal place to define them. |
Class-instance relation is not enough |
All topics, occurrences, and associations can be seen as instances of classes (types). The classes themselves are expressed as topics.
|
| This class-instance relationship is in fact merely a syntactically privileged association type, as the standard makes clear: |
|
| This means that the class-instance relation is an association type predefined by the standard. Any topic map software has to support it as a built-in function, e.g. by displaying the name of the referenced topic as the name of the type. |
| If we are looking at the class-instance relation from an object oriented view, then there is a justifiable demand for a superclass-subclass relationship as well. However, the standard explicitly declares that such a relationship has to be user-defined. Here are the relevant quotes: |
|
| STEP's experiences made with the encyclopedia applications show that the superclass-subclass relationship is a very powerful mechanism for performing inferencing, i.e. deriving implicit information about the current “object”. The implicit information can be used when querying the map or when declaring and/or checking consistency constraints. And because these features should be an integral part of a topic map software a user-defined and therefore application-specific solution is too weak. |
Questions of consistency |
| The standard has almost nothing to say on the subject of validation and consistency. The “Conformance” section of the standard focuses on the understanding of the defined constructs, the interchange syntax, and import/export of topic maps. But nothing more, as this excerpt from the standard shows: |
|
| A topic map author (or authoring team) needs system support when developing a map with millions of topics and associations. The question of the consistency of the map becomes a key issue, because it is nearly impossible to check a map of that size manually. |
| For that reason we need concepts to declare consistency constraints and to validate that those constraints have been obeyed. |
Topic map templates |
| The ISO working group has already responded to the need to be able to separate the declarative part of a topic map. It coined the termtopic map template for a topic map that only consists of topics that are declared in order to be used as types in a class of topic maps. At the present time this term is only “semi-official”, since the concept has not yet been refined and added to the standard. |
|
What is a topic map template? |
| A topic map template consists of all constructs which have a declarative meaning for the map (see
). These are all the topics used as themes and as types for |
| As we will see later, the class hierarchy information and consistency constraints will also become part of a topic map template. |
| The topic map designer shall mark the topics in the template for which kind of type they could be used in the "real" map. This can be done by either grouping the topics (see below ) or by assigning attribute values. The latter approach provides more flexibility for marking topics for more than one kind of type. |
| In any case it is clearly important that the topics of the template can be distinguished somehow from the topics of the topic map instance(s) belonging to the class of topic maps defined by the template, and that the template becomes a "manageable" object with its own (public) identifier, owner, version number, etc. |
Using templates in topic maps |
| The topic map template – which is a topic map – can be copied into or referenced by another topic map. |
| The copied template acts as a starting point for a new map containing all the themes and types which will be extended during the further development of the map. |
| The referenced template provides the basic themes and types which are used by the referencing map. A referenced template makes use of the merging features of topic maps defined by the standard. Thus more than one template could be referenced. Though the precondition for merging is the existence of carefully worded subject identities. |
Template modules |
| It might be meaningful that a template consists of sub-templates to modularize the design. Candidates for template modules are |
| But this is only one possibility. How the declarations will be clustered in modules depends to a large degree on the application specific requirements. The only important thing is that the template can easily identified and separated from the real map. |
Distributing the design and creation tasks |
| The design and creation of topic maps can now be split up into subtasks because of the availability of templates and template modules. Furthermore, user access rights of user groups as well as roles can be assigned. |
| The tasks of the designer might be: |
| The tasks for the editor might be: |
| The assignment of facets can be seen as a completely separate task. |
Role of topic map templates for ISO/IEC 13250 |
| The concept of templates offers the ISO working group the possibility of defining various templates which are specific for different application areas. These templates would contain built-in types (i.e. topics) and association types with predefined semantics which could be supported by “template-conformant” applications. |
| Such templates could be published as annexes to the standard or as separate standards, as has already been done with SGML DTDs (e.g. ISO 12083). |
Association taxonomy |
| The investigation of the theoretical backgrounds of relations leads us to the domains of mathematics, linguistics, artificial intelligence, and philosophy. All these scientific fields deal with knowledge representation and knowledge structures in one way or another. |
| We will concentrate on two issues from this broad research area: relations in mathematics (i.e. the abstract properties of associations) and relationship types in artificial intelligence and linguistics (i.e. specific classes of associations). |
Association properties |
The most important relations – in the mathematical sense – are thebinary relations
.
|
| Definition: Abinary relation between the sets A and B is: every subsetR ofA ×B (R ⊆A ×B ). |
| The properties which are of interest for topic maps are only effective for a restricted kind of relations. |
| Definition: Abinary relation in M is: a binary relationR withA =B =M , thusR ⊆M ×M . |
| A binary relation is also a binary predicate. |
| Definition: Apredicate (relation) R is fulfilled (true) forx ∊A andy ∊B (x ,y ) ∊R . |
| (x ,y ) ∊R can be abbreviated asxRy . |
| Now we can define the properties for relations inM . |
|
| Certain combinations of these properties define special classes of relations, of which there are four: |
| Definitions:
|
| Some examples of specific relations will serve to illustrate the various properties and classes of relations (M = {0, 1, 2, 3, ...}). |
|
| Why is all the theory relevant for topic maps? Let us analyze the association type “geographical_objectis in geographical_object”. It is transitive, anti-reflexive, and anti-symmetric; thus it is a strong order relation. Topic map software that was aware of these facts (i.e. the properties of this prarticular association type) would be capable of automatically deriving implicit knowledge from the map. |
| An example: From the given associations |
| It is obvious that the most informative statements of this example derive from the property of transitivity. |
| Another example: Let us analyze the association type “streetis parallel to street”. It is reflexive, symmetric, and transitive; thus it is an equivalence relation. |
| If we have the associations |
| The examples show that a simple set of association properties, i.e. the relation properties introduced above, would give more “knowledge” from the topic map than explicitly coded in it. This means that the map becomes smaller, that the effort creating a map will be minimized, that possible coding errors will be reduced tremendously, and that the inferencing capabilities of the topic map's query engine will be greatly enhanced. Furthermore the consistency checking can make use of the property information, which again improves the quality of the map. |
Basic association types |
The previous section introduced the basic association properties. This section investigates if also basic association types would make sense.
|
A lot of research has been done in the area of knowledge structures
|
| A large class is comprised of thepart-whole
orholonymy/meronymy
relations.
list six and seven subclasses of holonymy respectively: |
| Iris et al
reduce this to four basic subclasses: |
| According to onlysegmented-part andsubset exhibit transitivity. Individualfunctional-part orcollection-member relations could be transitive, but the property does not apply to these classes as a whole. |
| We can conclude that thepart-whole class with its subclassesfunctional-part ,segmented-part ,collection-member , andsubset shall be predefined association types – declared in a template. |
Some other relevant relationship types are
|
| Thesynonymy ,order , andstrict implication are transitive relations.Synonymy andsimilarity are symmetric. For everyresult-agent andtool-agent relation exists an inverse one (“agent” causes “object”, “agent” uses “tool”). Strict implication is non-symmetrical: you can sleep without snoring, but you cannot snore without sleeping! All these relations are candidates to be predefined association types that are declared in a template. |
| The contributions from linguistics introduce further subclasses forsynonymy relations (thesauri: ). Bothhyponymy andtroponymy represent the “is a” or “is a kind of” relation, which is already covered by the topic type construct. Thesynonymy subclasses seemed to be quite specific, thus there is no need to have them as predefined association types. They are in any case more appropriately handled through the use of multiple topic names. |
Class hierarchies |
| The realisation of the need for class hierarchies stems from STEP's encyclopedia projects. A topic map for a lexicon contains a very large number of topics (typical orders of magnitude are hundreds of thousands or millions) and associations (even more). But most of the topic, association, and occurrence role types can be reduced to a small number of “super-types” – as we have already seen in the previous section. |
Superclass-subclass |
| The superclass-subclass relationship of topic types, association types, and occurrence role types go hand in hand, as following examples shows: |
Class hierarchy and association type properties |
| The class hierarchies become even more important when the end-user navigates or queries the map. If someone would like to know “Which pieces of music were composed by Germans that were influenced by W.A. Mozart?”, it is very likely that this information is not exactly part of the map. But with just a few topics, transitive associations, and a class hierarchy the answer can be found very easily. |
| The facts of the map: |
The algorithm how the topic map software would find the solution with these facts could work as follows:
|
| This very simple example shows the power of combining class hierarchies with properties of association types (here transitivity). As already stated above, both class hierarchies and association type properties are the basis for compact topic maps, minimized creation and maintenance efforts, and a reduction of coding errors. |
| This supports our contention that the concept of class hierarchies should be a predefined association type of topic map template ensuring the correct built-in interpretation by the topic map software. |
Validation of consistency |
| All the previously introduced concepts extend topic maps in ways that increase their expressive power and ease creation and maintenance efforts. In addition to this, the topic map developer wants to have something at hand to help ensure the quality of the map. The information provided by a topic map based on the standard architecture is not enough – the developer asks for validation concepts. |
| Real life topic maps will consist of millions of topics and associations. Checking a map of such a size manually is clearly impossible, and yet checking is absolutely necessary for both proof-reading and quality assurance. It is obvious that both the designer and the editor need access to an automatic process that can validate a topic map against a set of consistency rules. |
| The validation is the task of the topic map development environment (e.g. an editorial system). It should be performed permanently or on demand – like structure validation against the DTD in an SGML/XML editor. |
| The standard has almost nothing to say on the subject of validation and consistency. The “Conformance” section of the standard focuses on the understanding of the defined constructs, the interchange syntax, and import/export of topic maps. But nothing more, as this excerpt from the standard shows: |
|
| This shows that we have to develop a schema language for the definition of the consistency constraints. |
Consistency constraints |
DTD, Document Type Definition ![]() | The topic map standard provides the architectural element types which can be used in a derivedDTD . However, the degree to which semantics can be modelled in a DTD and through content models is rather limited. A topic map will consist of a large number of “independent” elements which are connected by links and not by element structures. |
| Consequently a separate schema is needed which contains all the information necessary for the validation process. We call this constructconsistency constraints or justconstraints . The constraints are a set of predefined association types declared in the template. |
What should be validated? |
| Constraints may be assigned to three potential layers: |
| Here, we focus on the topic map modeling layer. |
Associations |
| The most important candidates for validation are the associations. This is obvious because they are the key concept and carry a large number of parameters which might be "misused". |
| The starting point is the association type. This controls which association role types can be combined. Beside the possible combination(s) the number of the various roles within these combinations might be of interest. |
| The association role type in turn governs the set of topic types which may be referenced. |
| It is necessary that the constraint schema brings the association type, the role type, and the topic type into a meaningful combination. |
| An example: |
|
Occurrences |
| The assignment of the proper information resource types – if type information is provided by the editorial system – to the occurrence role types is also of interest as well as the meaningful combination of topic types and occurrence role types. |
| An example: |
|
Scopes |
| Furthermore the correct use of scopes and especially the combination of different scopes might be checked. |
The topic type could restrict the possible scopes for the topics, their topic names, base name, display name, sort name, and their occurrences.
|
| The association types might restrict the meaningful scopes for the associations also. The combination of the meaningful scopes of the association and the referenced topics should be checked also because the association type is closely related to the possible types of the referenced topics. |
| An example: |
|
Topic names |
| For reasons of completeness checking of the topic names should also be possible. Topic names might be checked against text patterns or against database entries. The constraints will be governed by the topic type in question. |
| An example: |
|
| All type combination constraints might limit the number of superclasses and/or subclasses of the affected types. |
Conclusions |
| The new topic map standard ISO/IEC 13250 defines a model and architecture for the semantic structuring of link networks. It can be seen as a base technology for modeling knowledge structures. The standards working group defined topic maps in such a way that a limited but implementable set of core concepts express the necessary semantics. |
| The STEP Group investigated how topic maps can be applied to reference works and uncovered some concepts which are not made explicit in the standard: |
| The paper explained these concepts and presented meaningful solutions. |
| First experiences have shown that the part of a topic map made up by all topics used as themes and types by other "objects" in the map should be clustered somehow. For this purpose the term "topic map template" was coined by the ISO working group. Templates can be used as starting points for new maps or can be used by reference in order to provide all the themes and types the map needs. Standardizing topic map templates will offer base topic maps for specific application areas and could form the basis of semantic application profiles. |
| We looked at related academic fields like mathematics, linguistics, and philosophy to get some substantial input about relations. The results are a list of association type properties which give important hints to the topic map software and a list of basic association types which could act as built-in superclasses. |
| The introduction of the superclass-subclass relationship was the logical consequence. |
| Another technical issue covered by the paper is the validation problem. Topic maps might become rather big with millions of topics, occurrences, and associations. Manual consistency checking will be impossible. All the previously defined concepts open the possibility for sophisticated rule-based validation of topic maps. The proposed consistency constraints are those rules which declare the semantics not expressible with DTDs and which control the validation process. |
| A couple of examples proved that standardizing the missing concepts as predefined topic map templates will help both the topic map developer and the topic map user. The improvements were presented on a level that they can be used as input to the ISO working group for further discussions. |
| Acknowledgements |
| The author would like to thank his colleagues Geir Ove Gr⊘nmo (STEP Infotek, Norway), Rafal Ksiezyk (STEP Poland), Graham Moore (STEP UK), and Steve Pepper (STEP Infotek, Norway), as well as all the members of STEP's Reference Works Module Club – leading European reference works publishers – for their input and open discussions about topic maps. |
| Bibliography |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
![]() |
Topic Map technology - the state of the art | Table of contents | Indexes | Creating semantically valid topic maps | ![]() | |||