Towards knowledge organization with Topic Maps   Table of contents   Indexes   Topic Map technology - the state of the art

 

Answer is just a question [of matching Topic Maps]

 Ksiezyk, Rafal  
 
 Rafal  Ksiezyk
 Managing Director
  Poland  
STEP Poland Ltd.
 Warsaw  
STEP Poland Ltd.,  Plocka 5a
Warsaw   01-231 Poland
Phone: (+48 22) 535 88 11 Fax: (+48 22) 535 88 14 email: rafal@step.pl web site: www.step.pl
 Biography
 Rafal Ksiezyk - Managing Director and Chief Consultant of STEP Poland Ltd. His main fields of interest are application of structured information technologies to solving of reference, legal and STM publishing problems.
 Prior to joining STEP Rafal worked for Polish Scientific Publishers designing and implementing document and knowledge management solutions for encyclopaediae.
 Rafal is a frequent conference and seminar speaker, including SGML/XML Europe 98 (Paris) and XML Europe 99 (Granada). He is also author of number of papers popularising XML/SGML technology, as well as, since 1996, maintains Web page "XML/SGML in Poland" under www.xml.pl.
 Rafal graduated from Department of Physics, Warsaw University.
 Abstract
 Expressive power of Topic Maps, commonly perceived as method for indexing of information resources, places the standard very close to artificial intelligence and knowledge modelling. Topic Maps resemble semantic networks and conceptual graphs, but offer more - a unique, standards-based way of encoding and exchange of the knowledge.
 Starting from description of similarities and differences between artificial intelligence and Topic Maps, paper demonstrates artificial intelligence features worth incorporating into Topic Maps domain along with practical benefits they provide.
 Further on it focuses on applications in information filtering and retrieval using proposed topic-map-defined query language and user profile definition.
 

Introduction

 TM 
 
TM , ISO/IEC 13250:1999 standard, describe what an information set is about, by formally declaring topics and associations, and by linking the relevant parts of the information set to the appropriate topics. Topic Maps help organise and retrieve online information in a way that it can be mastered by information owners and information users.TM play the same role as indexes play in books, and that thesauri play in editorial environment.
 AI, artificial intelligence 
 TM 
 
Due to its generality and expressive power they go far beyond meta-information modelling. Topic Maps can express facts, procedures and fairly complex relations between concepts. As such they come close to the knowledge representation, branch ofAI .
 TM 
 
Current paper presents the AI knowledge representation foundations as well as points out which of them may be useful forTM field of applications. In the following chapters information retrieval and filtering model is introduced and applied toTM environment.
 TMQL 
 
Last chapter presents proposal forTMQL syntax and semantic. It contains discussion ofTMQL usage scenarios. In particular it discussesTMQL application to user profile definition as well asad hoc query formulation on basis of already defined profile.
 TMQL 
 

Knowledge representation

 AI, artificial intelligence 
 
AI makes the claim that computers can be made to "think" on a level equal to humans, or at least states that some "thinking-like" features can be added to computers to make them more useful tools. "Think" and "thinking-like" means to use knowledge representation data structures and apply to it thinking representation processing. Knowledge representation techniques like semantic networks or conceptual graphs turn what we know about a particular domain into a form in which a computer can understand it.
 

Semantic networks

 SN 
 
SN create models involving nodes and links (arcs or arrows) between nodes. The nodes represent objects or concepts and the links represent relations between nodes. The links are directed and labelled, thus, a semantic network is a labeled, directed graph. The network defines a set of binary relations on a set of nodes. For instance, suppose that one wanted to represent the fact that Frederic Chopin was born in Poland. One might represent Chopin as one node, Poland as another node, and the relationwas born in as a link between the two nodes. The fact that Chopin also composed mazurka might be represented by adding a node "mazurka" and connecting it to the "Chopin" node with instance of acomposed link (see ).
 
 In a semantic net nodes represent objects and links represent relations among these objects. One can think of the objects as concepts or meanings and the relations among them conceptual relations (an intensional account) or think of the objects as entities in the world and relations among them (an extensional account). Notice that while the relations allowed are only binary this does not restrict the expressiveness of the semantic net. Any n-ary relation can be decomposed into a binary relation. The commitment to binary relations does has some effect on the fluency of the representation as this decomposition may do some damage to original conception underlying the construction of the network. Another restriction on semantic network representations as opposed to the more powerful frame based representations is a difficulty in representing kinds of things and natural classes. It is hard to represent what we mean by sayingParis is a city in a pure semantic net since a notion of inheritance of properties in a type hierarchy is lacking.
 

Conceptual Graphs

 CG, conceptual graphs 
 
The idea ofCG is a combination of logic with the semantic networks used in AI and computational linguistics. A concept graph is finite contacted bipartite graph. There are two kinds of nodes; concept nodes (displayed as a box in graph notation) which represent entities, attributes, states, and events, and relation nodes (displayed as a circle in graph notation) which represent the relationship among concept nodes. A single concept by itself ( -a) may form a conceptual graph but it is not the case with the relation ( -b), since every relation node should have one or more arcs each of which must be linked to some concept ( -c).
 
 A conceptual graph can be constructed by assembling percepts. In the process of assembly, concept relations specify the role that each percept plays and concepts represent percepts themselves. A concept can be an individual or generic. The functionreferent maps concepts into a generic markers denoted with names starting from * or individual markers usually denoted by numbers. The functiontype maps concepts into a set of type labels. A concept c with type(c)=t and reference(c)=f is displayed as [t:f] in the linear form. The function type can be applied to relation also. A relation r with type(r)=t is displayed as (t) in the linear form.
 Types of concepts (type labels) are organized into a hierarchy called type hierarchy. Type hierarchy forming operations includes conjunction and disjunction operations, so the type hierarchy will be a formal lattice.
 For example, a sentenceChopin composed mazurka can be represented in conceptual graphs as:
 
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA]
 or if we introduce individual concept - mazurka Opus 23:
 
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA:23]
 For example, a sentence "Chopin composed mazurka in his beloved Paris" can be represented by the following linear form:
 
[COMPOSE]-
->(AGNT)->[CHOPIN:*x]
->(OBJ)->[MAZURKA]
->(PLACE)->[PARIS]<-(THME)<-[LIKE]<-(AGNT)<-[CHOPIN:*x]
 This linear form finds the following graph representation ( ).
 
 

Lesson to be learned from AI knowledge representation techniques

 AI, artificial intelligence 
 
TMs are also to be used for knowledge representation purposes. Therefore they will face with typicalAI problems, including completeness of the language, effectivity and nonambiguity. It's worth to make a comparison of theAI languages presented above with topic maps.
 

Similarities and differences

 SN 
 
SN use nodes and labeled, directed arcs to express knowledge. It is not clearly defined whether used entities represent classes or particular instances of classes. Lack of clear specification of roles, which can be played by entities representing concepts and relations, was found to cause misinterpretations and ambiguity of the semantic in the model .
 CG, conceptual graphs 
 SN 
 TM 
 
Conceptual graphs are more precise and mature formulation ofSN . They relay on more atomic constructs, which used in defined roles of concept and relation are able to build higher order assertions.CG have expressive power equivalent to predicate logic and thus may represent natural language statements as well as any other data or knowledge expressible by natural language. This includes ability to expressTM data using conceptual graphs.
 CG, conceptual graphs 
 SN 
 TM 
 
Topic maps offer larger building blocks thanCG : topic types, associations with types and association roles with association role types.TM identify scope, although it could be expressed with special association type. The same applies to occurrences and facets. Generally,TM offer constructs with more predefined semantic thatCG andSN
 CG, conceptual graphs 
 
This is good on one hand but bad on the other, since we have a lot of ready made instruments to use, on contrary if we need to build a custom one - we have no basic materials. This would be the case with implementingCG concept ofcontext - used for instance to express assertions within the content of somebodies belief.
 AI, artificial intelligence 
 TM 
 
One of the lessons to be learned fromAI is thatTM should treat all of its constructs (including associations, association roles) as first class objects (topics), which is not allowed at the moment by the standard. This heterogenity may cause problems in building elegant and compact data models leveraging work done inAI .
 Lesson of semantic networks shows that we have to keep clear distinction between class and its instance levels for association and its association type (e.g. association type:composed , association:composed#23182 ), association role and its type (e.g. association role type:composer ; association role:composer#98723 ) and for other such constructs.
 CG, conceptual graphs 
 TM 
 
As shown in the previous sections,CG research has developed linear and graph notations that may be taken over byTM users.
 TM 
 

Linear notation forTM

 AI, artificial intelligence 
 SGML 
 TM 
 
Based on achievements in textual representation ofAI knowledge structures we may propose analogous language for coding ofTM s. Such language will enableTM interchange using more compact language and without in-depth knowledge ofSGML orXML syntax.
 TM 
 XML 
 
The proposed language is based conceptually on linear notation for conceptual graphs, though becauseTM use number of semantic constructs we are forced to introduce more verbose syntax. summarises basic constructs of the proposed language.
 CG, conceptual graphs 
 TM 
 
Square and round brackets used inCG for concept and relation are substituted by explicit name of construct with its identifier in square brackets. Directed arcs are replaced by dot, since we no longer need to mark directionality of relations which is expressed inTM s by association roles.
 TM 
 
Presented language will be used later as a syntax foundation for proposedTM query language.
 
topic  topic[]
topic with id "chopin"  topic["chopin"]
topic with declared topic type "person"  topic[].topic-type["person"]
association with id "as243"  assoc["as243"]
association with declared association type "born-in"  assoc["as243"].assoc-type["born-in"]
topic "chopin" plays a unspecified role in association which type is "born-in"  topic["chopin"].assoc-role[].assoc[].assoc-type["born-in"]
topic "chopin" plays a role of type "who" in association which type is "born-in"  topic["chopin"].assoc-role[ar1].assoc[].assoc-type["born-in"], assoc-role[ar1].assoc-role-type["who"]
topic "chopin" plays a role of type "who" in association which type is "born-in" where the other role of type "where" is played by "poland", topic "poland" is of type "country"  topic["chopin"].assoc-role[ar1].assoc[a].assoc-role[ar2].topic["poland"],assoc[a].assoc-type["born-in"], assoc-role[ar1].assoc-role-type["who"],assoc-role[ar2].assoc-role-type["where"]topic["poland"].topic-type["country"]
 TM 
 
In fact whether we use " assoc[a] " instead of " (a) " and " . " instead of " -> " is a matter of markup minimization and configuration of delimiters used. This issue is well known to SGML community. So one could propose isomorphic syntax but with more compact delimiters as a useful language e.g. for rapid coding of prototypeTM s.
 

Graphical notation

 TM 
 
As useful as textual notation is graphical representation ofTM structures. It may serve as one of possible graphical user interfaces forTM display on screen, as well as for editing and browsing purposes. Current proposal was inspired by graphical representation of conceptual graphs. provides graphical equivalents for the previously introduced TM statements:
 
topic
topic with id "chopin"
topic with declared topic type "person"
association with id "as243"
association with declared association type "born-in"
topic "chopin" plays a unspecified role in association which type is "born-in"
topic "chopin" plays a role of type "who" in association which type is "born-in"
topic "chopin" plays a role of type "who" in association which type is "born-in" while the other role of type "where" is played by topic "poland", which is of type "country"
 

Information retrieval and filtering for Topic Maps

 IR 
 TM 
 
After having defined linear and graphical notations forTM s we may try to approachIR forTM -encoded data. Information retrieval is usually described as way of leading the user to those documents that will best enable him/her to satisfy his/her need for information.IR is typically concerned with single uses of the system, by a person with one-time goal and one-time query, while information filtering is concerned with repeated uses by a person or persons with long-term goals or interests. InIR ad hoc goals of user are expressed by him/her in a form of a query. In information filtering stable interests of user are expressed in the form of user profile, which within a filtering process plays the role similar to the role of a query.
 IR 
 TM 
 
In case ofTM -based application retrievable resources may be considered as consisting of:
 
  • information resources (actual data, documents, illustrations, audio, also non-TM metadata)
  •  TM 
     
  • topic map domain data (metadata expressed in form ofTM )
  •  TM 
     
    Thus potentially retrieval may be twofold: at the level of information resources and at the level of topic map. In current research we focus for simplicity on retrieval in the topic map domain. General model will require combining of standard retrieval and filtering techniques for resources domain with the presented one forTM s.
     TM 
     
    In typical applications content is embered in a fixed and known data structure. In TM applications user is responsible for creating content, its structure and data model for information. This results in structural diversity and complexity ofTM data undergoing retrieval. Because of lack of data model (in the sense of e.g. relational databases) for theTM domain, retrieval has to incorporate meta-queries about data structure along with queries for data.
     SQL 
     TM 
     TMQL 
     
    ThusTM querying capabilities may be perceived as generalisation of relational queries. (It's good design principle to extend existing functionality instead of proposing completely new one.) So one of the principles while designingTMQL was to leverage existing and popular querying solutions, namelySQL . It was also inspired by research done in the area of information retrieval in conceptual graphs and graphical query languages developed for hypertext structures .
     

    RDB as a Topic Map

     RDB 
     
    The most frequently used information representation and retrieval modelRDB andSQL is worth being investigated for similarities withTM retrieval.
     RDB 
     SQL 
     TM 
     
    TypicalRDB structures may be expressed usingTM model. Sample representation is shown on .
     
     TM 
     
    PresentedTM model for relational database shows simple two level hierarchical entity structure. It consists of tables (employee andaddress ) grouped bycontacts association and individual field values (e.g.John ) bound in records to the table by association of typerecord in table .
     SQL 
     
    The designed query language should extendSQL in a way that it handles the above but also more liberal data structures.
     

    TMQL proposal

     SQL 
     
    InSQL pair of names connected with a dot: employee.name denotes all values of field name in the table employee . Because of fixed data architecture, roles played by the two parts of the expression are set totable andfield , and we evaluatefield part. But it is frequently of interest what are the tables that contain field called name . Such reverse querries or querries about data structure are not easily expressible inSQL
     SQL 
     TM 
     
    It is possible inSQL engines by querying dedicated MASTER database.
    . In ourTM model of relational database we would just use the same association which is used to answer the question for field value to answer the question of table names where the field belongs to. It would be enough to pick the other end of topic association.
     TM 
     
    The role entities play in ourTM data model may be defined by linear notation introduced in section . If we preserve SQL division of select query into traditional parts:select ,from andwhere we may start to build out first query:
     
    select topic[x]
    from
    topic[x]
    where
    x="Chopin"
    
     or the other version of it, which returns all topics of type "composer":
     
    select
    topic[x]
    from
    topic-type["composer"].topic[x]
    
     On the other hand we could ask for all the types of topic "Chopin", expecting that we will learn that besides the fact that he was "composer" he was also "pianist":
     
    select
    topic[t]
    from
    topic-type[t].topic["Chopin"]
    
     If one would like to know all the topics that are subtype of type "musician", he would need to use join:
     
    select
    topic[x]
    from
    topic-type["musician"].topic[t1], topic-type[t2].topic[x]
    where
    t1 = t2
    
     or equivalently:
     
    select
    topic[x]
    from
    topic-type["musician"].topic[t], topic-type[t].topic[x]
    
     RDB 
     SQL 
     TM 
     
    If we come back to theTM model forRDB we could reformulate the basicSQL query:
     
    select
    name
    from
    employee.name
    
     writing the following TMQL equivalent:
     
    select
    topic[x]
    from
    topic-type["table"].topic["employee"].asoc-role[].assoc[a]
    .assoc-role[ar].topic[x],
    assoc[a].assoc-type["record in table"],
    assoc-role[ar].assoc-role-type["name"]
    
     TM 
     TMQL 
     
    TMQL select query returns a topic map. In the examples above, it was just a list of topics, but select clause may contain more complex structures which enable user to create useful views overTM infopool. The following query represents a view that for every topic having supertype "person" makes "person" its explicit topic type.
     
    select
    topic-type["person"].topic[x]
    from
    topic-type["person"].topic[t], topic-type[t].topic[x]
    
     

    User profile as a TMQL query

     User profile may be considered as a query that is stable over time and returns wide range of objects related to the interest of the user. After specifying set of such queries they may be used for filtering of incoming data or may serve as a default filter switched on when number of hits fromad hoc queries is to large.
     Let's assume that some student is interested in piano music. We could assume that everything that is connected with both concepts: "piano" and "music" is relevant for him/her. Thus student may encode personal interest profile with the following query. Double dots ".." mean omission of association role, which improves reading in cases where association roles are free.
     
    select topic[x]
    from
    topic[x]..assoc[]..topic[prof-item1],
    topic[x]..assoc[]..topic[prof-item2]
    where
    prof-item1 = "piano" and
    prof-item2 like "music%"
    
     The above query may be stored as "my-piano-music-profile" by the user. To benefit from the profile any query executed by user should for instance include extendedwhere clause invoking condition contained in the profile query. If we use as an example query asking for all information related to Paris, the following query is expected to return great number of hits:
     
    select topic[x]
    from
    topic["Paris"]..assoc[a]..topic[x],
    asoc-type["related to"].assoc[a]
    
     And here is its version filtered by the user's profile, which is likely to return information about piano concerts in Paris and hopefully also topic of Chopin:
     
    select topic[x]
    from
    topic["Paris"]..assoc[a]..topic[x],
    asoc-type["related to"].assoc[a]
    where
    topic[x] in (my-piano-music-profile)
    
     TM 
     
    For useful exploration of information resources in the presented way we need some processing of topic map structures, which enable to interpret the association "lived in" as a specialisation of association "related to" so that the above query is able to find "Chopin" even in case where there is no direct association of the type "related to" between "Chopin" and "Paris" in the map. To achieve this intelligent inference of information contained in the map is needed. We may use the support from thesauri, which are in ideal case to be found in the supplementaryTM . One of theTM properties of great importance for information inference applications istransitivity property of associations, unfortunately missing in the standard at the moment.
     

    TMQL query expressed as Topic Map

     AI, artificial intelligence 
     TM 
     TMQL 
     
    We argued thatTM s seem to take over duties ofAI information representation techniques. If so it should be no problem to encodeTMQL queries as topic maps also. Let's have a look at the last query based on user profile definition displayed with help of graphical notation:
     
     TM 
     
    What has to be expressed additionally is: what of results of this query has to be returned back to the user. This could be easily done with a help ofTM construct - scope, which shuld be set as a flag (to a predefined theme) for entities to be returned.
     TM 
     TMQL 
     
    Interesting aspect of presented approach is that users may formulate queries either by usingTMQL syntax or by drawing the query using graphical tool similar toTM editor. Query in this representation looks very much like pattern of the requested answer with some unknown parameters. This fact should enable wide range of users to formulate advanced queries without knowing the syntax ofTMQL . If we would add natural processing unit at the front of query engine, then system would be capable of turning natural language queries into their topic map representation (which could be again refined by user) and use this form as high value input forTM query engine. presents overview ofTM retrieval application model.
     
     CG, conceptual graphs 
     TM 
     
    This was one of the goals ofCG research. For instance question "What is related to Paris?" can be directly mapped to theTM representation demonstrated on and .
     

    Summary

     AI, artificial intelligence 
     SQL 
     TM 
     TMQL 
     
    Based on previous research inAI we have introduced linear and graphical notations forTM knowledge structures. Those notations are used further on in the definition of proposed topic map query language.TMQL is positioned as a generalisation ofSQL , such that it covers alsoTM data having more complex structure than just relational. The paper reports work in progress rather than a result of finite research. Significant number of questions is still to be soved.
     Presented graphical notation for the queries shall enable users to express their information needs in a form of topic pattern. Such pattern is later on matched with information resources in order to find unknown parameters and present them to the user in a form of view over a topic map.
     Therefore answer to information request is a question of matching right topic maps: one for a query, one for user's profile and finally one for information resources.
     Bibliography
     
    Aman92 Amann B., Scholl M. (1992)Gram: A Graph Data Model and Query Language ,
     
    Belk92 Belkin N.J., Croft W.B. (1992)Information Filtering And Information Retrieval: Two Sides Of The Same Coin? , Communications of the ACM.
     
    Cruz87 Cruz I.F., Mendelzon A.O., Wood P.T. (1987)A Graphical Query Language Supporting Recursion , International Conference on Management of Data and Symposium on Principles of Database Systems, ACM.
     
    Grif82 Griffith R.L. (1982)Three Principles of Representation for Semantic Networks , ACM Transactions on Database Systems.
     
    Ksie99 Ksiezyk R. (1999)Trying not to get lost with a topic map , XML Europe '99, GCA.
     
    Lehm92 Lehmann F., Rodin E.Y. (1992)Semantic Networks in Artificial Intelligence , Pergamon Press.
     
    Sowa00 Sowa J.F. (2000)Knowledge Representation: logical, philosophical, and computational foundations , Brooks/Cole.
     
    Yang93 Yang G., Oh J. (1993)Knowledge Acquisition and Retrieval Based on Conceptual Graphs , Symposium on Applied Computing, ACM.

    Towards knowledge organization with Topic Maps   Table of contents   Indexes   Topic Map technology - the state of the art