![]() |
Towards knowledge organization with Topic Maps | Table of contents | Indexes | Topic Map technology - the state of the art | ![]() |
|||
Answer is just a question [of matching Topic Maps] |
Ksiezyk, Rafal ![]() |
| Rafal Ksiezyk |
| Managing Director |
Poland ![]() STEP Poland Ltd. Warsaw ![]() | STEP Poland Ltd.,
Plocka 5a Warsaw 01-231 Poland Phone: (+48 22) 535 88 11 Fax: (+48 22) 535 88 14 email: rafal@step.pl web site: www.step.pl |
| Biography |
| Prior to joining STEP Rafal worked for Polish Scientific Publishers designing and implementing document and knowledge management solutions for encyclopaediae. |
| Rafal graduated from Department of Physics, Warsaw University. |
| Abstract |
| Further on it focuses on applications in information filtering and retrieval using proposed topic-map-defined query language and user profile definition. |
Introduction |
TM ![]() | TM , ISO/IEC 13250:1999 standard, describe what an information set is about, by formally declaring topics and associations, and by linking the relevant parts of the information set to the appropriate topics. Topic Maps help organise and retrieve online information in a way that it can be mastered by information owners and information users.TM play the same role as indexes play in books, and that thesauri play in editorial environment. |
AI, artificial intelligence ![]() TM ![]() | Due to its generality and expressive power they go far beyond meta-information modelling. Topic Maps can express facts, procedures and fairly complex relations between concepts. As such they come close to the knowledge representation, branch ofAI . |
TM ![]() | Current paper presents the AI knowledge representation foundations as well as points out which of them may be useful forTM field of applications. In the following chapters information retrieval and filtering model is introduced and applied toTM environment. |
TMQL ![]() | Last chapter presents proposal forTMQL syntax and semantic. It contains discussion ofTMQL usage scenarios. In particular it discussesTMQL application to user profile definition as well asad hoc query formulation on basis of already defined profile. |
TMQL ![]() | Knowledge representation |
AI, artificial intelligence ![]() | AI makes the claim that computers can be made to "think" on a level equal to humans, or at least states that some "thinking-like" features can be added to computers to make them more useful tools. "Think" and "thinking-like" means to use knowledge representation data structures and apply to it thinking representation processing. Knowledge representation techniques like semantic networks or conceptual graphs turn what we know about a particular domain into a form in which a computer can understand it. |
Semantic networks |
SN ![]() | SN create models involving nodes and links (arcs or arrows) between nodes. The nodes represent objects or concepts and the links represent relations between nodes. The links are directed and labelled, thus, a semantic network is a labeled, directed graph. The network defines a set of binary relations on a set of nodes. For instance, suppose that one wanted to represent the fact that Frederic Chopin was born in Poland. One might represent Chopin as one node, Poland as another node, and the relationwas born in as a link between the two nodes. The fact that Chopin also composed mazurka might be represented by adding a node "mazurka" and connecting it to the "Chopin" node with instance of acomposed link (see ). |
Conceptual Graphs |
CG, conceptual graphs ![]() | The idea ofCG is a combination of logic with the semantic networks used in AI and computational linguistics. A concept graph is finite contacted bipartite graph. There are two kinds of nodes; concept nodes (displayed as a box in graph notation) which represent entities, attributes, states, and events, and relation nodes (displayed as a circle in graph notation) which represent the relationship among concept nodes. A single concept by itself ( -a) may form a conceptual graph but it is not the case with the relation ( -b), since every relation node should have one or more arcs each of which must be linked to some concept ( -c). |
| For example, a sentenceChopin composed mazurka can be represented in conceptual graphs as: |
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA] |
| or if we introduce individual concept - mazurka Opus 23: |
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA:23] |
| For example, a sentence "Chopin composed mazurka in his beloved Paris" can be represented by the following linear form: |
[COMPOSE]- ->(AGNT)->[CHOPIN:*x] ->(OBJ)->[MAZURKA] ->(PLACE)->[PARIS]<-(THME)<-[LIKE]<-(AGNT)<-[CHOPIN:*x] |
| This linear form finds the following graph representation ( ). |
Lesson to be learned from AI knowledge representation techniques |
AI, artificial intelligence ![]() | TMs are also to be used for knowledge representation purposes. Therefore they will face with typicalAI problems, including completeness of the language, effectivity and nonambiguity. It's worth to make a comparison of theAI languages presented above with topic maps. |
Similarities and differences |
SN ![]() | SN use nodes and labeled, directed arcs to express knowledge. It is not clearly defined whether used entities represent classes or particular instances of classes. Lack of clear specification of roles, which can be played by entities representing concepts and relations, was found to cause misinterpretations and ambiguity of the semantic in the model . |
CG, conceptual graphs ![]() SN ![]() TM ![]() | Conceptual graphs are more precise and mature formulation ofSN . They relay on more atomic constructs, which used in defined roles of concept and relation are able to build higher order assertions.CG have expressive power equivalent to predicate logic and thus may represent natural language statements as well as any other data or knowledge expressible by natural language. This includes ability to expressTM data using conceptual graphs. |
CG, conceptual graphs ![]() SN ![]() TM ![]() | Topic maps offer larger building blocks thanCG : topic types, associations with types and association roles with association role types.TM identify scope, although it could be expressed with special association type. The same applies to occurrences and facets. Generally,TM offer constructs with more predefined semantic thatCG andSN |
CG, conceptual graphs ![]() | This is good on one hand but bad on the other, since we have a lot of ready made instruments to use, on contrary if we need to build a custom one - we have no basic materials. This would be the case with implementingCG concept ofcontext - used for instance to express assertions within the content of somebodies belief. |
AI, artificial intelligence ![]() TM ![]() | One of the lessons to be learned fromAI is thatTM should treat all of its constructs (including associations, association roles) as first class objects (topics), which is not allowed at the moment by the standard. This heterogenity may cause problems in building elegant and compact data models leveraging work done inAI . |
CG, conceptual graphs ![]() TM ![]() | As shown in the previous sections,CG research has developed linear and graph notations that may be taken over byTM users. |
TM ![]() | Linear notation forTM |
AI, artificial intelligence ![]() SGML ![]() TM ![]() | Based on achievements in textual representation ofAI knowledge structures we may propose analogous language for coding ofTM s. Such language will enableTM interchange using more compact language and without in-depth knowledge ofSGML orXML syntax. |
TM ![]() XML ![]() | The proposed language is based conceptually on linear notation for conceptual graphs, though becauseTM use number of semantic constructs we are forced to introduce more verbose syntax. summarises basic constructs of the proposed language. |
CG, conceptual graphs ![]() TM ![]() | Square and round brackets used inCG for concept and relation are substituted by explicit name of construct with its identifier in square brackets. Directed arcs are replaced by dot, since we no longer need to mark directionality of relations which is expressed inTM s by association roles. |
TM ![]() | Presented language will be used later as a syntax foundation for proposedTM query language. |
TM ![]() | In fact whether we use " assoc[a] " instead of " (a) " and " . " instead of " -> " is a matter of markup minimization and configuration of delimiters used. This issue is well known to SGML community. So one could propose isomorphic syntax but with more compact delimiters as a useful language e.g. for rapid coding of prototypeTM s. |
Graphical notation |
TM ![]() | As useful as textual notation is graphical representation ofTM structures. It may serve as one of possible graphical user interfaces forTM display on screen, as well as for editing and browsing purposes. Current proposal was inspired by graphical representation of conceptual graphs. provides graphical equivalents for the previously introduced TM statements: |
|
Information retrieval and filtering for Topic Maps |
IR ![]() TM ![]() | After having defined linear and graphical notations forTM s we may try to approachIR forTM -encoded data. Information retrieval is usually described as way of leading the user to those documents that will best enable him/her to satisfy his/her need for information.IR is typically concerned with single uses of the system, by a person with one-time goal and one-time query, while information filtering is concerned with repeated uses by a person or persons with long-term goals or interests. InIR ad hoc goals of user are expressed by him/her in a form of a query. In information filtering stable interests of user are expressed in the form of user profile, which within a filtering process plays the role similar to the role of a query. |
IR ![]() TM ![]() | In case ofTM -based application retrievable resources may be considered as consisting of: |
TM ![]() |
TM ![]() | Thus potentially retrieval may be twofold: at the level of information resources and at the level of topic map. In current research we focus for simplicity on retrieval in the topic map domain. General model will require combining of standard retrieval and filtering techniques for resources domain with the presented one forTM s. |
TM ![]() | In typical applications content is embered in a fixed and known data structure. In TM applications user is responsible for creating content, its structure and data model for information. This results in structural diversity and complexity ofTM data undergoing retrieval. Because of lack of data model (in the sense of e.g. relational databases) for theTM domain, retrieval has to incorporate meta-queries about data structure along with queries for data. |
SQL ![]() TM ![]() TMQL ![]() | ThusTM querying capabilities may be perceived as generalisation of relational queries. (It's good design principle to extend existing functionality instead of proposing completely new one.) So one of the principles while designingTMQL was to leverage existing and popular querying solutions, namelySQL . It was also inspired by research done in the area of information retrieval in conceptual graphs and graphical query languages developed for hypertext structures . |
RDB as a Topic Map |
RDB ![]() | The most frequently used information representation and retrieval modelRDB andSQL is worth being investigated for similarities withTM retrieval. |
RDB ![]() SQL ![]() TM ![]() | TypicalRDB structures may be expressed usingTM model. Sample representation is shown on . |
TM ![]() | PresentedTM model for relational database shows simple two level hierarchical entity structure. It consists of tables (employee andaddress ) grouped bycontacts association and individual field values (e.g.John ) bound in records to the table by association of typerecord in table . |
SQL ![]() | The designed query language should extendSQL in a way that it handles the above but also more liberal data structures. |
TMQL proposal |
SQL ![]() | InSQL
pair of names connected with a dot: employee.name
denotes all values of field name
in the table employee
. Because of fixed data architecture, roles played by the two parts of the expression are set totable
andfield
, and we evaluatefield
part. But it is frequently of interest what are the tables that contain field called name
. Such reverse querries or querries about data structure are not easily expressible inSQL
|
TM ![]() | The role entities play in ourTM data model may be defined by linear notation introduced in section . If we preserve SQL division of select query into traditional parts:select ,from andwhere we may start to build out first query: |
select topic[x] from topic[x] where x="Chopin" |
| or the other version of it, which returns all topics of type "composer": |
select topic[x] from topic-type["composer"].topic[x] |
| On the other hand we could ask for all the types of topic "Chopin", expecting that we will learn that besides the fact that he was "composer" he was also "pianist": |
select topic[t] from topic-type[t].topic["Chopin"] |
| If one would like to know all the topics that are subtype of type "musician", he would need to use join: |
select topic[x] from topic-type["musician"].topic[t1], topic-type[t2].topic[x] where t1 = t2 |
| or equivalently: |
|
RDB ![]() SQL ![]() TM ![]() | If we come back to theTM model forRDB we could reformulate the basicSQL query: |
select name from employee.name |
| writing the following TMQL equivalent: |
select topic[x] from topic-type["table"].topic["employee"].asoc-role[].assoc[a] .assoc-role[ar].topic[x], assoc[a].assoc-type["record in table"], assoc-role[ar].assoc-role-type["name"] |
TM ![]() TMQL ![]() | TMQL select query returns a topic map. In the examples above, it was just a list of topics, but select clause may contain more complex structures which enable user to create useful views overTM infopool. The following query represents a view that for every topic having supertype "person" makes "person" its explicit topic type. |
select topic-type["person"].topic[x] from topic-type["person"].topic[t], topic-type[t].topic[x] |
User profile as a TMQL query |
| User profile may be considered as a query that is stable over time and returns wide range of objects related to the interest of the user. After specifying set of such queries they may be used for filtering of incoming data or may serve as a default filter switched on when number of hits fromad hoc queries is to large. |
| Let's assume that some student is interested in piano music. We could assume that everything that is connected with both concepts: "piano" and "music" is relevant for him/her. Thus student may encode personal interest profile with the following query. Double dots ".." mean omission of association role, which improves reading in cases where association roles are free. |
select topic[x] from topic[x]..assoc[]..topic[prof-item1], topic[x]..assoc[]..topic[prof-item2] where prof-item1 = "piano" and prof-item2 like "music%" |
| The above query may be stored as "my-piano-music-profile" by the user. To benefit from the profile any query executed by user should for instance include extendedwhere clause invoking condition contained in the profile query. If we use as an example query asking for all information related to Paris, the following query is expected to return great number of hits: |
|
| And here is its version filtered by the user's profile, which is likely to return information about piano concerts in Paris and hopefully also topic of Chopin: |
select topic[x] from topic["Paris"]..assoc[a]..topic[x], asoc-type["related to"].assoc[a] where topic[x] in (my-piano-music-profile) |
TM ![]() | For useful exploration of information resources in the presented way we need some processing of topic map structures, which enable to interpret the association "lived in" as a specialisation of association "related to" so that the above query is able to find "Chopin" even in case where there is no direct association of the type "related to" between "Chopin" and "Paris" in the map. To achieve this intelligent inference of information contained in the map is needed. We may use the support from thesauri, which are in ideal case to be found in the supplementaryTM . One of theTM properties of great importance for information inference applications istransitivity property of associations, unfortunately missing in the standard at the moment. |
TMQL query expressed as Topic Map |
AI, artificial intelligence ![]() TM ![]() TMQL ![]() | We argued thatTM s seem to take over duties ofAI information representation techniques. If so it should be no problem to encodeTMQL queries as topic maps also. Let's have a look at the last query based on user profile definition displayed with help of graphical notation: |
TM ![]() | What has to be expressed additionally is: what of results of this query has to be returned back to the user. This could be easily done with a help ofTM construct - scope, which shuld be set as a flag (to a predefined theme) for entities to be returned. |
TM ![]() TMQL ![]() | Interesting aspect of presented approach is that users may formulate queries either by usingTMQL syntax or by drawing the query using graphical tool similar toTM editor. Query in this representation looks very much like pattern of the requested answer with some unknown parameters. This fact should enable wide range of users to formulate advanced queries without knowing the syntax ofTMQL . If we would add natural processing unit at the front of query engine, then system would be capable of turning natural language queries into their topic map representation (which could be again refined by user) and use this form as high value input forTM query engine. presents overview ofTM retrieval application model. |
CG, conceptual graphs ![]() TM ![]() | This was one of the goals ofCG research. For instance question "What is related to Paris?" can be directly mapped to theTM representation demonstrated on and . |
Summary |
AI, artificial intelligence ![]() SQL ![]() TM ![]() TMQL ![]() | Based on previous research inAI we have introduced linear and graphical notations forTM knowledge structures. Those notations are used further on in the definition of proposed topic map query language.TMQL is positioned as a generalisation ofSQL , such that it covers alsoTM data having more complex structure than just relational. The paper reports work in progress rather than a result of finite research. Significant number of questions is still to be soved. |
| Presented graphical notation for the queries shall enable users to express their information needs in a form of topic pattern. Such pattern is later on matched with information resources in order to find unknown parameters and present them to the user in a form of view over a topic map. |
| Therefore answer to information request is a question of matching right topic maps: one for a query, one for user's profile and finally one for information resources. |
| Bibliography |
|
|
|
|
|
|
|
|
![]() |
Towards knowledge organization with Topic Maps | Table of contents | Indexes | Topic Map technology - the state of the art | ![]() | |||