[topicmapmail] natural language or not?

Alexander Mikhailian ami at spaceapplications.com
Mon Dec 10 08:33:55 EST 2007


> My point here is that a Topic Map should not be considered 
> more precise than the use of natural language that forms its
> constructs

I guess you presume that topic names contain natural language text. It
would be interesting to see whether this statement is agreed upon in the
topic maps community.

Here is just one example of how one could speculate on this statement
from a linguistic point of view.

Let me first remind us of three important concepts that I think will be
useful for further discussion as well.

* Language is a collection of all possible combinations of tokens  
  that satisfy a given Grammar.

* Grammar is a set of rules that conditions the way tokens
  can be combined.

* Grammaticality [1] is a discrete (yes/no) quality of an utterance
  which states whether the utterance belongs to a Language or not.

One way to test the Grammaticality of an utterance is to construct a
parser for a given Grammar and then to try and parse the utterance. If
the utterance is parsed successfully, it is grammatical and thus belongs
to the Language.

Now, let us get back to Topic Maps and let us assume that the natural
language we speak about is the English language. English by itself is a
fuzzy concept. However, it has a well formalized grammar and a multitude
of parsers available. 

To test the Grammaticality of any given topic map we may just extract
the topic names and feed them to an existing English language parser. I
give no hints as to what parser to use as this is not the purpose of
this email.

However, I would intuitively expect the results to be significantly
worse than for a free English text like a newspaper article or even a
patent application.

One could explain this by the fact that, although the tokens used in
Topic Maps are very close to the English vocabulary, the Grammar in
Topic Maps is much less similar to the English grammar.

My guess is that we can speak about a separate Topic Maps grammar which
shall exist at two levels:

1) the relations between the tokens present in topic names 
2) the Topic Maps syntax

[1] http://en.wikipedia.org/wiki/Grammaticality

-- 
Alexander Mikhailian

P.S. The above reasoning is based on the idea of Grammaticality being a
binary property. This definition of Grammaticality is still largely
disputed, though. 




More information about the topicmapmail mailing list