[topicmapmail] Fwd: CPAN release of WordNet::Similarity

Murray Altheim m.altheim@open.ac.uk
Thu, 10 Apr 2003 16:05:07 +0100


Guy Murphy wrote:
> Hiyas.
> 
> Off the top of my head and not at all well considered....
> 
> Are the concepts "dog" and "wolf" similar to each other?.... rate on a scale
> of 1 to 10, with 1 being "the same" and 10 being "not at all similar".
> 
> Are the concepts "house" and "river" similar to each other... rate on a
> scale of 1 to 10.
> 
> Now we're applying metrics to semantic proximity... can be sliced other
> ways, but it's no harder or easier that any aspect of a taxonomy, and is
> incredibly useful in any automated system.

No, no. I think you're missing my primary point. (but yes, such assignments
would be arbitrary -- I did understand that point).

I wrote the following as a problem in contexts:
 >>
 >>      "dog"  -->  "mammal"
 >>      "dog"  -->  "mammalia"
 >>      "dog"  -->  "canine"
 >>      "dog"  -->  "Canis familiarus"
 >>      "dog"  -->  "Canis domesticus"
 >>      "dog"  -->  "Canis lupus"
 >>      "Canis domesticus"  -->  "Canis lupus"
 >>      "puppy" --> "dog"
 >>      "Poodle" -- "dog"
 >>      "Tony Blair" --> "poodle"
 >>      "Fido" -->  "dog"
 >>      "Dog"  -->  "dog"  (lexically)

"dog" and "wolf" are both common names, but absent a context you don't know
*anything* about their relationship, e.g., "dog" could be in the context
of a pet store, a child's understanding, a woman in a bar at 2am, or
the name of somebody's goldfish. "poodle" could be a descriptor for a
specific dog, the name of a breed, or a moniker for a politician. When
we talk about semantics we must *always* talk about interpretation, and
this is *always* in a context. Applying a metric is impossible outside
of a specific context. This is a common mistake in modeling knowledge;
there are no universals.

> You create a taxonomy for a body of data... it's arbitrary. Either you'll do
> it well and people will find it useful, or you'll do it badly and they
> wont... same for semantic proximity.

Perhaps we're talking different language here. I don't consider *any*
of this arbitrary. The assigning of a 1-10 metric might be considered
arbitrary, but the essence of the taxonomy, or of any relation within
a taxonomy or ontology it itself not arbitrary. And while I'm skeptical
about most assignment of metrics, there are instances where metrics
are mathematically possible. Physical distances, for one.

> For papers on the matter, run a search on "semantic spatial indexing".
> 
> Humans find spatial relationships useful as they're used to dealing with
> them... is not really different than any other class of relationship,
> especially weighted ones... is just a weighted relationship.... there not a
> fully functional adult that one cant immediately start asking how near and
> far concepts are from each other, we think that way... not everybody thinks
> in terms of taxonomies as well.... hell you can ask children to arrange
> animal dolls on the floor of a room in relation to how similar they are to
> each other far easier than you can ask them to build a taxonomy.

Yes, I did quite a lot of research on this kind of thing. My take
on it is that we as humans "understand" this intuitively, but it's
exceedingly difficult to put metrics on that understanding. The
work of Frank Shipman and others comes to mind. Fridge magnet poetry
and the like...

> Conflicting scopes are no easier to resolve in a taxonomy than a spacial
> index.

yes, kinda "orthogonal" to the problem at hand, hence unsolved.

> If there is a difference it's that one can apply a logical non-arbitrary
> consistency of distribution of distances within a spatial framework for the
> whole body of data.... hence my comments about spatial indexing.... noting
> that the indexing can be in more than 3 dimensions. Advantages of this being
> that rather than track proximity relation to all other concepts, you decide
> where to place a concept and the proximity is then a given.

But as the girl with the dolls on the floor, I think this is one case
where the spatial proximity *is* quite arbitrary, or at least
unspecified.

> It's not my ballpark so I'm reluctant to comment further as I'd be talking
> about something for which I have no real background other than having kept
> an eye out for papers and material on the mater than pass infront of my
> nose.... at one point I did take a look at R-trees and the seemingly 101
> related tree types as a prospect for a multi-dimensional index my team was
> working on, but frankly it went over my head... I can't jump that high.
> 
> The rewards for the end-user of conceptual proximity (yep, I switched words)
> are simply too great to dismiss the matter.

The rewards for a great deal of endeavour in these areas are very
great, but as I mentioned previously, my understanding of the state
of the art in computational linguistics leads me to believe that
most claims of "semantics" are either bogus or overenthusiastic.
I'm in the middle of a tome called "Ontological Semantics" (by
Sergei Nirenburg and Victor Raskin) that is quite interesting,
though I hardly feel ready to comment on it. I may also have
mentioned before my contact with Torkild Thellefsen (google on
his last name and "Peirce"), which is more from an epistemological
or semiological approach, but interesting.

Murray

......................................................................
Murray Altheim                  <http://kmi.open.ac.uk/people/murray/>
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK

    Hunt the Boeing! And test your perceptions!
    http://www.asile.org/citoyens/numero13/pentagone/erreurs_en.htm