[topicmapmail] starting with topic maps: resources <-> topics relationship?

Murray Altheim m.altheim@open.ac.uk
Tue, 14 Oct 2003 19:48:52 +0100


Josema Alonso wrote:
> Hi, all.
> 
> Alert! I'm a newbie. Well, a topic maps newbie around here. I've been doing
> web development for some years.
> I even have browsed the archives of the mailing list and found some known
> names from some Apache mailing lists like Xindice :-)

Welcome! We don't have a problem with Newbies around here, so long as
they're willing to do a bit of homework now and then. Your questions
seem quite pertinent and well-informed.

> Well, let's get into it.
> 
> I've been designated the IA for a complex web site. It's a Spanish
> university web site (http://web.uniovi.es). And I want to make something
> very different and nice. I have just started researching Semantic Web and
> found Topic Maps. And I'm very interested in them.
> 
> I'm pretty sure I would use topic maps but I have a big problem. We have
> thousands of static HTML pages. We also have web designers who make some
> dozens more everyday.
> 
> I'm thinking about building the IA for our site using Topic Maps. But I
> can't add any of our HTML pages as resources or topics or whatever in there.
> So I need a linking method from the HTML resources to the topics in the map.
> I've been thinking about adding meta information to the HTML pages related
> to one or more of the topics in the map. I would say adding a meta name
> pointing to one topic id could be the way. Does it sound stupid?

So long as each web page has a canonical URL, it can be brought in as an
occurrence in a Topic Map. You'd probably want a sniffer to grab the document's
<title> and maybe other metadata info (like deliberately creating Dublin Core
content within <meta> elements and harvesting that same content when you
do your mining).

But I'm not sure why you'd want things in the territory to necessarily
point out at the map. Typically, the map points at the territory. And
given that there's no browser support for <meta> usage such as you describe,
it's a bit of a wasted enterprise.

Also to be noted, is that maps exist for different purposes. You see maps
of North America for political boundaries, geographic features, weather
zones, agricultural harvests, etc.  The territory itself is mined for
information specific for each instance of a map.

> I understand this is not absolutely semantic, but could be a first approach.
> Our web designers will be developing pages as they already did for years, so
> I can't make big changes in there. But forcing it to add a meta name for
> example, could be possible. I'll think how to do it later.
> 
> Any ideas or pointers would be greatly appreciated.

Jack Park and I have been discussing similar ideas. Currently, the
discussion centers around using Lucene as a search tool to create
indices, which are converted into XTM for use within the Topic Map.
You'd need tools to dig into various file formats such as MS Word
or PDF if you plan to mine those types, or just a <title> and <meta>
sniffer if not. If you use well-formed or valid XHTML rather than
HTML as your content, you will have an easier time processing the
files.

I looked into a project called DocSearcher, which seems to do a
great deal of the above, but it would need to be completely
reengineered, since it's not very well designed. But you could
check sourceforge to see what it does for ideas. Sean Palmer and
I published a "spec" on using Dublin Core metadata in XHTML at

    http://www.altheim.com/specs/meta/NOTE-xhtml-augmeta.html

and there's also a number of good docs on the subject at the DC
site itself. I'd use Dublin Core for your metadata as much as
possible. It's a solidly-understood and accepted schema from a
very successful project.

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

   Monkeys use thoughts to control robotic arm
     http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2003/10/13/MN2018.DTL
   Bush uses media expertly to push apocalyptic view
     http://truthout.org/docs_03/091403J.shtml