[topicmapmail] starting with topic maps: resources <-> topics relationship?

Josema Alonso josema@josema.net
Wed, 15 Oct 2003 00:43:03 +0200


Hello.

> Welcome! We don't have a problem with Newbies around here, so long as
> they're willing to do a bit of homework now and then. Your questions
> seem quite pertinent and well-informed.
Sounds great! I'm still learning the basics but I really don't mind putting
a great effort in this area :-)
I'm so tired of just programming for years...;-)

> So long as each web page has a canonical URL, it can be brought in as an
> occurrence in a Topic Map. You'd probably want a sniffer to grab the
document's
> <title> and maybe other metadata info (like deliberately creating Dublin
Core
> content within <meta> elements and harvesting that same content when you
> do your mining).
I see. Some of our documents already have DC meta tags.

> But I'm not sure why you'd want things in the territory to necessarily
> point out at the map. Typically, the map points at the territory. And
> given that there's no browser support for <meta> usage such as you
describe,
> it's a bit of a wasted enterprise.
I have a problem with the designers. I should add every resource manually to
the map after them. So, they design a page, they finish with it and I have
to go after them adding the resource to the map. Too bad when they create
dozens per day, should find another way.

Also, I'm very afraid of the size of the map. Including thousands of pages
as resources (sorry if this is not the right name in the spec, maybe I
should say topic or occurrence or whatever, I promise I'm learning these
concepts but it takes a time) in the map could make it so large...

> Also to be noted, is that maps exist for different purposes. You see maps
> of North America for political boundaries, geographic features, weather
> zones, agricultural harvests, etc.  The territory itself is mined for
> information specific for each instance of a map.
Good point. For example, we have different profiles defined, and a page
should be linked to more than one category.

> Jack Park and I have been discussing similar ideas. Currently, the
> discussion centers around using Lucene as a search tool to create
> indices, which are converted into XTM for use within the Topic Map.
Hmmm...
We built a search engine using Lucene. It's indexing all of our
'*.uniovi.es' web sites. It usually takes almost a day to index the whole
domain. Sometimes even more. So, believe me, it is certainly a large number
of pages and servers. I'm still afraid of the size of the XTM file, and of
its manual update.

> You'd need tools to dig into various file formats such as MS Word
> or PDF if you plan to mine those types, or just a <title> and <meta>
> sniffer if not. If you use well-formed or valid XHTML rather than
> HTML as your content, you will have an easier time processing the
> files.
Ok, I see the point. At least I'll try them to use XHTML from now on.

> I looked into a project called DocSearcher, which seems to do a
> great deal of the above, but it would need to be completely
> reengineered, since it's not very well designed. But you could
>...
I'll take a look. Thanks.

>...
> I published a "spec" on using Dublin Core metadata in XHTML at
>
>     http://www.altheim.com/specs/meta/NOTE-xhtml-augmeta.html
>
> and there's also a number of good docs on the subject at the DC
> site itself. I'd use Dublin Core for your metadata as much as
> possible. It's a solidly-understood and accepted schema from a
> very successful project.
I'm printing the doc and I'll start reading it asap. I was not absolutely
sure about the DC metadata, but you're confirming what I thought about it
and that's why we already started to use it a while ago. So, I think we'll
go on with its use.

Wow, long message, full of ideas. Thanks a lot, Murray.

And now, before the end of this one, another random thought. I have been
also thinking about developing some kind of plug-in for the designers. They
use Dreamweaver.
This plugin would allow them to assign the page, once designed, to some of
the topic maps already created manually by me. What about it?
I'm just afraid of the XTM file size again, after thousands of pages
created...at least I should use an intermediate layer in here for sure.

Well, that's all by now. Very, very interesting discussion for me. This is
something I have tried to make right for years and as of today I still
haven't found a good solution. Maybe this time :-)

Best,
    Josema.