[topicmapmail] starting with topic maps: resources <-> topics relationship?

Josema Alonso josema@josema.net
Fri, 17 Oct 2003 09:49:09 +0200


> Great -- so at least some of your people are familiar with and acceptant
> of DC content.
Hehe...more or less...but they will :-)

> It shouldn't be too much work to write a Java tool that could import an
> XHTML document, grab whatever DC metadata content was already there, the
> content of <title>, reveal it in a GUI for review, and then rewrite it
> to the document. The author and revision information could be added at
> that time, including revision timestamp. I implemented something a bit
> similar to this in my Ceryle tool, and the coding is not difficult.
Yes, it does seem quite easy. I should think about it. Thanks.

> There'd only be one linking element for each page, unless you want there
> to be a Topic for each page. But if you've got a computer that's in the
> 2GHz range, you're not going to see much of a problem on parsing. You'll
> only have one copy of the current map to deal with, so even if it gets
> big, it shouldn't be a problem. If you use Kal Ahmed's TM4J topic map
> engine, you can use a persistent store backend like Ozone so that the
> whole thing won't have to live in memory.
The computer is not a problem. Is quite powerful and is a dedicated server
from Siemens.
I should think about the tools, too, of course. The more I think about the
more I'm sure I should make very big changes around here and moving people
from a static background to a more dynamic one. But that would be very
difficult with some of them. That's why I thought about plugins to their
common tools and building an intermediate layer by myself.

> Okay, good point. If you're dealing with that volume, there will be
> substantial file sizes no matter what method you use. You might be
> able to come up with some way of not dealing with it always as a file,
> such as keeping the topic map in a persistent store as I mentioned
> above. You could always export it to XTM for archiving, but the "live"
> topic map would live in Ozone.
Yep.

> That's okay so long as you have control, but once someone posts a
> PDF, they'll either have to supply the metadata and a means of
> linking that info with the PDF file (either external to or within
> the topic map), or you'll need a way to read the metadata from the
> PDF. I'm currently dealing with this same issue.
I see. I was thinking the same. I should go and ask our search engine
expert.
I don't see very difficult to read PDF or Word metadata. The problem is it
usually does not exist. Authors don't care about it. For example, we have
thousands of HTML pages with blank titles, and that's too bad.

> Well, I'd look at another library-based technology called Faceted
> Classification. An old technology from the 1930s, where subjects
> don't exist as atomic classes but are composites built from an
>...
Wow, thanks.
Maybe too much information for me at this moment. But I'm on the right track
learning the basics.

> It's always good to be learning new things, and it always helps
> me too to talk through ideas. Glad to be of help.
Great. Thank you very much. I'm really learning a lot and I love it :-)

Best,
    Josema.