[topicmapmail] Conceptual Graphs are Step 6
Dan Corwin
dan@lexikos.com
Thu, 13 May 2004 11:07:00 -0400
Thomas B. Passin wrote:
> I really had [it] .. in mind to take
> representative instances of NL expressions, turn them (by hand) into
> CGs, and with the CG in hand, with its thematic roles, etc., construct
> associations as isomporphic as possible to those CGs.
We are thinking along very similar lines - that TM structures can be
built which are equivalent in meaning to CGs. We need only to first
find and formalize the rules of that equivalence.
> Having done that, you have templates for populating with the
> results of your processing steps.
I think we agree here. My working theory is verb-centric CG templates
for clause models, plus a few extra templates to handle the "logic".
> It looks like you have been doing this already.
Not exactly. I have instead been trying to devise a web framework that
can generalize the process, and boost its throughput to scale it up.
Instead of a few (verb) templates, done independently, I'd advocate you
write yours down in expandable template lexicons, which over time could
each grow to define a useful, domain-specific verb vocabulary.
And while working purely "by hand" is okay for self-education, for doing
anything real, even Q/A, I'd advocate building a simple web app to help:
Its code would read an input paragraph, then use constraints to select
and fill in the CG template(s) with each matching clause and topic. It
could work partly from rules, but on complex decisons it would ask for
your advice - exactly the same judgements you'd be making "by hand".
I'd guess if its UI were well designed, it should let you map text onto
the CGs in your lexicon at 10-100 words/per minute.
With minor extensions for CGs like those above, as I said earlier...
>> My MODELER design can do this. In fact, if you configure it
>> properly, and interact with it about noticed problems (like
>> misspellings), it will cull ambiguities from your paragraphs with
>> *virtually no* errors, and dump an XTM chart which says in *your*
>> ontology what they asserted
To which you replied:
> Well, you are so far ahead of me - I have never done any machine NL work
> - so there's probably not much I can offer here.
You could help in several ways. At the least, you could help test the
appropriate CG extension for MODELER, whose UI would resemble that of a
spelling corrector. You don't need any NLP experience to run it - just
name its input and lexicon files, then answer its questions about what
parts of which templates in the CG lexicon match which input topics.
Before you could run it, however, we'd first have to design and build
it, and that would require your CG expertise, which is way past mine.
So I go back to my original suggestion for a team effort:
>> here's a partial proposal for a 2-part [design] project ...:
>>
>> 1. On this public thread, we debate XTM models that would be ideal for
>> a chart, and craft a few prototypes by hand in LTM. Iff we get a
>> consensus, we will then have a CG-compatible base ontology in TM
>> terms, which MODELER will adapt to use for step 5 output.
Part 1 is important in simplifying the design of part 2. I'll focus on
ontology more directly later in my reply to Murray.
>> 2. To guide such work in part 1, we can also imagine a file-to-file
>> converter that takes that XTM chart and produces CGIF. We should
>> limit it to use *only* the XTM input syntax, then notice and feed back
>> requirements on what data that part 1 output must contain.
This "imagining" is essentially what I'm starting to do here. In this
post, for clarity, I've expanded the scope of the imagined converter to
include the raw text file MODELER's five steps convert to an XTM chart.
In practice, the step 6 logic for answering questions on CGs would thus
get only unambiguous XTM models of pre-parsed English clauses and their
topics, not raw English text. That should make the step 6 rules simpler
to write, in TM4JScript or whatever.
But its core goal would remain: fill the best CG template in the lexicon
for each given clause; ask the operator for "advice" to make any tough
decisions; and (hardest) ensure that this interactive Q&A process stays
fast enough to handle 10-100 words/minute of input text.
Such a step 6 process would help map English text inputs of 5 to 50
*kilobytes* per hour into equivalent bulk CGs, without errors unless
the operator screwed up or some bug existed in its rules or lexicons.
Depending on how you wrote those lexicons, the output could emerge in
either CGIF or linear CG notation - maybe easier to use for debugging.
Is this interesting to you? Does it interest any other readers?
If so, contribute to this thread, and let's see where it goes. :-)
Cheers,
Dan Corwin