[topicmapmail] Conceptual Graphs are Step 6

Jack Park jackpark@thinkalong.com
Thu, 06 May 2004 18:50:28 -0700


Useful thread, this.

I suspect that the new RM is much closer to the right modeling tool for 
doing CGs and having topic maps fall out simultaneously, than is XTM.

Jack

Dan Corwin wrote:

> * Tom Passin wrote May 1 on another thread:
>
>> The best approach, IMO, is to first convert the natual language 
>> constructs into Conceptual Graphs.
>
>
> I agree Conceptual Graphs seem useful, but building them is NOT the
> first step.  Classical NLP theory says five preprocessing steps would
> have to be run first on that language, even to extract its SITUATIONS.
> These steps are summarized in this diagram:
>
> [1] http://www.lexikos.com/charts/index_files/image003.jpg
>
> Each prior step is *required* to cull out one well-known kind of
> ambiguity from a paragraph - lexical, structural, semantic, referential,
> etc.  Unless all 5 are present and accurate, errors accumulate so fast
> that related software will just be a digital garbage churn.
>
>> Conceptual Graphs are ... easy to comprehend and were specifically 
>> designed to translate natural language statements into a formal logic.
>
>
> I agree, except CGs really do not *translate* NL.  They *express*
> assertions in predicate calculus about the role players that fill
> verb-specific association templates based on case frames.
>
> There is hope such assertions may prove useful.  A persistent theory
> holds that they will let agents in an FOL engine *react to* English
> statements with enhanced cleverness.  Time will tell.
>
> Even if that theory is true, in practice it cannot help until agent
> software can also - and first - *understand* those English statements,
> by removing ambiguities about what their speaker intended to say.
>
> At minimum, this means selecting which (dictionary) sense of every
> word the speaker had in mind, and which (contextual) topic best models
> the intended subject of every pronoun and definite noun phrase.
>
> Good news:  NLP software is getting better good at resolving such
> ambiguities.  With heuristics, public lexical data, and sneaky tricks,
> software encapsulating steps 1-5 can now guess about such things and
> in some circumstances generate low error rates.
>
> My MODELER design can do this.  In fact, if you configure it properly,
> and interact with it about noticed problems (like misspellings), it will
> cull ambiguities from your paragraphs with *virtually no* errors, and
> dump an XTM chart which says in *your* ontology what they asserted:
>
> [2] http://www.lexikos.com/charts/
>
> MODELER uses tricks, one of which is WORDS scripts.  They resemble CGs,
> and let you make similar assertions.  They can also substitute for a
> parser's level 3 code by forcing *you*, not grammar rules, to map each
> paragraph's English systax into legal WORDS syntax.
>
> Regardless of how they may be built, WORDS scripts can expand like
> association templates into TM structures that resemble CGs, but use TM
> paradigms and *your* TM ontology to express the intended meaning of the
> original paragraph.  This happens in subgraphs of MODELER's internal
> topic chart, which for each input it returns (by default) in XTM.
>
>
>> I consider Topic Maps to be essentially a subset of Conceptual Graphs 
>> (with a few additional wrinkles).  Some CGs can be expressed as TMs 
>> and some cannot.  The ones that can be so expressed are nearly 
>> identical except for some syntax details.
>
>
> I believe TMs can hold graph structures fully equivalent to those of
> any CG, but TMs have no standard inferencing model.  CGs do: some
> FOL engine that can infer things by using predicate calculus.
>
> I suspect that any part of CGs which a TM cannot express are related
> to their missing FOL engine.  But to me, normal conversion direction
> would go from TMs toward logic processing - not the other way around -
> so these lacks should present no real problems in any case.  The TM
> application software would simply have to take up the slack if a chart
> become fodder for somehting besides FOL.
>
>
>> The RDF folks are wrestling right now with how to make statements 
>> about subgraphs.  In a CG, you can draw a box around a collection of 
>> conceptual relations and their topics.  The box is an assertion 
>> (anything placed on the page in a CG asserted by definition), and it 
>> is called a "context box".  We need something equivalent for topic 
>> maps, and you will want it for the kind of things you seem to be 
>> getting into.
>
>
> I can easily believe RDF folks want to annotate subgraphs, because
> to learn what was said in any English paragraph, you need only query
> the subgraphs in its chart of topics - what associations were stated
> for each topic present.  Adding new statements about similar subgraphs
> would be a first natural step to "reacting to them" in the RDF world.
>
> Independent of details on *reacting*, the business impact of the
> software that charts topics should be non-trivial, as it will let
> people write new kinds of IR software that avoids lexical ambiguity:
>
> 1) chart any English paragraph about *your* domain in an XTM file
> 2) find the speaker's intended meaning in *your* TM-based ontology
> 3) query the merged version of such charts in something like TMQL
>
> And, if you really want it to, a chart can also serve as the input
> to a step 6 process, which reformats all its Topic Map subgraphs
> into proper CG notation, so predicate calucus engines can crunch it.
>
> So, Tom, okay - if you really do think CGs in volume are a good idea,
> here's a partial proposal for a 2-part project able to produce them:
>
>     1. On this public thread, we debate XTM models that would be ideal
>        for a chart, and craft a few prototypes by hand in LTM.  Iff we
>        get a consensus, we will then have a CG-compatible base ontology
>        in TM terms, which MODELER will adapt to use for step 5 output.
>
>     2. To guide such work in part 1, we can also imagine a file-to-file
>        converter that takes that XTM chart and produces CGIF.  We should
>        limit it to use *only* the XTM input syntax, then notice and feed
>        back requirements on what data that part 1 output must contain.
>
> Seriously, I have no concrete plans yet in MODELER for a CGIF output
> module, because I do not know CGs or FOL in depth; because I sense they
> are mostly a tool for teaching logic than a standard; and because it
> is not yet clear to me how many people actually care about CGIF.
>
> I hope this thread might yield interesting comments on such issues.
>
> Regards,
> Dan Corwin
>
>