[topicmapmail] Conceptual Graphs are Step 6
Jack Park
jackpark@thinkalong.com
Thu, 06 May 2004 18:50:28 -0700
Useful thread, this.
I suspect that the new RM is much closer to the right modeling tool for
doing CGs and having topic maps fall out simultaneously, than is XTM.
Jack
Dan Corwin wrote:
> * Tom Passin wrote May 1 on another thread:
>
>> The best approach, IMO, is to first convert the natual language
>> constructs into Conceptual Graphs.
>
>
> I agree Conceptual Graphs seem useful, but building them is NOT the
> first step. Classical NLP theory says five preprocessing steps would
> have to be run first on that language, even to extract its SITUATIONS.
> These steps are summarized in this diagram:
>
> [1] http://www.lexikos.com/charts/index_files/image003.jpg
>
> Each prior step is *required* to cull out one well-known kind of
> ambiguity from a paragraph - lexical, structural, semantic, referential,
> etc. Unless all 5 are present and accurate, errors accumulate so fast
> that related software will just be a digital garbage churn.
>
>> Conceptual Graphs are ... easy to comprehend and were specifically
>> designed to translate natural language statements into a formal logic.
>
>
> I agree, except CGs really do not *translate* NL. They *express*
> assertions in predicate calculus about the role players that fill
> verb-specific association templates based on case frames.
>
> There is hope such assertions may prove useful. A persistent theory
> holds that they will let agents in an FOL engine *react to* English
> statements with enhanced cleverness. Time will tell.
>
> Even if that theory is true, in practice it cannot help until agent
> software can also - and first - *understand* those English statements,
> by removing ambiguities about what their speaker intended to say.
>
> At minimum, this means selecting which (dictionary) sense of every
> word the speaker had in mind, and which (contextual) topic best models
> the intended subject of every pronoun and definite noun phrase.
>
> Good news: NLP software is getting better good at resolving such
> ambiguities. With heuristics, public lexical data, and sneaky tricks,
> software encapsulating steps 1-5 can now guess about such things and
> in some circumstances generate low error rates.
>
> My MODELER design can do this. In fact, if you configure it properly,
> and interact with it about noticed problems (like misspellings), it will
> cull ambiguities from your paragraphs with *virtually no* errors, and
> dump an XTM chart which says in *your* ontology what they asserted:
>
> [2] http://www.lexikos.com/charts/
>
> MODELER uses tricks, one of which is WORDS scripts. They resemble CGs,
> and let you make similar assertions. They can also substitute for a
> parser's level 3 code by forcing *you*, not grammar rules, to map each
> paragraph's English systax into legal WORDS syntax.
>
> Regardless of how they may be built, WORDS scripts can expand like
> association templates into TM structures that resemble CGs, but use TM
> paradigms and *your* TM ontology to express the intended meaning of the
> original paragraph. This happens in subgraphs of MODELER's internal
> topic chart, which for each input it returns (by default) in XTM.
>
>
>> I consider Topic Maps to be essentially a subset of Conceptual Graphs
>> (with a few additional wrinkles). Some CGs can be expressed as TMs
>> and some cannot. The ones that can be so expressed are nearly
>> identical except for some syntax details.
>
>
> I believe TMs can hold graph structures fully equivalent to those of
> any CG, but TMs have no standard inferencing model. CGs do: some
> FOL engine that can infer things by using predicate calculus.
>
> I suspect that any part of CGs which a TM cannot express are related
> to their missing FOL engine. But to me, normal conversion direction
> would go from TMs toward logic processing - not the other way around -
> so these lacks should present no real problems in any case. The TM
> application software would simply have to take up the slack if a chart
> become fodder for somehting besides FOL.
>
>
>> The RDF folks are wrestling right now with how to make statements
>> about subgraphs. In a CG, you can draw a box around a collection of
>> conceptual relations and their topics. The box is an assertion
>> (anything placed on the page in a CG asserted by definition), and it
>> is called a "context box". We need something equivalent for topic
>> maps, and you will want it for the kind of things you seem to be
>> getting into.
>
>
> I can easily believe RDF folks want to annotate subgraphs, because
> to learn what was said in any English paragraph, you need only query
> the subgraphs in its chart of topics - what associations were stated
> for each topic present. Adding new statements about similar subgraphs
> would be a first natural step to "reacting to them" in the RDF world.
>
> Independent of details on *reacting*, the business impact of the
> software that charts topics should be non-trivial, as it will let
> people write new kinds of IR software that avoids lexical ambiguity:
>
> 1) chart any English paragraph about *your* domain in an XTM file
> 2) find the speaker's intended meaning in *your* TM-based ontology
> 3) query the merged version of such charts in something like TMQL
>
> And, if you really want it to, a chart can also serve as the input
> to a step 6 process, which reformats all its Topic Map subgraphs
> into proper CG notation, so predicate calucus engines can crunch it.
>
> So, Tom, okay - if you really do think CGs in volume are a good idea,
> here's a partial proposal for a 2-part project able to produce them:
>
> 1. On this public thread, we debate XTM models that would be ideal
> for a chart, and craft a few prototypes by hand in LTM. Iff we
> get a consensus, we will then have a CG-compatible base ontology
> in TM terms, which MODELER will adapt to use for step 5 output.
>
> 2. To guide such work in part 1, we can also imagine a file-to-file
> converter that takes that XTM chart and produces CGIF. We should
> limit it to use *only* the XTM input syntax, then notice and feed
> back requirements on what data that part 1 output must contain.
>
> Seriously, I have no concrete plans yet in MODELER for a CGIF output
> module, because I do not know CGs or FOL in depth; because I sense they
> are mostly a tool for teaching logic than a standard; and because it
> is not yet clear to me how many people actually care about CGIF.
>
> I hope this thread might yield interesting comments on such issues.
>
> Regards,
> Dan Corwin
>
>