[topicmapmail] [Fwd: examples of how it works]
Dan Corwin
dan@lexikos.com
Sat, 15 Nov 2003 12:03:02 -0500
Asle Pedersen wrote:
> Hi,
>
> You mention paid betatesters for the Modeler toolkit on your website.
> do not think that I am up for paid betatesting, however I would like
> to learn more about your product.
Hi Asle. Thanks. I'd also like to explain it better.
> Can you supply me with some examples, screenshots or other documentation
> of the features. Myself, I have been working with Topic Maps for 3 years now
> since I first used it in my master-thesis and do have basic knowledge of NLP.
>
> Best regards,
> Asle
Modeler is basically middleware which maps an input file into an
output file. So screen shots are limited to its debugging shell.
Its final version is months off, but these links can explain it:
[1] http://www.lexikos.com/words/beta/index_files/image003.jpg
Page [1] shows the main processes being run. The shell lets you
trace any of them selectively, and control the trace output format.
[2] http://www.lexikos.com/nlptools.jsp
This better explains the processing modules and holds links [3]
and [4], leading to examples of their related tracings.
[3] http://www.lexikos.com/words/loops/scanner.txt
Page [3] traces the results of looking up terms in our lexicon, and
heuristically filtering the results to reduce lexical ambiguity.
Its unfiltered equivalent was 3-4 four times longer, with additional
"lexemes" for each term - mostly in other parts of speech.
[4] http://www.lexikos.com/N.txt
This shows one "raw" lexicon file for syntactic data on root terms.
Modeling each term by hand takes non-trivial time, but it is the only
way to encode enough factual knowledge on English to analyze real
syntax. Our ontology recognizes 54 noun/adj/adv features, 59 for
verbs, 22 for pronouns, and 34 more used in miscellaneous ways.
Many deal with expected complements; all are binary; the meaning
of most can be easily guessed from their names.
[5] http://www.lexikos.com/words/loops/parser.txt
This shows two parsings of one sentence. Our parser is a rule-
based expert system, with about 250 grammar rules managed within a
modified "Marcus" engine. A full debugging trace of all the rule-
firings to analyze a sentence takes typically 1 page per input word.
Page [5] also shows (half-way down) how operator interaction works.
The shell asks some question easily answered with one keystroke.
This answer rejects the first parsing as not "semantically" valid,
even though it was syntactically okay.
[6] http://www.lexikos.com/words/context.htm
Page [6] explains our new semantic module. As [1] shows, its input
is output structures like those on [5]. For each S (clause) node,
it will create an association in LTM of the topic for each NP node
that relates. As you know from NLP theory, these "cyclic nodes"
nest to form a parse tree, and have referents in the real/imaginary
world. The topics we build will have those referents as subjects.
- - - To put the above pieces back together ....
Modeler 2.0 will automate the entire process, using a big base of
semantic tags (see 'MEANS' field in [3]). They denote association
templates, to be selected and filled in by using another kind of
rule for semantics (embedded WORDS scripts).
The net result, driven by [5]'s parse tree structure and a lexicon
of templates (senses), will be a semantic model of the NPs (topics)
linked by various kinds of S associations (what was said of them).
The unit of work will be an entire paragraph of input text, so the
output trace will look like [3], but with form-by-form LTM (from
templates) replacing its term-by-term list of syntactic models.
Modeler 1.0 lets us debug this process on simplified English forms
identified by markup, not our parser. The generated LTM for them
can be scrutinized for bugs, then fed back into manual editing to
help build up scripted association templates in sufficent bulk to
cover all common word senses (using Roget's as the goal set).
But one could use it for other goals. [6] will blindly turn
forms into LTM using *any* set of association templates it gets,
including those you might create for another project. Hence my
invitation to beta test it using your own custom lexicons.
Dan