[topicmapmail] A missing key for next generation applications using TMs
MARK DRAGAN
mithrndir@msn.com
Wed, 06 Nov 2002 12:00:42 +0000
Greetings,
I had been sitting on this for five years before I encountered TMs in the
Wash. DC XML 2000 conference. That spurred me to write a book on my
research, which I recently completed, and would now like to share my ideas
on an application that is a good match to TM technologies.
You can read my paper below, or you can find it, and related information,
including how to integrate humanistic attributes into business and technical
processes, components, and information systems that span cultures and
disciplines, at http://www.humancontexttechnologies.com
Mark Dragan
mithrndir@msn.com
A Missing Key For Next Generation Applications
This paper introduces a next generation information processing application
capable of shifting XML, Topic Map, Semantic Web, and Knowledge Management
technologies into overdrive, turning e-commerce upside-down, and helping to
combat cyber-terrorism. This application seems neutral to the technology
upon which it runs, so it may not matter whether it runs uses web services
or not. Web services may enhance its usefulness, but that remains to be
seen.
There have been various attempts to find common ground among XML, Topic Map,
RDF, Semantic Web, Knowledge Management, and Knowledge Organization
technologies. Though some research has focused on the use of dictionaries
and thesauri, a key application of that data has been overlooked.
In the spirit of openness, I’d like to share an application that, in its use
of such data, holds the potential for achieving a sought after convergence
of these technologies and the means by which they can begin to realize their
full potential.
Over the past seven to eight years, I’ve been exploring the usefulness of
what seems, on the surface, a ridiculous notion, that is, making a
concordance of the dictionary. Though thesauri and other references would
also likely play a part, this application is easier to envision by limiting
the initial discussion to the dictionary. Likewise, for convenience, I’ll
use the term Geodesic Database or Geodesic Dictionary (GD, for short), to
represent a neutral, non-aligned vision of the convergence of these
technologies. Notice the similarities in the images conjured by:
· A world-wide semantic network of interconnecting nodes spanning the globe,
· A world-wide 3D lattice of interconnecting nodes communicating to
Geo-stationary satellites used in Topic Map circles, and
· A Geodesic dome like the one in Epcot center in Disney World in Florida,
USA.
Still, feel free, if you’re familiar with Semantic Webs, to think Semantic
Web instead of Geodesic Database. If you’re familiar with Topic Maps, think
Topic Maps instead of GDs, and so forth.
Because the dictionary is fully self-referential, that is, all of its words
are defined by other words; a 3D representation of their interconnections
would resemble a complex geodesic dome (as it would resemble both of the
other models). To be more explicit, a database would result from a parser
traversed each word in the dictionary and its definitions.
As it encountered each word, it would create a node for the word, and a
two-way link to every word in its definition. Of course, various levels of
intelligence could be used here to define additional link structures based
on parts of speech, whether there are synonyms, word roots, etc., but for
now, let’s keep it simple. The words and links would become “weighted” in
the AI sense, and would naturally form nested relationships between
lower-level concepts and higher-level concepts.
Now, suppose (watch out, here comes the assumptions) that:
1. There was a level of agreement as to what words would be included in some
base Geodesic Dictionary (a.k.a. GD) (which would of course define how each
node was initially weighted), which could include translations.
2. Systems could process this information efficiently.
3. This GD and its initial weightings define a starting point, a “table
Rasa”; that is, a GD whose content and weights have not been affected by
interactions with other GDs, documents, databases, search engines, and so
forth. This provides a standardized starting point from which additional
“experience-based” weightings can be added and compared with other GDs.
4. There is software capable of comparing the weights in one GD to another
and identifying the deltas though time.
5. There is software capable of crawling through various forms of
information, extracting the words there, and using them to add weight to the
words in the “table Rasa” GD. (remember that more complex approaches would
include adding new words, and dealing with other forms of data, such as
video and audio, but let’s keep it simple, for now).
Okay, nothing too complex here. These assumptions are easy enough to
swallow. Now, What? What do you do with a table Rasa Geodesic Database of
dictionary items?
Perhaps, like all businesses, your company’s accounting department wants to
access and interact with the company data from its point of view, as do the
executives, the marketing department, the IT department, and so forth. You
might, again create a table Rasa GD for each department that records the
types of information, and functions, that members in each department access,
then customizes their interfaces to bring that information and those
functions to them. With experience (or with a bit of pre-set weightings
garnered from the experiences of similar departments in similar businesses –
see the possibilities for remarketing weighted templates), the information,
and functions that a particular department is likely to want, can be
pre-served for them. This has many implications for Web Services.
Perhaps you’d like to evolve a personal search agent that automatically
“learns” your shopping interests. You could activate software on your
computer that creates a new GD that begins adding weight to those words that
it encounters as you interact with web pages. This results in a personalized
GD that is weighted to your shopping interests. After developing this
profile, you send it out on the web to crawl around and weed through various
sites and search engines to determine if those sites have similarly weighted
content. After finding those sites, it brings them to your attention.
Such agents could turn the e-commerce paradigm upside-down and address many
security concerns. Suppose that your shopping agent GD doesn’t exist on your
computer, but resides on a separate server. Further suppose that it has your
shopping and interests preferences, and your contact’s lists. Since you’ve
blocked all ads and email to your computer and have a very secure channel to
your agent, marketers will target your agent rather than you. This empowers
your agent to automatically filter through the marketing information. It
also allows it to proactively shop for goods and information for you. This
paradigm will serve the marketers, because the GD will automatically, and
more efficiently, hook them up with targeted customers. It creates a win-win
experience for businesses and consumers. It also may make security more
achievable.
Take this concept into the Web Services arena. A scenario emerges where GD
weights could evolve to help load balancing among servers and applications.
Suppose a user’s agent begins to use the services, perhaps locally on the
same server (the agent software can itself be mobile). Suppose further that
the Web Services server determines, by the nature of the agents weighting,
what applications it is likely to use during its interactions. Such
foreknowledge would enable the Web Services to work more effectively.
In the software development industry, one could use such technology to
determine what code components are useful to what industries. It could also
determine what functions are obsolete, and which ones to reuse. It might
even be able to “learn” what parts of a function are more valuable than
others.
For software testing, GDs could define various test scenarios and user
profiles that would form useful test cases.
Hardware and chip manufacturers could finally introduce large-scale parallel
processing systems.
Business executives could create GDs capable of discovering and “learning”
about the major factors confronting their business, just by letting such a
system monitor its interactions with customers, including documents,
contracts, etc., thus building a picture of its major activities, in either
a short-or long-term capacity. It could report on the percentages of time
used for various activities in its daily business, either by its systems or
by its employees.
User interfaces could be freed from the file-drawer paradigm. More natural
interfaces could be developed, that when a client calls, the system serves
up links to that client’s contracts, related correspondences, personal
information, accounts payable, accounts receivable, and so forth.
National security could help to monitor massive amounts of message traffic
(perhaps even encoded) to determine variations from the normal types and
amounts of messages.
It could make cheap parallel processing possible.
In the financial industry, a GD could help determine variations in the
marketplace and help illuminate trends.
On the commercial side a GD could help identify buying patterns, both over
time, and across market segments.
Supply-chain management could benefit from a series of GDs that were
weighted to trigger when ordering more supplies was necessary.
GDs could help process management if their weights were “tuned” to the
various stages in the process, and serve the appropriate set of functions,
documents, etc. as the product passed through each process stage.
In the research arenas, GD could make it easier for researchers to find
existing research by tuning the GDs weights to focus on the target
information.
GDs could be used in satellite imaging and communication applications such
as detecting environmental changes, and spotting military targets. Only
needing to send changes to images or other data in the stream may reduce
communication bandwidth.
A GD approach could be used in pattern and image recognition systems by
being able to detect variations in an automated, continuous scan mode. This
has obvious military applications.
GDs may make it easier to combat terrorism and cyber-terrorism by being able
to detect anomalies to the normal types of messages sent without needing to
investigate the content of every message.
As you can see, the applications for this approach are endless.
This model is both flexible and scaleable. It is flexible in the sense that
comparisons can be made on any part of the GD, and it can be translated
between cultures and across disciplines. It is also flexible in the sense
that its approach is not limited to words. GDs could be composed of
pictorial nodes and their definitions, or sound frequencies and their
definitions, and so forth. It is scaleable in the sense that the starting
point, table Rasa GD, is not set in stone. It can evolve to include any
number of terms, the difference measurable. It is also scaleable in the
sense that the quality of the linkages between nodes can be improved by
adding additional intelligence into the generation of the GD, making it
increasingly rich in complexity. This also enables you to directly measure
the differences among versions.
You can find this white paper, and related information including how to
integrate humanistic attributes into business and technical processes,
components, and information systems that span cultures and disciplines, at
http://www.humancontexttechnologies.com
Be well,
Mark Dragan
mithrndir@msn.com
_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*
http://join.msn.com/?page=features/junkmail