[topicmapmail] A missing key for next generation applications using TMs

MARK DRAGAN mithrndir@msn.com
Wed, 06 Nov 2002 12:00:42 +0000


Greetings,

I had been sitting on this for five years before I encountered TMs in the 
Wash. DC XML 2000 conference. That spurred me to write a book on my 
research, which I recently completed, and would now like to share my ideas 
on an application that is a good match to TM technologies.

You can read my paper below, or you can find it, and related information, 
including how to integrate humanistic attributes into business and technical 
processes, components, and information systems that span cultures and 
disciplines, at http://www.humancontexttechnologies.com

Mark Dragan
mithrndir@msn.com

A Missing Key For Next Generation Applications

This paper introduces a next generation information processing application 
capable of shifting XML, Topic Map, Semantic Web, and Knowledge Management 
technologies into overdrive, turning e-commerce upside-down, and helping to 
combat cyber-terrorism. This application seems neutral to the technology 
upon which it runs, so it may not matter whether it runs uses web services 
or not. Web services may enhance its usefulness, but that remains to be 
seen.

There have been various attempts to find common ground among XML, Topic Map, 
RDF, Semantic Web, Knowledge Management, and Knowledge Organization 
technologies. Though some research has focused on the use of dictionaries 
and thesauri, a key application of that data has been overlooked.
In the spirit of openness, I’d like to share an application that, in its use 
of such data, holds the potential for achieving a sought after convergence 
of these technologies and the means by which they can begin to realize their 
full potential.

Over the past seven to eight years, I’ve been exploring the usefulness of 
what seems, on the surface, a ridiculous notion, that is, making a 
concordance of the dictionary. Though thesauri and other references would 
also likely play a part, this application is easier to envision by limiting 
the initial discussion to the dictionary. Likewise, for convenience, I’ll 
use the term Geodesic Database or Geodesic Dictionary (GD, for short), to 
represent a neutral, non-aligned vision of the convergence of these 
technologies. Notice the similarities in the images conjured by:

·	A world-wide semantic network of interconnecting nodes spanning the globe,
·	A world-wide 3D lattice of interconnecting nodes communicating to 
Geo-stationary satellites used in Topic Map circles, and
·	A Geodesic dome like the one in Epcot center in Disney World in Florida, 
USA.

Still, feel free, if you’re familiar with Semantic Webs, to think Semantic 
Web instead of Geodesic Database. If you’re familiar with Topic Maps, think 
Topic Maps instead of GDs, and so forth.

Because the dictionary is fully self-referential, that is, all of its words 
are defined by other words; a 3D representation of their interconnections 
would resemble a complex geodesic dome (as it would resemble both of the 
other models). To be more explicit, a database would result from a parser 
traversed each word in the dictionary and its definitions.

As it encountered each word, it would create a node for the word, and a 
two-way link to every word in its definition. Of course, various levels of 
intelligence could be used here to define additional link structures based 
on parts of speech, whether there are synonyms, word roots, etc., but for 
now, let’s keep it simple. The words and links would become “weighted” in 
the AI sense, and would naturally form nested relationships between 
lower-level concepts and higher-level concepts.

Now, suppose (watch out, here comes the assumptions) that:

1.	There was a level of agreement as to what words would be included in some 
base Geodesic Dictionary (a.k.a. GD) (which would of course define how each 
node was initially weighted), which could include translations.
2.	Systems could process this information efficiently.
3.	This GD and its initial weightings define a starting point, a “table 
Rasa”; that is, a GD whose content and weights have not been affected by 
interactions with other GDs, documents, databases, search engines, and so 
forth. This provides a standardized starting point from which additional 
“experience-based” weightings can be added and compared with other GDs.
4.	There is software capable of comparing the weights in one GD to another 
and identifying the deltas though time.
5.	There is software capable of crawling through various forms of 
information, extracting the words there, and using them to add weight to the 
words in the “table Rasa” GD. (remember that more complex approaches would 
include adding new words, and dealing with other forms of data, such as 
video and audio, but let’s keep it simple, for now).

Okay, nothing too complex here. These assumptions are easy enough to 
swallow. Now, What? What do you do with a table Rasa Geodesic Database of 
dictionary items?

Perhaps, like all businesses, your company’s accounting department wants to 
access and interact with the company data from its point of view, as do the 
executives, the marketing department, the IT department, and so forth. You 
might, again create a table Rasa GD for each department that records the 
types of information, and functions, that members in each department access, 
then customizes their interfaces to bring that information and those 
functions to them. With experience (or with a bit of pre-set weightings 
garnered from the experiences of similar departments in similar businesses – 
see the possibilities for remarketing weighted templates), the information, 
and functions that a particular department is likely to want, can be 
pre-served for them. This has many implications for Web Services.

Perhaps you’d like to evolve a personal search agent that automatically 
“learns” your shopping interests. You could activate software on your 
computer that creates a new GD that begins adding weight to those words that 
it encounters as you interact with web pages. This results in a personalized 
GD that is weighted to your shopping interests. After developing this 
profile, you send it out on the web to crawl around and weed through various 
sites and search engines to determine if those sites have similarly weighted 
content. After finding those sites, it brings them to your attention.

Such agents could turn the e-commerce paradigm upside-down and address many 
security concerns. Suppose that your shopping agent GD doesn’t exist on your 
computer, but resides on a separate server. Further suppose that it has your 
shopping and interests preferences, and your contact’s lists. Since you’ve 
blocked all ads and email to your computer and have a very secure channel to 
your agent, marketers will target your agent rather than you. This empowers 
your agent to automatically filter through the marketing information. It 
also allows it to proactively shop for goods and information for you. This 
paradigm will serve the marketers, because the GD will automatically, and 
more efficiently, hook them up with targeted customers. It creates a win-win 
experience for businesses and consumers. It also may make security more 
achievable.

Take this concept into the Web Services arena. A scenario emerges where GD 
weights could evolve to help load balancing among servers and applications. 
Suppose a user’s agent begins to use the services, perhaps locally on the 
same server (the agent software can itself be mobile). Suppose further that 
the Web Services server determines, by the nature of the agents weighting, 
what applications it is likely to use during its interactions. Such 
foreknowledge would enable the Web Services to work more effectively.

In the software development industry, one could use such technology to 
determine what code components are useful to what industries. It could also 
determine what functions are obsolete, and which ones to reuse. It might 
even be able to “learn” what parts of a function are more valuable than 
others.

For software testing, GDs could define various test scenarios and user 
profiles that would form useful test cases.

Hardware and chip manufacturers could finally introduce large-scale parallel 
processing systems.

Business executives could create GDs capable of discovering and “learning” 
about the major factors confronting their business, just by letting such a 
system monitor its interactions with customers, including documents, 
contracts, etc., thus building a picture of its major activities, in either 
a short-or long-term capacity. It could report on the percentages of time 
used for various activities in its daily business, either by its systems or 
by its employees.

User interfaces could be freed from the file-drawer paradigm. More natural 
interfaces could be developed, that when a client calls, the system serves 
up links to that client’s contracts, related correspondences, personal 
information, accounts payable, accounts receivable, and so forth.

National security could help to monitor massive amounts of message traffic 
(perhaps even encoded) to determine variations from the normal types and 
amounts of messages.

It could make cheap parallel processing possible.

In the financial industry, a GD could help determine variations in the 
marketplace and help illuminate trends.

On the commercial side a GD could help identify buying patterns, both over 
time, and across market segments.

Supply-chain management could benefit from a series of GDs that were 
weighted to trigger when ordering more supplies was necessary.

GDs could help process management if their weights were “tuned” to the 
various stages in the process, and serve the appropriate set of functions, 
documents, etc. as the product passed through each process stage.

In the research arenas, GD could make it easier for researchers to find 
existing research by tuning the GDs weights to focus on the target 
information.

GDs could be used in satellite imaging and communication applications such 
as detecting environmental changes, and spotting military targets. Only 
needing to send changes to images or other data in the stream may reduce 
communication bandwidth.

A GD approach could be used in pattern and image recognition systems by 
being able to detect variations in an automated, continuous scan mode. This 
has obvious military applications.

GDs may make it easier to combat terrorism and cyber-terrorism by being able 
to detect anomalies to the normal types of messages sent without needing to 
investigate the content of every message.

As you can see, the applications for this approach are endless.

This model is both flexible and scaleable. It is flexible in the sense that 
comparisons can be made on any part of the GD, and it can be translated 
between cultures and across disciplines. It is also flexible in the sense 
that its approach is not limited to words. GDs could be composed of 
pictorial nodes and their definitions, or sound frequencies and their 
definitions, and so forth. It is scaleable in the sense that the starting 
point, table Rasa GD, is not set in stone. It can evolve to include any 
number of terms, the difference measurable. It is also scaleable in the 
sense that the quality of the linkages between nodes can be improved by 
adding additional intelligence into the generation of the GD, making it 
increasingly rich in complexity. This also enables you to directly measure 
the differences among versions.

You can find this white paper, and related information including how to 
integrate humanistic attributes into business and technical processes, 
components, and information systems that span cultures and disciplines, at 
http://www.humancontexttechnologies.com

Be well,

Mark Dragan
mithrndir@msn.com




_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail