Bridging Information and Knowledge

XML 2000, Washington DC,

6 December 2000

Michel Biezunski, mb@infoloom.com

Definitions

Information [Computers]

  • How data are represented and manipulated. Format, Structure, Platform.
  • Knowledge [Humans]

  • What it means, how it can be understood. Domain, Context, Connections.
  • Building Knowledge

    We need to build our knowledge from:

  • Structured information, carefully organized and prepared to be accessed in a (limited) number of ways.
  • A mass of unstructured information, heterogenous, not prepared to be accessed (other than by full text search).
  • The Bridges

    How Bridges Happen

  • By providing standards
  • Standards are acting as attractors to drive everybody into a common interchange platform.
  • Standards need to be used.
  • 1. Information Management

    Structured Documents and Databases

  • Data is organized into fields/element types.
  • Schemas are applied. They need to be modeled prior to be used.
  • Information is easier to retrieve.
  • Inline Markup for Elements and Links

    Markup, when embedded into the data, is considered inline.

  • Difficulties when merging with other markup schemes.
  • Same situation for links ("simple links"). The anchor which is the origin of the link is the link itself.

  • Easy to create, difficult to manage on a large scale.
  • Structure needs to be known at browsing time

  • Querying exploits existing structure
  • Sometimes special training necessary.
  • OK for closed environments
  • Not appropriate for the Web.
  • You can't ask users to learn every underlying structure when they go from one page to the other.

    Structure: once for all

  • Documenting structure helps.
  • But what if specific structure is irrelevant to the task?
  • It may also be good when it's created and then become obsolete.
  • Or too rigid (see tree-based thesauri)
  • How does it merge with different structure ?
  • Information about information

  • Metadata usually understood as adding information about information.
  • Example: library catalog
  • In a book, the title page is inside.
  • There are no fundamental difference between data and metadata.
  • However, there is information which is provided originally and information which is subsequently added to it, or by other parties than the authors.
  • Getting meaning out of structure ?

  • Markup facilitates interchange of information
  • Markup doesn't necessarily have a semantic. <i>, <p> as opposed to <book>, <house>, <u8474yr>
  • Markup is not always meaningful (especially when not properly documented).
  • 2. Knowledge Engineering

    Getting meaning out of anything

  • Applying computer-driven algorithms help make sense from something that originally was unstructured, …
  • But not always a lot of sense.
  • However sometimes enough sense to be truly useful.
  • Knowledge technologies are proprietary

  • Knowledge-base products usually implement proprietary solutions.
  • Customers' investment in these technology is therefore limited.
  • Systems can usually not be upgraded or transformed to another.
  • This is an important limitation in interchanging knowledge.
  • Interchanging knowledge is becoming a necessity.
  • Structure is standardized, but often too expensive

  • Retrofitting masses of information is practically unreachable. What about the Web, for example.
  • The web is properly marked up, but not structured.
  • There is no point in structuring it, it's too big and too spread out.
  • Putting things together

    Taking benefit of both:

  • Standardized, interchangeable structured information
  • Rich repositories of unstructured information
  • Solution:

  • Superimposing a semantic layer above existing information resources, regardless whether information is already structured or not.
  • Sharing knowledge among user communities

    Knowledge does NOT mean:

  • Formats
  • Structures
  • Platforms
  • It means:

  • Content
  • Subjects
  • Understanding
  • Common understanding is what we are looking for here.

    RDF (Resource Description Framework)

    A generic, neutral, and powerful approach to :

  • Assign properties on information objects.
  • Connect information nodes together in a vast network
  • Create computer-driven processes to exploit this network.
  • Topic Maps

    A generic, neutral and powerful approach to:

  • Group relevant information about subjects of interest.
  • Connect these subjects together.
  • Describe the validity in which subjects are connected.
  • Why two overlapping standards?

  • In a way that's not good.
  • In a way it's good.
  • A parallel in science

  • In chemistry, atoms are elementary building blocks used to create compounds and elements. Structure of matter is based on atoms.
  • In physics, atoms are complex objects made of neutrons, protons, electrons, and plenty of other stuff.
  • Do we need both ?
  • Yes.
  • Why two approaches?

    We need to describe:

  • Knowledge in terms of what users need to model in order to improve navigation.
  • Connectivity in terms of how computers understand how to get from a node to another node in a graph.
  • Simple proposal:

  • Topic Maps used for Chemistry, RDF used for Physics. Discussion in progress between authors of RDF and of Topic Maps.
  • And the bridge ?

  • Topic Maps and RDF are both XML-based standards that come from the Information Technologies side.
  • Convergence is being pursued.
  • Since they are used to actually represent knowledge, the next step is to build a series of schemas, for queries, inference rules, template construction rules, etc., all things that provide a standard answer to knowledge engineering issues.
  • Improving Global Knowledge Interchange

  • Key issue is: How to merge our information ?
  • How can we improve the likelihood that we get a common understanding about the knowledge we want to express?
  • Answer is: by using meaningful subjects, described in a reliable way, to merge information sources.