XML'99: the dreams and the reality   Table of contents   Indexes   The Marriage of XML and Databases

 

The Interchange of Mathematics in XML: MathML, OpenMath and their Application

 Stephen   Buswell
  Director, Research and Development
  Stilo Technology Ltd  Empire House, Mount Stuart Square,
Cardiff   CF1 6DN  United Kingdom
Phone: +44 (0) 1222 483 530
Fax: +44 (0) 1222 488 498
Email: sb@stilo.com Web: http://www.stilo.com
 
Biographical notice:
 
Stephen Buswell is a mathematician who has been active in the SGML and XML software arena since the early 1990s. He is a member of the W3C Math Working Group, a principal writer of the MathML recommendation, a member of the OpenMath project and the ISO 12083 working group. His other interests include modern languages and linguistics. Prior to Stilo, Stephen's previous experience included the design of a language for the control of Life Science experiment equipment on board Spacelab flight D2, and being Technical Advisor to the Polish Ministry of Finance. Stephen was a founder member of Stilo and is now Director of Research and Development.
 
ABSTRACT:
 
This paper discusses two XML-based encodings for mathematics: MathML, the W3C recommendation for mathematics on the Web, and the ESPRIT OpenMath project developing support for the exchange of 'semantically rich' mathematical objects. It looks at the scope of each approach, the relationship between them, and at how and where these technologies might be used for the interchange of mathematical expressions.
XML Maths
 

Introduction

 
 

Introduction - Why Write Maths in XML ?

 
Recently XML encodings of mathematical expressions have emerged. They have been developed in response to a number of different but related stimuli. Existing image-based mechanisms are costly in bandwidth and clumsy in terms of their display and alignment properties. As a very simple example, consider an HTML page with the expression 5 + 3 = embedded as an image file thus: . Experienced HTML users will know the problems of aligning the maths in the image with the surrounding text, particularly under changes of font size. In addition, a GIF file, even for such a simple expression, can easily exceed 1KB. Furthermore, the mathematical meaning is effectively unrecoverable from the image.
 
The meaning of a mathematical expression may depend on the wider context of the document in which it is embedded. For example, the semantics of the "5 + 3" example above is completely altered if, on a previous page, we have the statement "All arithmetic here is modulo 7" . In a mathematical paper, the expression 'c-squared' could be a reference to the square of the speed of light, or the space of all twice-differentiable continuous functions. The expression sin -1 x could be the reciprocal of sin x, or the inverse function of sin x, dependent on context. Worse still, what is sin -2 x ?
 
More generally, we can say that the semantics of mathematical text is more complex than simply the semantics of the embedded expressions, and and we need a means of encoding such information, including the 'non-local' semantics, together with the expressions themselves.
 
In a wider and less technical context, the information industry is increasingly driven to maximise the value of its information by means of search, retrieval and re-use capabilities. XML encodings can support all these facilities. The same XML encoding of mathematics could be used for display on the Web, in an interactive textbook on CD, and as input to a print publishing system.
 
With these issues in mind, we will now consider two XML encodings for mathematical data: Mathematical Markup Language (MathML) and OpenMath.
 

Mathematical Markup Language (MathML)

 

Introduction to MathML

 
The problem of encoding mathematics for computer processing or electronic communication is much older than the Web. The common practice among scientists before the Web was to write papers in some encoded form based on the ASCII character set, and e-mail them to each other. The World Wide Web Consortium (W3C) has long recognised that support for scientific communication is an urgent requirement and, after a number of earlier initiatives, the W3C Math working group was formally constituted in March 1997. The working group brings together representatives from diverse organisations including research institutes, learned societies, and various branches of industry including computer algebra and electronic publishing.
 
MathML is not standalone but has inherited a great deal from earlier work. In particular, TeX is a de facto standard in the mathematical research community, with a large authoring community. This brings with a need for legacy conversion tools. The ISO 12083 DTD maths describes the visual presentation of mathematical notation. The Semantic Maths DTD fragment developed under the aegis of the 12083 working group has provided a base for the content elements. MathML also inherits from computer algebra systems such as Mathematica and Maple . MathML also has connections with the OpenMath community . This is discussed in more detail below.
 
In April 1998, the working group issued its recommendation for the encoding of mathematical expressions within Web pages, Mathematical Markup Language (MathML) . The goal of MathML is to enable mathematics to be served, received and processed on the Web. MathML is an XML application supporting the encoding of mathematics in terms of both presentation and the underlying mathematical meaning or content. MathML has an XML DTD, a grammar, a Usage Guide and default semantics.
 
Maths has a two-dimensional symbolic notation and is simultaneously both notation and content. The Web is interconnected, interactive and dynamic. MathML aims at bringing this web interactivity to mathematical expressions for automatic processing, searching, indexing, and reuse, and working and learning at a distance.
 

MathML Features

  •  
    MathML has:
     XML Syntax - XML is the core syntax of almost all emerging Web applications. This has significant general advantages in terms of tool commonality and data format compatibility. Another consequence is verbosity of the MathML instances, which brings in turn a requirement for tools for the easy creation and manipulation of MathML fragments.
  •  Presentation Elements - these support the encoding of mathematics from a (mainly visual) rendering standpoint.
  •  Content Elements - these support the encoding of maths from a semantic standpoint. MathML aims to cover maths for 'K-12', that is up to high school or the first year of university.
  •  Interface Elements - these allow the embedding of the MathML fragment in an HTML page, and in the future will support the required interface negotiation (eg. for baseline alignment with the surrounding text).
  •  
    A very important point to note is that the uses of Presentation and Content element subsets are not mutually exclusive: a Content construct can have embedded presentation information (overriding the default presentation specified for that element). Similarly presentation constructs can have embedded content structures.
     

    MathML Presentation Elements

     
    The MathML presentation model offers a heirarchical, "boxes within boxes" mechanism allowing the user to control the size, positioning and spacing of all the components of the expression to be displayed. It aims to achieve the same expressive power as TeX.
    mrow
    horizontal group of subexpressions
    mfrac
    fraction formed of subexpressions
    msqrt, mroot
    radical formed of subexpressions
    mstyle
    style settings
    mphantom
    used for size calculations
    merror
    encloses a syntax error
    msub, msup, msubsup, munder, mover, munderover, mmultiscripts
    attach subscripts, superscripts, underscripts and overscripts to a base
    mtable, mtr, mtd
    table or matrix, row and element.
    maction
    creates 'live' text in the expression
    mn; mi; ms
    number; identifier; string literal
    mf
    fences (eg. parentheses)
    mo
    operator
    mspace
    adjustable space
    mtext
    arbitrary text
     


    Definitions
    Table of MathML Presentation Elements
     

    MathML Content elements

     
    Clearly, providing a semantic model for the whole of mathematics would be a huge if not unachievable task. Not only this, but the day it was finished, it would be rendered out of date by the development of some new branch of mathematics. The MathML WG therefore set itself the task of providing basic support for "K-12" mathematics, that is for mathematics from kindergarten up to the end of high school or the first year of university education. The areas covered in the current MathML recommendation (1.0) are listed below. This list is under revision in the ongoing MathML review process.
     
    • MathML Content Areas
    •  Arithmetic, Algebra and Logic
    •  Relations
    •  Calculus
    •  Set Theory
    •  Sequences and Series
    •  Trigonometry
    •  Statistics
    •  Linear Algebra
    •  Semantic Mapping
     
    MathML Content elements have well-defined default semantics. That is to say that, for example, the symbol "sin" is not only the name of an element in the DTD, but also has the mathematical semantics which a user (or a mathematical application) would expect. Two communicating applications can therefore rely on a consistent interpretation of the symbol. These default semantics can be overriden by the user if desired.
     
    MathML has an extension mechanism which allows a user to create a new symbol, for example to represent a function not in the predefined K-12 element set. This mechanism can also provide a reference to an external definition of the semantics of such a user-defined symbol (effectively extending the semantic scope of MathML). Such an external definition could take many forms: a reference to a standard text or a function in a well-known computer algebra system, for example. One possible method for the formal, machine-readable, definition of these semantics is supplied by OpenMath.
    apply
    explicit application of a function to its argument.
    inverse
    generic inverse operator
    fn
    User defined function
    plus, minus, times, min, max
    mathematical operators
    eq, neq, gt, lt
    relations
    ln, log, int, diff
    logarithms, integral and differentials.
    set, union, intersect, in, notin
    set manipulation
    sum, product, limit
    sum, product, limit of sequences and series.
    sin, cos, tan etc.
    Trigonometric functions
    mean, median, mode
    statistics
    matrix, determinant
    linear algebra
    semantics, annotation
    Semantic Mapping elements
      Definitions
    Content Element Examples
     

    Tools and Products Supporting MathML

     
    One of the most interesting and encouraging aspects of the definition process of MathML has been the parallel emergence of tools and products supporting the emerging recommendation. Indeed, the experience of the implementators of these tools has provided significant input to the development and refinement of MathML itself. Some MathML tools, available or under development, are listed here:
    Viewers
    WebEQ (Geometry Center), Techexplorer (IBM), Amaya (W3C)
    Editors
    EZMath (HP) STARS/JOME (Stilo/OVE/OpenMath)
    Conversion tools
    AMS Latex - MathML (AMS/SIAM/Geometry Center ),TeX4ht (Rahtz/Gurari (Elsevier/Ohio U.))
    Generators
    Mathematica (WRI), Maple (WMI), MathType (Design Science)
      Definitions
    MathML Tools and Products
     

    MathML and Other Standards and Recommendations

     
    As a consequence of being as one of the first application specific recommendations issued by W3C, MathML has become one motivating example for many of the W3C workgroups,for example schemas, flow-objects and formats, query languages. The MathML working group has therefore been able to provide input to the various groups occupied with these areas. The working group is also in discussion with the two major browser manufacturers concerning the implementation of native MathML support within the browsers.
     

    The ESPRIT OpenMath Project

     
     

    Overview of the OpenMath Project

     
    Closely related to MathML is the european ESPRIT project OpenMath . OpenMath came into existence as an informal group in the academic and industrial Computer Algebra community interested in inter-application communications of mathematical objects. In 1997 ESPRIT, the Information Technology programme of the European Community, commenced support for a project involving nine European members of this group to define the standard in detail and develop supporting technology. The wider grouping continues in the global OpenMath Society which co-ordinates the efforts of various OpenMath-related projects and activities around the world. In particular, there is close co-operation with the North American OpenMath Initiative (NAOMI).
     
    Openmath aims to develop a standard for the interchange of semantically-rich mathematical objects between communicating applications.
     
     
     
    Fields of investigation include mathematical databases, specialist processing systems, interactive and distance learning and distributed mathematical software. The OpenMath project is also developing tools and prototype industrial applications in areas of particular interest. Public seminars are held to disseminate information and generate user feedback.
     

    The Openmath Technical Approach

     
    OpenMath takes a slightly different approach from MathML to the encoding of a mathematical expression. The core OpenMath language is very small, but closely associated (although outside the formal syntax) is a vocabulary of symbols which are defined in a (freely extensible) set of Content Dictionaries. OpenMath 'symbols' represent functions, operators, variables and so on.
     
    An OpenMath object itself has an abstract syntax which is a recursively extensible tree of symbols and objects. An object can have a representation in one or more formats: XML or binary for example. This differs from the view in MathML where an object is defined by its XML representation. Nodes of the tree can be attributed.The attribute itself is an OM object and encodes contextual information not derivable from the data.
     
    A Content Dictionary (often referred to as a CD) specifies the mathematical semantics of an OpenMath symbol: type signatures; formal mathematical properties (eg. associativity) and so on. OpenMath objects can have more rigorous semantics than MathML constructs. For example, there is support for formal type systems and type-inference mechanisms.
     
    A given CD contains symbols from a particular area of mathematics such as Linear Algebra: two OpenMath-aware applications can communicate if they process the same CD. This allows specialist systems to be OpenMath-compliant by implementing a minimum interface, possibly even only for one CD.
     
    The project is developing a core set of CDs: others can be defined at will by OpenMath users. Developing a new CD does not require any change to the core OpenMath language, or affect the compatibility of previously developed applications. It is this clean separation between the core language and the semantics of symbols in newly-created CDs which gives OpenMath its extensibility.
     
    The core set of CDs numbers (at the time of writing) some 28. The reason that there so many small CDs in the core is to allow easy selection of tightly-defined application-specific subsets, called "CD Groups". For example, a subset can be chosen so that the semantic scope covered by the symbols in these CDs is equivalent to the scope of Content MathML. This equivalence allows a precise definition of OpenMath-MathML compatibility and gives a firm basis for the development of conversion tools.
     
    Alongside the 20 CDs in the "MathML equivalence" CD Group, there are another 8 providing some basic symbols not in MathML, and a basis for a formal type system. In addition to the core, the project is developing further some areas of special interest, including Algebra, Polynomials, Group Theory and Theorem Proving Systems.
     

    The Synergy between MathML and OpenMath

     
    Although OpenMath started out as a completely independent activity from MathML, by the time the ESPRIT project was running, it was quickly realised that MathML had become an important part of the mathematical universe. The project has therefore taken a direction to avoid duplication of the MathML work, or any possibility of 'competing' standards, and to work on the synergy between the two and the development of interchange technology.
     
    MathML and OpenMath are in many respects complementary. OpenMath can provide an extension and formalisation of the mathematical semantics in MathML. In turn MathML offers a visualisation and publication mechanism for OpenMath. In addition, many tools are likely to provide interoperability. (This is particularly related to the fact that both notations have XML encodings). The relationship between the various parts of MathML and OpenMath can be visualised as below:
     
     
     
    We can expect to see OpenMath in use in areas where the formal mathematical properties of an object are paramount - in specialist engineering applications, research and so on. MathML will become widespread for example in education and publishing where the visual aspects of a mathematical expression may be as important as (or more important than) the underlying semantic content. The balance between MathML presentation and content will depend on this relative importance.
     
    As there is a clear common interest between the scopes of OpenMath and MathML, it is also inevitable that many expressions will start life in one form and at some point be re-represented in the other. One interesting example of this occurs in a prototype being developed by the ESPRIT project. Here MathML mechanisms are used to provide a browser interface with a visual rendering of an expression - this is then transformed into OpenMath for processing by a commercial numerical library before re-transformation of the processing results for display. The intention is to develop a model for an intelligent interface to the user documentation of the numerical library.
     

    Conclusions

     
    Both the MathML and OpenMath encodings preserve the internal substructure of an expression, both from the semantic and, in presentation MathML, layout viewpoints. The information encoded is therefore recoverable and reusable, in whole or in parts. This offers the prospect of interchange, both vertically and horizontally. The same encoding used for a piece of mathematics in a print-publishing system can be re-used in a web page or electronic journal. The mathematical semantics of an expression can be passed between web pages, interactive textbook CDs, and specialist mathematics servers for processing, display and user interaction. This approach should find many useful applications in education and research, publishing, engineering and allied fields.
     
    Acknowledgments
     
    The author explicitly acknowledges the contributions of all members of the W3C Math Working Group to the MathML recommendation. Parts of this paper have derived significantly from this work. The work of the OpenMath community, and in particular the valuable contributions of the team members of the ESPRIT OpenMath project are also acknowledged with thanks.
     
    Bibliography
    MathML
    Mathematical Markup Language, http://www.w3.org/TR/REC-MathML
    Knuth 1986
    The TeXBook, American Mathematical Society, Providence, RI and Addison-Wesley Publ. Co., Reading, MA, 1986, ix + 483 pp. ISBN: 0-201-13448-9
    ISO 12083
    ISO 12083:1993//DTD Article//EN Document Type Definition for an Article. See also Poppelier, N.A.F.M., E. van Herwijnen, and C.A. Rowley; "Standard DTD's and Scientific Publishing" , EPSIG News 5 (1992) #3, September 1992, 10-19.
    Bus 1996
    Buswell, S , Healey S, Pike E.R., and Pike M; "SGML and the Semantic Representation of Mathematics", UIUC Digital Library Initiative SGML Mathematics Workshop, May 1996 and SGML Europe 96, Munich 1996.
    OpenMath
    The ESPRIT OpenMath project - www.nag.co.uk/projects/Openmath. See also the OpenMath Society (www.openmath.org) and the North American Openmath Initiative, NAOMI (www.naomi.math.ca).
    WRI
    Wolfram Research Inc. www.wri.com
    Maple
    Waterloo Maple Inc. www.maplesoft.com

    XML'99: the dreams and the reality   Table of contents   Indexes   The Marriage of XML and Databases