Implementing a Component Broker using XMI   Table of contents   Indexes   Content Aware Intelligent Web Graphics

Advanced Information Systems
 Chahuneau, François 
 France 
 Paris 
 
François Chahuneau
 General Manager
Advanced Information Systems
  15-17 rue Rémy Dumoncel Paris  France (75014)
Email: fcha@ais.berger-levrault.fr
 Biography
 François Chahuneau is General Manager of AIS S.A., a French consulting and systems integration company with more than 10 years of experience in SGML/XML based applications, in a variety of business fields ranging from Publishing to Industry and Finance.
Angleraud, Christophe
 France 
Spot Image
 Toulouse 
 
Christophe Angleraud
 Spacemap System Manager
Spot Image
  5 rue des Satellites BP 4359 Cedex 4 Toulouse  France (31030)
Email: christophe.angleraud@spotimage.fr Web site:www.spotimage.fr
 Biography
 Christophe Angleraud is a Project Manager with Spot Image, the CNES subsidiary in charge of SPOT satellite digital data acquisition and marketing. He is in charge of designing digital exchange formats for the new SPOT 5 program.
 

Abstract

 The Spot program was developed by the French Space Agency CNES, in cooperation with Belgium and Sweden. Spot Image, created in 1982, is the first commercial company established to distribute geographic information derived from Earth Observation Satellites on a worldwide basis. SPOT archives currently store more than 5.6 million images, made of a 1,3 or 4 spectral band rasters associated with explanatory and interpretative metadata.
 For the new SPOT 5 program to be launched in year 2001, an XML-based format (Dimap/XML) is being designed to replace the record-oriented metadata encoding format (CEOS) currently in use within the SPOT 4 program.
 One of the advantages of using XML-coded metadata is to facilitate their formatting and display process for catalog applications, using XSL stylesheets and standard Web browsers.
 A schema, based on the W3C XML Schema draft specification, was developed to formally specify and describe the information content and format, and to express the set of applicable constraints. This application draws intensively on the new data-typing facilities provided by XML Schemas, so that the schema could be used as the basis for validity checking and quality assurance in the production processes.
 Extensions to XML Schemas are introduced to embed structured documentation items in the schema itself. This allows for automatic derivation of the published technical specification from the schema itself, both in print and on-line forms, through XSL processing.
 

Introduction

 

Spot Image business

satellite imagery
 
Spot image is a French company in charge of worldwide commercial distribution of SPOT . It has been existing for about 20 years and is currently exploiting three satellites (SPOT1, 2 and 4). SPOT5 will be launched by the end of 2001.
 Satellite images can be used as digital maps for GIS (Geographic Information System) users. They provide accurate, objective and recent views of an area of interest. The business of Spot Image is a quickly evolving one :
 
  •  past : remote sensing data provider (mainly for scientific applications)
  •  current : image Geographic Information data provider
  •  future : Comprehensive Geographic Information data provider (mixing several sources of information and providing customized products) with on-line order/delivery services.
 Currently, Spot Image holds more than 60% of the worldwide satellite image distribution business. Its incomes reached about $45M in 1998.
 

From simple raster to image and towards global geographic information

 metadata 
remote sensing
 
Multiple applications have been using satellite imagery for many years. While the core of applications remains raster data, the diversity of uses have driven them into using complementary data sets as well as auxiliary , such as the date of acquisition, sensor configuration, geo-localization (ground localization), etc. This set of metadata delivers information which is essential for an end-user to understand the real meaning of the imagery.
 The first step for using satellite imagery is usually to search an image catalog in order to find out what is available. Such a capability is already available as an on-line service from Spot Image: the Sirius catalog . Sirius has been designed to select raw satellite scenes as well as advanced products. E-commerce functionalities are currently under design. The next versions will offer the possibility to directly extract portions of on-line image coverage and to apply geometric transforms and standard radiometric processing options.
 
 Example of a Sirius catalog on-line search
 From raw satellite scenes (images corresponding to an original satellite shot), Spot Image evolved its range of products in order to provide satellite image coverage regardless of original scene acquisitions. Such products, designed to fit the exact requirements of customers in terms of geographic coverage and other acquisition parameters such as dates, viewing angles, etc. are calledspacemaps (commercialized under theSPOTView® products brand name). Spacemaps require merging (assembly) of several original satellite scenes; this process is often calledmosaicking .
 
 Typical Spacemap Product
 
 Spacemap mosaicking
 In summary, it is important to understand that a Spot image is more than simple raster data: it is associated with metadata which are critical for users to understand what he they are looking at.
 This is where satellite imagery leaves the raster world to enter the GIS (Geographic Information System) world. All the GIS technology relies on the capability of systems to mix graphic data (images, vectors) and attributes (non-graphical data). It is this combination which allows so-called geospatial queries, which add localization queries to basic SQL selections.
 Future applications will require fully interoperable data, self-descriptive, and standardized interchange structures. Moreover, ISO-9000 requirements imply full traceability of product transformation processes.
 

About the image format jungle and entropic standardization processes

 Defining a universal raster format has turned into the quest for the Holy Graal. The main difficulty lies in the large number of different applications. Currently, image manipulation software commonly include 50 different raster formats drivers. Coping with this "format jungle" strongly reduces application interoperability and requires a lot of technical expertise from data producers and their customers.
 The end of this decade has seen a lot of efforts deployed around normalization of electronic data interchange. Among these efforts, some are of direct interest to us : OpenGIS, GeoTiff, JPEG-2000, XML and XML Schemas.
 

Image metadata and product design requirements

 

Product requirements

 General requirements for interoperability imply several features :
 
  •  Provide catalog search and discovery through browser-like GUIs.
  •  Preserve human readability for remote customer support.
  •  Provide self-descriptive metadata.
  •  Provide strict document validity check for quality assurance and interoperability.
     
 
 Sample SPOTView product discovery
 
 

Metadata contents

 Experience in providing satellite data, gathered across hundreds of projects and applications, allowed Spot Image to design a set of metadata tailored to earth imagery description. The process involved in making a spacemap is complex. For example it involves geometric transformations in order to make the image directly superimposable to other geographic information data sources. Such transformations require external information such as ground control points and DTM (Digital Terrain Models).
 The Dimap (for Digital Image Map) metadata include : raster description, terrestrial coordinate system, satellite ephemeris trajectory, satellite attitude (rotation around its axis), quality information, sensor configuration, statistics, localization, legend, data sources description, etc.
 
 Extract of Dimap metadata
 

Encapsulation of other data, adaptive products

 Effective geographic information often involves several complementary data sources such as DTM, road networks, census data, built up areas, political boundaries, scanned maps, satellite imagery, aerial photography...
 The combination of all these data layers is dependent upon user applications. Moreover, individual datasets can be shared among several users for different applications.
 
 Sample combination of Spot data with road networks
 The near future (the next 2 years) will see an incredible increase in pixel resolution of satellite shots: SPOT5 will allow a ground resolution of 2.5 m, and Spot Image will distribute data from US Orbview satellites with a top ground resolution of 1 m.
 As a consequence, the market for satellite imagery will widely broaden. Individual customers and local council managers will have access to up-to-date imagery through e-commerce applications. The Spot Image business will dramatically change in terms of distribution model and volume handled. It will be necessary to hold image datastores with comprehensive metadata sets, from which customers will place on-line extraction orders. Sophisticated extraction processes will allow tailoring end products to the exact needs of users and/or according to specified pricing ranges.Flexibility in product making and packaging will be the key to on-line distribution success.
 

Historical background for Dimap

 The design of Dimap has been a long process. Dimap is based on multiple pre-existing raster data exchange formats:
 
  •  CEOS: a fixed record, mixed ASCII/binary encoding format tailored to satellite imagery. Its main shortcomings are : total lack of evolutivity, need for specific software import utilities, very problematic revision process involving upgrade of all customer software tools, mixing between metadata and raster data making metadata extraction a complex process.
  •  GIS-Geospot, GIS-Image: ASCII keyword/value pairs formats designed by Spot Image and SSC-Satellitbild (Sweden). These two formats introduce the concept of separate metadata, and make use of de-facto standards for raster encoding. Although the use of keywords brings some flexibility for product design, most software vendors have implemented specific hard-coded readers that still make it difficult to get things moving.
  •  Geotiff: an extension to the TIFF raster format which embeds raster geo-localization parameters.
  • Dimap (Digital Image Map):
     
    a joint effort by Spot Image and SSC-Satellitbild. Originally designed as a pure ASCII metadata standard, its object architecture makes it possible to derive an XML implementation.
 All these previous efforts have been combined in the design of the SPOT5 Dimap/XML specification.
 

Using XML for Dimap

 

Reasons for adopting XML

 Our decision to adopt XML for implementing the Dimap specification was motivated by the very good match between XML properties and major Dimap design requirements.
 The main design requirements for Dimap can be summarized as follows:
 
  •  Data separation: provide clean separation between metadata and raster data, so as to allow direct display of raster data (encoded in standard raster formats) in standard browsers or raster manipulation tools, without the need for proprietary viewing software.
  •  Genericity and universality: encode metadata as a structured grammar, for which generic processing layers can be easily developed and shared between specific software applications. Allow multiple uses of metadata information (anticipated or not), including screen display, processing, database loading, etc.
  •  Robustness and extensibility: make it possible to develop Dimap applications in such a way that evolutions in the Dimap information set are non-disruptive to existing applications which do not make use of newly available information, while minimizing work required to upgrade applications which could take advantage of it. It should also be possible to design self-adaptive applications able to reconfigure themselves when this information set is evolved.
  •  Reliability: support validity checking.
  •  Documentation precision and quality: guarantee that the published technical specification precisely reflects the Dimap information set and all associated constraints, and will faithfully track future evolutions.
 XML naturally fulfills those requirements:
 
  •  XML is worldwide standard, well adapted to defining industry-level application standards.
  •  The Dimap information can be naturally modeled as XML data structures (nested, attribute-bearing information objects), using a much more direct mapping than with traditional, record-oriented approaches. The resulting format is directly understandable and easier to maintain.
  •  XML makes it possible to re-use existing technology bricks (parsers, browsers) in software development, hence reducing proprietary development costs.
  •  XML allows clear separation of content & structure from presentation, which does not restrict the range of applications (while implying the use of stylesheets for visual rendering).
  •  XML supports self-descriptive data structures through schemas.
  •  Automatic validation techniques are available (DTDs and schemas).
     XML Schemas, with adequate documentation extensions, can be formatted into published technical specifications.
 

Dimap XML Schema design

 

Schema requirements

 The purpose of a schema is to precisely state a set of constraints about a structured data set. If the schema is expressed itself in machine-processable form, then it is possible to develop software to check how a data set (instance) fulfills this set of constraints, a process known asvalidation .
 This automated validity checking capability is a major requirement in our case, since this is the only way to ensure interoperability. It is a key quality assurance aspect of the project, which contributes to fulfillment of ISO-9000 criteria.
 In the context of Dimap design, constraints on data types are as important as constraints ondata structures . Therefore, XML DTDs (although they are currently the only stabilized and standardized formalism available to express structural constraints over XML datasets) do not match our requirements, since they do not allow expression of constraints on data content (data typing; etc).
 It was decided to examine the on-going work on XML schemas, and to select one of the existing proposal as a basis for the Dimap project.
 

Selecting XML Schemas as a schema formalism

 On-going work on XML schemas is inspired by three distinct and complementary trends, aimed at overcoming current limitations of XML DTDs. Various existing proposals (XML-Data, DCD, SOX, DDML, XML Schema) are not equally concerned by all three aspects.
 
  1.  Increase schema modeling power , by extending the range of constraints which can be expressed (introduction of constraints on data types, etc.). The XML-Data proposal was the first to explore this direction by introducing several new ideas, most of which made their way into the current XML Schema proposal.
  2.  Take advantage of modern object-oriented concepts in information modeling (modularity, classes, specialization, inheritance) to facilitate schema engineering, maintenance and documentation. This objective is shared by the XML-Data, SOX and XML Schema proposals, and was obviously the main objective behind SOX.
  3.  Adopt XML syntax as the representation for the schema itself, so as to facilitate schema processing, display, and storage using XML tools. All current proposals follow this objective, which was the only purpose of the DDML proposal.
 XML Schema 
 
After careful examination of the various existing proposals, appeared as the most comprehensive synthesis of these trends, and also as the most likely candidate to reach a W3C recommendation status within a few months. The May 99 W3C Working Drafts were used as the basis for the Dimap schema design work.
 Given the relative simplicity of the Dimap information set, it turned out that no use of the object-oriented design capabilities (archetypes, etc.) was required. By contrast, data typing capabilities were heavily used. Distinct advantages were also obtained from the XML representation of the Schema (see below).
 

Reuse of Dimap 1.0 and CEOS formats

 The existing Dimap 1.0 Beta work was used to design the architecture of the schema elements. In the same way, documentation requirements were derived from Dimap 1.0 documentation. A comparative analysis between Dimap and CEOS was then undertaken to select the portions of CEOS that needed to be integrated into Dimap/SPOT5. This work allowed us to specify the traceability requirements (historical information). Traceability of information origin (CEOS or Dimap 1.0) was implemented through theext:reference XML schema extension.
 Beyond this merging process, an interesting "intellectual filtering" process was applied to eliminate information present in legacy formats and which was there only as a side-effect of the format design itself. As an example, in a record-oriented format, some records contain information about the field lengths and location of data items in the following records, or the number of records to be read to complete a data block: there is no need for such data in an XML-based format in which information object boundaries and nesting relationships are fully explicit.
 

Single self-contained XML Schema

 The resulting XML Schema, over 3000 lines long, represents the complete specification for the new Dimap/XML format which merges the CEOS and Dimap 1.0 information sets. This is the only specification available, and it is supposed to be used both as areference document and as aset of rules for automated validity checking (either as a standalone QA process or as an embedded process in Dimap applications).
 Of course, the schema itself is not meant to be read by human readers in XML format: a formatting process, through an XSL stylesheet, is required to turn it into a readable specification document. This can be done either in a static way on print, or in a more dynamic way on-line (interactive, hyperlinked document).
 Thisself-contained nature of the Dimap XML schema, which is at the same time the Dimap specification and its documentation, is what makes its reliability: any change in the specification is automatically reflected when it is browsed as a document.
 

Current XML Schema limitations

 Some limitations of the XML Schema formalism in its current draft form were identified during the design phase of Dimap/XML.
 The most fundamental limitation is certainly XML Schema's inability to expressreferential integrity constraints . For instance, it should be able to state that existence of an XML element A is dependent upon the existence of another element B located in a remote location of the data structure, or that thevalue (content) or thevalue range of A is dependent upon the value or existence or B. The inability to express such constraints through XML DTDs still remains with the current XML Schema draft proposal. Another problem is the currentlack of documentation guidelines for XML Schemas, although this will probably be introduced as the specification is refined. Both problems were circumvented by introducing proprietary extensions to XML Schemas (see below).
 Finally, the current lack of direct validation tools for XML Schemas (other than SGML parsers using the XML Schema DTD) makes practical work more difficult. This annoyance, however, was fully anticipated and accepted in our decision to adopt the XML Schema formalism in its early stage of development.
 

XML-Schema extensions

 Several extensions to the current draft specification for XML Schemas were introduced, in the form of additional elements or attributes. All of them make use of anext: namespace to separate them from the native XML Schema vocabulary. Three additional first level elements were defined, which themselves contain several sub-elements (not detailed here):
 
  •  ext:purpose contains a structured definition of the purpose of the associated data item (XML element), available in short and long forms, with an optional illustration and comment.
  •  ext:reference is used for data item traceability from legacy formats. Sub-elements precisely describe the origin of the associated data item, in what original format/information set it can be found and under what name.
  •  ext:comment allows interspersing free comments throughout the schema.
 Finally, the additionalext:constraint attribute, which can be associated with the native XML schema elementselementTypeRef andmodelGroup , is used to express integrity constraints in literal form.
 Since the proposed draft DTD for XML Schemas is used as a bootstrap to validate the Dimap schema, this DTD had to be extended to reflect these additional elements and attributes.
 

XML schema sample

 The following schema excerpt (for the "Vertice" data item) is typical of our usage of the XML Schema formalism, and illustrates the use of schema extensions.
 
<elementType name="Vertice">
 <sequence>
 <elementTypeRef name="FRAME_LON"/>
 <elementTypeRef name="FRAME_LAT"/>
 <elementTypeRef name="FRAME_ROW"/>
 <elementTypeRef name="FRAME_COL"/>
 <elementTypeRef name="FRAME_X" 
     ext:constraint="CM if METADATA_PROFILE = 2A" 
     minOccur="0" maxOccur="1"/>
 <elementTypeRef name="FRAME_Y" 
     ext:constraint="CM if METADATA_PROFILE = 2A" 
     minOccur="0" maxOccur="1"/>
 </sequence>
 <attrDecl name="unit">
 <datatypeRef name="unit4"/>
 </attrDecl>
 <attrDecl name="index">
 <datatypeRef name="integer"/>
 </attrDecl>
 <ext:purpose><ext:short>Dataset frame vertice</ext:short>
   <ext:detail>Dataset frame vertice is repeatable. A Vertice is cited 
   for each vertice of the framing polygon of the dataset.</ext:detail>
   <ext:illustration>Illustrations\\Dataset_Frame.gif</ext:illustration>
   <ext:comment>Either group FRAME_LON/LAT or FRAME_X/Y must be cited
   </ext:comment>
 </ext:purpose>
 <ext:reference>
   <ext:doc_id>cap</ext:doc_id>
     <ext:file>LEADER</ext:file>
     <ext:record Number="2">HEADER</ext:record>
     <ext:offset>149</ext:offset>
     <ext:version>1.0</ext:version>
 </ext:reference>
</elementType>
 

Applications

 

Data browsing

 Spot Image products for which metadata are stored in Dimap/XML can be browsed using standard browsers supporting XSL. A prototype XSL stylesheet was developed which shows how displayed information provides users with an overview of the main product characteristics (see screen snapshot below). Similar techniques will be used to present the products stored in the catalog.
 
 Sample product browsing using Dimap/XML with XSL stylesheets
 

Schema display for technical documentation

 The extensions to the XML Schema specification described previously allow inclusion of structured description of Dimap elements in the schema. Using an appropriate XSL stylesheet, the schema itself can be used as on-line Dimap reference documentation including technical diagrams. Adequate XSL stylesheets allow schema browsing through hypertext functionalities. The screen snapshot below shows such a prototype implementation, where all navigation and hypertext linking mechanisms are implemented using XSL stylesheets along with a few embedded scripts (using IE5 XSL scripting extensions).
 
 Sample view of the Dimap/SPOT5 Schema using an XSL
 

Current status and future plans

 The development of Dimap/XML as well as its SPOT5 extensions is now complete. Demonstration products have been developed to demonstrate the technology. The Dimap documentation is available as static HTML and is currently being integrated as extensions into the Dimap/SPOT5 XML Schema. An XSL stylesheet allowing browsing through the schema has been developed; it will be used by the software developers of the SPOT5 ground processing system.
 In the future, Spot Image plans to use Dimap and XML at different stages of the production process.
 

Basic uses of Dimap/XML metadata: data interchange and catalog applications

 Spot Image will use Dimap not only for the publication of its spacemap products, but also for raw satellite scenes (time frame: SPOT5 launch). The required specific Dimap extensions have been integrated in the XML Schema. These evolutions will significantly reduce the burden of image import and metadata gathering.
 Dimap/XML will be used internally both as a data interchange format between systems and as a metadata storage format for cataloging purposes.
 Adopting Dimap/XML as a data interchange format between systems will allow interface maintenance cost reduction by integrating standard parsers into these systems. In addition, incremental evolution of metadata documentation is easily implementable, which provides high level production process traceability matching ISO-9000 requirements. Dimap/XML will also be required from Spot Image suppliers for quality control requirements.
 The catalog application will allow customers to extract the displayable part of a finished product before ordering it. XML allows easy development of self-contained product presentation sheets, using standard XSL processing. This approach will considerably reduce the burden of catalog system maintenance by replacing the current generation of specific, hard-coded dynamic HTML generation engines.
 

Advanced uses of Dimap/XML: on-the-fly product customization

 In a further step, we plan to store all technical metadata in XML, in order to support on-the-fly product generation for products requiring late customization. For example, a data store could hold all the raw satellite multi-temporal scenes over a given area in compressed format, along with the geometric models and the optimized mosaic paths as Dimap/XML metadata. When a customer orders a small extract, in a specific map projection, of a specific date range, with specific radiometric processing (e.g. enhanced linear networks by specialized filtering), then the on-line data server would use the stored metadata to apply all the necessary image processing steps to the selected portion, and ship the result over the Internet.
 

Using the Dimap XML schema for self-documentation of Dimap products

 We have already shown that the Dimap XML Schema, using an appropriate XSL environment, could be used effectively as standalone on-line documentation for Dimap itself .
 This idea can be taken one step further to develop on-line help for Dimap applications. Data items such as those displayed in our "data browsing" example would behave as hypertext links to the corresponding part of the XML schema. Clicking on a link would display a nicely formatted excerpt of the XML schema in a popup window, providing detailed information about the meaning, specific constraints and origin of the data item.
 

Conclusion

 There are several levels at which the Dimap design project takes advantage of XML technology:
 
  •  XML provides a natural framework for a serialized representation of Dimap metadata, well-adapted to data interchange. Existing XML technology allows sharing of generic, low-level data processing software layers through all specific Dimap applications.
  •  Specifying the Dimap format itself through an XML schema provides precise specification of the Dimap information set, of its exchange format, and of the set of accompanying constraints (or most of them), all in a machine-processable form which allows automatic validity checking.
  •  Using XML to express this schema itself greatly facilitates its use as the reference documentation for Dimap, and as a building block for on-line documentation components of future Dimap applications.
 Despite the not-yet-stabilized normative and technical environment, it is obvious form our experience that XML, as a universal standard for structured data representation, and XML Schemas, as a method to express sets of constraints about XML data structures in machine-processable form, provide good solutions to existing problems in technical data interchange and represent significant advances in this area.
 

Glossary

 
DTM Digital Terrain Model: a raster that contains elevation values as grey levels for each gridded ground location.
DEM Digital Elevation Model, similar to DTM
Spacemap An image, which has been geometrically corrected, and resulting from the mosaic of several input satellite scenes.
Sirius Commercial name of the Spot Image worldwide on-line catalog. It references not only raw satellite scenes but also off-the-shelf products.
Mosaic Specific process applied to a set of images in order to join them seamlessly into a user's area of interest. This process requires that all the input images have been geometrically corrected.
Geometric correction Raw satellite images are suffering from several geometric distortions resulting from the motion of the satellite, the perspective distortion, terrain relief, earth surface curvature... all these distortions can be modeled and corrected. This process involves at least two inputs: a DTM and GCPs in order to get a good location accuracy (the on-board angular positions measures are not accurate enough to reproject accurately each pixel to its exact ground location). An image which has been geometrically corrected is often called an ortho-image.
GCP Ground Control Point. A well-known point on the earth surface for which high precision ground coordinates are provided and which is easily identifiable on a satellite image (road crossing, bridges over rivers, piers,...). A GCP, when observed in a satellite image allows to build a geometric model of the raw satellite image.
Raster A regular grid cell array containing measurements. In the case of earth imagery, these measurements correspond to color of soil or lightness intensity. The satellite images can be acquired in several spectral bands, each of them responding to a specific range of colors. For Spot satellites: Band 1 is green, 2 is red, 3 is near-infrared, 4 is middle infra-red. A raster can be seen as a painting as opposed to a drawing (Check Vector)
Image A raster with corresponding metadata describing the raster data (date of acquisition, image processing, geometric processing,...). We believe that a good part of the value of an image is located in its metadata.
Vector A set of points well located on the ground, which, when linked to each other, can model some features on the ground (roads, rivers, parcels, forest boundaries,...). This graphical drawings can be associated to tabular data, such as id, length, surface, owner's name, etc. This tabular data are called vector attributes. GIS users can query vectors on both their geometry and their attributes.
Metadata Data about data. A information set designed to deliver additional information about some other data (also called real data). This information set must be standardized and well understood by users to be useful. Another use of metadata is to store it in catalogs, allowing queries by potential users.
GIS Geographic Information System. A software which allows display of multiple types of geographic information (images, vectors, DTMs,...) and which lets the user select and update part of this information by building geospatial queries.
 

Implementing a Component Broker using XMI   Table of contents   Indexes   Content Aware Intelligent Web Graphics