| Data Models as an XML Schema Development Method | Table of contents | Indexes | Implementing a Component Broker using XMI | |||
Burkett, William ![]() Long Beach ![]() Product Data Integration Technologies, Inc. ![]() USA ![]() | William C. Burkett |
| Senior Information Engineer |
| Product Data Integration Technologies, Inc. |
| 100 W Broadway, Suite 540
Long Beach
(California)
USA
(90802)
Email: wburkett@pdit.com |
| Biography |
Introduction |
| "XML will add meaning to the web." |
| "XML will enable application interoperability over the web." |
| "Vocabulary" this "Ontology" that "Metadata metadata metadata " |
| What does it all mean? Indeed, does it - or can it! - mean anything at all? |
Notes on the Terminology |
Problem Statement |
| · The fact that hundreds - if not thousands - of organizations are doing it. |
| · Standardardized such that it can be effectively, uniformly, and consistently applied to the degree that it can be called out in contracts. |
| Unfortunately, these objectives fail to account for a very human and all-too-real phenomena: the meaning of terms tends to drift with repeated use. |
| 1) be slightly different than that specified by the vocabulary standard in a given use - there are many small semantic variations that are unique to the use; |
| 2) to drift and vary across multiple uses as business processes change. |
Integrating Vocabularies |
System Integration and Application Interoperability |
| (Note: These approaches are presented roughly in order of historical appearance.) |
| · Point-to-point translators that convert data bound to one application system into the data format of a target application system; |
| · A shared database that is used by multiple applications |
| · Product Data Exchange (PDE) standards that specify a neutral, application-independent data structure used to convey data between applications (and translators written to/from the PDE standard); |
| · Database federations in which each repository makes (some part of) its data visible and accessible via an API to other databases/applications in the federation; |
Integrating Information Resources |
| There are added benefits of the integration-through-abstration paradigm that directly addresses several of the requirements above: |
| · The abstract integration vocabulary is more stable and less likely to change over time because of the built-in semantic fuzziness - thus, it is standardizable; |
| · The domain-specific vocabularies are decoupled from each other and thus free to evolve as requirements change with a minimum "ripple effect" on other vocabularies; |
| The structure of PDML and the nature of the vocabularies of which is it comprised are described below. |
Ramifications on the development and use of XML Vocabularies |
| The most pertinent lessons are |
| · An XML Vocabulary is a model of data |
| · An XML Vocabulary should be small in scope and specific in application if the meaning of the terms to be clear and unambiguous to the user of the vocabulary. |
| · The context, experience, and requirements of a particular use of a particular vocabulary cannot be accurately predicted, nor can the evolution of requirement be predicted. |
Product Data Markup Language - PDML |
Structural Overview of PDML |
| The structural architecture of PDML is analogous to the "star-satellite" structure of the client-server model. PDML is composed of the following components: |
| · A collection of Application Transaction Set. |
| · The Integration Schema; |
| · Mapping specification between the Application Transaction Sets and the Integration Schema; |
| The relationship between these components is illustrated in Figure 1. As PDML grows, additional transaction sets will be added to the specification. |
|
Meeting the requirements for XML Vocabularies |
| The development of PDML sought to meet the functional requirements defined above. To reiterate, the requirements are: |
| 1) Complete and unambiguous data semantics to facilitate the import and export of data; |
| 2) Integrated data semantics; |
| 3) Standardizable data semantics; |
| 4) Adaptable data semantics; |
| -An additional PDML requirement was leveraging existing technologies. Toward this end, PDML is simply a new application of technology and standards that already exist. |
Complete and Unambiguous Data Semantics | ||||||
| For example, the Joint Engineering Data Management Information Control System (JEDMICS) is a very large - and very old - defense data system. It consists of data fields like: |
| · Drawing_number |
| · Drawing_title |
| · cage_code |
| · doc_type |
| · drawing_revision |
| · sheet |
| · sheet_revision |
| · frame |
| · number_of_frames |
| · control_code |
| · security |
| · foreign_secure |
| · nuclear |
| ·wsc |
| · safety |
| · dist |
| · master_location |
| Some of these fields might mean something to non-JECMICS users like "drawing_number" or "sheet". Who but a JEDMICS user, however, would know what a "control_code" was, or what "wsc" meant? |
| Complete and Unambiguous in Large Meaning Communities |
Integrated Data Semantics | ||||||
| b) The size of the merged vocabulary will quickly become too large for anyone to understand and manage. |
| Integration Schema |
| Mapping Specifications |
| The PDML Toolkit "internalizes" and uses the Mapping Specification to drive the conversion of XML data to/from the Integration Schema format. |
Standardizable and Adaptable Data Semantics | ||||||
| Formal specification of semantic content |
Leverage existing technologies | ||||||
| · The "integration through abstraction" and interpretation architecture of STEP; |
| · The STEP Integrated Resource data structures in the construction of the Integration Schema. |
| PDML actually developed no new technology. Rather, it simply combined bits of existing technology to deploy integrated, web-based data resources. |
XML Vocabulary Design Issues |
| Local autonomy versus global applicability |
| Impediment of SGML "mindset" |
| Complexity versus acceptance |
| That this issue is important to the widespread adoption and use of XML vocabularies is evidenced in the facts that: |
| 1) SGML was standardized in 1986 and gained primary acceptance within government applications, but never caught on publicly. (SGML is a complex standard.) |
| 2) HTML caught on with the general public very quickly. (HTML is - relatively - simple.) |
| 3) XML caught on with the generalpublic very quickly. (XML is simple with the background provided by HTML experience.) |
| 4) All those "xxx for Dummies" books are so popular . |
| Users of internet technology want simple solutions - they want all that complex stuff hidden away under the hood of application. They want appliances. |
| This is not a technical issue, but a business issue. Who is the prospective target audience of an XML vocabulary? |
| Keys, identifiers, and cross-platform uniqueness |
Summary and Conclusions |
| 1. Batini, C., Deri, S., and Navathe, S., Conceptual Database Design, an Entity-Relationship Approach. Benjamin/Cummings, Redwood City, 1992. ISBN 0-8053-0244-1. |
| 2. Batini, C., Lenzerini, M., and Navathe, S.B., A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18, 4, (1986), pp. 323-364. |
| 3. Berger, P. and Luckman, T., The Social Construction of Reality A Treatise in the Sociology of Knowledge. paperback ed. Doubleday, New York, 1966. ISBN 0-385-05898-5. |
| 4. Berners-Lee, T., Connolly, D., and Swick, R.R., Web Architecture: Describing and Exchanging Data. 1999: World Wide Web Consortium. |
| 5. Bray, T., Hollander, D., and Layman , A. Namespaces in XML. (1998) http://www.w3.org/TR/PR-xml-names. Date of page: 1998-11-17. |
| 7. Elmasri, R. and Navathe, S.B., Fundamentals of Database Systems. Benjamin/Cummings, Redwood City CA, 1989. ISBN 0-8053-0145-3. |
| 8. Fowler, M. and Scott, K., UML Distilled Applying the Standard Object Modeling Language. Addison Wesley Longman, Reading, Mass, 1997. ISBN 0-201-32563-2. |
| 9. Goldfarb, C. and Prescod, P., The XML Handbook. Open Information Management, C. Goldfarb, ed. Prentice Hall, Upper Saddle River, NJ, 1998. ISBN 0-13-081152-1. |
| 10. Gonsalves, A. and Pender, L., Schema Fragmentation takes a bite out of XML, in PCWEEK. 1999. p. 1, 16. |
| 11. ISO. Industrial automation systems and integration - Product data representation and exchange - Part 11: EXPRESS Language Reference Manual. ISO 10303-11:1994, Geneva, 1994. |
| Data Models as an XML Schema Development Method | Table of contents | Indexes | Implementing a Component Broker using XMI | |||