XML education and training   Table of contents   Indexes   What employers want

 education 
system development
 

Acquirement of XML skills in industry

 van der Steen, Gert 
 
 Gert  van der Steen
 Senior Consultant
  Palstar bv 
 The Netherlands 
 Uffelte 
Palstar bv,  Winkelsteeg 5a
Uffelte   7975 PV The Netherlands
Phone: +31 521 351077 Fax: +31 521 351078 email: palstar@xs4all.nl web site: www.palstar.nl
 Biography
 Gert van der Steen - Gert van der Steen is an independent consultant in Document Management and Language Technology, with a focus on the introduction of SGML and XML in industry and the automatic translation of controlled natural languages.
 Gert studied Mathematics and Computer Sciences and wrote a dissertation on the design of a program generator for syntactic pattern recognition and transduction. He worked as a researcher and lecturer at the Universities of Leiden (Medicine), Rotterdam (Economics) and Amsterdam (Arts). He shifted in 1988 from science to the industry where he has been involved in many SGML projects as a consultant, trainer, developer and project manager. For Cap Gemini he developed a system for the treatment of controlled natural languages.
 In 1996 Gert van der Steen founded his own consultancy, Palstar bv. For Palstar he is currently developing tools for SGML/XML and Natural Language Processing, like syntactic analysis of documents for subsequent up-conversion, revision tracking (Palstar sells the program SGDIFF), transformation and querying of SGML documents and groves. In addition to this Gert van der Steen is a part-time Professor in Information Science at the University of Utrecht.
 Abstract
 Incidentally or full time, workers in industry are increasingly confronted with an application of XML in one or more of its many aspects. People are frequently insecure about the background knowledge they have to acquire and the resources that are available. In order to come to a more general solution, we observe that many tasks in automation may be described in terms of Information System Methodologies. According to the layers in these methodologies, the respective XML aspects may be identified, together with the required background knowledge for their proper application.
 The next steps will be to identify the resources for training and the ways to set up training effectively. In that respect, typical working habits in industry have to be taken into account.
 

Introduction

 The widespread introduction of XML in industry, government and other organizations, calls for an inventory of skills. Skills that are required for the proper use and implementation of XML and its related standards (“XML+”). This paper investigates which skills are needed.
 Varieties of people in organizations are in contact with XML. Among them are:
 
  • managers who have to oversee the consequences of the introduction of XML
  •  
  • the real end users who have to profit from the use of XML
  •  
  • authors who have to write documents in XML
  •  
  • business consultants who have to know about the applicability and usability of standards and tools
  •  
  • members of standardization committees who are developing new languages
  •  
  • computer scientists who are developing data structures and efficient algorithms and who study the intrinsic properties of the standards
  •  
  • developers of tools and systems for XML+.
  •  All these people need different types of skills and background knowledge. They may be heavily involved in large projects, or only at first sight. Sometimes they use different terminology.
     Where do people acquire new skills? For the new generation it may be in technical schools or universities, where XML is gaining popularity in regular courses. However, for most people in industry XML is new. People have to follow training courses or have to get incidental advice.
     The author has given a number of training courses in industry, from crash courses taking half an hour to regular courses taking a week or more, and is preparing courses on the university level. There is a variety of demands and a variety of personalities and backgrounds involved. Some topics are basic, but some demand more research on an academic level.
     This paper traces only the topics involved, and not the depth of training that is required for different target groups. It addresses the use of XML as a language for the exchange of messages as well as for the structuring of documents, as a partial replacement of SGML.
     In order to locate the required skills we follow two approaches.
     The first approach is a technical one. It takes the point of view of a developer who wants to construct a tool or a complete system, be it from scratch or as an addition of XML functionality to an existing one. This approach will recover mainly skills offered by Computer Science.
     The second approach takes the view of an application that has to profit from the use of XML+. Following the steps of a System Development Methodology we encounter the required XML skills, independent of the size of the system to be developed. This approach is more or less covered by Information Science.
     Besides the skills covered by Computer and Information Science the introduction of XML+ may require other skills.
     This paper has three main divisions.
     
  • Part one outlines the components of a complete document processing system, to be referenced in the other parts.
  •  
  • Part two identifies the depth of XML integration in a document processing system, seen as a gradual integration of XML with tools, systems and infrastructure, in a bottom-up fashion
  •  
  • Part three traces the sequential steps of system development, to be taken in time and from top to bottom.
  •  

    Components of a document processing system

     Every system that processes documents, small or large, has one or more of the following components. The (partial) markup of the documents calls for additional XML functionality.
     Input:
     
  • Author environment: for the creation and maintenance of XML tagged data
  •  
  • Conversion: for the translation of data and documents into XML
  •  
  • Parsing: for the validation of an XML document against a DTD.
  •  Data storage:
     
  • Repositories, be it (distributed) databases (relational, OO, relational OO) or simple flat files
  •  
  • Data storage manager for XML data, connected to the repositories, for the access to document objects, hyperlinks, entities and text.
  •  Retrieval:
     
  • Browsing
  •  
  • Querying.
  •  Output:
     
  • Transformation: within and from XML data and documents
  •  
  • Composition: mapping XML structure to format (paper and electronic)
  •  
  • Electronic Delivery.
  •  Document management:
     
  • On the XML component level: workflow, authorization, version control and content management.
  •  Workbenches:
     
  • For development and maintenance of specifications: document and database schema’s, stylesheets, transformations.
  •  

    Layers of XML system integration

     This section takes the point of view of a gradual integration of XML+ into a Document Management System. The following steps are covered:
     
  • The XML+ Standards themselves
  •  
  • The building of Engines based upon the standards
  •  
  • The integration of engines within XML+ Tools and Systems
  •  
  • The integration of tools and systems within the Infrastructure: Databases, Doc. Management, Operating Systems and Networks.
  •  

    XML+ standards

     The XML+ standards are based upon formal languages and datastructures. Therefore, members of the standardization committees should have a command of Theoretical Computer Science.
     (The writing of specifications, like schemata and stylesheets, according to the standards is covered in the section on System Development Methodologies.)
     
    Design of XML+ Syntax Structures Skills
     
  • In general
  •  
  • Design considerations for formal languages and grammars
  •  
  • Dtd's, schemas
  •  
  • Ambiguity
  •  
  • Transformations with XSLT
  •  
  • Other transformation languages
  •  
  • Rewriting systems; aspects of reversibility
  •  
  • Query languages on documents and trees
  •  
  • Database theory
  •  
  • Context-sensitivity within transformations, queries and stylesheets
  •  
  • Expression of context-sensitivity
  •  
    Design of XML + Information Structures Skills
     
  • Document tree, DOM
  •  
  • Tree walking languages
  •  
  • Operations on trees
  •  
  • External links
  •  
  • Hypertext, hypermedia
  •  

    XML+ engines and API’s for processes and datastructures

     The processes, which are described in the standards, may be implemented within dedicated Engines. Most expertise stems from the theory of Formal Automata and Program Generation.
     
    XML Skills
     
  • XML parser + DOM constructor
  •  
  • Techniques for parser generation
  •  
    Conversions and transformations: 100% automatic and correct Skills
     
  • XSLT engine: add, delete, change components in trees
  •  
  • Other tree transformation techniques
  •  
  • Techniques for transducer generators
  •  
  • Programming languages: imperative, functional, logical, ..., event-based, pattern-based, rule-based
  •  
    XML+ Skills
     
  • In general
  • Knowledge of:
     
  • Responsiveness: off-line, on-line, real-time
  •  
  • Execution: compiled, interpreted
  •  
  • Time and space complexity: exponential-NP, polynomial, (sub)linear
  •  
  • Decidability and correctness
  •  
  • Handling of ambiguity
  •  
  • Handling of ill-formed input
  •  
  • Software engineering
  •  
  • OO-aspects: (multiple) inheritance, sending messages to object
  •  
  • Sequence (in)dependency of rules
  •  
  • (Visual) developer workbenches
  •  
  • Browser engines on trees
  •  
  • Data storage manager
  •  
  • Query engine on documents and document trees (also Doms)
  •  
  • Database techniques
  •  
  • Techniques for pattern recognition: string and tree matching and comparison
  •  
  • Revision tracking and storage
  •  
  • Theory of sequence comparison for strings and trees
  •  

    Integration of engines within XML+ tools and document processing systems

     Tools for Document Processing may obtain XML functionality by integration of XML+ Engines.
     It may concern already existing tools, which have to be extended with XML+ functionality, or new tools, constructed from ground level. It may concern shrink-wrapped tools or home made ones, grown up from dedicated systems.
     The required skills stem, at one hand, from engineering disciplines and, at the other hand, from the field of application for which a tool is constructed.
     
    XML+ Skills
     
  • In general
  •  
  • Access to requirements analyses for the desired functionality
  •  
  • Working with API's on internal datastructures
  •  
    XML Editors Skills
     
  • In general
  •  
  • Psychology-Ergonomics
  •  
  • Techniques for technical authoring
  •  
  • Language Technology for Controlled Languages
  •  
  • If built into existing word processors (like MsWord)
  •  
  • How to operate on text buffers without hierarchical structure
  •  
  • If built from scratch
  •  
  • Rendering techniques
  •  
    XSL, XSLT Editors Skills
     
  • Creation of specifications by example
  •  
  • Case systems, knowledge based systems
  •  
    Data storage and retrieval of XML objects: database techniques Skills
     
  • Relational, OO, relational-OO, full-text
  •  
  • Meta-data
  •  
  • Storage of document trees in databases
  •  
  • OO-paradigm
  •  
  • Tradeoffs between relational, OO, relational-OO for XML components
  •  
  • Database technology
  •  
  • Information Retrieval models, e.g., Boolean, Vector, Probabilistic, Fuzzy Set, Bayesian
  •  
  • Pattern matching
  •  
  • Indexing and search
  •  
  • Web query languages
  •  
  • Connection to Data Storage Manager for XML+
  •  
    Document management for XML components: Skills
     
  • Workflow, authorization, version control, content management and collaborative authoring
  •  
  • Workflow principles
  •  
  • Connection to Data Storage Manager for XML+
  •  
    Browsers Skills
     
  • Rendering XML
  •  
  • Hyperlinks
  •  
  • Java and Internet technologies
  •  
  • Multitier solutions, XSL-HTML
  •  
  • XSL(T) engines
  •  
    Composers Skills
     
  • Mapping XML structure to format (paper and electronic)
  •  
  • Graphic design
  •  
  • Design principles graphical industry
  •  
  • XSL
  •  
    Electronic Delivery Skills
    In XML, HTML, PDF, LaTex etcetera.
     
  • Transformations
  •  
  • Server Technology
  •  

    Integration within infrastructure: databases, doc. management, network, OS

     The integration of tools and subsystems (with or without XML+ functionality) in one overall system can be simplified by the exchange of messages which are marked up with XML. Also (for instance in the B2B paradigm), systems of different organizations can become more or less integrated.
     
    XML+ Skills
     
  • XML as glue for exchange/ interface for composite systems
  •  
  • DTD design
  •  
  • Parsers
  •  
  • SAX
  •  

    System development methodology; XML aspects

     The former section took a bottom-up approach to the creation of a Document Processing System. Now we take the opposite view: constructing a system top-down, starting with a specific application in mind.
     There exist several System Development Methodologies (SDM), some competing and some specializing in different areas of application. We may abstract from the differences between these methodologies because our goal is to recover the specific skills needed when XML+ is introduced. Therefore, we will follow the main steps of a SDM and will use general terminology.
     It may be the case that in the evolution of an installed Information System, XML+ functionality has to be added. In that case methodologies for reverse engineering may apply, which may call for a mix of top-down and bottom-up strategies.
     
    XML+ Skills
     
  • In general
  •  
  • System development methodologies, also for OO systems
  •  

    Definition study

     This is the phase of the definition of goals and the formulation of constraints.
     
    XML+ Skills
     
  • Business objectives
  •  
  • Strategic planning
  •  
  • Bottlenecks
  •  
  • Design lines
  •  
  • Costs / benefits
  •  
  • Critical success factors
  •  
  • Awareness on management level of benefits of XML
  •  
  • Business Economics
  •  
  • Costs / benefits studies
  •  
  • Case studies
  •  
  • Design patterns
  •  
  • Creation of working group with authors, database specialists, stylists, administrators
  •  
  • Initial training
  •  

    Information (data) analysis

     In this phase the flow of information is analyzed and defined. It is an important phase for the definition and planning of XML+ activities.
     
    Data Skills
    input
     
  • Who, where, quantities
  •  
  • Types of documents, known logical structure, coherence between documents, versions
  •  
  • Input medium(s); from which platform(s), ways of delivery
  •  
  • Existing layout, styles, figures, formula, tables
  •  
  • Constraints
  •  
  • Quality documents, known inconsistencies and errors
  •  
  • Data analysis
  • storage
     
  • Document components
  •  
  • Entities to be shared by processes
  •  
  • Extended links
  •  
  • Specifications (dtd's, stylesheets, etc.)
  •  
  • indexes
  •  
  • Trade-off's for the level of granularity
  • output
     
  • Types of products to be delivered
  •  
  • Requirements for electronic exchange
  •  
  • Required types of queries; reliance on logical structure; requirements for precision and recall
  •  
  • Required logical structures
  •  
  • Additional information to be produced
  •  
  • Live links, shared data with other systems
  •  
  • Sharing of information between documents and remote document repositories
  •  
  • Fault tolerance
  •  
  • Concepts of information retrieval
  •  
  • Concepts of document structuring and transformation
  •  
  • Principles of XML, XSL, XLL, XMI
  •  

    Global design

     In this phase the techniques are defined by analyzing the requirements, making selections and applying restrictions.
     
    Document management Skills
     
  • On the XML component level: requirements for version control, approvals, production flow, working procedures, archiving
  •  
  • Trade-off's storage of process data within XML markup, metadata or elsewhere
  •  
    XML+ Skills
     
  • Choice of XML+ standards
  •  
  • Which standards, and when they are required
  •  
  • Choices, (dis)advantages, availability of tools
  •  
    Document analysis Skills
     
  • Determine logical structure of documents and granularity for retrieval
  •  
  • Inference of specifications
  •  
  • Diagram techniques
  •  
  • Design of data models
  •  
    Required conversions Skills
    Converters from non-XML, once only (legacy) or regular
     
  • Converters from XML
  •  
  • Machine aided human conversion, degree
  •  
  • Human aided machine conversion, e.g., reformatting existing documents with stylesheets
  •  
  • Volumes
  •  
  • (In need:) knowledge about the effectiveness of conversion strategies in different situations
  •  
    XML functionality of system (behavior) Skills
     
  • Requirements for user interfaces
  •  
  • Input- and output structure
  •  
  • Interface design
  •  
  • Ergonomics
  •  
    Selection of tools Skills
     
  • In general
  •  
  • In need: studies on the effectiveness of different XML functionality and strategies within tools
  •  
  • Criteria for selection
  •  
  • Quantification of usability, like in ISO 9126
  •  
  • Evaluation of tools on the market
  •  
  • Types of tools
     
  • common characteristics
  •  
  • requirements
  •  
  • usability analyses

  •  
  • Evaluation strategies
  •  
  • In-house building of tools / combining tools
  •  
  • (See Integration)
  •  
  • Building visual workbenches for the development of specifications
  •  
  • (See Integration)
  •  

    Detail design

     In this phase the techniques to be used are further detailed and specifications are written, prior to realization.
     
    XML+ Skills
     
  • In general
  •  
  • Knowledge of standards for XML+
  •  
  • Know how and when to apply them
  •  
  • Aspects of usability
     
  • understandable
  •  
  • correct
  •  
  • customizable
  •  
  • effective/efficient

  •  
  • Applicability of alternatives
  •  
  • Analyses of users experiences
  •  
  • Design of Control Information: dtd's for input, storage, delivery
  •  
  • Ontology's for names
  •  
  • Initiatives of OASIS, IDEAlliance, BizTalk: Registry, Repository, Conformance
  •  
  • Meta-data
  •  
  • Creation of authoring instructions
  •  
  • Explaining the dtd's; shortcuts for keyboard
  •  
  • Readability instructions linked with structure, like in Information Mapping
  •  
  • Controlled Language guidelines
  •  
  • Stylesheets for word processors with subsequent conversion to XML
  •  
  • Design of XSL stylesheets
  •  
  • Knowledge of standard; experience
  •  
  • Design of XSLT transformations
  •  
  • Knowledge of standard; experience
  •  
  • Design of standard queries
  •  
  • Trade-off between different query languages for XML
  •  
  • design/adaptation of database structures
  •  
  • Design of user interfaces
  •  
  • XML messages for dynamic adaptation of user interfaces
  •  
  • Design of XML specifications for the exchange of messages between processes
  •  
  • Schema (dtd) design
  •  
    Design of conversions Skills
     
  • Determine methods and phases of conversion
  •  
  • The symbols in the document on which the conversion will be based (also dependent upon consistency and errors)
  •  
  • Degree of pre-, inter- and post-editing
  •  
  • Defining the operator environment, the functionality, the user interface
  •  
  • Quantity and qualification of operators
  •  
  • Selection of tools for conversion
  •  
  • Parsing on correctness of input document
  •  
  • Trade-off's between pattern matching / replacement and pattern grammar approach
  •  
  • Programming systems for pattern matching and replacement (like Omnimark and Perl)
  •  
  • Systems for pattern grammars and transduction
  •  
  • Tree transformations
  •  
  • Creation of working procedures
  •  
  • Experience in writing instructions for XML lay people
  •  

    Realization

     In this phase, the detailed design materializes into a concrete working system that can be tested.
     
    XML+ Skills
     
  • Building of tools and systems, also for conversions
  •  
  • (See Integration)
  •  
  • Writing of documentation and user manuals
  •  
  • Training in Operations
  •  
  • Stylesheets for input
  •  
  • Training of all personnel
  •  
  • Psychology: motivation of authors in using XML
  •  
  • Introduction of XML in technical writing
  •  

    Implementation in organization, working procedures

     The last phase we mention here regards the introduction of the system within the organization.
     
    XML+ Skills
     
  • Arrangement of user organization
  •  
  • Arrangement of central organization for the control of all XML+ specifications
  •  
  • Training in Operations
  •  

    Conclusion

     It is possible to define the XML skills that are required in industry by following the steps of bottom-up and top-down approaches for System Development.
     The acquirement of skills has to be dealt with elsewhere. Varieties of people and different working habits ask for different approaches in the way training courses have to be set up.
     The checklists in this paper are of a rather general nature. They can be extended and more refined. The author welcomes additions and other comments.

    XML education and training   Table of contents   Indexes   What employers want