Using structured information standards for publishing   Table of contents   Indexes   SGML-oriented Integral Editorial System

 

Quality management considerations for implementing SGML

Tyson, Paul
 
 Paul  Tyson
 Section Supervisor, Technical Publications Processes
  Cessna Aircraft Co. 
 Kansas 
 USA 
 Wichita 
Cessna Aircraft Co.,  1 Cessna Blvd.
Wichita  Kansas  67215 USA
Phone: 316-517-7749 Fax: 316-517-7089 email: ptyson@cessna.textron.com
 Biography
 Paul Tyson - has worked in general aviation technical publications since 1981. He developed Cessna's CD-ROM Technical Library, one of the first CD-ROM technical publications product in general aviation. He was a founding member of the Electronic Publications Subcommittee of the Technical Policy Committee of GAMA (General Aviation Manufacturers Association). Currently he oversees the continuing transition of Cessna Technical Publications to full SGML authoring and publishing.
McPartland, Dan
 
 Dan  McPartland
 Supervisor, Engineering Technical Publications
  Cessna Aircraft Co. 
 Kansas 
 USA 
 Wichita 
Cessna Aircraft Co.,  1 Cessna Blvd.
Wichita  Kansas  67215 USA
Phone: 316-517-6151 Fax: 316-517-7089 email: dmcpartland@cessna.textron.com
 Biography
 Dan McPartland - has worked in general aviation safety investigation, CADAM/CATIA graphics, and flight related technical publications since 1989. During the past 5 years, he has supervised the cockpit typography and flight document writing group within Technical Publications. As of January 1, 2000 Dan assumed responsibilities for the Engineering Technical Publications Department.
 Abstract
 Quality management principles of customer focus, quality measurement, and continuous improvement can be used to bridge the gap between the theoretical promise of SGML as a tool for information owners and the practical compromises required to accommodate application limitations.
 

Introduction

 
 SGML was originally developed to meet the requirements of publishers--that is, people who own information and need to process it in multiple formats and disseminate it in a variety of media. SGML was NOT developed to meet the requirements of the developers of the processing systems... nor of the owners of the distribution media....
 Charles F. Goldfarb, "Entity Management in SGML", 1993 , http://www.oasis-open.org/cover/goldenti.html
 In this paper we want to describe what happens when quality management principles meet markup technology in the area of technical publications. We will introduce some of the important principles of quality management and discuss their application to many aspects of technical publications.
 Our management goal is to improve the quality of technical publications at all levels and phases--production, storage, access, and delivery. Ultimately, the value of any information management system is determined by the degree to which it enhances or impedes communication among humans. We believe quality management principles, applied to standardized markup technologies, will help us meet our goal.
 

Principles of quality management

 In this section we will introduce some of the principles of quality management and discuss how we have applied them in the area of technical publications.
 

Communication is the product

 At the outset we must be clear about what is the real product (the ultimate goal) of technical publications. We don't believe that it's just a stack of manuals, or a CD-ROM, or a website. Those are means to an end. The goal is communication.
 In humans, the task of communicating is dynamic. The elements of sight and sound, independently or collectively, are what modify behavior as a result of communication. The stimulus we deliver to, for instance, five people could result in five different behaviors. In technical publications, Simplified English for instance, is one quality card in the playing deck. Can SGML implementation enhance the burden of technical writing with Simplified English? What types of authoring and behavior changes need to be in place to insure complete, efficient implementation? Should Simplified English be viewed as a natural derivative of the SGML and quality culture standpoint or just an additional feature? Using quality management considerations, the determinate would be our goal of world wide understanding amongst our customers, not just from person to person but country to country, across languages.
 Standards of communication in our industry include subjects such as Simplified English, standard abbreviations, standard dictionaries, units of measure, and military specifications. These are all important but often fail to address the real standard, that of consistent behavior. The human at the end of the information assembly line needs a change to take place in the mind. Technical publications deliverables of the future must consistently and effectively manipulate the matter between our ears.
 

Quality product

 A quality product or service, in any field of commerce, is characterized by an abundance of useful features and an absence of defects. In technical publications, what features and defects are important? Your lists may be different from ours, but the important thing is to identify the quality characteristics that determine the success or failure of your endeavor.
 Features
 
  • well-organized
  •  
  • complete
  •  
  • easy to find
  •  
  • engaging presentation
  •  
  • up-to-date
  •  
  • etc.
  •  Defects
     
  • incorrect data
  •  
  • unintelligible content
  •  
  • incorrect references
  •  
  • out-of-date material
  •  
  • etc.
  • quality culture
     

    Quality culture

     Humans are at the beginning and end of all communication processes. It stands to reason that better quality products would come out of organizations that have better "quality cultures". A quality culture is characterized by specific attitudes, abilities, and behavior of each member of the organization.
     Quality culture attitudes and abilities can be summarized (and measured) as the degree of self-control employees have in their work. Specifically, there are three requirements for employee self-control:
     
    1. Know what the quality goals are.
    2. Be able to recognize quality shortcomings.
    3. Have the means and authority to correct defects.
    We believe that SGML can enhance the quality culture of an organization by enabling and empowering employees to monitor and correct many more dimensions of data quality.
     J. M. Juran is one of the pioneers of quality management. InQuality planning and analysis , he says that to be superior in quality we must pursue two courses of action: 1) develop technologies to create products and processes that meet customers' needs; and 2) stimulate a culture of quality throughout the organization--one that continually views quality as the primary goal. Juran goes further to say "technology touches the head, but a quality culture touches the heart". Our goal is to develop pride and ownership of mind-altering communication. SGML promises to unlock what we call "the shackles of intense development time" and allow our imaginations the time to create and innovate. We feel that once dedication to the process becomes habit, confidence in our abilities will follow, and with that, policies designed to constantly listen and understand customers will deliver us to the top marketplace of the future.
     Another aspect of quality culture is the personnel requirements, which we feel are critical. The technical publications employee profile itself is changing. The capacity for abstract thought and reasoning coupled with technical imagination are significant human features we need to provide all our data recipients both internal and external.
     

    Quality process

     Quality management encompasses three distinct processes: quality planning, quality control, and quality improvement. Any of several different methods can be used to actually implement "quality management". Each of them favors a certain sequence of steps, uses slightly different terminology, and employs different tools. What they have in common, though, is more important than the differences. The common elements include: focusing on customers and their needs; identifying and designing products with features that meet those needs; and always measuring performance to enable continuous improvement of products and processes.
     

    Needs

     As we have mentioned, our customers have specific needs related to their goal of getting information for the purpose of maintaining or operating their airplanes. The technical publications organization itself also has needs. What are the needs of a quality process? They may be a set of tools such as authoring guidelines which are ISO 9000 compliant, application skills acquired through training, or a document instance. These needs originate within our department. Customer needs, on the other hand, are best defined through marketing surveys, technical committee meetings with operators, and personal contact through our support groups. External and internal needs must be carefully analyzed and prioritized to allow effective allocation of limited resources toward the goal of complete customer satisfaction.
     

    Features

     Features, such as delivery buttons, hotspot links, printing, and search routines are a few of the vehicles we use to meet customer needs. To provide our customers with information, we develop these features to deliver information in a user-friendly, efficient form. In our organization, we are still formulating our “features” list. The capabilities of SGML within Tech Pubs have certainly not developed completely. We still don’t know or fully understand all of the possibilities. We do, however, have high hopes of surpassing all customer needs through feature development, conscientious planning, and data ownership.
     

    Measurement

     According to Juran, things that get measured are things that will be prioritized and accomplished. In our business we anticipate SGML could allow quality measurements to be automated. This could be represented, for example, by dividing the man-hours expended by the number of revised figures, paragraphs or words to develop some type of efficiency scale. Revision start and stop times are a few manpower based ideas. Hits on a web page, or a deeper analysis of key strokes or mouse clicks are future possibilities. Our hope is that SGML implementation will provide these management tools to serve as building blocks for information reuse, repository development, and technical manual customization. A rough count of customer comments presently represents one of our measurement tools. SGML will, we predict, enable a much more scientific measurement of perceived or real defects in the future.
     

    Data quality, SGML, and technical publications

     Note:
     A word of clarification: In this paper when we say "data" we generally mean some collection of detectable differences in a physical medium (e.g., bits or bytes in a computer storage device), as well as the patterns or organizing schemes by which we make use of those detectable differences. We don't pretend to say what "information" is, except that it seems to happen behind the eyeballs and between the ears. We aren't overly precise in our usage, however, and may sometimes talk about "finding information", as if it's something available out in the physical world; and we may also say things like "the data tells us", as if the data has some inherent meaning apart from the conventions we assign to it. There are, of course, fields of application where the distinction would have to be much more finely drawn, but we ask the reader to grant that this is not one of them.
     

    Data is not the product

     We don't believe that data is the final product of technical publications. Data is the means to the end. The goal is communication, which only happens when someone's mind is changed or improved. The direct agent of communication in our business is a "presentation instance"--that is, a formatted, printed or displayed instance of a document.
     Twenty years ago our product was a stack of manuals, and our "data" was our product--that is, the data and the presentation instances were one and the same. As technology has evolved, we have taken advantage of opportunities to separate the data from the presentation instances. While we have become comfortable with this paradigm shift on the production side of technical publications, there is still a predominate tendency to speak of "delivering data" to the end users. Actually, what we deliver to users is a series of presentation instances derived from the data. In many cases, we also deliver (or require) a specific environment for viewing the presentation instances.
     Many factors determine the quality and effectiveness of the presentation instances that we deliver. (Refer to .) This paper is based on the assumption that there is a very close correlation between data quality and the quality of information delivery. Many people might think this is too obvious even to bother noting the correlation. But what, exactly, is the nature of this correlation? High quality data may be a necessary, but not a sufficient, cause of good information delivery. Even this connection is tenuous, because there are many ways that poor quality data can be dressed up and used, especially when expectations or requirements are low. Conversely, you can have exquisitely formed data that fails to meet users' information needs.
     
    Presentation instance quality cause-and-effect diagram
     We take the position that data quality is an important contributing factor to the success of technical publications delivery, and that SGML can be used to enhance the quality of technical publications data. We definitely are not saying that high quality data (in SGML or any other notation) will ensure successful delivery of technical publications. There are opportunities for further research in this area.
     In the following sections we will discuss some specific approaches to measuring data quality, and how SGML can be used to improve data quality in the area of technical publications, as well as improve the processes of producing data.
    data quality
     

    Dimensions of data quality

     Thomas C. Redman, inData quality for the information age , discusses several "dimensions" of data quality that apply to three distinct aspects of data processing. His analysis provides a starting point for applying quality measurements to SGML data. Although Redman focuses on record-based relational data structures, many of the quality dimensions are applicable to data structures that are used to represent documents. We'll review Redman's dimensions of data quality, and discuss how they can be used to measure and improve the quality of SGML data.
     Dimensions of data quality (fromData quality for the information age )
     Note:
     The dimensions apply to various aspects of data processing, grouped under "Conceptual view", "Values", and "Representation". As applied to SGML, these aspects correspond roughly to DTDs, document instances, and notation of storage objects. We have used Redman's original terms for the dimensions, but have supplied our own definitions to suggest how they might be applied to SGML data.
     
    1. Conceptual view (DTDs)
       
      1. Content
         
        relevance The degree to which a component is related to the primary needs defined for the scope of application of the document instances.
         
        obtainability The ease with which content values can be obtained for inclusion in a document instance.
         
        clarity of definition How well is the component defined?

      2. Scope
         
        comprehensiveness The proportion of defined needs met by this view.
         
        essentialness In the strictest sense, components are either essential or irrelevant to the application needs. In reality, there may be a scale of "essentialness", perhaps correlated to the seriousness of the consequences of omitting a particular component.

      3. Level of detail
         
        granularity The fineness of component definition. (E.g. how many elements does it take to describe a postal address?)
         
        precision of domains The precision of specification supported by the view.

      4. Composition
         
        naturalness The degree to which each component corresponds to a recognizable idea or thing within the scope of application.
         
        identifiability The ease of identifying components and distinguishing among components.
         
        homogeneity Similarity of like structures.
         
        minimum redundancy The degree to which the view requires or allows data redundancy.

      5. View consistency
         
        semantic consistency The degree to which similar relationships are expressed by similar markup constructs.
         
        structural consistency The degree to which similar components are similarly structured.

      6. Reaction to change
         
        robustness The number and type of application extensions which this view can accommodate.
         
        flexibility The ease with which the view itself can be changed.


    2. Values (data content, document instances)
       
      accuracy How closely the element content or attribute values correspond to the "correct" values.
       
      completeness Proportion of the "required" content actually included in the document instance.
       
      consistency Constraints on a collection of values (content) that limit the range of acceptable values.
       
      currency/cycle time Is the content currently applicable?

    3. Representation (notation, storage)
       
      1. Formats
         
        appropriateness How well suited is the format to the functional characteristics of the working environment?
         
        format precision Can the notation represent all data content without ambiguity?
         
        efficient storage Compactness, compressibility.
         
        interpretability The set of rules for interpreting data in a specified notation.
         
        format flexibility Adaptability to different user needs, applications, and recording media.
         
        portability How easily can the data be used in different computing environments?
         
        ability to represent null values The difference between "no content" and "missing content".

      2. Physical instances
         
        representation consistency Degree to which all instances are represented in similar notation.


     As mentioned previously, these dimensions are derived primarily from data structures of a relational database model. We have not attempted to extend this list by adding dimensions that apply specifically to document data structures. But there appear to be some promising opportunities in this direction. Document instances will have a greater range of quality dimensions in the "Values" area, since they are composed of semantically complex structures instead of simple typed data fields. Considerations such as simplified language, spelling, and composition would apply. Linking and addressing, which are such challenging areas of document management, should also have some applicable dimensions of data quality.
     A fundamental principle of quality management is that you can't control what you don't measure. Only by identifying specific dimensions of data quality can you begin the process of measuring data quality, comparing actual quality with intended quality, and analyzing the process of creating the data to improve the overall data quality. There is no single measurement or set of measurements that is appropriate for all applications. The important thing is to selectsome appropriate measurements along one or more identifiable dimensions of data quality.
     The remaining subsections will discuss some ways in which these dimensions of data quality can be applied in technical publications to the areas of DTD development, authoring, and processing and presentation of SGML documents.
     DTD, Document Type Definition 
     

    DTD development

     DTDs correspond to the "conceptual view" aspect of data quality, as discussed by Redman. The importance of good DTDs to a successful SGML implementation has long been recognized. But what is a "good" DTD? And what is the relationship between the "goodness" of a DTD, and the effectiveness of information delivery? As shown in , DTDs are not a direct factor affecting the final presentation instance. It is often possible to create bad document instances from good DTDs, and good instances from bad DTDs.
     Furthermore, as markup theory has evolved it has recognized that, ultimately, the structure of the document instance is more important than the schema(s) to which it conforms. This concept is embodied in HyTime's notion of "enabling architectures", and, less robustly, in XML's notion of "well-formed" documents.
     Nevertheless, as a practical matter, you want DTDs; and, if you aim to produce quality document instances, quality DTDs can help. Notice the chain of cause-and-effect assumptions here (refer to ): the end user's experience depends (in part) on a quality presentation instance; the quality of the presentation instance depends (in part) on the quality of the document instance; the quality of the document instance depends (in part) on the quality of the governing DTD. The level of effort you devote to DTD development should be determined by the degree to which your final product is affected by the DTDs. We do not know of any general rule for determining this--many other parts of the process can compensate for a poor DTD, or dilute the benefits of a good DTD. For example, enforcing specific writing procedures can assure the production of good document instances even if a DTD allows nonsensical structures. On the other hand, limitations of your processing applications can obviate the benefits of clever markup constructs.
     These caveats aside, we still want to be able to determine whether our DTDs are "good", and if not, how to make them better. If we are just starting out with SGML, we want to know whether an off-the-shelf DTD will work, or if we should buy one from a consultant, or make one ourselves. DTD development is a challenging effort, even in the best of circumstances. Fortunately, there is excellent practical advice available in books such asDeveloping SGML DTDs andStructuring XML documents . These books describe how to create a high-quality DTD, and we know, based on our own experience, that these processes can indeed help you produce good DTDs.
     But if we want to apply quality management principles in this area, we must identify the specific features that contribute to the quality of the product (i.e., the DTD), and we must be able to measure, directly or indirectly, the "performance" of those features. Consider the following statements that explain why a DTD is "good":
     
     This DTD is good because it was created using a proven DTD development process.
     
     This DTD is good because it allows the creation of quality document instances, and enables efficient document processing that meets all our application needs.
     
     This DTD is good because all markup constructs exhibitsemantic andstructural consistency ; all elements satisfy our design goals ofnaturalness andidentifiability ; all components arerelevant to our document content needs, haveobtainable values, and areclearly defined .
     (In the last statement, the terms in bold face correspond to data quality dimensions of the conceptual view. .)
     The first statement implies that the right process will always produce a good DTD. The second implies that the quality of a DTD is determined by seeing it in operation. Only the third statement recognizes explicitly that the quality of a DTD depends on specific identifiable, measurable features of the DTD itself. In practice, if you have created a DTD by following Maler's guide , you will almost certainly be able to make a statement like the third about your DTD. But if you need to decide whether to adopt an industry-standard DTD, or must maintain a hand-me-down DTD, it helps to have a list of specific, measurable features and feature goals.
     In our department, we've tried two different approaches to developing DTDs. First we made some minor customizations to a couple of industry standard DTDs. These DTDs currently govern the bulk of our SGML document instances. Although we have been able to meet our production commitments while migrating our data and processes to these DTDs, we recognize that there are several deficiencies. We have been able to work around these deficiencies with creative data processing. Since we do not have any requirements to deliver data conforming to these industry-standard DTDs, we will probably evolve away from them, toward more specifically customized DTDs.
     We have also created one DTD from the ground up, following the procedure outlined inDeveloping SGML DTDs . This effort was more time-consuming than the minimal customizations we did on the industry-standard DTDs. But we were far more satisfied with the result, and believe we have the basis of a robust DTD to govern the creation of another large portion of our document instances.
     

    Authoring

     It is during authoring, perhaps more than any other phase of document production, that SGML enables dramatic quality improvements. We take it for granted that competent writers and editors can organize words, paragraphs, and pictures to make good-looking, understandable documents. When all is said and done, the final presentation instance, in paper or on screen, shouldn'tlook any different whether it was encoded in SGML or Elbonian. So, why is SGML better for creating quality documents than some other format?
     It should not be news to anyone that writers, when authoring in SGML, will concentrate primarily on the data content and the structural relationships amongst document components, instead of on "how it looks". Dimensions of data content ("Values") quality includeaccuracy ,completeness ,consistency , andcurrency (or timeliness). It stands to reason that, by focusing writers' attention on these values, the quality should improve.
     In our operation, we have also improved the quality culture by deploying SGML editing applications. (Recall that the three preconditions of a quality culture are employee awareness of quality goals, employee ability to detect quality defects, and employee ability to correct quality defects.) Our previous structured editor did not provide the means to detect many of the quality defects that only showed up during batch processing of the data for CD-ROM preparation. Even the most careful writers with the best intentions could not possibly find and correct errors on their own, using the tools at their disposal. (Many such errors involved cross-reference pointers to other documents.) As a result, we had to rely on a feedback loop of returning error lists to the writers, waiting for the writers to correct the documents, then reprocessing the data, returning a list of remaining errors, etc. Not only was this time-consuming, it was annoying to the writers who, to the best of their knowledge, thought they were done working on the documents. Now we can enable the writers to check for many types of errors within their authoring environment, which has reduced (but still hasn't eliminated) the error-list feedback loop.
     We would be thought naive if we didn't comment on the supposed resistance of authors to working in an SGML environment. In our department, we had completely different reactions from two groups of writers. The group that had already been using a structured editor did not have much difficulty, for two main reasons. First, they were already familiar with the concept of tagging document components. Secondly, our previous editing application suffered from terribly slow performance, so most writers were pleased with the responsiveness of a native SGML editing application. The group that migrated from a traditional word processing/desktop publishing application experienced more difficulty, for a variety of reasons, including unfamiliarity with tagging concepts, lack of control over formatting, and insufficient training.
     Any organization that faces this problem should try to emphasize the data quality improvements that are possible with SGML authoring. But many important aspects of SGML data quality are somewhat abstract, and therefore less apparent to writers than other, traditional indicators of document quality. If quality principles are already well-understood and supported in an organization, then SGML should be an "easy sell" to authors.
     

    Processing and presentation

     During processing and presentation of SGML data, Redman's "Representation" dimensions of data quality come into play. As anyone who has worked with computers for very long knows, you can almost always program a solution (or buy an application) for any particular data processing need. The main question is notwhether you can process the data, but how reliably and efficiently you can process it, what knowledge and skill sets are required of developers, and how many different processing paradigms you want to support. These are all functions of the "Representation" dimensions of data quality, particularly ofinterpretability ,portability ,flexibility , andrepresentation consistency . We do not need to argue in this forum that SGML data outscores all other notations on these scales.
     We will, however, point out that these very features of SGML present other opportunities for applying quality management principles to improve processes and products. The portability and flexibility of SGML allow you to implement modular solutions to different operations in the technical publications production process. For example, your editing application and your storage system can be chosen primarily based on how well they meet the needs in each respective area. In our operation, we began with file-system storage of text and graphic entities. While the security and versioning capabilities provided by a database storage system are attractive, we concluded that these were not essential to support initial migration and production.
     As with any process or product controlled by quality management principles, you will look at actual customer needs, then evaluate the features of proposed products as they pertain to those needs, and implement a process, or obtain the product, that best meets your needs. Largely because of its representational data quality, SGML gives you the freedom to apply quality planning principles precisely and appropriately to each segment of your production process.
     

    Conclusions

     Through a deliberate and well executed endeavor, our technical publications department will be an example of a complete customer focus group for others to emulate. The processes we've outlined in this paper illustrate one approach to bridging the gap between theory and application of SGML. We have shown how quality management principles and concepts can be applied to the problems of SGML implementation in the area of technical publications. Quality considerations affect many levels and aspects of this operation--data, processes, and culture. Regardless of the specific process or techniques used for quality management, the goals are always the same: improve product features and reduce product defects.
     Acknowledgements
     The authors would like to thank Jim Laney, Director of Engineering Services and Product Safety at Cessna Aircraft Company, for his long-standing support of quality management principles and markup technologies in Cessna Technical Publications.
     Bibliography
     
    Juran 1993 Juran, J. M., and Frank M. Gryna. 1993.Quality planning and analysis: from product development through use . 3rd ed. New York: McGraw-Hill
     
    Maler 1996 Maler, Eve, and Jeanne El Andaloussi. 1996.Developing SGML DTDs: from text to model to markup . New Jersey: Prentice Hall PTR.
     
    Megginson 1998 Megginson, David. 1998.Structuring XML documents . Prentice Hall
     
    Redman 1996 Redman, Thomas C. 1996.Data quality for the information age . Boston: Artech House

    Using structured information standards for publishing   Table of contents   Indexes   SGML-oriented Integral Editorial System