Introducing SGML into the RAF Flight Manuals World or Throttle to Bottle in Two Extraordinary Years   Table of contents   Indexes   Opening 750 million envelopes without an instrument

  Nicholson  Simon 
 

Authoring and Translation for the International Market

 

Abstract:

 Only a few markets around the globe can mandate the universal use of a single language for documentation. Further, it was once the case that the author had some sight of the user of the information. With global markets this luxury has all but vanished. Today authors have little sight of the intended user of the information. It is more than likely the user will work in a different culture, a different language, using different media. Organisations must be cogniscent of these conditions of entry into the market, and in most cases the requirement to provide localised, translated information must be absorbed as part of the cost of entry. Such costs can rapidly exceed the original startup costs for production of the source language version. Today the pressure is on to find ways to reduce startup and ongoing costs and time frames whilst maintaining or improving quality.
  The presentation discusses such initiatives. The key argument presented will be that translation activity and management of information encoded inSGML (Standard Generalized Markup Language) can provide reductions in cost and timescales whilst offering real opportunity to improve the quality and consistency of the content. The advantages offered by the can provide reductions in cost and timescales whilst offering real opportunity to improve the quality and consistency of the content. The advantages offered by the SGML context when applied against technologies and concepts such as Translation Memory,SGML Element-level management and Controlled Terminology will be presented and discussed. The application of these capabilities will then be presented as part of a Component Based Document Management System providing a concurrent translation processing environment.
  Within the presentation references will be made to ongoing initiatives to implement such systems, providing an analysis of the issues to be addressed, and the savings generated have provided towards the justification for the use ofSGML .
 

Introduction

 When Parker Pen marketed a ball-point pen in Mexico, the advertisements were supposed to say “It won’t leak in your pocket and embarrass you.” However, the company mistakenly thought the Spanish word “embarazar” meant embarrass. Instead the advertisement said that “It won’t leak in your pocket and make you pregnant.” In Taiwan, the translation of the Pepsi slogan “Come alive with the Pepsi Generation” came out as “Pepsi will bring your ancestors back from the dead.”
  These examples, amongst others, circulate freely and may be urban myths but they illustrate the importance of language translation in today’s global markets. These examples, one would hope cost no more than short-term embarrassment, and some marketing dollars, but in the areas of technical support and service information, such mistakes could prove far more costly. As a result companies are spending vast amounts of money on translation.IBM (International Business Machines) state that the worldwide spend on translation in 1995 was approximately $50 billion, with Europe alone spending $9 billion. Those figures, they claim, are set to increase by 15% per annum. The global spend on products and services for one segment of that market, the localisation of software, was worth $600 million in 1995, claims Rose Lockwood in an Ovum report (Globalisation: Creating New Markets with Translation Technology), with a growth to around $2.4 billion by the year 2000.
 That rate of change escalates the demands on the documentation providers who have to maintain the information in not just one language, but in multiple languages, and enable the use of that information by a wide range of users (technicians, auditors, service users etc.). The ability of these providers to meet the demand is being stretched, and often results in documentation being supplied after the product, or in a non-native tongue, or, indeed, the product release date slipping.
 Much of this growth is fuelled by the opening up of new markets such as the tiger economies of the Pacific Rim, the Middle East and the break up of the Eastern Bloc. Further, the rate of change and growth of complex technical systems in markets such as aerospace, computers, telecommunications and automotive has spiralled. It used to be the case that new product introductions were phased around the globe, but increasingly, the requirement is to provide a simultaneous global launch of a product.
  This paper discusses this growth, and examines how organisations can meet the challenge of the increased demand whilst maintaining control over costs and the quality of the output. A number of research projects and industrial implementations are examining the role ofSGML as a foundation for a common multi-lingual information base enabling increased automation of translation and document production systems. The development of these systems has particular significance for the technical documentation marketplace, whose output is typically highly structured and repetitive across the range of document products, such as Service Publications, User Manuals, Training Material and Product Catalogs. Within the information base, document components encoded inSGML are built and managed, and subsequently used in structured document configurations.SGML is a foundation technology providing management of both structure and context, not offered by proprietary, file based unstructured document formats. The technologies and concepts behind these initiatives will be presented along with an indication of the business rationale for their development.
 

Trends/influences

  The growth in multi-lingual documentation as highlighted in the introduction is due not just to the opening of new global markets. Other factors are at play:
 
  • creation of international organisations through partnership and merger such as British Airways, Rover and BMW, Ford and Mazda, Airbus Industries, European Union
  •  
  • increasing employee mobility across geographic boundaries
  •  
  • source document content written in non-native language such as English
  •  
  • legal pressure for local versions (France, Spain)
  •  The growth in demand for electronic documentation is also influencing the way in which information is captured and subsequently presented. The increased use of CD-ROM, the Internet and Intranets for delivery of information is changing the model from that where the documentation provider pushes the information out to the market to one where the market pulls the information from the supplier. In the ‘]]push’ model the information provider had clear line of sight of the user in terms of format or media, language and culture. The pull model changes this and increasingly the requirement is to create information in as generic a way as possible. A simple example of this is the way in which illustration placements are described within the content. “Picture on the right” may work in cultures where the language is read left to right, but would this still apply in languages reading right to left?
     Organisations are reacting to these trends and recent initiatives have resulted in an increased outsourcing of translation to bureaux, and/or the relocation of the organisation to a country offering reduced costs. Dublin, for instance, is rapidly becoming a centre for software localisation. However, it is not clear that such initiatives address the core issues of language translation.
     

    Issues

     This paper cannot capture and discuss all issues relating to the translation of information, but will attempt to cover the following topics:
     
  • can translation be done concurrently with the creation of the source material
  •  
  • can more be done during the creation of the source to impact onward translation load and costs
  •  
  • how are version changes translated
  •  
  • ambiguities - can they be prevented
  •  
  • what is translation memory and who owns it
  •  

    How does it happen today?

     A simplified view of the typical process for the production of multi-lingual documentation is shown in the figure . The authoring process is complex involving various people, potentially geographically dispersed, with various knowledge (photographers, authors, technicians, marketing and legal etc.). As an adaptive process, it potentially depends on the document type (different documents have different production processes). It requires some basic domain expertise from the authors which are generally assigned, from time to time, on the same parts of documentation. It sometimes starts with a product prototype which implies constant adjustments to the various engineering changes during the development phase. The figure shows the creation of documentation for an automobile involving both authors and illustrators. A source language version in English serves as the input for multiple translators who create language variants, including German and French.
     
    Current Authoring/Translation Process
     The document prototype is produced, reviewed, cross-checked and validated for errors, inconsistencies and ambiguities. The translation process adds synchronisation, distribution and linguistic problems to the authors' already complex work. Based on their inputs, translation has therefore to start before a complete authoring has been done. Presumed stable parts of documents are sent for translation before the end of authoring. When targeted to several languages and achieved by native speakers, translations may also be spread all over the world increasing tremendously the communication and synchronisation difficulties. Finally, linguistic problems may arise and require interactions between the translators and authors of the original texts, for example to solve misunderstandings or ambiguities.
     The costs and nature of the translation process result in source authors not releasing their work until the last minute, as subsequent changes can prove costly to implement. This results in the situation shown in the figure . The figure tracks the development of pages of the source language version over time. In this scenario the translation organisation has identified an ideal delivery date of the source information to the translators, which will ensure the cut off date is achieved. However, actual delivery occurs later than scheduled.
     
    Graph of Development of Source Information
     The impact of late delivery of source has one of two effects. If the planned cut off date is still to be met, the translation resources will have to be increased. The alternative is to extend the cut off date which may result in product slippage, or product being shipped without appropriate documentation.
     On completion of the initial version, attention turns to the revision process. Changes to the source are made, and at an appropriate point released for translation. The translator will then apply tools against the revised source to identify what has changed and then translate those changes. Identification of changes can be achieved through differencing engines, and in combination with other tools the translator will provide a degree of automation to the process. One such tool is translation memory.
     

    Translation memory

      The vast majority of translation done in the world today involves a human being. The throughput of a typical translator will be between 1500-2000 words per day. The rapidly increasing demand for translation has resulted in the development and use of technology to increase that throughput.
      Machine based translation was a by-product of the Cold War, and has rapidly developed since that point. The technology enables the automatic translation of source text into a chosen target language according to grammatical rules. Such systems can be used successfully for conveying the general meaning of content, but cannot guarantee the accuracy and quality required for the translation of complex technical information. The more widely used technology is translation memory.
     Translation memory is a database containing previously translated terms and phrases. The translator uses the memory to automate the translation of the source, and this can prove very effective when translating new versions of previously translated material. The memory enables both direct and fuzzy matching, and in the case of a fuzzy match the translator applies their expertise and other tools such as on-line dictionaries and glossaries (which may contain terminology specific to the organisation or industry) to complete the task.
     The use of these tools has a number of benefits to the translator including:
     
     
  • ensures consistency across document and target languages even when translation is collaborative
  •  
  • more than 15% reduction in total translation costs
  •  
  • more than 50% reduction in total translation time
  •  

    That's good for the translator but....

     In order to further enhance the multi-lingual document production process what can be done to improve the creation of the source, and what benefits will that generate? Can source information be translated in-house? Probably, but that would require the employing a team of translators. However, it might be possible to provide the translators with partially translated material. The translator then proofs and amends the content, rather than conducting a full translation from scratch. A review of translation rates by one company revealed that rate, per word, for proof and amend were ‘]]one-fifth’ of the rates for full translation from source!
     
     

    SGML

     Standard Generalized Markup Language enables documents to be broken down and stored as information units. Storage of these units within a suitable repository enables collaborative authoring, life-cycle tracking, and automatic configuration of the units into document deliverables.
     
     

    Translating changes

      The version history of the information unit also has a positive potential benefit on the translation process. MostSGML repositories enable version changes to be made and stored at the element level. As a result it is easy to identify and extract what has changed over a period of time, and at the appropriate point in the process dispatch those units for translation. The automation of this removes the requirement to send the whole document for retranslation, removes the responsibility of identifying the changes from the translation bureau and reduces the volume of data being sent out. Further, units can be translated as they become available rather than waiting for the completion of the entire document, translation can be concurrent with the document production process as a whole.
     
     

    Alignment

      Recent experience with database implementations has indicated the intention of the customer to store both source and target language versions within theSGML repository. Both source and target variants have identical structures, enabling alignment of content down to the finest level of granularity. Therefore, if the source contains a <PARA> element withID (unique identifier) Attribute of 123, the language variants will also contain this element with thisID .
     This approach has two immediate advantages. Firstly, when a source language information unit is identified as a candidate for translation, its language variants are also identified. As a result they can be locked and shipped to the translation bureaux, with those elements requiring translation clearly identified. On completion of the process they are correctly versioned enabling full life cycle audit of all language variants. The second advantage concerns the identification and distribution of documentation changes to end-users in all languages, as those changes in source will also be valid in the variant languages.
     
     

    Reuse

      The ability to reuse information units within documents, or other information units can have an important effect upon the translation process.A study documented on the World Wide Web (www.ida.liu.se/labs/nlplab/projects/translation.htm ) showed that up to 52% of the content of a technical manual was repetitious either internally or externally. Some sources claim that figure can be as high as 70%.
      Enabling easy access and use for the technical writer to standard lexicons and terminology databases, can yield high returns. Through this mechanism the writer can be prompted within the authoring environment with existing material such as terms or information elements and units. The writer can reuse the appropriate option or create new material, which will be clearly identified within audit procedures. Greater control can be exercised over document content, style, ambiguities and terminology, and the amount of “reinventing the wheel” can be reduced. A European Union funded project, DocStep (guagua.echo.lu/language/en/indexes/projects.htm ) suggests that such an approach will act as a kind of self-training environment that will help the writer understand the standard terminology and writing rules. This will then have an onward impact on the amount of validation and review required in the overall process.
     The reuse of source language material from a standard lexicon can also enable the automated reuse across the language variants. Agreed and pre-translated terms in target languages are held within the environment and are used as appropriate, without requiring human intervention. This has a positive benefit in that the volume of translation work is further reduced to new material.
     

    Content and Context

     An SGML information unit not only contains the content (word and spaces), but a high degree of semantic and contextual data about the information. A simple example being the identification of elements. This data can prove a useful aid to the translation process as it can identify the context of a term to be translated. Within other unstructured file formats context can only be derived by inspection of the surrounding content.
     
     

     It is important to note however, for all the technology applied to the translation process the resolution of ambiguities requires human intervention. Did the author mean this or that? The example in the figure demonstrates how ambiguity can be introduced with the translation into German.
     
    Example of Ambiguity
      Enabling reuse of existing material can start the process of controlling the language and style used within the documentation. Initiatives such as Controlled Language place focus on ensuring that source material is written in a manner that guarantees consistent and unambiguous translations. An example of a Controlled Language is Simplified English, an application of which is to be found in the Aerospace industry. Further information on Controlled Languages can be found at wwwots.let.ruu.nl/Controlled-languages .
     

    Who Owns Translation Memory?

     The reuse of elements from a multilingual lexicon reduces the volume of translation, and starts to provide a type of translation aid within the authoring environment. The need for a full translation memory system remains, however, to provide automated translation of new text. As previously discussed translation memory will prompt the user with a direct match or, through fuzzy searching, similar translations that can be adapted.
     The development of translation memory is in some regards a by-product of the translation process, tracking the work of the translator, and capturing new translations for future use. Translation memory therefore increases in value as translation projects pass through it. Often translation memory systems capture and store the information in proprietary formats, and the question then arises; if we move our translation work from our current supplier what happens to the translation memory?
     This is not solely a question of whether the content of the memory can be moved from the existing system to the format of the new supplier, but is also a question of ownership of that content. It is beyond the scope of this paper to find an answer to that question save to point out that some translation customers are now exploring the viability of bringing the translation memory in-house.
     A number of organisations are investigating opportunities to develop standards for translation memory. One example of this is OpenTag.
     
     

    OpenTag

      OpenTag is an initiative aiming to address the problem of format specific translation memory. It is a proposal for a standard, based onSGML ,XML (Extensible Markup Language) and Unicode from International Language Engineering (ILE) of Boulder, CO in the USA. The proposal is to extract the content from the original proprietary source and build an OpenTag file using the OpenTagDTD (Document Type Definition) . The resulting file is stored within the translation memory so that it is independent of both tool and supplier.The OpenTag initiative is still at a very early stage. Further information about OpenTag is available atwww.ile.com/opentag/otwhatis.htm .
     

    Communication with the translator

     How is material for translation delivered to the translator? In some cases where volume is high and timeliness critical, dedicated, leased lines are in place. In other cases email is used. Another, option is through the exchange of magnetic media such as disc and tape.
     The translator is a key player in the document production process, and yet all too often they work off-line due to the expense of dedicated lines. However, as with many other parts of document exchange and delivery, the internet and intranet offer an opportunity to change. Intranets increasingly offer the opportunity for secure access, potentially allowing the translation team to work in real-time as part of the whole documentation team. As source data changes it is a realistic proposition that the translator receives the new version in real-time and can access both source data and translated variant as stored within the repository. This capability is becoming more realistic with the development of web based workflow products (an example at www.websoft.com) to meet the process management requirements of a virtual organisation. Is this as futuristic as it sounds?
     
     

    Translation on the internet

      The internet, with its massive volume of English text, is generating new types of translation requirement. One new service can be explored atwww.systransoft.com/translate.html . This Web site, run by SysTranSoft offers the translation of any user-specified web page, for free!! Alpnet, a translation and software localisation organisation forecast that within 5 years 30% of the translator’s work will be generated by the Internet. The Rank Xerox Research Centre (RXRC (Rank Xerox Research Centre) ) in Grenoble, France offers experimentation with linguistic tools at
    www.rxrc.xerox.com
    .
     

    Do We Need to Translate? - Transman

      RXRC are also conducting research into the delivery of technical documentation in a foreign language with a linguistic support tool, where full translation would prove too expensive and the documentation is being delivered electronically on CD-ROM or via the Web. The end user is a technical reader with a basic knowledge of a foreign language. However some words and expressions are unknown. Therefore the user needs support in understanding the text, and through the interface can select words that are not understood and request a translation. A key enabler to this development is that the source material is held inSGML enabling the context of the word to be used when generating the translation. A pilot project is currently being run in Rank Xerox France with the service personnel.
     

    Summary

     In the technical documentation market, companies spend millions of dollars annually on translation. The spend is growing, driven up by the corporate objective to achieve a simultaneous worldwide launch of new products, by the customer requirement for electronic delivery of information specific to their product configuration and by the need to support new languages and markets.
     In some cases the development of each language variant can be as high as 10% of the cost of development of the source. With global translation spend forecast to grow at 15% per annum, cost justification of new systems can be rapidly achieved.
      As a result, research is underway to examine how the translation process can be improved, and in most cases that research is built upon the management ofSGML based information units. The privately funded initiatives within the transport, telecommunications, information technology and high-tech manufacturing sectors, along with Public projects funded by the European Union have a common set of base objectives.
     These can be summarised thus:
     
     
  • reduction of ambiguities and inconsistencies
  •  
  • enablement of collaborative work
  •  
  • shorten production delays
  •  
  • simplify the production process
  •  
  • enable the re-use of document components
  •  
  • facilitate the exchange of documents across languages and markets
  •  
  • control costs
  •   As an enabling technology,SGML has a key role. The ability to enable automatic reuse both in source and translated documents, the inherent identification of the context in which the content occurs and the ability to identify and translate only the changes presents opportunities for real cost and time savings. However,SGML is only part of the solution, which comes through the use of component based management systems, translation and terminology memory and electronic exchange and delivery.
     Not all users will require translation. Some may feel comfortable working with a non-native language with support provided by embedded linguistic tools.
     Finally, these technologies and capabilities are not for all documents. They are focused on document types such as support information, user information, educational material. I doubt that they would have been able to prevent the following urban myth.
     A chicken farmer’s slogan “It takes a tough man to make a tender chicken” got terribly mangled in a Spanish translation. A photo of the farm owner with one of his birds appeared on billboards all over Mexico with a caption that explained “It takes a hard man to make a chicken aroused!”

    Introducing SGML into the RAF Flight Manuals World or Throttle to Bottle in Two Extraordinary Years   Table of contents   Indexes   Opening 750 million envelopes without an instrument