XML - Practice finally makes perfect!   Table of contents   Indexes   XML, Everywhere


Managing the Peer-Review Cycle of SGML-based Bilingual Documents

Gary   Palmer
  SGML Knowledge Engineer
  ActiveSystems Inc.
Suite 602
11 Holland Ave.
Ottawa   Ontario  Canada  K1Y 4S1
Phone: +1 613 729-2043
Fax: +1 613 729-2874
Email: gpalmer@activesystems.ca Web: www.activesystems.ca
Biographical notice:
Gary Palmer
Mr. Palmer has studied, developed and managed solutions for a variety of information management environments for the past 23 years. Since 1984 years, Mr. Palmer has focused on text, records, and document management environments, most recently in the area of SGML integration. Mr. Palmer is experienced in SGML integration planning and DTD architecture. Mr. Palmer is the intellectual support for all SGML-related activities within ActiveSystems.
The NMS  (National Master Specification) is a automated resource tool that provides an easy-to-use framework for writing construction project specifications. This bilingual (English and French) product was published, semi-annually, by the NMSS  (National Master Specification Secretariat) of the Public Works department of the Federal Government of Canada. The Specification contains over 640 sections of construction specification frameworks.
This case study will explain how the two separate English and French documents for each of the sections were integrated into one SGML bilingual document, and how each document was managed, electronically, through the different stations of the bi-annual review (selection, editorial review, peer review, revision control, translation, approval). The different up and down translations in the interfacing of the SGML and non-SGML authoring environments will be explained.
This report will highlight the benefits of merging the two documents into one bilingual document, and its positive impact on revision control and translation. One pleasant surprise in the integration of the system was when the external English-to-French translators decided to use the SGML editor within WordPerfect 7, thus eliminating the cost of up and down translation and guaranteeing the quality and integrity of the document structure and content.
The NMS is a comprehensive master construction specification resource tool used to simplify specification writing for building construction contractors in Canada. The NMS is a group of specification packages for preparing construction proposals, including: architectural, air transport, building services, electrical, heavy civil, interior design, landscape architectural, mechanical, restoration-conservation, and structural.
Jointly produced by the private and public sectors, each of the more than 640 sections serves as an easy-to-use framework for writing building project specifications. The content is clear, concise and precise, reflecting the expertise of many of Canada’s foremost authorities on building specifications, contract documents, and construction technologies. The content is reviewed by construction specialists from both government and industry on an on-going basis to ensure that the content continues to represent current building practices and the latest in construction technology.
Its purpose is for contract managers to select portions of the standard related to their functional responsibilities for a construction project, edit the content and deliver to print. The NMS provides contractors with maximum protection against duplication and errors, while minimizing the chances of risk, misunderstanding and liability in the delivery of construction contracts.
The NMS is made available in both of Canada’s official languages, English and French.

Former Process

Data was edited with a proprietary editor, especially created for the strictly formatted look of the standard. Publishing to any other format required a hard-coded conversion process. The two-sets of 640 electronic documents was managed by one person in a hierarchical directory structure.
Semi-annually, a selected group of sections was flagged for review and revision. The English version of each section was printed to the "look" of the formatting standard and sent to the appropriate peer-review committee. The committee made the revisions, in writing, to the paper copy. The NMS editors would review the changes for content integrity, amend where required, then forward the changes to the Administrator who would enter the changes via the proprietary editor. Revision markers were not available in the editing software to mark the changed text. The revised copy would be printed and returned to the committee for further review. The cycle would continue until the committee signed-off on the revisions.
The revised sections were bundled and sent out for translation to French. Since revisions markers were not included in the revised English text, the modified text could not be easily located without doing a line-by-line comparison of the English and French. The NMSS was being charged per word of translation, but they were never sure of what the costs would be until after the translation was completed. The translations would have to be keyed into the French version of the document via proprietary editor. Another contracted translator proofed the translation and provided corrections, where required.
For publishing, the NMSS would print the properly formatted revised sections for their own in-house paper copy. Distributors, who sold the standards to building contractors, accepted the electronic version of the files and converted them to a variety of popular formats, such as Word Perfect and RTF, as required by their clients.


The NMSS recognized that they needed to upgrade the functionality of their editor, automate their manual processes, and to make use of the latest technologies for information delivery. They developed a pilot of an integrated editor and viewer to better understand what they wanted from an automated process. The editor was to be much more structured, and to automatically check document integrity. The viewer needed to support the functionality of a good on-line viewer with good search capabilities and cross-referencing of/to anything cross-referencable. The NMS also required an automated workflow management process to replace their manual logging of completed stages. Once the proposed solution was in place, they also needed to know how it would reduce their overall costs and improve workflow.
Because of the very structured nature of the NMS data, and because of the on-line functionality required, ActiveSystems proposed an SGML solution which included DTD development, data conversion, an SGML document repository (ActiveServer), a customized workflow management function, and transforms to a variety of non-SGML edit forms. Of all the proposals received by the NMSS, this was the only solution based on a non-proprietary standard that had the functionality they required and that could adapt to the any publishing technologies they would need in the future.

DTD Design

The ActiveSystems SGML analyst facilitated the DTD development with the NMSS editors. They were taught the basic SGML DTD syntax, and how and why DTDs are assembled.
When it came to the issue of how to handle the bilingual data, we had one of four choices:
  1. Language attribute on the root element of each document.
  2. Language attribute on every text object.
  3. English and French elements within each text object.
  4. Marked sections.
With the following basic DTD structures:
<!ENTITY % lang        "eng , fr" > 
<!ENTITY % text.ele    "Title , para , li , … ">
let’s detail the four options.
Language attribute on the root element of each section would require a DTD something like:
<!ELEMENT NMS - - NMS-section+ > 
<!ELEMENT NMS-section - - (front, body, back)> 
<!ATTLIST NMS-section         
    language (%lang) #REQUIRED  
    DocId ID #REQUIRED  
    link IDREF #REQUIRED -- Link to the same document in the other language-- >
The text of each document would be in only language only. Although the principle is the same as their former environment, the NMSS administrator is responsible for ensuring all three attributes are properly encoded and ensure that the cross-linking of the two languages of the document always stayed in sync.
Language attribute on every text element would require a DTD like:
<!ELEMENT (%text.ele) - - (#PCDATA | et.al)* > 
<!ATTLIST (%text.ele)    
    language (%lang) #REQUIRED  >
Each section would be a single bilingual document. Every text object would occur twice, once for each language. But without some automated verification tool or some manual intervention, there are no controls to ensure that every text object had two occurrences, one for each language.
English and French elements within each text object would require a DTD like:
<!ELEMENT (%text.ele) - - (%lang) >
Each section would be a single bilingual document. As every new text object is introduced to the document, both the English and French components will be part of the structure. The English text is added to the document. The French tag remains empty until translated. The NMSS administrator does not have to deal with language attributes — the DTD enforces, or directs, the proper structure.
Marked sections would work with a DTD like:
<!ELEMENT (%text.ele) - - (#PCDATA | et.al)* >
The NMSS administrator would have to clearly understand the function and power (?) of marked sections and maintain the following structure in every text object.
<!ENTITY % english "Include" >  
<!ENTITY % french "Ignore" >  
<![ %english [English text goes here.]]>  
<![ %french [Le texte Franšais est ici.]]>
It would not be realistic for the NMSS administrator to manage and control a single bilingual document using this method except with an external process.
Without knowing a lot about SGML, it was obvious to the NMSS editors that the English and French elements within each element was the best way to go. It was the most rigid in structure and automatically ensured that all the bilingual structures would be in place in every document. Any SGML editor could be used, as is, to control this structure. The administrator would not have learn any special features of the DTD, and no extra verification software would be required to verify the bilingual integrity of every document. Also, managing one bilingual document over two unilingual documents ensured that the unilingual documents delivered at publishing time were identical in content and structure, which was not always true when trying to manage two unilingual documents in parallel.
One more feature added to the DTD was a translate attribute on every text object. It was decided that as every text object was modified, the value of this attribute would be set to "on" to denote that translation must be performed on this text object. As well, both the english and french element contained revision markup so that revised text could be easily identified for the translators, and that the publishing software could, if required, show revision markers next to the line(s) of modified text.
For anyone who has facilitated DTD development, slogging through the arm-twisting battle of directing the author/editors away from their ever-so-familiar formats, recognizes the moment of joy when the group looks at each other and nods in agreement with their decision, realizing how simple and sensible it is. This happened on more than one occasion when the editors realized what their effort was going to produce, and how it would greatly improve their workflow and end-product.


Now that the DTD has been defined, it was time to get the current data from the proprietary editor into SGML. The first problem was that more English documents than French were provided for conversion. There should have been the same number. As we blended each pair of English and French documents together into one SGML document, it was clear to our bilingual conversion staff that the structural content of the pairs of documents were not in sync. When we informed the NMSS editors of these two problems, they simply looked at each other and smiled. They had expected that the quality assurance of the translated work was not as well as they had hoped and needed. They realized that the conversion would put the documents in sync. Content that was out of sync was flagged for re-translation.
An added benefit of this conversion process was to give the NMS a new starting point, one the NMSS editors knew would be correct.

Peer Review

Transforms were created to format the English content of the documents selected for peer review to MS-Word and WordPerfect. The documents were released to the peer-review committees who simply modified the electronic documents, as required. When the documents were returned, a special conversion filter was developed to recognize the structures of the incoming document and compare its contents to that of the SGML document from which it was originally generated. The modified source was electronically compared to the original source. Differences in the English text were marked with the DTD revision marker, and the translate attribute of the parent text object was set "on". The associated French text was kept in place.


Since the contracted translators worked in WordPerfect 7 (WP7), a transform was planned to convert the complete bilingual document to WP7 in a simple style that allowed the translators to easily locate modified English text and make the appropriate changes to the associated text. Once the translated documents were returned to the NMSS , a conversion filter would turn the WP7 documents back into SGML documents.
But wait, there’s more …
We knew that (WP7) supported SGML. Although some may say that it is not dependable, it does work for simple documents, and DTD and markup were certainly not complex. When the translators were asked if they worked in SGML, they responded that they did not but were aware of it. When informed that WP7 contained an SGML editor, they agreed to give it a try. We created a WordPerfect layout sheet that formatted the SGML document to show the English in one colour and font, and the French in a different colour and font. As well we styled the insert revision marker as red-underlined and the deletes where styles as red-strikethrough. The translators could quickly locate the modified English by locating the revised red text and make the appropriate changes in the French text.


The initial objective of this project was to improve the editing and document management environment of the review process. With the data in the controlled and structured SGML environment, the client is now positioned to provide the NMS in any media format, including controlled access through Intranet and Extranet access with HTML or XML.
The DTD provided the editors with a method of enforcing the structural integrity of their documents. The DTD also allowed them to manage each section as one bilingual document instead of two unilingual documents. This meant that the daunting task of keep two sets of 640 documents in sync was gone, reducing the stress on the administrator and freeing up time.
The benefits gained during conversion was that they finally were assured that all the English documents were translated, and that their bilingual documents were completely in sync. Every text object contained the proper text in both languages.
Once a modified document was returned from the peer-review committee, the modified English was auto-magically blended back into its original SGML markup and included revision markers around the inserted and/or deleted text. All text objects needing translation were automatically marked.
An automated process selected all documents needing translation. The documents were sent, as SGML documents, to translation who applied the modifications directly to the SGML files. This meant that no down and up translation filters were required to prepare the data for the translators and to convert their work back to SGML. One less conversion process that could interfere with document integrity. This also reduced the cost of translation by clearly identifying the English text that had been modified, thus making it more efficient for the French translation to focus on only that which was changed rather than having to read all text objects to see what may have been changed.

Future Opportunities

The distributors of the NMS fully embraced SGML as the format in which they received the NMS documents. They were still responsible for distributing this document to their buyers in whatever format their buyers required. Transforming from an SGML source rather than a proprietary format made the task much more consistent.
The NMSS is now looking at making the NMS data available to other Canadian Government departments over the government internet. They have the possibility of allowing access to the data directly from the database, or selecting the required sections from a menu and having the documents delivered in anyone of a variety of formats: SGML, HTML, XML, RTF. CD-ROM delivery is now a real possibility. With some simple style sheets, the data can be viewed bilingually, English-only, French-only, all from one SGML source.
One of their future marketplaces may be the United States. To make it an official publication, they would require each section to be written in Spanish. To support this, only one line of the DTD would have to be changed to get it to support three languages. The workflow process remain the same, or are slightly modified. Spanish translation of the SGML documents would mean creating a layout sheet that displayed the English, hide the French, and allowed modification to the Spanish elements.

XML - Practice finally makes perfect!   Table of contents   Indexes   XML, Everywhere