| Building an SGML-based Publishing Environment | Table of contents | Indexes | SGML &, schemas: from SGML DTDs to XML-DATA. | |||
The Addition of a Multilingual Component to An Existing Document Processing System |
|
Tom Catteau |
| software engineer |
| SGML Technologies Group 29 Boulevard General Wahis, 29 B-1030 Brussels Belgium Email: tct@sgmltech.com Web: http://www.sgmltech.com Phone: +32 2 705 70 21 Fax: +32 2 705 81 01 |
Biographical notice: |
Tom Catteau |
ABSTRACT: |
| multilinguism |
This paper discusses the addition of a multilingual component to an already existing document processing system, where a trade-off has to be chosen between innovation in terms of new functionality for multilingual processing and the stability of the system. |
Introduction |
|
| synoptism |
Many organizations, both public and private, deal with multilingual documents. A major issue when dealing with such documents is the concern for equivalence between different linguistic versions of one document: the concern for synoptism . The check for synoptism takes place at two levels: at the structural level of the document, as well as at the content level, where, although not explicitly expressed in a DTD, certain types of information might be present which are language-independent. |
General architecture |
|
Multilingual repository |
|
The LI repository and the LS repository, together with these modules, give rise to a multilingual repository. |
Language-independent versus language-specific content |
|
Language-independent features are only of interest in as far as they serve at least one of the two following purposes: |
DTDs for the LI and the LS-repositories |
|
A document will be stored partly in the LI-repository, and partly in the LS-repository. Naturally, in each repository DTDs will be used which will be derived from the documents's DTD. |
For the sake of clarity, the example in this section uses the scheme for concurrent DTDs, even if in the implementation another scheme might be used. |
The LI-DTD |
|
<(LI)SECTION ID=AAFGH> <(LI)CHAPTER ID=AAFGI LEAF=Y> </(LI)CHAPTER> <(LI)CHAPTER ID=AAFHA LEAF=Y> </(LI)CHAPTER> </(LI)SECTION> |
<(LI)SECTION ID=AAFGH>
<(LI)CHAPTER ID=AAFGI>
<(LI)P ID=AAFGJ LEAF=Y>
</(LI)P>
<(LI)TBL ID=AAFGK>
...
</(LI)TBL>
<(LI)P ID=AAFGL LEAF=Y>
</(LI)P>
</(LI)CHAPTER>
<(LI)CHAPTER ID=AAFHA>
</(LI)CHAPTER>
</(LI)SECTION> |
Also note that the LI DTD will be a copy of the document's DTD up to the level that is the same in all languages. |
<(LI)SECTION ID=AAFGH>
<(LI)CHAPTER ID=AAFGI LEAF=Y>
<(LI)TBL ID=AAFGK>
...
</(LI)TBL>
</(LI)CHAPTER>
<(LI)CHAPTER ID=AAFHA>
</(LI)CHAPTER>
</(LI)SECTION> |
<(LI)SECTION ID=AAFGH>
<(LI)CHAPTER ID=AAFGI LEAF=Y>
<(LI)TBL ID=AAFGK>
...
</(LI)TBL>
<(LI)REF ...>
</(LI)REF>
</(LI)CHAPTER>
<(LI)CHAPTER ID=AAFHA>
</(LI)CHAPTER>
</(LI)SECTION> |
This example tells us that in the first chapter, a table and a reference should be present in every language, but without any order of occurence being prescribed. |
The LS-DTD |
|
<(LS)SECTION ID=AAFGH>
<(LS)CHAPTER ID=AAFGI>
<(LS)TBL ID=AAFGK>
...
</(LS)TBL>
<(LS)P ID=AAFGL >
</(LS)P>
</(LS)CHAPTER>
</(LS)SECTION> |
In this case, during the extraction, the first paragraph in the first chapter and the second chapter will be added at extraction time. |
A general scheme for extraction and update |
|
Extraction |
|
Storage |
|
Master language |
|
No master language: locking mechanism and differential updates. |
|
The checking of synoptism |
|
Global check |
|
Language-based approach |
|
Non-coercive versus coercive implementation |
|
An incremental approach to implementation |
|
Feature by feature implementation |
|
In a feature by feature implementation, one language-independent feature at a time is added to the system. Only after the upwards conversion has succeeded is it possible to go to the next feature. |
Language by language implementation |
|
Here, the new feature is first tested for a set of languages. After that, it can be extended to the others. |
The conversion of legacy documents. |
|
Versioning of multilingual documents |
|
Versioning |
|
The goal of versioning is to be able to retrieve different versions in the course of the life-cycle of a document. |
Versioning for multi-lingual documents. |
|
When speaking of multi-lingual documents, the notion of versioning becomes less self-evident, and the question of consistency is more difficult to answer. |
Versioning |
|
Consistency |
|
Differences among versions |
|
Conclusion. |
|
Quality of a document may be hard to quantify, but it has been shown that as a tool to improve the quality of that document, synoptism certainly has an added value. |
| Building an SGML-based Publishing Environment | Table of contents | Indexes | SGML &, schemas: from SGML DTDs to XML-DATA. | |||