The Economics of Collaborative Authoring and Distribution   Table of contents   Indexes   The Addition of a Multilingual Component to An Existing Document Processing System

 
 

Building an SGML-based Publishing Environment


 
Eoin   Campbell
  Technical Director
  SGML Workshop Ltd
4 Greenmount Office Park
Harolds Cross
Dublin   6W  Ireland
Phone: +353 1 4547811
Fax: +353 1 4547810
Email: ecampbell@sgmlw.ie
 
Biographical notice:
 
Eoin Campbell
European Foundation for the Improvement of Living and Working Conditions
 Ireland  
Schmidt, Barbara
 

Eoin is the Technical Consultant to the Foundation on both the EIRO project and for the Foundation's project to move all publications to an SGML-based production environment. He established the initial technical infrastructure, provides ongoing technical support to the EIRO editorial team and contributors, and liaises closely with the web site developers and typesetters. Previously Eoin has worked as a software developer and technical writer in the telecommunications, electronics, software localisation and publishing industries, among others, since graduating from Trinity College, Dublin, in 1984 with both a B.A. and MSc. in Computer Science.
 
Barbara   Schmidt
  Information Systems Officer
  European Foundation for the Improvement of Living and Working Conditions
Wyattville Road
Shankill
Co. Dublin
 Ireland
Phone: +353 1 2043213
Fax: +353 1 4547810
Email: barbara.schmidt@eurofound.ie Web: www.eiro.eurofound.ie
 
Biographical notice:
 
Barbara Schmidt
 
Barbara is the Information Systems Officer at the European Foundation for the Improvement of Living and Working Conditions in Dublin, Ireland since 1991. She has been involved in both projects described in this paper from the outset, and in particular in the promotion of the use of SGML for the Foundation since 1991. As part of the planning team, she was responsible for the systems specification of the EIRO database, and is at present involved in the contract management of this application. She is also part of the team responsible for building up the Foundation's SGML full text repository of all future Foundation publications.
 
Previously, Barbara worked in a number of specialised library and documentation establishments (both public and private) in Germany, following graduation as "Diplom-Dokumentarin" from the FHBD in Cologne in 1986. In 1997, Barbara started work on a Masters thesis (part time, by research) on the subject of "establishing an electronic fulltext respository 'knowledge resource database' based on SGML" at the University of Central England in Birmingham.
 
ABSTRACT:
 
Two case studies of multi-lingual projects to create integrated electronic publishing environments for publications involving delivery on paper and in online form.
 
 

Introduction

 
The European Foundation for the Improvement of Living and Working Conditions, located in Co. Dublin, Ireland, was set up in 1975 "to contribute to the planning and establishment of better living and working conditions through action designed to increase and disseminate knowledge" (Council Regulation (EEC) No. 1365/75, Art. 2). It is governed by a quadri-partite Administrative Board drawn from employer and trade union organisations at European level and from the national Member State level, National Governments and the Commission of the European Communities.
 
 

Goals

 
The Foundation's "raison d'etre" is to help the policy makers and decision takers of Europe and the EU Member States to work towards the ends of constant improvement of the living and working conditions of the people of the European Union. The Foundation's recommendations are absorbed into the thinking of policy makers at European, national and local levels. Its programmes to date have spanned themes from social cohesion, to work-related issues, to the environment. The continuous research and information exchange carried out by the Foundation provides the Community with a scientific basis against which to develop medium and long term policy for the improvement of social and work-related matters. It also offers other interested parties an impartial reference and a forum for discussion on the subjects.
 
 

Activities

 
The production, discussion and dissemination of information are the key elements around which the Foundation's activities are built. In consequence, the Foundation has become a leading authority on matters relating to living and working conditions. Its international staff controls the work of a variety of research teams across Europe and produces published versions of the results both for the use of Community bodies and also for general use by those active in the fields of living and working conditions. Other data collection is made possible by the Foundation's library and information services which are involved in programmes of information exchange with bodies across Europe and the rest of the world. Additional contribution comes from the development of computerised databases and continuing co-operation with EU bodies such as the Commission, the Parliament and the Economic and Social Committee.
 
 

Research

 
The generation of knowledge is particularly evident in the research projects, the aim of which is to provide information to assist the development of medium and long term EU policy. The information gathered in these projects is made available to anyone with an interest in the field. The research is carried out independently by experts and networks throughout Europe under the direction of the Foundation's staff.
 
The aims of the Foundation's research are to identify and analyse new and existing problems and their causes, to quantify their scale and impact and to search for solutions. Methods may involve surveys, case studies, action research projects, conferences, seminars and workshops, networks, databanks or different combinations of these. Closely related with such projects is the need to monitor and evaluate their progress and outcomes on an ongoing basis. New methods and approaches are developed so that the Foundation's contribution to confronting the key challenges for European society can be assessed in a way which closely involves the key audiences for its work.
 
 

Publications

 
The Foundation publishes nearly 400 titles per annum (in all 11 EU languages). 139 english language titles were published in 1997, and many of these were translated into one or more other languages. These print publications broadly consist of the following types:
  • Research reports (national and EU-consolidated);
  • Periodicals on specific topics;
  • Newsletters, brochures, summaries, etc;
  •  
    Since 1991 the Foundation is involved in the development of electronic products. The "hybrid" publishing of research outputs in both print and electronic formats where appropriate is as much part of its electronic publishing efforts as is the development of dedicated information systems. The following specialised online/offline databases and data collections have been developed over recent years:
     
    EMIRE  (Employment and Industrial Relations in Europe.) Electronic version of series "European Employment and Industrial Relations Glossaries", published in English and respective national language for each EU member state.
     
    HASTE  (Health and Safety in Europe.) A "meta database" on Health and Safety monitoring systems in EU Member States, developed 1993/1994. Once-off published as offline database on a floppy disk.
     
    ELCID  (European Living Conditions Information Directory.) A collection of information sources (institutions, organisations, publications, electronic sources) on Living Conditions in the EU member states, complementary to the EURES project of the European Commission. Was available online from 1995-1997 on ECHO, will be re-launched as web based database on Foundation's website during 1998. - First EF database for which a dedicated DTD was developed.
     
    EIRO  (European Industrial Relations Observatory.) Genuine integrated SGML publishing project. More details below.
     
    Since 1996 the Foundation has also started to develop Internet based electronic information services, for example with EIRO, but also the Foundation's general website at URL: http://www.eurofound.ie/ The Foundation follows the following electronic publishing principles:
  • open systems, open access
  • platform independence
  • multiple delivery formats
  • repackaging of information
  •  
    Therefore, the acceptance of SGML was the obvious choice to fulfil these objectives. The Foundation has therefore tried to apply SGML principles since its beginnings in database publishing with EMIRE in 1991. Two case studies from the electronic publishing experience of the Foundation have been selected for this paper. They were chosen for the following common reasons:
  • both are relying on EU-wide networks of contributors, thus presenting a number of inherent management problems and challenges;
  • both have paper and electronic outputs;
  • both are based on SGML technology;
  • both deal with the topic of Industrial Relations.
  •  
     

    Case Study 1: European Employment and Industrial Relations Glossaries and Database (EMIRE)

     
    This project was initiated by the Foundation in 1990 as response to a demand for more information on employment and Industrial Relations issues throughout the European Union. It comprises a series of some 15 English language glossaries - one for each Member State of the EU - containing several hundred entries clearly explaining the key employment and labour relations issues, terms, and concepts relating to the Member State in question, with the aim of improving the knowledge of the Industrial Relations infrastructures and procedures specific to this country, especially for readers outside this country. The glossaries have been designed for anyone seeking a comprehensive overview of employment and Industrial Relations practices throughout the European Union, and in particular briefings on any one country. Each glossary in the series is published in English, and in the respective national languages. In addition to the printed versions of the glossaries, the contents are also intended for online delivery in the form of the EMIRE database ("Employment and Industrial Relations in Europe"). (EMIRE was available online from 1991-1997 via the ECHO database host, and will be re-launched on the Internet during 1998 as part of the Foundation's general web-site development).
    Country number of languages International edition national edition online file available
    United Kingdom 1 (EN)
    Italy 2 (EN, IT)
    Spain 1 (EN)
    Germany 2 (EN, DE)
    Greece 2 (EN, GR)
    Portugal 1 (EN)
    Belgium 3 (EN, FR, NL)
    France 1 (EN)
    Netherlands 1 (EN)
    Ireland 1 (EN)
    Luxembourg in preparation
    Denmark in preparation
    Austria not yet available
    Finland not yet available
    Sweden not yet available
     
     

    Project Organisation

     
    The authors are made up of a EU wide network of national research teams for each country, led by leading Industrial Relations experts and / or labour lawyers. Each team is responsible for submission of a glossary with regard to their own country. Each glossary contains several hundreds of definitions of terms, concepts, key organisations and provisions in Industrial Relations of their own country. Each glossary entry should consist of an identification number, the glossary term, the definition of the term in variable length, typically between 200 to 500 words, and cross-references where appropriate. It was made compulsory for authors to deliver their manuscripts in electronic format in their original language (minimal requirement: DOS, 8 bit-ASCII character set). Submission of the files were to be made on floppy disks.
     
    A "scientific editor" was appointed to take responsibility for the accuracy and coherence of the glossaries series as a whole. As far as the content of the delivered is concerned, the manuscripts are subjected to a standard evaluation process carried out by nominated representatives of the national Social Partner organisations. The original language manuscripts undergo a thorough translation and revision process, whereby the most suitable translation of national concepts to the English language is being found, in close consultation with the authors. The "international series" of glossaries is being published in co-publication with a leading international publishing house, and simultaneously this glossary edition becomes part of the electronic version of the glossaries, the EMIRE database.
     
     

    Problems encountered and lessons learnt

     
    In reality, a number of management circumstances in the Glossary project lead to problems further down-stream in the production process. The problems manifested themselves in different aspects: with regard to the network of authors, the manuscripts themselves and the editorial reality. The lessons which were learnt from the EMIRE experience resulted in an improved methodological approach for subsequent projects, especially for the EIRO project (case study 2).
     
     

    Training of authors and enforcement of guidelines

     
    EMIRE situation: Detailed written guidelines had been drawn up and had been circulated to the network of authors, detailing how entries should be structured, and keyed up, whereby the use of typographic element such as bold, italics, capitalisation would indicate a structural element of the entry (e.g. glossary entry, cross-reference, organisation, etc.) . However, the guidelines were not sufficiently contractually enforced, and were therefore regarded as "voluntary" by the authors, and in many cases ignored. Apart from issuing the guidelines, no further training or active support were offered to the authors with view to helping them to comply with the guidelines. Some attempts to exert more central formatting control were met by reluctance or resistance from the authors.
     
    Lesson: To make it easier for authors to comply with their contractual obligations, it was recognised to be necessary to take more proactive measures. It was found not to be enough to simply issue guidelines and to hope for the best. It has subsequently proven more successful to bring authors together for training sessions, backed up by a help desk support function (via e-mail and telephone), and guidance on various levels.
     
     

    Manuscripts : DTDs and templates

     
    EMIRE situation: No common style or structure was applied to manuscripts across the network. In practice, each manuscript applied its own style, and there was no recognition of the importance of structure, despite the guidelines which had made this quite clear. On the level of content, there was no overall consistency of concepts and terms, due to the absence of a core list of concepts to be covered by all countries. This made it difficult to make cross-country comparisons, which could have been a worthwhile feature of such a project. Despite the contractual stipulation that 8-bit ASCII files were to be submitted, in reality the Foundation was confronted with a variety of formats, whatever the authors in question happened to have available : a mixture of Word, WordPerfect, DOS, MacIntosh, and different versions was delivered. In an extreme case no electronic version at all was delivered, only a typed paper manuscript. Due to this non-compliance, a number of character set problems etc. were encountered further down-stream in the production process, with particular consequences for the electronic version.
     
    Lesson: There is a need to have a formal DTD! In EMIRE, the only formalisation of structure was given in the guidelines, with the result of not being complied with. The great variation in structure applied had caused particular problems, especially across country files. It was therefore soon recognised that it is indispensable to work in accordance to a properly defined Document Type Definition, and to submit manuscripts to thorough parsing against the DTD prior to further processing. Development of easy templates to support required structure. Rather than trying to get across the concept of structure in guidelines, we found that authors would benefit from practical tools like a word processing template, which would reflect and stipulate the necessary structure without a high degree of abstraction. The use of the template for submission of the manuscript was to be stipulated in contracts, as well as electronic submission (e.g. via e-mail).
     
     

    Lack of editorial control

     
    EMIRE situation: Despite the establishment of a "scientific editor" with responsibility for the coherence of the entire series, and the existence of a thorough evaluation procedure with regards to the quality of the content, there was no sufficient editorial procedure to apply quality control with regard to structure or format. As a consequence, considerable pre-publication re-work of the manuscripts became necessary. Concerning the payment policy towards the authors (signing off), only delivery of the content was considered as a criterium of fulfilment of contract, no formal criteria like adherence to guidelines, stipulated formats were considered at this point. These problems were usually only discovered during the production process, when no come-back on the authors was possible anymore.
     
    Lesson: Enforce format quality contractually. To ensure that authors comply with guidelines and stipulations, it is absolutely necessary to make these conditions of payment and to anchor these properly in the contracts with authors. Signing off of the work therefore has to take these provisions into account.
     
    The difficulties and lessons learnt have taught the Foundation how to better organise an ambitious integrated electronic publishing project. In similar projects following the glossaries, most notably in EIRO (case study 2) the lessons from the EMIRE experience were implemented, leading to a much more successful operation
     
     

    Case Study 2: European Industrial Relations Observatory (EIRO)

     
    The European Industrial Relations Observatory (EIRO) is based on a network of leading research institutes in each of the countries of the European Union, at EU level, and in Norway. EIRO is co-ordinated by the European Foundation in Dublin, where the central Editorial Unit is also based. It is essentially a collaborative publishing project, with correspondents in National Centres in each country contributing a number of articles each month to the growing body of information that makes up the EIRO database.
     
    The purpose of EIRO is to collect, analyse and disseminate high-quality and up-to-date information on key developments in industrial relations in Europe, primarily to serve the needs of a core audience of national European-level organisations of the social partners, governmental organisations and EU institutions. However, the information is also disseminated to a much wider audience through a number of distribution media, and is publicly available free of charge to all citizens of the EU.
     
    The underlying concept of EIRO existed in a previous project funded by the European Commission in the late 1980's, but which fell into abeyance. The present EIRO project began with preliminary project scope and definition studies in 1994 and 1995, and formally began in the latter part of 1996, with the appointment of a full-time editor and the selection of correspondents in each of the participating countries. At this time also the publishing strategy was defined, with the help of outside publishing consultants PIRA of the UK. In early 1997 the first product of EIRO, the EIRObserver print bulletin began to be published as a bi-monthly magazine, and a limited-access database and allied Web site developed which contains a complete collection of all information collected by EIRO. In January 1998 the EIROnline web-site ( http://www.eiro.eurofound.ie/ ) was made publicly available and formally launched by the EU Commissioner for Employment and Social Affairs, Padraig Flynn.
     
     

    Project Organisation

     
    The EIRO production process begins with the National Centres (NCs), who produce a number of articles each month for submission to the EIRO database. The topics of the articles are decided in consultation with the chief editor of EIRO, who attempts to assemble a range of articles on a similar theme from each of the NCs. The articles are written using an MS-Word template defined specifically for EIRO, and emailed to the Editorial Unit twice a month. There are two types of article, short news items of about 400 words, and longer feature articles of about 1,000 words. Articles must be delivered in English (optionally with the original in the countrys' native language). The english text becomes part of the SGML database, while the original language article is stored on the database in MS-Word form.
     
    Once received by the Editorial Unit, articles are converted into SGML using a WordBasic macro, and from this point on all editing is done using a structured SGML editor (Author/Editor), using the EIRO Record DTD. When articles for a particular month are finalised (usually in batches), they are uploaded to the EIROnline web-site and published. Every second month, a selection of the most interesting and relevant of the recent articles are assembled (using the EIRO Bulletin DTD) into the EIRObserver bulletin, which is published in paper form and issued to a range of readers across the EU by post. An electronic version of the bulletin (in Adobe Acrobat) is emailed to an extended readership who have requested it by registering their email address, and also made available for downloading from the web-site.
     
     

    Lessons from EMIRE

     
    The EMIRE experience led to a number of significant measures being taken in the EIRO project to ensure a smooth production process. A dedicated team was assembled to form the central Editorial Unit, and take responsibility for the success of the project. This consists of a full-time editor and administrative assistant, working on-site at the Foundation, and a part-time technical consultant, to provide SGML expertise and general technical support to both the Editorial Unit and the National Centres, and liaise with the contractors building the database and Web site.
     
    Formal SGML DTDs were defined prior to start-up, for both the individual records and for the paper bulletin. Based on the EIRO record DTD, an MS-Word template was defined which closely matched the DTD in terms of the paragraph and character-level styles defined. A number of WordBasic macros were added to provide some structural validation of the EIRO records in MS-Word format, and present dialog boxes to configure both the local environment and fill in record-specific details such as the author and record type.
     
    A formal training course of two days duration was given to representatives of all EIRO National Centres, covering both the editorial and technical aspects of the project. The technical training course introduced the EIRO MS-Word template, and showed the use of all of the style elements, the structure validation macro, and even the use of FTP as a backup transfer method should email delivery fail. This was supplemented after the first six months of operation, by a further day of training, including individual reviews of the material delivered to correct particular problem areas. In general, the NCs responded very positively to these measures, and both the timeliness of delivery, and the quality of the technical mark-up of the material, have exceeded our expectations.
     
    Finally, to ensure that the National Centres provide the level of quality required to efficiently process their material, the legal contract agreed by the NCs contains a number of clauses to enforce compliance with the delivery requirements of the project. This was a new departure for both the Foundation and the NCs themselves. The timeliness of delivery, the method, the format and the tools are all contractually set out, to remove any room for misunderstanding. That is, monthly delivery, by email, of english language MS-Word documents formatted using the defined templates.
     
     

    Technology and Tools

     
    The technical environment to support the EIRO project consists of a mixture of off-the-shelf products and custom developments. NCs use a standard word-processor, augmented by custom template and macros. Editorial Unit staff use a standard SGML editor to work within the EIRO DTDs. Typesetting is done using Advent 3B2 typesetting software that can read SGML directly, and formatted in combination with an EIRObserver style-sheet. The database behind the web-site uses the POET object database as the record repository, augmented by programs written mainly in Java, to import records into the database, and convert them to HTML for Web viewing. Muscat search engine software is used to provide a search facility, with some customisations to interface with POET and the Web, and present results in a structured fashion. Finally, a number of custom WordBasic and Perl scripts are used within the Editorial Unit for various batch processing jobs such as conversion and validation of the records.
     
    A number of approaches to Word-to-SGML conversion were investigated, including the use of a commercial conversion product (Omnimark), and an add-on to MS-Word to enable SGML authoring (SGML Author for Word). However, given that the word-processing styles deliberately closely mapped to SGML elements, it proved most effective to develop a custom WordBasic macro to address this particular conversion problem.
     
    The EIRO National Centres have little difficulty in using the templates, although some problems were encountered in installing them on the various language editions of MS-Word being used. The WordBasic macros were originally written for an English edition, and needed some changes to run correctly on other language editions. Recently, further changes were made to support Word 97.
     
     

    Status and Future Development

     
    EIRO has progressed from a concept in late 1996 to a prime industrial relations resource by early 1998. It is considered an outstanding success, not only in the context of the industrial relations environment, but also as an excellent example of an integrated electronic publishing infrastructure. It allows efficient processing of considerable quantities of information by a small staff, and fast publication of that information on both electronic and paper media.
     
    Initially, it was planned that the paper bulletin would be published monthly in 1998. However, such has been the success of the web-site that the bulletin will remain a bi-monthly product, in order to ensure that the web edition receives the resources it requires. A number of new features are planned in order to enhance its accessability for a non-english speaking audience, and improve the range of services available to the industrial relations community.
     
    For the European Foundation, the success of the EIRO project has been the spur to work towards implementing a similar production process for all its publications. All reports will be delivered in structured word-processor file format, converted into SGML, and stored and maintained in an internal document database. From the database, individual reports will be published in an appropriate medium, whether paper, Acrobat, or Web formats, depending on the nature and value of the work. In addition, it is hoped the database will become an internal tool for searching for information, for enriching existing information with references to other material, and for developing new products from the catalogue of material.
     
    Bibliography
    European Foundation
    Perspectives on the future : the work of the European Foundation of the Improvement of Living and Working Conditions. - (brochure). - 1992. - ISBN 92-826-5035-9; SY-76-92-908-EN-C
    European Foundation
    Improving quality of life for Europeans
    European Foundation
    European Employment and Industrial Relations Glossary series / European Foundation for the Improvement of Living and Working Conditions. - Sweet & Maxwell ; Office for Official Publications of the European Communities
    European Foundation
    EIRObserver: European Industrial RelationsObservatory - ISSN 1028 0588

    The Economics of Collaborative Authoring and Distribution   Table of contents   Indexes   The Addition of a Multilingual Component to An Existing Document Processing System