Integrating product model and the documentation: A practical approach   Table of contents   Indexes   Making an IETP –, a real life experience

 
 

Plato, SGML and revolution


 
Rafal   Ksiezyk
  SGML consultant
  Polish Scientific Publishers
10 Miodowa Street
 00–251 Warsaw   Poland
Phone: +48 22 6954 308
Fax: +48 22 6954 288
Email: ksiezyk@fuw.edu.pl Web: www.pwn.com.pl
 
Biographical notice:
 
Rafal Ksiezyk
 
Rafal Ksiezyk is the SGML consultant and information engineer at Polish Scientific Publishers. He was responsible for implementation of SGML-based editorial system for Encyclopaedia Department. His area of interest is also DTD writing and user training. He did key conversions from typesetting formats to SGML.
 
He is author of some newspaper articles popularising SGML technology in Poland, and also maintains Web page "SGML in Poland" under http://www.fuw.edu.pl/~ksiezyk/sgml.html.
 
Rafal graduated from Warsaw University, Department of Physics. Now he is carrying research on stochastic signal analysis for the Ph.D.
 
ABSTRACT:
 encyclopaedia 
 knowledge base 
 microdocuments  
 publishing 
 

The largest in Poland and eastern Europe encyclopaedic publisher's experience in revolutionary implementation of SGML- based editorial system is presented. Idea of so-called Mother Encyclopaedia, that stores reference data for all current and future publications is described. This approach uses microdocuments techniques but goes further in application to reference publishing.
 
 

Introduction

 editorial system 
 

Insurrection is a polish speciality. If you need example other than from politics, have a look at SGML implementation in Polish Scientific Publishers.
 
In this largest in eastern Europe encyclopaedia publisher, during 1995 loud kingdom of typewriters was suddenly attacked by PCs. Then first DTDs were made. MS Access based editorial system with polish SGML editor was developed. Some long talks and conversions in the night. And finally in 1996 we woke up once in free SGML world.
 
Sounds easy but live behind was much more difficult.
 
 

Objectives

 
It is less and less time and more and more facts for a new encyclopaedia to publish nowadays. During production cycle, editors operate with large amount of information. It is of interest to obtain some more things from this turnaround then only one new publication.
 
 

One

 
You say — SGML? Difficult task. Encyclopaedia tries to address all the variety of the world so it is extremely difficult to find a schema suitable for all the cases. And the contradictory character of the world is reflected in those texts. This is the challenge — to SGML the world. But the high density of information here is worth to be civilised. Employment of SGML helped us to shorten production cycle and enabled publishing of the same material on few mediums.
 
The second requirement was to build a new publication on the base of existing one. This principle is well known. But in encyclopaedia you can not just place an old article unchanged. Cut & paste is not in fashion. And the live around does not wait till you finish. New kingdoms arise and old fall down.
 
 

Two

 
Western publishing houses like Britannica or Elsevier grow their knowledge-base continuously from 19-th century without breakdowns. They have parts which did not need to be modified for a century. In this environment you can produce new publications in evolutionary way. For us live was not so kind.
 
We, in 1990, threw out not too old so new-speak encyclopaedias and started to write a new one from scratch, because our world had changed. No reediting was possible — learning the new world with a pencil in the hand (read a keyboard under the fingers ). And the target was moving. Once, in the middle of 2 nd of 6–volume encyclopaedia, we had a tremendous problem of breaking apart Soviet Union. We were lucky that “Soviet Union” begins in polish with “Z”. Even among encyclopaedists, in the heart of knowledge, it was difficult to answer some questions. And where to look for an answer? In encyclopaedia? Yes!, but what? We would like to have one super-encyclopaedia.
 
Even without so big breakdowns in history there is a lot of facts that should be tracked. During heavy (read normal ) production cycle when many new editions are carried out it is difficult to say from which publication you should take the most valid information. None of them can be the source of sources. And you can not maintain data for all publications separately in the same time.
 
 

One + Two > Three

 
One problem is not a problem. But if above two meet together, you cannot work a day without clear source delivering up to date information. We decided to set up a concept of so-called Mother Encyclopaedia.
 
 

Idea

 
 

Plato

Plato
idea
 link 
 

The idea of ME  (Mother Encyclopaedia) comes after Plato. In ME we place SGML instances of ideas of all articles which can appear or already appeared in the real encyclopaedia. The real articles are the shades on the wall of the Plato's cave casted by ideas from ME . They can differ from publication to publication but original is the same. Since articles in ME have no standard body (they are pure ideas) they are linked to their children in particular publications. So children define them.
 
 

What is the benefit

 
 

Identification

 
Thanks to ideas of articles we can easily identify articles between encyclopaedias, because everyone points to his idea in ME . Traditional approach was to link articles from two or more publications directly. This resulted in hardly manageable half-linked set of entries with high percentage of orphans, caused by differences in headwords of the same de facto article or placing new article under the same headword. New model helps us to avoid this problem.

 
Relationship schema between articles in former approach

 
 
 

Easy update

 
With articles in ME we may store and maintain number of core data, updating of which was always a problem in traditional environment. Example of such data is date of birth and death for persons, number of inhabitants for countries, cities and villages, lengths of rivers and heights of mountains, but also pronunciation of headwords and headwords themselves (when transcribed form foreign languages). Any news connected with subjects of articles may be appended to ME idea of article, being thus on right place and easy to find.

 
Relationship schema between articles in current approach

 
 
 

Reuse

 
Mother's material is never published and is used internally. Due to the DTD compatibility of usual encyclopaedia and ME at the low level structure interchange of fragments between publications is easy. Subtrees containing description of headword, pronunciation and other hard data become microdocuments reuse of which can be automated in several ways. Any news are documented in ME using news' DTD fragment.
 
 

Implementation

 RDB 
RDBMS
 link 
 microdocument 
 reuse  
 

We found that SGML and not relational database tables should be the storage format for the core data. The reason is textual character of our information and high variance of it's form. DTD for ME objects is compatible at low structure level with publications' doctypes which enables interchange. Thus chunks of data become a microdocuments that can be used to build other articles. If the construction of ME is done, here an editor can find a hard data he needs. Some of them (e.g. headword or pronunciation, correctness of which must be guaranteed) are inserted into new articles and made read-only for additional safety.
 
In practice majority of things need rewriting by the editor, so the only way we can help is allow to drag and drop. In our case microdocument technique must be more flexible. Hard linking when possible but otherwise soft and permissive. Policy in this field may depend on application, publication, time pressure or reliability of your editors. Below is an example of fictitious ME entry devoted to idea of Warsaw.
 
<ME.ENTRY ID=”ME1”>
<HEAD.VARIANTS>
    <HEAD ID=”ME1.1”>Warsaw</HEAD>
    <HEAD ID=”ME1.2”>Warszawa</HEAD>
    <HEAD ID=”ME1.3”>Varsovia</HEAD>
</HEAD.VARIANTS>
<HARD.DATA>
    <INHABITANTS ID=”ME1.4”> 
       <VALUE>1.6 mlns</VALUE> <DATE>1996</DATE>     
    </INHABITANTS>
    <SURFACE ID=”ME1.5”>
        <VALUE>495 km<SUP>2</SUP></VALUE> <DATE>1997</DATE>
    </SURFACE>
</HARD.DATA> 
</ME.ENTRY>
 
If you consider new publication with entries: Warsaw (as reference to Warszawa) and Polish name Warszawa, linking would be done as follows:
 
<ENTRY MEREF=”ME1”>
    <HEAD>Warsaw</HEAD> <REFERENCE>Warszawa.</REFERENCE>
</ENTRY>
 
<ENTRY MEREF=”ME1”>
    <HEAD MEREF=”ME1.2”>Warszawa</HEAD>
    <BODY>capital city of Poland; 
    <INHABITANTS MEREF=”ME1.4”><VALUE>1.6 mlns</VALUE> 
    <DATE>1997</DATE></INHABITANTS> of inhabitants; ...
</ENTRY>
 
We assume three modes of information interchange between ME and children:
  • hard data from articles in ME may be embedded as live links in new publication, allowing update on demand
  • data from ME may be used after adoption to the style of publication in this case only pointing to the origin
  • using links from currently developed article to original in ME editor can easily find and check it with centrally updated information or give his input to update.
  •  
    Application is responsible for handling links an appropriate way.
     
     

    Going deeper

     
    Above the simplest model is described. Things become more complicated when some of the ideas from ME become mixed in new publication. For instance, in smaller edition of big encyclopaedia two articles on special and general theory of relativity are combined together. Do we have then two parents of one article? (Sounds normally.) We solve the problem in pragmatic way. If this would be most acceptable solution, we do so. But generally in such a case new idea in ME is born. To keep the family of articles together we use another layer of linking. In this case links inside the Master Encyclopaedia points to relative (daughter) ideas at the parent level.
     
     

    Summary

     
    We believe that described model would be of interest of wider environment than only reference publishers. It may be recognised as not elegant and pure enough. We feel that approach where you use ready document–building blocks when they fit and otherwise modify them without loosing the contact with origin is the best what you can do on our Earth where spirit is mixed with body.
     
    Acknowledgments
      I would like to acknowledge patience and help of my wife Karina and son Marcel.

    Integrating product model and the documentation: A practical approach   Table of contents   Indexes   Making an IETP –, a real life experience