Implementing a viable architecture for standardized Intelligent Graphics   Table of contents   Indexes   XML::DT - a Perl down translation module

 

PRISMA: The new publishing process at Samsom Publishers

 Jan-Willem   de Koning
  Information Manager
  Samsom (Wolters Kluwer)  P.O. Box 4
 2400 MA Alphen aan den Rijn   The Netherlands
Phone: +31 172 466358
Fax: +31 172 421426
Email: j.koning@samsom.nl Web: www.samsom.nl
 
Biographical notice:
 
Jan-Willem joined Wolters-Kluwer in 1987 as an editor for CD-ROM productions. Later he was projectleader and accountmanager for CD-ROM products and the realization of contentmanagement to support those productions at various customers.
CMG Trade, Transport & Industry
Rotterdam
 The Netherlands  
van Gool, Joris
 

From 1992 Jan-Willem has had various managerial functions at Samsom (and subsidiaries) all aimed at realizing media neutral databases. In 1998 Jan-Willem became Information Manager for Samsom; responsible for introducing new production processes based on SGML, aimed at cheaper, faster and better production to help create value for our professional customers.
 Joris   van Gool
  SGML/OmniMark consultant
  CMG Trade, Transport & Industry  P.O. Box 8566
 3009 AN Rotterdam   The Netherlands
Phone: +31 10 2537000
Fax: +31 10 2537035
Email: Joris.van.Gool@cmg.nl Web: www.cmg.nl
 
Biographical notice:
 
Joris van Gool completed his Managerial Computer Science study at the Erasmus University in Rotterdam, the Netherlands in 1997 by writing a thesis called “Content Management for Traditional Publishers”. During his study he has been involved as an intern in long-term projects around multi-media document management and manipulation of text.
 
Since joining CMG Trade, Transport and Industry in Rotterdam, he has mainly worked with publisher Samsom as an SGML/OmniMark consultant.
 
ABSTRACT:
 
Confronted with growing demands from our customers for more products, fit for use, on more media, with a growing concern for electronic publishing, Samsom had to consider media neutral storage of our texts. But how?
 
We based it on the following,
 
A sound neutral concept, not just "go, go SGML  (Standard Generalized Markup Language) "! Our answer: market driven SGML
 
A good set of standards, SGML alone is not good enough. Our answer: simple and sound attitude, aimed at process and techniques
 
An organization to back it up. Our answer: A pragmatic mix of make and buy
 
In this presentation I want to share our experiences with you.
Market driven SGML
 

Market driven SGML

 

Samsom

 
Samsom is a Dutch-based medium-sized publisher (approximately 600 employees, FTE), part of Wolters Kluwer. We are organized in four business units and two service units. I work for the service unit “Realisation Services” and may call myself Information Manager.
 
Common denominator in our portfolio is that all our business units inform on “policy issues”. General managers within business or government administrations and their advisors, civil servants, human resources managers, marketing managers, communication specialists etc., all of them are involved in the process of policy making, advice or policy implementation.
 
Although a lot of our content is law-based, we do not publish many law textbooks. Law and fiscal law are of concern for us, but our target groups are not lawyers and fiscalists. A typical Samsom product explains law for a non-lawyer and describe the consequences of it. So we have a lot of checklists, common questions in the field of forms and standard letters etc.
 
The media we publish on are mostly magazines and loose-leafs. The last couple of years we have witnessed the coming of new product types on CD-ROM and Internet.
 

The drive behind neutral storage

 
Our customers are asking for more fit-for-use products instead of the one-size-fits-all loose-leaf we send them. Publishing on demand is the most fit-for-use, but there are a lot of intermediate forms and these are not all electronic.
Time-to-market
 

This demand for fit-for-use products is mentioned in every market survey and customer panel we do. And it is shown in the steady decline of the subscriptions on our loose-leafs. Time-to-market is a new phenomenon for us.
 
Our production process however is still predominantly aimed at the manufacturing of loose-leafs and magazines only. It is a parallel process: each product has its own line of procedures and agreements about the style.

The old process

 
 
Confronted with growing demand from our customers for a custom-made solution, we would drown if we catered for that with more parallel processes. So we had to reorganize all the information that is used to create the products for this target group. The products must be more market driven, meaning that the layout, the medium and the composition of the product are not known up front. So when we look at the information within the business units the information should be reorganized in the following ways:
  •  information should be stored in a medium neutral storage format
  •  it must be possible to (re-)use chunks of information in several products
  •  the production process for the products must be efficient, and ideally automatic
 
Note: it is not the traditional concept of re-use we are interested in, we want to have multiple use possibilities. From that central and neutral storage, we want to be able to determine the product form in the last part of the production process.
 
Our concept is as follows.

The new process

 
 
PRISMA is the project that aims to implement this new concept at Samsom. PRISMA is an acronym for PRoduction of Information at Samsom on a Media neutral basis.
 
PRISMA starts with the recognition that we know the techniques but we are still using them in processes aimed at production with lead. So, in stead of starting with document analysis we started with process analysis; if we want to automate the making of our products, we have to standardize. We want to standardize ofcourse in SGML but we shall also have to standardize our product forms and production processes. We want to make our production street less complicated; for starters we have to make it a one-way street, no loops allowed!
 
The aim of PRISMA is to fully automate the production of loose-leafs at our main supplier; AlfaBase.
 

The Overall Business Process

 
The input for Samsom comes from third parties or is provided by editors. Laws for instance, come from Kluwer Deventer, the Central Dutch Law Database (CWB). Where necessary, input is converted to SGML , either fully or semi-automatic. If the content will be managed at Samsom, the conversion might not be fully automatic. If the content is managed elsewhere, a fully automatic conversion process will be developed in order to always have the latest content available. These conversions are still in their infancy and are mostly done on a ‘per-publication’ base.
 
After the input is in SGML (either after conversion or SGML being its native format), editing of the texts takes place using SGML-editors (Adept Editor), using their respective DTD's. Either the information is constantly updated, or extra information (keywords, links) is added. All texts are stored in a central database, that is called the MNO-database (Medium Neutral Storage database, a content repository).
 
If you introduce such a new concept in your company, you face the problem of getting all project stages in the right sequence. You cannot test a complete running publication by converting to SGML in phase one, update information in phase two and then start building your output-processes: that would mean stopping your publication for several months or having a test product without the real world problems. So naturally, we started with the output process and we'll focus on that in the rest of this talk.
 
For the output processes, the goal is to start with storage (publication-/media-/layout-neutral) and to go to a specific publication on a specific medium according to a specific layout. We wanted to do that with clear steps in a fixed order. Just like an old-fashioned production process, it needs conveyor belts to automate the output.
 

The use of SGML in this concept

 
Traditionally DTDs were made by an inventory of all books on the bookshelf, resulting in a DTD that can encompass every text, but controls for nothing: every exception is supported by the DTD. If you want to have a neutral concept that supports and controls your production process, you have to separate form and function.
Information types
 

First of all we defined Information types:
  •  Law
  •  Case law
  •  Comments
  •  Scientific article
  •  Address information
 
These Information types are quite homogeneous, for structure as well as for their content. By designing a DTD to just describe an Information type, you keep the goal of the DTD clear and its complexity manageable. It is difficult to make a DTD for a law, but it is much more difficult to make a DTD for any product containing law, comments, a checklist and much more.
 
While designing the information type DTD's the following questions are kept in mind:
  •  How can the information types serve as building blocks for the products? How will they be reused?
  •  How is the information used (information function)? This is the result of market research.
  •  What parts of the information should get a specific layout?
Building blocks
 Granularity  
Micro Document Architecture
 

Of course the granularity here is as small as you need. We call it the smallest sell-able unit. This concept gained popularity under the name of Micro Document Architecture.
Product-DTD
 

The maintenance of our texts is based on these Information types. But of course we want sell our products, so we have to take parts of the texts stored as Information types and assemble to products. Via several standardized steps, all necessary information like Table of Contents is added and the product is composed into its final format. The steps regarding structure and adding information (automatic numbering for instance) are done at Samsom, the remaining composing steps and addition of page numbers are done at our strategic partner AlfaBase. There is a very sharp line between the tasks at the respective sides and a Product-DTD aids to enforce that separation.
 
By this separation in Information types and Product-DTDs, we are able to keep our DTDs clean with the possibility to grow complex, while keeping our applications simple, sound, fast and manageable.
 
For such a radical change in your business process and the introduction of new technology, you need a company wide focus on the road ahead. We did this by creating the layout of the factory: a good set of standards.
 

A Good Set of Standards

 

Time-out

 
After a couple of pilot projects, steering committees, brainstorms and cowboy projects we took some time off to establish what we had and what we still had to conquer. We had a week ‘time-out’ with our programmers, SGML experts, projectleaders and some friendly publishers and editors.
 
In this week of talking and laughing about our experiences, my aim as manager of the group was to get our experts from the “backseat to the driving wheel”. I wanted them to move from say “yes, but….” to “No, like this!”.
 
So we had to establish standards, sell our ideas to management and move ahead implementing them. In this we succeeded beginning of 1998.
 
During 1998 we worked with our standards and used them while doing several smaller projects for specific publications. While doing we grew confident that our approach would be successful for more products and decided to start a large project to establish a general production process for several paper products: loose-leaf, annuals and newsletters. We called this project Prisma-Folio.
 
The decision was made to initially provide 100% functionality for 60% of our publications. Another 20% will need enhancements in the process, which will be made later on. The last 20% are publications that still need customized processes or will probably be phased out. The 100/60 design philosophy needs a sharp cut in functionality to keep momentum, without stepping in the trap of “all colors that are black”-mentality.
 

Areas for Standards

 
After a successful project a lot of publications will follow. Without setting standards beforehand, you will drown after your first successes. We identify several areas where we set up and maintain our standards: project guidelines, business process, DTDs, stylebooks and transformation.
 
Project guidelines. These projects were new to our organization and are done with a lot of external consultants. In order to give people a headstart and keep the knowledge in your organization, we developed these guidelines.
 
Standardize the business process: concentrate on the common denominators, not the (few) exceptions. Build fully automated ‘conversion belts’ with 100% functionality for 60% of your publications.
 
Identify for each major step in the process the goal of the information you want to store, and what role an open standard can play in it. SGML for us. Create DTDs for each major step, keeping the different goals in mind. Improve your development speed by defining what building blocks you need to create all of your DTDs, and by creating a sound DTD development process.
 
Every publication used its own stylebook, defined either by the publisher or by the printer. To create a standard process, these instructions to the printer had to be standardized.
 
Choose one development tool to build all the conversion and translation steps. During the learning process, while you are learning where to solve what problems, you want to have a powerful and diverse tool. Identify all smaller steps and decide again where to solve those problems. And here also, create building blocks and a development process to improve speed.
 
Stand firm for your standards! In order to keep the process fast, robust and simple, compromises need to be avoided. The process is complex enough by itself.
 

Process: Introducing ‘Conversion’ Belts

Standard output processes
 

To make the new process work and get all possible benefits from the new approach, standard output processes must be designed. In 2 ways.
 
First function: a new workflow in a new organization with new roles for people. A radical change with the past: linear one-way flows with clearly defined half-products.
 
Second form: you need to design the technical conversion steps to support the new workflow. Of course, IT and SGML in combination with the right tools gives you the possibility to radically change your workflow. Especially if you can fully automate the production process. However, remember that ‘form follows function’ which we will show right now.
 
5 steps:
  1.  Define publication: which Information Types in what order.
  2.  Choose medium and product-type: translate medium/publication-neutral structure into publication-type specific styles, according to a so-called Product-DTD.
  3.  Choose formatting software product: translate the styles into product-specific styles (if necessary).
  4.  Choose stylesheet: compose the product using a (possibly product-specific) stylesheet.
  5.  For folio loose-leafs, the final step is creating supplements using Xydiff.
 
After each step, working towards that particular final product, a decision about the next step can be taken. For instance, a new branch can be chosen to choose another medium, other software-package, or other stylesheet. In the long run this will make you less dependent on providers and software packages or versions.
 
Design philosophies:
  •  Fully automated.
  •  Linear. No feedback to earlier stages. Only the final product gets a Go/No Go decision.
  •  Use DTD's where necessary to explicitly define an interface. For instance, a Product-DTD defines the interface between the publisher and the graphic industry.
  •  Easy: minimize degrees of freedom in storage and the separate steps, still keeping diversity in the final products by adding new options in each step.
  •  Users can separately fine-tune the step they have responsibility for - by changing a stylesheet for instance - without compromising the automated process.
  •  Information is generated as late in the process as possible. Codes specific to the formatting-engine are generated in the latest steps. Make sure all problems are handled in the right place (do not solve page-problems at publisher, do not solve structure problems at graphic industry).
  •  Clear interface between publisher and graphic industry. Steps 1 and 2 are done at Samsom, all other steps at AlfaBase, our graphic industry partner. Use the Product-DTD (more on that later) to explicitly define the interface.
 
This is a radical change with the old way, no major compromises to the ‘old feelings’ based on print were allowed.
 

Step 1. Define publication

 
In this step, the publication is defined: which Information Types and instances from the storage will be incorporated in what order in the publication and what extra information (e.g. colofon, extra titles, TOC, numbers) will be added.
Configuration-file
 

All this information is assembled using a configuration-file, which defines the publication and is edited by the publisher. At the same time, it removes hidden intelligence from the translation applications of step one and two.
Configuration-DTD
 

To control the integrity of this file, and to make sure all necessary information is incorporated, we use a configuration-DTD.
 

Step 2. Choose medium and product-type

 
Although the order of instances has already been chosen – and this is often medium-dependent –, in this step the medium and product-type is chosen. For this, medium/publication-neutral structures are translated into publication-type specific styles, according to some Product-DTD. Also, medium-neutral structures (“<link>”) are translated into medium-specific ones (“<footnote>”, “<hyperlink>”).
 
For instance, both hierarchies “<article> - <part> - <chapter>” and “<law> - <section> - <sub>” are translated into “<level1> - <level2> - <level3>” with the possible indication of the stylesheet-type that will be used. By removing information that is not needed any more, the number of styles for the formatting process is reduced considerably.
 

Step 3. Choose formatting software product

 
In this step, hierarchic styles (“<title> in <level1> in <part>“) are translated into non-hierarchical styles (“<title1c>”). If for the next step a formatting software product with its own styles is used, the hierarchical styles are translated into product-specific styles.
 Xyvision 
 

In Prisma-Folio Xyvision is used to produce Postscript-files for printing. For instance, “<title> within <level1>” is translated into “{Title1}”, “<title> within <level2>” is translated into “{Title2b}”.
 

Step 4. Choose stylesheet

 Stylesheet 
 

For the publication, a specific stylesheet is used and the final product is composed. All stylesheets for a specific publication type use the same styles, making them interchangeable. In Prisma-Folio, Xyvision is used as composition application, so all stylesheets are proprietary Xyvision ones.
 

Step 5. Xydiff

SGdiff
Supplements
Xydiff
 

For folio loose-leafs, the final step is used to let Xydiff create supplements. This is done by presenting Xydiff the output from Xyvision of the current and an earlier run. Xydiff will find the differences and output only the changed leafs.
 

 SGML & DTDs

DTD Manual
DTD modules
 

DTD modules and the DTD Manual

 
The concept of information types results in a lot of DTD's that have to be developed and managed. In order to support this, Samsom developed a set of standard DTD modules. These modules are included in every DTD. They implement standards in the area of inline elements, hyperlinking, titles, text hierarchy, etc. By using standard modules for DTD's the maintenance and development of DTD's is easier and “outsource-able”. Furthermore a DTD-manual is written in which all the standards are documented.
 

Product DTDs

 
From the information types within the MNO-database products have to be generated. To shorten the ‘time-to-market’ for a product, several standard product-types were defined for which standard production processes are implemented. The goal of the PRISMA project was to implement standard production processes for the annual, newsletters and loose-leaf products. For each product type a DTD was designed. This product DTD describes the structure of the specific product, thus it stores no semantic information. When information for a certain product has to be communicated from the publisher Samsom to, for example, a typesetter, the information is marked-up according to the product DTD by an automatic conversion process.
 
So for all the information-block types there is a conversion to the product DTD's. And the conversion from the product DTD to the final product, for example the conversion from the loose-leaf product DTD to Xyvision, is a standard, only once developed conversion.
 
Through stylebooks the layout of the products can be configured.
 

Configuration DTDs

Publication definition
 

If you want to assemble products from information types that are stored in the MNO-database, you need some format to store publication-specific information. This includes which information types to use and in what order. For instance: [“Preface; 1 May 1999”, “TOC”, “Article; About XML”]. This is your publication definition for a specific publication on a specific date. Examples of information that is written in this definition: new titles for articles, type of numbering to be generated, etceteras.
 
To control the validity of the publication definition, a ‘Configuration DTD' has been developed. A publisher will write his publication definition according to this DTD, and the validated SGML-file controls the translation process: extraction from the MNO-database and generation of redundant data.
OmniMark
 

OmniMark

 
This powerful tool gave us the opportunity to get on track very fast, without giving up flexibility in future. All the general steps - 1, 2, 3 - are done with OmniMark, each step requiring 2-4 OmniMark scripts. Other conversion processes can use generic scripts or customized scripts within one step. The concept allows for these customized scripts. This results in faster development of new conversion processes and easier maintenance. Software-libraries (in OmniMark) were built for all standard conversions (storage to output).
 

Organization

Outsourcing
 

What do you do yourself, and what do you outsource?

 
Management of a publishing house had the tendency to ask: "Can't you just outsource this?"
 
First of all, there is no company that you can go to. In our experience you have to teach your suppliers. Of course, there are a lot of good companies, but they are all small.
 
Secondly, this is your primary business process - you better get involved heavily. So, start convincing your manager to invest in people first.
 
However, the labor market is not favorish for our line of work these days. Let's face it, a good programmer, analyst, or projectmanager can earn more at a softwarehouse.
 

Internals

 
The organization question doesn't stop at your standard make or buy decision: your also have to define the split of responsibilities in your own company. For this we defined functions and tasks to be performed in a media neutral storage project.
 

Our team

 
There are certain tasks always to be performed:
  1.  Projectmanager,
  2.   SGML developer,
  3.  conversion programmer (mainly Omnimark in our case)
  4.  system development (hardware and software)
 
On top of this you want to
  1.  Coordinate the work
  2.  Maintain standards
  3.  Advise on the use of standards
  4.  And just develop
 
Combining these you get:
 
Projectmanagement SGML/DTD Conversion System
Coordination internal internal internal internal
Maintenance internal
Advise internal
Development external external external external
 
What you must do yourself is maintenance of and the advise about the use of your DTD's (gray); in general you have to bridge the gap between business demands and the translation of these demands in your datastructuring (= DTD). This is the heart of your knowledge and is of strategic value for us.
 
Also, you will want to coordinate all the work. This you have to do yourself, and don't think it doesn't take time.
 
What you definitely can outsource is development; the projectleader, the SGML-developer, the conversion programmer are all external employees. Better still; it is a good thing to outsource this. Fresh views and a flexible organization favor external employees on this.
 
We choose to hire personnel from a limited number of companies, our strategic partners.
 

Team spirit

 
It is important to think about the fun in the work of the people that work for you. SGML can be boring and/or monotonous. A few things I learned:
  •  IT in Publishing is not readily available. Your avarage IT consultant doesn't know anything about text and textprocessing. Be prepared to invest in people, investing in external employees is part of that.
  •  Train internal and external employees in a joint program. This creates personal bonds and focuses the attention to the publishers' problem from day one.
  •  The team should be mixed. The average SGML-taskforce in your company has not enough women. This is by no mean a sexist remark: women change the culture and make it more fun to work.
  •  Take your developers seriously, give them room to experience new techniques.
  •  Celebrate birthdays, have dinner with the group, including the externals.
 

Concluding remarks

 
What is it that we do? We publish! We are not interested in the difference between SGML and XML. We are interested in making a good profit based on high quality publications. We value our content more than our SGML standards. But of course our content is based on SGML now.
 
What makes this approach and this project different and interesting?
  •  Using SGML (just) as a means for achieving business goals (time to market, versatility).
  •  Combining this with redesign of the production process (no more proofreading at any stage).
  •  Radical design philosophies, at least radical to the existing industry.
  •  Although fully media-neutral, still embracing paper as an important medium (cash cow).
  •  Quality in print, but subject to functional standards.
  •  Quantity at last; no more endless SGML-pilots.
 
Acknowledgments
 
Author wishes to thank the strategic media neutral partners of Samsom:
  •  CMG - Full service information technology provider.
     Contact: Jurgen Vreeburg (jurgen.vreeburg@cmg.nl / www.cmg.nl)
  •  Daidalos - IT in publishing/knowledge management.
     Contact: Hans Richters (hans.richters@daidalos.nl / www.daidalos.nl)
  •  AlfaBase - Publication processors; prepublishing, printing and fulfilment.
     Contact: Klaas Groen (kgroen@alfabase.nl / www.alfabase.nl)
  •  and our main technical suppliers, Omnimark and Texcel

Implementing a viable architecture for standardized Intelligent Graphics   Table of contents   Indexes   XML::DT - a Perl down translation module