| Croner &, Croner &, SGML — the first 3 years: opening the envelope! | Table of contents | Indexes | SGML and development documentation | |||
| Reich Thomas |
| Von Zadow Günter |
From Mainframe to Intranet |
Abstract: |
| This report describes our experience with a conversion project involving IBM BookMaster and HTML. |
| Credit Suisse, a large Swiss Bank, has an existing document processing infrastructure based on IBM host. They useIBM BookMaster - a markup language based onGML (Generalized Markup Language) . Their BookMaster documents are traditionally distributed on paper. The content is the documentation of bank-internal software applications - everything from user manuals to technical descriptions for audit purposes. The document size varies from about five to several hundred pages. |
| To improve the accessibility of their documents the bank wants to publish and keep these documents in acorporate intranet in the future. While this intranet is currently coming into existence the old paper publishing process must be maintained. The coexistence of the old and the new “]]world” will be necessary for several years.Conversion tools are therefore needed to ease the way from BookMaster to HTML and back. |
| This paper is a description of the current status of the ongoing project. |
BookMaster compared to HTML |
| BookMaster and HTML are both markup languages. Both have headings and paragraphs etc. Both call them H1, H2... and P. But a closer look reveals many differences. Goals and concepts are quite different. |
BookMaster |
| The functionality ofIBM BookMaster is directed towards formatting of high-qualitypaper documents. BookMaster is owned by a vendor. The scope of the language is well defined and stable. There are about 300 tags, many of them have attributes. There is practically no special editing tool, the user generates the markup “]]by hand” using a standard mainframe editor. BookMaster is available on mainframe only. |
| BookMaster parses the source document against a built-in set of syntax rules. If it finds syntax errors it tries to correct them during the formatting process in a formal way. Thus, BookMaster in most cases produces a document ready for printing - even if the source document contains some incorrect markup. |
| Unfortunately, BookMaster documents may also containDCF (IBM Document Composition Facility, formerly called [Script]) markup. BookMaster is built onDCF , it can be considered as being a macro language ofDCF .DCF is procedural markup, whereas BookMaster offers descriptive markup. Some users feel that BookMaster does not suit all their layout needs, and so they also useDCF commands . Of course, their documents present an extra challenge to our project. |
HTML |
| The functionality ofHTML is directed towards formatting ofscreen documents only. Many browsers are available. The language is practically defined by the editing tool that is being used. There are also many HTML editing tools available. This leads to the fact that the scope of the language is being extended at a high rate (and in different directions) by many vendors. HTML tools are available on all of today’s platforms. |
| HTML browsers are very tolerant against syntax errors. In fact, the term “]]syntax error” does not really apply. The browsers either “]]understand” the markup or they ignore it. But they always show the document somehow. |
| If HTML is to be used as input for conversion the scope and the syntax of the language need to bedefined . At the beginning of our project we agreed on using the HTML 2.0-DTD. Later, we switched to the HTML 3.2-DTD. Elements and attributes not defined in this DTD are ignored during conversion. |
| Since BookMaster and HTML have different goals (paper vs. screen) there are many functions in either system which are not available (and not needed) in the other. The document therefore inevitably looses part of its functions during conversion. This is illustrated in the following chapter. |
Examples |
| The following three examples show some of the differences of the two systems. |
Example 1: Cross references |
| Both, HTML and BookMaster useidentical attribute values to define the logical connection between reference and target. However, there are the following differences. |
|
HTML |
<P>To view some samples click <A HREF='#samp'>here</A>. ... <H2><A NAME='samp'>Selected Samples</A></H2> ...
| The browser shows this reference: |
| To view some samples clickhere . |
BookMaster |
:P.Samples can be found in :HDREF REFID=samp.. ... :H2 ID=samp.Selected Samples ...
| The printed document shows this reference: |
| Samples can be found in “]]Selected Samples” on page 27. |
Example 2: Tables |
| Both systems include tables. However, the table models used are different. The BookMaster table model is complex and offers the functions of a professional typesetting system. The HTML 3.2 table model is less complex and is - again - oriented towards screen viewing. All details are different. |
| To illustrate the difference a small sample is modeled in both systems. The sample table has two rows and three columns. Row spanning: column 1. Column spanning: Row 1, column 2 and 3. Text alignment: various examples. |
![]() |
HTML |
<TABLE BORDER='BORDER'> <TR><TD VALIGN='MIDDLE' ALIGN='CENTER' ROWSPAN='2'> Barcelona</TD> <TD ALIGN='CENTER' COLSPAN='2'> Princesa Sofia Intercontinental</TD></TR> <TR><TD ALIGN='LEFT'>SGML</TD> <TD ALIGN='RIGHT'>'97</TD></TR></TABLE>
BookMaster |
:TDEF ID='sample1' COLS='2* * *' ARRANGE='1 2 2/1 3 4' VALIGN='CENTER' ALIGN='CENTER CENTER LEFT RIGHT' :TABLE REFID='sample1'. :ROW. :C 1.Barcelona :C 2.Princesa Sofia Intercontinental :C 3.SGML :C 4.'97 :ETABLE.
Example 3: Semantic BookMaster constructs |
| There are various BookMaster tags with semantic meaning such as :FRONTM (front matter), :AUTHOR, :BACKM (back matter), :CAUTION, :CLETTER (cover letter), :DATE, :DDHD and :DTHD (header in definition list), :F (Formula), :GL (glossary list), :GRID, :INDEX, :LBLBOX (labeled box), :LITDATA (literal data) and many others. |
| The bank has also extended BookMaster by a number of customer specific tags like :INSTR (Instradierung), :SKACGR (special character graphics), :SKAnHD (standard headers with generated text), :DOCNAME (document name and other information which appears in header and footer lines only), and others. |
| HTML does not have aspecific markup for all of these. A BookMaster document that is converted to HTML looses this information. |
Project goals |
| Our main goal was to make the converter easily accessible and easily usable for every bank employee who deals with such documents. We reached this goal by developing agraphical user interface and by using aclient/server technique which both are described below. |
| At the beginning we wanted to develop a conversion tool forboth directions. Ideally, a round-trip conversion - for example from BookMaster to HTML and back to BookMaster - should end in thesame document. The more we looked into the details, we found that our goal was out of reach. This is due to the difference in scope and purpose of the two systems, as described in the previous two chapters. |
| So in the course of our project we found out that we could not develop auniversal conversion tool. We could however develop a tool which serves someparticular purposes. We then re-defined our conversion goals to the following: |
|
Outlook |
| In above goals asingle document of one system is being converted into asingle document of the other system. As a future objective we are considering to handle HTMLdocument webs as well. |
| We define a “]]document web” as a group of HTML documents that form a hypertext system. Typically, these documents are relatively small (filling just one “]]page” or so.). Typical BookMaster documents on the other hand are relatively large. Even a complete manual of several hundred pages isone BookMaster document. |
| In order to convert a document web to BookMaster the following information must be known: (1) Which documents belong to the web and (2) in what order they shall be arranged. Converting a BookMaster document to an HTML document web involves (1) a segmentation into small documents and (2) the generation of a table of contents page with the appropriate links for navigation. |
Project phases |
| Our project started in September 1995 and is still continuing. Concluded project phases: |
|
| Planned |
|
Conversion technique |
| The conversion tool we use isOmniMark from OmniMark Technologies. OmniMark is an excellent and universal conversion tool which presents aprogramming environment. The conversion application is defined in a user written program called OmniMark script. The actual conversion is done by calling OmniMark to execute the script. |
| OmniMark is well equipped to perform conversion tasks in SGML applications. It understands a DTD and has a built-in SGML parser. One distinguishes betweenup-translations (from non-SGML to SGML),down-translations (from SGML to non-SGML), andcross-translations (no SGML parsing). A combination of these techniques is often required. |
| For our project a number of OmniMark scripts had to be written. The total number of lines of code is currently 4,000. Techniques used: |
| BookMaster to HTML |
| (1) Cross-translation to coverevery BookMaster construct. (2) Up-translation resulting in an SGML instance conforming to the HTML 3.2-DTD. |
| HTML to BookMaster |
| (1) Cross-translation to clean up HTML (discard all markup which is unknown to the HTML 3.2-DTD). (2) Down-translation from a valid SGML instance to BookMaster. |
| The application is implemented in client/server technique. OmniMark runs on the server (OS/2) whereas the graphical user interface operates on clients (OS/2 and Windows 95/NT). |
Documentation and testing |
| This chapter describes two areas that needed more attention than we thought at the beginning. |
Documentation |
| Before we actually started to write the OmniMark scripts we developedconversion tables - one for each direction. They contain for example an entry for each BookMaster tag and the appropriate action to be performed for it. If the tag can have attributes and if it must be handled differently depending on those attributes this is also documented. Of course, there are things that cannot be expressed in table format. In these cases another document format was chosen. |
| These conversion tables were discussed and agreed upon among those responsible for the project before the programming started. They were updated during the actual programming process. We find such documentation to be a good ground for discussion among the various parties of the project and they certainly serve as a reference later. |
Testing |
| In order to test our converter we wrote many synthetic BookMaster and HTML documents. Each of them is rather small and features just one or two elements or related subjects. Whenever a substantial change is made to the converter we run the complete test suite and correct all bugs found. Subsequently, we are also using real documents for testing. We test both directions separately. |
| At the beginning we underestimated the amount of testing necessary. Testing and correcting takes almost as much time as the programming itself. |
Graphical user interface |
| Calling our various conversion programs with all necessary files and options “]]by hand” is cumbersome and error prone. We therefore developed agraphical user interface . It consists of 6,500 lines of code (OS/2). |
| The user first selects the direction of the conversion. Subsequently, he selects input and output files and - if needed - some options. Note that the interface does not only deal with workstation based files. The user also has the option to have his BookMaster files transferred to or from themain frame (where they actually belong). |
| The following example illustrates this. The users specifies a conversion from BookMaster to HTML. He identifies the source file as residing on the mainframe. The user interface at first calls a file transfer program which transfers the source file to the workstation. Then, it sends it to the server for conversion. Then, it receives the resulting HTML file from the server. All this happens automatically. |
| Besides convertingsingle files the user can also ask for the conversion ofmany files in a single run. This is calledpackage handling . A naming convention is used for the output files. During processing the interface shows the actual status. After completion the user can obtain an error log. Package handling also works in both directions. |
Users |
Expectations |
| It is important to understand that the converter is used by many different end-users. Most of them are not experts in document processing. Most of them only occasionally write documents for publication. |
| The bank employees are more or less free to use the tools of their choice. A new tool is usually not introduced by managementorder . If the support department wants to introduce a new tool, technique, or method they must “]]sell” it to their users. |
| We found out that theexpectations of the end-users are quite high: They consider the converter as a “]]black box”. They do not want to analyze any error messages. Their source documents may have syntax errors, they may be incomplete, or they may beunnecessary complex. Nevertheless, the users expect our converter to produce a result for every source document. The converter should never stop because of errors. |
| To cope with these expectations and problems we constantly modified and improved the OmniMark scripts as well as the user interface. We organized a feedback process from the users and solved most of the reported problems in the scripts. We are confident that our conversion tool will eventually work fine for BookMaster. We are however aware of the fact that every new HTML tool may present new problems. |
Coexistence of both worlds |
| Why is it necessary to have coexistence of main frame and intranet forseveral years? Why do the users needround-trip conversions at all? Why can they not convert their documents to the new world justonce . |
| We are talking of about 2,000 bank employees who work on computer applications. They work on many different tasks and their documentation needs vary a great deal. They all are potential users of our converter. |
| An important part of the documentation is supposed to reachevery employee in the bank. The distribution today is based on a well established mainframe infrastructure. The system is calledPOV (Print Output Verteilung (=distribution)) . It is clear that for several years to come the mainframe technique will remain to be theonly common link between all bank departments. The intranet technique however is still being developed and far from being able to reachevery employee. |
| Furthermore, one has to accept that the ability and the need to turn to new techniques and tools varies among the employees. There are profound BookMaster experts whose main work field is not documentation and who hesitate to turn to any new technique because the BookMaster functionality serves them very well. Projects for example in which half of the documentation is written in BookMaster and the other half in HTML must nevertheless haveone documentation. |
| For all these reasons the bank has accepted that |
Consulting |
| The project is being run by FIDES Informatik, a Swiss consulting company which supports Credit Suisse in the development of strategic computer applications. FIDES defines the project. They install the converter at Credit Suisse and support the users. |
| The OmniMark scripts, the graphical user interface, and the client/server procedures are designed and written by DOSCO Document Systems Consulting, a German SGML consulting company. |
| The conversion concepts are jointly defined by both, FIDES and DOSCO. |
Summary |
| At first glance, a conversion from BookMaster to HTML and vice versa looks simple. A closer look however reveals many detail problems which must be solved before the converter is ready for the end-users. But with an excellent conversion tool and a considerable amount of work we are achieving good results. |
| Croner &, Croner &, SGML — the first 3 years: opening the envelope! | Table of contents | Indexes | SGML and development documentation | |||