XML and the ATA Interchange Model   Table of contents   Indexes   AECMA 1000D and IETP : Diverse approach to define IETP from Data-Modules,

 
 

Comparing Styling in Layout-driven & Content-driven Documents


 
Stephen   Deach
  Sr. Computer Scientist
  Adobe Systems, Inc.
345 Park Avenue — W14-110
San Jose   California  95110-2704  USA
Email: sdeach@adobe.com Web: www.adobe.com
 
Biographical notice:
 
Stephen Deach
 
Stephen Deach has over 20 years experience in building publishing system software. This has included —
  • Newspaper systems — editorial, ad-layout, classified, and pagination;
  • Tech pubs production systems (including forms and table authoring tools);
  • Book and textbook production systems;
  • Illustration and ad production tools — commercial advertising, Yellow Pages, and newspaper markets;
  • Directory pagination — White and Yellow Pages;
  • Presentation graphics, chartmaking, and graphics editing tools; Imaging software in typesetters, imagesetters, and laser printers; and
  • Font creation, editing, and rendering tools.
 
At Adobe, he has worked on Illustrator and is currently working on FrameMaker.
 
He represents Adobe on theW3C -XSL Working Group
 
ABSTRACT:
 
This presentation describes the differences in production methods and formatting technologies of layout-driven documents (newspapers, magazines, advertising, and forms) vs. content-driven documents (textbooks, user’s manuals, most business documents).
 
Acronyms: DB  (Database) , DSSSL  (Document Style, Semantic, and Specification Language) , DTD  (Document Type Definition) , ISO  (International Organization for Standardization) , HTML  (HyperText Markup Language) SGML  (Standard Generalized Markup Language) , TOC  (Table of Contents) , W3C  (World-Wide Web Consortium) , XML  (eXtensible Markup Language) , XSL  (eXtensible Style Language)
Content-Driven Documents
 DB, Database 
 DSSSL 
Database Publishing
 ISO 
Layout-Driven Documents
 SGML 
Style Sheets
 W3C 
 XML 
 XSL 
 

Keywords: Content-Driven Documents, Layout-Driven Documents, Style Sheets, DB, Database Publishing, DSSSL, ISO, SGML, W3C, XML, XSL
 
Copyright 1998 — Adobe Systems Incorporated, All Rights Reserved
 
 

Introduction

 
To allow the use of documents in different contexts, the developers ofSGML first found it necessary to split the content from the layout and styling, then found it necessary to split the content-structuring rules (the DTD ) from the content itself. Now, as we define XSL , we are finding a need for similar subdivision within the formatting process in order to satisfy the needs of cross media documents (browsers and print) and to provide the level of automation to layout-driven documents that is available today in content-driven documents.
 
In this presentation I’ll describe the similarities and differences between documents in the layout-driven class and those in the content-driven class, closing with a proposed architecture to support both.
 
I expect that much of this audience is familiar with the production of content-driven documents because that is whereSGML has been most successful.
 
Documents that are predominantly content-driven include —
  • User’s manuals
  • Product documentation — military, aerospace, and other heavy-industry
  • DB -driven documents — Parts lists, price lists, insurance policies, phone books (both white and yellow pages), and dictionaries
  • Monologues — Textbooks, other books, and other single-story documents
  • Collections of articles presented serially — conference proceedings, technical journals, short stories, and encyclopedia
  • Business documents — letters, white papers, proposals, specifications, and press releases
  • Financial documents — annual reports and securities filings
  • Legal documents — contracts, etc.
  •  
    But what about —
  • Newspapers, newsletters, and magazines
  • Packaging and point-of-sale materials
  • Advertising, brochures, and mailers
  • Retail and mail-order catalogs
  • Corporate identity packages
  • Presentations
  •  
    — and the ever painful —
  • Forms
  •  
    Then, what about —
  • Web pages — Many existing content-driven documents are being converted to be delivered on the Web, an environment that often demands a layout-driven presentation to be most effective.
  •  
    I’ll start by looking at the general characteristics of content-driven documents. Then I’ll discuss the general characteristics of layout driven documents. I’ll repeat this alternation between content-driven and layout-driven as I discuss the production processes, specific formatting features, and software architectures in each class. Finally, I’ll present a hybrid architecture that could be used to author both content-driven and layout-driven documents. (This is not a product pre-announcement.)
     
     

    Characteristics of Documents In Each Class

     
    I’m going to summarize the characteristics common to typical content-driven documents, then those of typical layout-driven documents. One should remember that nothing is pure:
  • Layout-driven documents have content-driven components.
  • Similarly, content-driven documents have layout-driven components.
  •  
     

    Characteristics of Content-driven Documents

     
    Content-driven documents have a number of common characteristics —
  • They are single thread documents, either —
  • The document consumes as many pages as are needed to carry the full content.
  • Page designs are fairly simple and highly repetitive.
  • Content is organized hierarchically, with these exceptions —
  •  
     

    Characteristics of Layout-driven Documents

     
    Layout-driven documents have a number of common characteristics —
  • Most are multi-thread —
  • Position or juxtaposition conveys information or emphasis.
  • Often read in a hop-scotch manner
  • In a layout-driven document the content is often trimmed or rewritten to fit the allowed space (whereas content-driven documents add pages to hold all the content).
  • Though each content component (article, sidebar, directory entry, figure, or table) is individually well-structured; the overall document is a hodge-podge of different (often unrelated) content types.
  • Pages (often spreads) have complex designs, with far less repetition than those of content-driven documents. (I’ve often said that newspaper page design is an attempt to completely hide the underlying grid.)
  • Articles often jump to non-adjacent pages, thus they have multiple headline blocks and they usually have "continued to[ENTITY-TML-sol]from" lines.
  •  
     

    Production Processes

     
    I have chosen examples of the production processes of a typical user’s manual for the content-driven case and of a newspaper for the layout-driven case. These examples were chosen to be about as different as possible to highlight how the process can dictate certain design decisions. Most other documents fall somewhere between these 2 extremes.
     
     

    Production Processes for Content-driven Documents

     
    The steps that one goes through to produce the typical large, content-driven document (somewhat simplified) —
  • You begin with a product[ENTITY-TML-sol]document concept. This includes a basic design concept, a general description of the nature of the content and illustrations.
  • Next, one typically puts together a high-level outline and a guidelines for the authors.
  • Then, the text creation task is broken apart, typically by chapter or section, across a group of authors and editors.
  • The authors gather the information necessary to write their sections. As they are written, sections are reviewed and edited or rewritten, often through multiple cycles.
  • Usually, during the latter part of this authoring/editing cycle, illustrations and photos are identified and produced. Again, multiple illustrators and photographers may be used. In some environments, many of the illustrative material requests are handled through extensive libraries of photos, artwork, and callout overlays.
  • Once a sufficient portion of the text and illustrations are available, the document’s assembly, formatting, and layout begins.
  • As the document approaches completion, the TOC and index are added. A product like FrameMaker is suitable for the authoring, editing, assembly, formatting, and TOC /indexing phases for many of these documents.
  • Finally, an electronic representation of the document is sent to the printer for final typesetting. Then, bluelines or sepias are signed-off; after which the document is printed, bound, and delivered.
  •  
     

    Production Processes for Layout-driven Documents

     
    I want to start this section with a question —
     
     
    "What would your reaction be if your boss came into your office and said, ‘I need you to find a way to produce a new 64-page document every day. I don’t just want a daily update; I want a complete, new document — I want new content, a different emphasis, a fresh layout — I want it totally new’?”
     
    As the world turns black around you and you vaguely feel your body hit the floor, you hear him amend his request to, "Ok, how about weekly? By the way, this has to make a profit.”
     
    When you come to and actually start to think about how you would organize your document and your production processes to do this, I suspect you would do something like the following —
  • Divide your document into a set of small autonomous pieces,
  • Farm much of the content creation out to third parties or rely on subscription services,
  • Structure the content in a general way so that content-production could be split across independent departments, and
  • Structure the day-to-day specialization, and
  • The profit center (advertising sales) would have control over the actual size of the document and significant control over the primary layout (at least over the ad placement).
  •  
    This is exactly the production strategy of your daily newspaper.
     
     
    Now before you say that this can’t happen to you, I’ll ask, "How many of you have been asked to put together and manage your company’s web page?" Those who have are looking at a very similar problem.
     
    I’ve outlined the general strategy of a newspaper’s production process. Let’s walk through the general workflow —
  • A page count target for a given date is set about 1 to 3 months in advance to give the sales department time to sell advertising. These targets take into account any special issues and seasonal advertising conditions. A newspaper is divided into sections. Ads are generally sold by section, though some documents sell specific pages or specific placements. Sections also provide a process management function. Some sections are driven by current events (the news, sports and business scores [that is what the stock market reports are anyway]), while the rest are feature sections that can be written in advance and used as needed (or used on a longer-term, well-defined schedule). This ratio of feature pages to news pages is decided at this time, as is the ratio of ad-space to editorial space in each section (these are often seasonally-adjusted, fixed ratios).
  • A few weeks before the issue, production of feature sections is begun. The feature and advertising-only sections are often printed in advance.
  • Usually, about a week before the date of publication, a final page budget for each issue is decided.
  • Next, advertising-only sections (except for the classifieds) and feature sections can be finalized.
  • The news sections are held until last. The remaining ads (especially classifieds) and content flow in daily. The ads are placed first (and billed). The remaining space is split up among the production departments and they get to fill them with a mixture of information that flows in from 3rd parties and information that you must produce on your own. To fill this varying amount of space reliably, a newspaper divides the space into a number of small, semi-interchangeable articles.
  • As stated previously, the ad layout is set first. This is generally frozen before the editorial departments get control of the remaining space —
  •  
    There are 3 key points that I identified in this layout-driven process description —
  • First, the need for reliable delivery of a specified amount of content has forced the design decision to use small, semi-independent articles.
  • Second, the large number of articles and ads has dictated many of the aspects of the production process.
  • Third, it has also dictated that the reporter/writer give up ownership of the day-to-day content.
  •  
     

    Specific Formatting Features

     
     

    Similarities between Content-driven and Layout-driven Documents

     
    There are many similarities in the formatting of content-driven and layout-driven documents. For example, the level of general typography is fairly similar, so controls over the following are fairly similar —
  • Paragraph/headline/heading styles
  • Character/string styles
  • Table and table cell styles
  • International styles (covered by another speaker in this session)
  •  
    You may think that some of these style properties differ between content-driven and layout-driven applications — they don’t. What is happening is that you are really comparing products with 2 different levels of service.
     
    For example, if we take the basic justification properties —
  • The lowest service products only let you set the paragraph alignment to justified. You get no control over how the justification is performed.
  • In the mid-level product, you get paragraph-level control over the minimum, optimum, and maximum wordspacing and the option to enable or disable letterspacing, but no control over the letterspacing range.
  • In the high-end product, you get character-string level control over wordspacing, plus the ability to set the minimum, optimum, and maximum letterspacing and often control of the actual letterspacing-to-wordspacing algorithm.
  •  
    This span exists on the content side as one moves from basic word processors to high-end book production systems. The same span exists on the layout-driven side as one moves from basic drawing or presentation-graphics applications to high-end ad production products.
     
     

    Formatting Features More Common in Content-driven Documents

     
    The primary goal of the content-driven styling process is to minimize the amount of human interaction with the layout process. As much as possible is specified in rules.
  • Data Reorganization is necessary because parts of the document are presented at places that can only be determined by the layout process or after the pagination is finalized.
  • Simpler rule-based page designs, but more extensive control over pagination algorithms —
  • More extensive auto-numbering, complex page numbering, lists
  • Change bars
  •  
     

    Formatting Features More Common in Layout-driven Documents

     
    The goal of most layout-driven documents is to allow each page to be unique, thus more likely to catch the reader’s attention.
  • Templated or individually-drawn page or spread layouts
  • Magazine article opening
  • Dropped caps (or even lowers) everywhere
  • Ornamental pull quotes
  • Caption blocks ("Photos, clockwise from top...”)
  • Jump -or- jump, jump, jump, jump, jump, jump... (backwards jumps are sometimes allowed)
  •  
     

    Software Architectures

     
    Traditionally, documents and applications have fallen into either the content-driven category or the layout-driven category — the Web will change this. Documents presented on the Web have many of the characteristics of layout-driven advertising. However, when you go to print that document, it would be better served by a content-driven presentation.
     
    Many people have said that they don’t believe one application could be developed to support both layout-driven and content-driven domains. I don’t support that belief.
  • The aggregate of software processing steps are fairly similar in both domains.
  • Many of the specification requirements are fairly similar. However, they do vary in how carefully they segregate the specifications of:
  • What is different is how they are organized, and as a result, the process flows are also different. Where in the process that certain things are specified varies —
  • But eventually, you end up pouring of streams of styled text into sets of rectangles.
  •  
    In this section, I am going to look at the architectures of existing applications. As I do, I’ll identify areas that limit the use of the products across both domains. In most cases, these limitations involve:
  • failure to segregate major components of a compound document model, either at the style sheet (specification) or the formatter (implementation) level (tight bindings that should be loose bindings),
  • specification models that are back-derived from a specific implementation (a "tail wagging the dog" problem that has been worsened by WYSIWYG), -or-
  • lack of a well-developed abstract model.
  •  
     

    Software Architecture for Content-driven Documents

     
    Current content-driven applications have been fairly successful at adopting the content/styling/layout split that is necessary to support multipurpose-reuse of the content.
     
    Early typesetting systems (late 1970’s) had direct/inline/cumulative markup for setting styling, column width, etc.; but layout was done with an X-acto knife. When computer area-composition came along, it was done mostly as a wrapper around the galley composer, rather than by embedding inline markup — so the low-level styling was tightly bound (even intermingled) with the content, but layout was separate. As these systems have aged over the past 20 years, they have been replaced with stronger indirect (style-sheet) systems that break the binding of the low-level styling and the content, thus are much closer to the desired model.
     
    Word processors have not evolved as far.
  • They have traditionally controlled all layout by embedding markers in the text (such as section marks that govern pagination, page numbering, multi-column vs. single column layout, etc.; break markers to force line, column, and page-breaks). This is bad.
  • They have also required that the user directly perform most content reorganization. (For example: extract the TOC and manually insert it in the document, manually re-extract to get correct page numbers.) This, too, is bad.
  • On the bright side, most modern word processors support some level of object tagging and style indirection via incorporation of style sheets (at least at the paragraph level) and standardization/indirection in pagination through the support some level of page templates or page phasing (first, odd, even).
  •  
    Products like FrameMaker have gone a few steps further. They allow us to split styling from the layout and from the content.
  • Style sheets are supported at the character-string level, at the paragraph level, and at higher levels for special constructs.
  • Layout is separated into a template mechanism.
  •  
    Finally, products like FrameMaker+SGML go one step further, allowing indirect assignment of these styles based on the context in which an element appears. This is probably the minimum needed to support the needs of hybrid documents, though segregation of the reorganization and synchronization of parallel flows is probably still needed. (The lack of the explicit reorganization capability is one reason that FM+SGML doesn’t handle multi-article SGML documents very well.)
     
    DSSSL provides an adequate architecture for specifying the content reorganization, style assignment, specification of basic page design, and mapping of content to page areas. DSSSL was designed to support serializable processing of a document in a manner similar to batch book production formatters. It was not designed to handle hybrid or layout-driven documents. In order to support hybrid content-driven/layout-driven documents, I would prefer separation of the layout specification from the flow restructuring/synchronization, and separation of the reorganization from the styling. In addition, (again by intent) the requirements placed on the formatter (and formatting algorithms) are not fully specified and many of the actual decisions needed to perform the pagination and text placement are left to the implementor of the formatter.
     
     

    Software Architecture for Layout-driven Documents

     
    We aren’t as lucky here — Tight bindings run rampant.
     
    Why? ...
  • There is a wider diversity of applications and design requirements.
  • Most layout-driven documents have been perceived as one-offs, no attempt has been made to look at underlying models. If one looks at today’s production models, that makes sense. However, if you try to automate the process over a family of documents, you need to move the decisions further forward in the process. This gives the designer more direct input into the process (some have called this capability "smart templates"). To do this successfully, an underlying model becomes necessary. This model must identify the content components and the relationships between the content components through a manner other than the specification of the positions of the components. This content specification must be separated from the presentation specification (layout and styling). This can be done! — It is exactly what is done in a charting package. You provide a spreadsheet that specifies the data values and the relationships between those values. The charting package then lets you choose between a number of different chart types and lets you attach styling on a class-by-class basis. (You attach a style to the chart title, to pie-chart segments or to the graph lines, to segment or curve labels, to all value labels, etc.) Production systems for newspapers, magazine, and Yellow Pages have also been successful at splitting the content specification/creation from of the presentation specification.
  • Attempts at standardization have been driven by application-/vendor-specific file formats, tightly tied to specific products and implementations. These generally work only for a narrow class of documents.
  • Generalization of the text content architecture to support SGML has been minimal to non-existent (until the advent of HTML , which has forced significant rethinking by these markets).
  •  
    — As a result, there are no "standards-based" specifications for layout-driven document formatting. Illustrator, Pagemaker, and Quark provide the most widely understood proprietary implementations.
     
    I expect that tools will evolve to support stronger layout-driven models over the next 5 years. HTML -export has forced products in these markets to address the problem of creating structured content representations for individual articles. It is not a large step to allow importing of articles from HTML , as well as import and export of XML and SGML formats. These articles could have the headline blocks and photo assemblies (photo, photo-title, descriptive-caption, & credit) attached or the headline blocks and photos could be carried separately, as required by the content use/reuse. In addition to the content/styling split forced by SGML / XML import, attempts to support reuse will force a similar content/layout split and the associated formalization of layout specification models.
     
    In most layout-driven environments, content reorganization is less important than it is in content-driven environments (except in areas involving DB publishing). The separation between content-structure-rules, content, styling for area composition, layout specification and the actual pagination/layout will become more critical.
     
     

    Architecture to Support Both

     
    Over the next few years, the web and the desire for document re-use/repurposing will cause tools to emerge that can create documents in both the content-driven and layout-driven forms and to convert documents from one to the other.
     
    To merge the capabilities of content-driven applications and layout-driven applications, it is necessary to divide the styling specification and the formatter implementation into 4 separate domains (though these may be combined into a single style sheet, the conceptual separation is necessary). This is analogous to the DTD /content split in SGML / XML .
     
    These domains are —
  • Content restructuring Conversion of the document from the authoring environment to the presentation environment. This is similar to the flow tree construction process defined in DSSSL , where the content is reorganized into a flow tree with hierarchical flow objects.
  • Style assignment Attaching styles onto the formatter objects. This step requires knowledge of the context of the source object in the authoring environment and the context where it will appear in the presentation, as a consequence, the processing of the content reorganization and style assignment phases can be combined into a single step. This is also similar to the style assignment portion of the flow tree construction process defined in DSSSL .
  • Pagination Pagination is the process of dividing the document into pages and deciding the general layout of each page. (Division of Web documents into pages is handled by the reorganization phase, until you go to print them, then traditional pagination is required.) Pagination usually involves the selection of a page design or page template based on a combination of page number, page phase, and the content pending for placement (usually including management of floating objects and footnotes). Once a page design is chosen, a single content queue is assigned to each page area. For a single paginator to support all document classes, it must be possible to specify page designs using rule-based pages, template-based pages, and tight-bound pages (content already embedded), a mixture of these methods on a page-by-page basis, or a mixture of differing layout strategies on an area-by-area basis within the page. DSSSL was designed to support content-driven documents with a single source tree. It didn’t need to split out the pagination or area composition from the content reorganization and style assignment, therefore does the page sequence selection and assignment of flows to page areas (by class) during the flow tree construction. — XSL will need to separate all pagination-related assignments into a separate pagination specification and a separate processing phase to support the mix of content-driven and layout-driven documents.
  • Area composition Area composition involves the detailed placement of text within an area including: local subdivision of an area into columns; handling of tabs, indents and horizontal alignments; breaking the text into lines; controlling line and paragraph spacing; and kerning and justification. International systems must also handle a number of special composition features, such as the Japanese rubi, warichu, and kumisuji; Arabic and Hebrew mixed writing directions; and special accent placement for many languages. Column subdivision is included at the area composition level rather than at the pagination level to support sideheads, straddle heads, side-by-side items (Q/A lists), and the "magazine article opening". The area-composition service must support the elide-to-fit capability if one wishes to automate the production of layout-driven documents. This split between the area composition and pagination is required for the style specification, but there are sufficient interactions in the formatter that the implementation of these 2 processing phases may continue to be combined.
  •  
    Why is this separation needed?
     
    Many of the style assignments are transferable across document types, but change with the target media. However the document restructuring, pagination, and area composition requirements change as one crosses the content-driven/layout-driven line. It also changes among document types or document components within each major class. You want the ability to swap-out only the necessary style specification portions as you move from one document type or presentation type to another, or as you move from component to component within a given document.
     
    For example, there are separate page design tools for different document and component types —
     
    • Ad layout and most DTP applications use a freeform layout definition with tight binding between the content an specific placements. These templates specify a mixture of hard-placed content and flow-areas where the content can be mapped in later by the pagination process. In most uses, the hard-placed content will remain directly embedded in the layout specification.
    • Today’s specialized tools for producing graphs, charts, diagrams, etc. produce highly structured layouts with tightly bound or embedded content. In the future, many of these will be templates that can be selected and activated by the document’s content stream.
    • Newspapers and newsletters use interlocking-tiled page templates, mapping articles to areas based on priority. In the future, if articles are marked up appropriately, the formatter can adjust the article content so that the article fits in the target space.
    • User manuals and textbooks will continue to use rule-based page designs for the majority of their pages, with template-based handling of special-case pages.
     
    Similar functionality differences across document types and component types exist in area composition, styling, and reorganization. It is necessary for the formatter (a browser is also a formatter) to handle all the possibilities. Specialized editing tools are required for the different component and document types and for different workflow steps, in order to foster designer, author, editor efficiency.
     
     

    Conclusions

     
    I’m going to close this presentation with 2 related predictions —
  • Regularly producing both layout-driven documents and content-driven documents (often sharing the same content) is in your future.
  • Product architectures and standards will evolve to support this.

  • XML and the ATA Interchange Model   Table of contents   Indexes   AECMA 1000D and IETP : Diverse approach to define IETP from Data-Modules,