![]() |
XML &, digital printing | Table of contents | Indexes | Book Ticket Files &, imposition templates for variable data printing | ![]() |
|||
PODi PPML ![]() Pageflex Peter Davis deBronkart objects reusability reusable | PPML (Personalized Print Markup Language) |
| a new XML-based industry standard print language |
| deBronkart, Dave |
| Dave deBronkart |
| Senior Consultant |
New York ![]() PODi USA ![]() West Henrietta | PODi,
150 Lucius Gordon Dr, Suite 110 West Henrietta New York 14586 USA Phone: (716) 239-6014 Fax: (716) 239-6093 email: daved@podi.org web site: www.podi.org |
| Biography |
| Davis, Peter |
| Peter Davis |
| Principal Consulting Software Engineer |
Cambridge ![]() Massachusetts ![]() Pageflex Inc. USA ![]() | Pageflex Inc.,
215 First Street Cambridge Massachusetts 02474 USA Phone: 617-520-8345 Fax: 617-868-0784 email: pdavis@pageflexinc.com web site: www.pageflexinc.com |
| Biography |
| Abstract |
Introduction |
History |
Goals |
| The goals established for the PPML standard included: |
|
|
|
|
|
|
|
|
|
|
Major issues |
| During the development process of PPML, a number of substantial issues arose. Most of these were the result of conflicting assumptions and capabilities of existing solutions. These are discussed in more detail in the following sections. |
Different, conflicting prior implementations |
| When the PPML activity began, the state of the variable data printing industry was such that several vendors had introduced proprietary solutions, some of which were already in use and mature, and evolving towards second generation products. Needless to say, as is usually true when a standard is developed for an existing market, vendors were reluctant to jeopardize the market niches they had already established with these products. However, the working group participants recognized that expanding the market overall would benefit everyone, and that this could not be accomplished without some standards in place. |
| Despite these good intentions, there were still issues around the varying capabilities of different systems. For example, some systems optimized recurring data by caching pre-rasterizedunscreened data. Others cached already screened data. The latter method, while potentially gaining important performance benefits, reduces the opportunities for reuse, as the screened data cannot easily be relocated on the page without incurring the possibility of halftone phase errors. |
| Clearly, to arrive at a standard, we needed to define a model of a PPML consumer and how it processes data. For PPML, this model was developed incrementally as the standard emerged, rather thana priori . This allowed the model to be adapted to new requirements as various issues were brought to light. However, it also meant that some conflicting assumptions about the processing model were carried along fairly far into the process. |
Separation of layout and content |
| The PPML working group reached consensus early that designing a new page description language or content data format wasnot a goal of this effort. Rather, the activity was directed towards layering a formaton top of existing content representations, in order to express characteristics such as element recurrence, etc. In other words, PPML was to be a language for expressingmetadata about the content. |
| A closely related issue was the idea that PPML should support multiple, possibly mixed content data formats. We wanted a solution that would embrace the various formats commonly in use today. This in itself proved a significant issue, since many vendors only offer support for one or a few existing formats, and the various content data formats all have different features and capabilities. |
Transport |
| The XML syntax was adopted early on in order to keep the PPML standard in line with other emerging standards and print and presentation, and to exploit XML’s features for representing structured documents. However, to meet the goal of separation of layout and content, and to support multiple content data formats, we had to find a way to combine non-XML data, including binary data, with the XML syntax of PPML. |
| The “Src” attribute sufficed to allow references to external non-XML data from within PPML. However, there was still a need to be able to package that data for transmission across a single data stream, and to be able to archive an entire job including all the requisite files. This is an absolute requirement for streaming/transactional applications, such as printing of credit card statements. For such applications, it must be possible to reload an archive and beabsolutely certain that each page will print identically to the first run. People involved in this work thus require that the job be stored in a monolithic stream. (A note later in this paper discusses the challenge of efficiently packaging non-XML data into an XML stream.) |
| For this purpose, MIME packaging is the recommended solution. It is specifically designed for packaging of multiple data formats into a single stream, and standards already exist for specification of content types. (See Appendix 2 for the MIME content types supported for PPML documents.) |
Varying consumer capabilities |
| Ideally, every vendor with an existing product would benefit greatly if all these features could be added to their existing products, even though their capabilities differ. Development would be simpler and the important "time to market" goal would be supported. |
| This had profound effects on decisions about what the language must express. The design choice was: Should the language express the most powerful of today's features, or the least common denominator? Any feature that's specified in the language must be supported on every PPML machine; this influences how the machine must be engineered and what it will cost. |
| Could the language "legislate" what capabilities a Consumermust have? Again, this issue was addressed by refining our models of how a Producer and Consumer process a job, and deciding what capabilities are needed. |
Pre-declaring graphics states |
| To benefit from caching, PPML consumers must be able to use the pre-rasterized renditions of recurring objects in any position, scale, orientation or clipping in which they may appear. Some Consumers are able to place pre-rasterized elements anywhere on the page. Others, however, are limited by the fact that cached elements are pre-screened for efficiency. Similarly, some Consumers can rotate or clip pre-rasterized elements, while others can not. |
| To avoid placing undue requirements on Consumers. PPML adopted the approach of declaring every possible transformation or clipping rectangle at the time a recurring element is defined. A REUSABLE_OBJECT definition includes one or more OCCURRENCEs, each of which specifies a transformation matrix and a clipping rectangle for the object. Objects are placed on the page by reference to a specific OCCURRENCE. In this way, the Consumer can make its own determination about which occurrences of a given object can be implemented with a single rasterization, and which may require multiple cached renditions. |
Element references |
| During the early stages of PPML development, the model permitted object definitions to include references to previously defined objects. For instance. it was possible to define a REUSABLE_OBJECT containing a company logo, and then define another REUSABLE_OBJECT containing the logo plus some text. |
| However, this conflicted with the above stated goal of making all of the possible transformations and clips of an object available when the object is defined. If objects could include references to others, then the included object had to specify every possible occurrence of the containing object among its own. The deeper the levels of inclusion become, the more the number of occurrences grows. To make this a tractable problem, PPML eliminated the possibility of defining objects by reference to earlier objects. It’s a simple matter to re-define the necessary content in each object which may contain it. |
Caching |
| Since reusability (the ability to save content objects in memory) is the single most important aspect of the new language, differences in caching capability are absolutely pivotal. What assumptions should the new language make about the caching capability of the consumer? |
| Complicating this was the desire to make the language successful on all levels of machine. Of course a half-million-dollar digital printing press can cache images; but what about desktop machines? Users should be able to proof-print a PPML job on an affordable workgroup printer. |
| Consumers may also vary in thekinds of caching they can do. Some may choose to cache the source data so it can be reused later without re-downloading; some may cache at an intermediate stage called the display list; some may cache the object in an intermediate state, before the image content has been screened; others may only cache the final, rasterized, screened image. How should the language deal with this? How far should the Producer go in managing the Consumer’s activities? A goal of PPML was to provide as much information as possible about the usage characteristics of elements within a print job, but to avoid dictating any caching strategies or implementations. It was felt that vendors of PPML Consumers would compete based on the degree of optimization they can provide, among other things. |
Relocatability of pre-rasterized objects |
| Once a graphic has been screened, most systems cannot relocate it. If pre-screened elements are placed so they abut or overlap, any slight differences in the placement of the halftone dots become glaringly visible to many people.This means that if a graphic appears at a slightly different position on the page, it must be re-rasterized, This is time-consuming, and requires that every possible position of an object must be pre-declared as a separate OCCURRENCE. |
| While some systems do make use of pre-screening to get the best performance from pre-rasterized data, this was not deemed to be a requirement. Currently, PPML permits objects to be positioned via an attribute on the MARK which actually places them on the page. |
Arbitrary transformations and clipping |
| While we agreed not to create another page description language, it became evident early on that it’s useful to allow PPML to specify transformations and clipping on external elements. For example, a PPML document might refer to an Encapsulated PostScript® (EPS) file, but scale it, rotate it, and position it on the page via the built-in capabilities of PPML, without the need to generate additional PostScript code. |
| To support this mechanism, we had to require that PPML Consumers be able to implement these transformations and clipping operations on all of the various content formats they support. While this is straightforward for PostScript, PDF, and others of that ilk, it remains to be seen whether other non-page-description formats (e.g., TIFF, JPEG) will be supported equally robustly. |
Scope of specification |
| PPML is specifically intended to support high-volume, automated production printing. This is an extensive, multi-stage process, involving far more than just the content of the page. We wanted the language to be clean, elegant, and efficient, and thus to have no more features than necessary; yet we didn’t want it to be so “pure” that it wouldn’t fit into today’s existing workflows. And the more we put into the language, the more it would affect the requirement of rapid time to market. More features take longer to develop, test, and debug. |
| So we faced multiple questions about how much should go into the language and how much should not. |
Imposition |
| Digital printing presents all of the challenges of traditional workflows. It’s not enough to compose an individual page – multiple pages must be imposed onto a sheet in specified positions. All the demands of CTP are present, with added complications caused by the fact that every sheet is different and must feed into a larger workflow. |
| Manual sorting of the sheets would be economically unacceptable, so the stack of printed sheets must contain documents in the correct sequence, so that when the pages are trimmed they are ready to use (or ready for the finishing process). For instance, postage can be the most expensive cost in a digital printing project, and substantial discounts are available if the documents are delivered to the post office in the correct order. So imposition software can provide real economic benefits. As another example, if the sheet contains 20 cards with serial numbers, they should be printed in stacks: in a stack of 50 sheets, the first sheet should contain cards #1, 51, 101, etc. |
| The design decision faced by the Working Group was, should imposition be specified in the language, or should PPML be limited to specifying thecontent of individual documents? We knew that some PPML products would be used completely standalone, so imposition must be specified, whether in PPML or some other language. But other PPML products would fit into a workflow that already has very sophisticated imposition software; should we limit these products to a less powerful feature set? To complicate things further, we wanted PPML to be universal, even on desktop machines whose sheet size is too small for imposed sheets. |
| The decision we made was to specify complete imposition instructions, but to make them optional: a Producer may omit any imposition elements, and a Consumer is entitled to ignore the in-stream imposition instructions and use its own imposition instead – or do no imposition at all. |
Media and finishing |
| This topic was an example of the challenges introduced by the convergence of promotional and transactional printing – a problem that’s entirely new to the era of high quality color digital printing. |
| Conventional graphic arts workflows involve loading a stack of paper into a press and “manufacturing” a stack of identical sheets, which are then removed from the press and fed into various finishing machines: folders, cutters, etc. But in the world of transactional printing, document printers have multiple online paper trays, and different paper sizes or colors are usually selected by online command. Should the PPML language, which was intended to provide a standard way of specifying reusable page content, include instructions that support online media selection? |
| Similarly, in the transactional world, online finishing equipment is commonplace. For PPML to be widely accepted, it must fit into existing workflows. Should the language specify finishing instructions? |
| As we began considering these questions we realized it’s a vast subject. We decided that media and finishing – the front and back ends of the process – deserve a separate specification of their own. It was decided that the scope of PPML would be restricted to specifying what goes on each sheet, not where the sheets come from or what happens to them afterward. |
Transport |
| As noted earlier, the nature of graphic arts work is that binary data must be included, and that’s not possible in XML. This requires defining a means of submitting the binary data to the Consumer. Should this be specified in the PPML language? |
| This question was complicated by the news that the W3C has recognized this problem and will be defining a solution. PPML will support that solution when it comes along. Until that happens, how much should PPML specify? |
| Again the decision was that PPML would be restricted to defining what goes on each sheet – it would not specify how the instructions are transported from Producer to Consumer. |
Source data |
| Ideally, we wanted PPML to permit content data to be included either by direct containment in the PPML elements, or by reference via URI. However, many of the appearance formats are not XML-safe, and may include binary data, “<” and “>” characters, etc. |
| PPML defines both EXTERNAL_DATA and INTERNAL_DATA element types. The INTERNAL_DATA element can specify an encoding, such as base64, which allows the data to be represented in an XML-safe way. |
| Processing of reusable elements implies that different components of a job may RIP in different sequences. It was necessary, therefore, to allow various source data elements to be concatenated for processing. For example, a job which uses multiple EPS files might want to use the same procedure sets, dictionaries, etc. with each one. To accomplish this, we defined a SOURCE element, which can contain multiple EXTERNAL_DATA and INTERNAL_DATA elements, all of the same format. All of the content data for a SOURCE are concatenated together for processing. |
| We did not allow EXTERNAL_DATA or INTERNAL_DATA elements to be defined once, and then used repeatedly. Some members felt that this posed a burden on the Consumer by requiring it to cache the data in source format to support subsequent references. This would be a useful feature to consider for future versions, however. |
Descriptive, not prescriptive |
| Earlier we mentioned the division of responsibilities between the Producer and Consumer. For instance, if the Producer knows an object will be printed 1000 times, should it tell the Consumer “RIP this and cache it”? |
| It was decided that to give such instructions reliably, the Producer would need detailed knowledge of the Consumer’s capabilities. Then, if decisions based on that information were embedded in the PPML data stream, the same stream would not work properly on a different machine. Thus, it was decided that PPML would merely convey what it knows – it woulddescribe the object to be RIPped, but it would notprescribe how the Consumer should execute the job. |
| Several members of the group remarked that this also allows for a wide range of competitive approaches for different markets. |
Implementing the design in XML |
| Once we’d decided what we wanted to accomplish with the language, we came to the issue of how to express it in XML. This presented several more decisions. |
| Surprisingly, these decisions were more complex because XML is so popular. Its popularity makes people want to do “cool XML things” that may or may not be appropriate for any given application. |
| In the end, the PPML specification was task-oriented and direct, not abstract. This sacrifices a certain amount of potential versatility, but our requirements called for rapid time to market, not unlimited versatility. Here are some of the decisions that were made. |
DOM vs event-driven |
| Because personalized printing is structured and hierarchical there was a strong urge to take full advantage of XML’s structures when describing documents. However, that approach would have led to two important problems. |
|
| Therefore, against some objections, an event-driven model was chosen. |
Restrictions imposed by DTDs |
| Early in the process the working group encountered questions about element models, especially how to deal with the rules imposed by DTDs. Is it important that the specification conform to the rules of DTDs? For instance if the JOB element can contain six different component elements, and it really doesn’t matter what sequence they’re in, and there are sometimes good reasons for each one to come first, but they must all come before any DOCUMENT elements, must the model for JOB list all 720 possible permutations? |
| Numerous group members were understandably eager to use standard XML tools wherever possible, so they favored “making the DTD king.” However, conversations with experienced XML developers produced the advice that the rules for DTDs were neither necessary nor sufficient to describe thereal rules for most applications. Their advice was to decide what we need to express, and state that in the spec: we should not say “We want to do this but it’s not possible in a DTD.” |
| Thus, it was decided (again over some objections) that the DTD would come last, after the language was specified. |
Packaging non-XML data in the same stream |
| As noted above, graphic arts applications require the uses of binary data for image content. Our research showed that this issue has been encountered before, for instance in medical imaging. (Example: a medical document may need to include a binary image of an x-ray or other photographic documentation.) |
| We considered using CDATA, but even that isn’t practical for the graphic arts. It’s entirely possible that any given image file might include the CDend string “]]>”. If we used CDATA, the PPML producer would have to scan every part of every image file to see if it contains that sequence. It’s not uncommon for such files to be 20MB or even 100MB in length, so this would have been a great burden. |
| In late 1999 we also learned that the W3C is beginning to address this so-called “packaging issue”. There was an ad hoc meeting about it at XML ’99, with Tim Berners-Lee and about 20 other people, including a member of the PPML working group. When there’s an official W3C solution we will want to support it; but what should we do in the interim? |
| We chose to use the existing MIME standard. Binary data objects are interspersed with XML segments in a multi-part MIME stream. The receiving system separates the parts, recognizing which parts are image data and which parts are XML; the non-XML data gets recognized and stored at the receiving system, and the XML parts are handed to the PPML interpreter. |
| With this solution, the binary data is in the same stream but notwithin the XML. The outermost wrapper is MIME, not XML. |
Group infrastructure issues |
| An interesting problem arose regarding the discussion infrastructure. To support group discussion, a Web-based product called WebBoard was used, which allows a discussion thread to be viewed with a conventional Web browser. While WebBoard’s features are appropriate for a collaborative effort like this, it has one problem: if a user posts a sample of XML code, the browser interprets the angle brackets as denoting HTML code, which of course it doesn’t understand, so nothing is displayed. Similarly, leading spaces are suppressed in HTML, so indents disappear. Members learned, the hard way, to indent using dots and to use other brackets, for instance: |
[PPML] ..[JOB] ....[DOCUMENT] |
Summary and future directions |
| The finished language specification is available on request from http://www.podi.org/ppml. The elements are defined in 70 pages, of which 30 are devoted to “manufacturing information” and 10 are introductory material. Thus, the job content model is completely described in 30 pages. |
| PPML has already been well accepted. Virtually every member company has issued a strong statement of support for the standard, and most have stated that they will have PPML products on display at DRUPA – nine weeks after the specification was released. Clearly, the goal of rapid time to market was achieved. |
| Future directions include both broadening the reach of PPML and restricting it: |
| Perhaps the best testimonial to the development of this standard is that all the members of the 1999 PPML working group have renewed their memberships for 2000. |
Appendix 1 |
| The following is Appendix 1 of the approved text of the PPML specification. |
PPML Working Group participants |
The PPML specification would not have been possible without the substantial efforts of the following companies and their designated participants. In alphabetical order, they are:
|
Prior work |
| While PPML as a standardized data format is new, the technology of variable data printing (VDP) is not. |
| PPML concepts were largely contributed by skilled developers of established VDP products from several members of PODi, including: |
Origins of PPML |
| PPML 1.0 grew out a combined proposal approved in July 1999 by the PPML Working Group. This proposal was a merger of proposals from Scitex, Barco and Pageflex: Scitex, by way of its VPS language contributed the foundation for the basic object model, object-level granularity, and job structure of PPML; Barco contributed the foundation for the production-centric parts of the specification, including major work on imposition; PageFlex contributed the original proposal for an XML-based language called PPML. NexPress contributed substantial work from its proposed vPDF specification, and Xerox presented additional information at the July conference based on its substantial experience with its VIPP PostScript-based variable data software. |
| The following are examples of the strings approved by IANA (the Internet Assigned Numbers Authority) that are to be used in the value of the Format attribute in the SOURCE element. These strings were developed for use in identifying the media type in a MIME stream; PPML is adopting them by reference because they are an existing standard that is well suited to PPML needs. |
Appendix II |
| The following is another appendix from the approved text of the PPML specification. |
| The following are examples of the strings approved by IANA (the Internet Assigned Numbers Authority) that are to be used in the value of the Format attribute in the SOURCE element. These strings were developed for use in identifying the media type in a MIME stream; PPML is adopting them by reference because they are an existing standard that is well suited to PPML needs. |
| Most of these strings are from http://www.isi.edu/in-notes/iana/assignments/media-types/media-types . |
|
![]() |
XML &, digital printing | Table of contents | Indexes | Book Ticket Files &, imposition templates for variable data printing | ![]() | |||