PDF Evolution and Compatibility: A Personal Anecdote

Adobe senior principal scientist Jim King recently posted an item on his blog introducing a paper he authored titled “On the Evolution of PDF.” In that document, King discusses how Adobe Systems has worked to maintain the compatibility of their Portable Document Format (PDF) while continuing to enhance its capabilities.

King describes three types compatibility and how Adobe Systems has addressed these as PDF has evolved:

Backward-compatibility, the ability of newer products to process older PDF files.
Forward-compatibility, the ability of older products to “adequately” process newer PDF files (with the meaning of “adequately” subsequently explored).
Feature-compatibility, the ability of products that purposefully lack support for certain features to adequately process PDF files that contain those features.

Features like backward compatibility — which allows newer versions of the PDF viewers to read the earlier format — are not surprising (although it is shocking how often software vendors are unable to achieve even this level of compatibility with their products).

More noteworthy is Adobe’s attempt to attain forward compatibility in the Acrobat product line — that is, the ability of the earlier Acrobat products to correctly process later modifications to the PDF format. As one might suspect, forward compatibility is, as King points out, “more difficult” than backward compatibility.

While this level of compatibility is not always attainable, a personal anecdote from the early days of PDF demonstrates that, at times, even this elusive goal can be accomplished.

In May 1994, I was a one of a group of representatives from higher education institutions participating in a series of customer advisory meetings focusing on Adobe’s Acrobat products. Acrobat 1.0 had been launched the previous June. At the time of these meetings Acrobat 2.0 was in the works, although its public introduction was still four months away.

By the spring of 1994, the Wharton School and the University of Pennsylvania were working on an initiative to enable the distribution of PDF documents across campus. As part of this project, Acrobat Reader had been installed on all the computer lab stations and public-access computers across the University. Remember that Acrobat Reader 1.0 was not free. The retail list price was $50 per seat (dropping to $35 for quantities of 500 or more). In 1993, the Wharton School had purchased a number of licenses for Acrobat 1.0 at a substantial discount. Later, in 1994, Penn had struck a deal to deploy Reader University-wide at no additional cost. (For a look back at this project, see the 1994 Penn Printout article, “Acrobatic Network: PennNet, Acrobat, and the Web at Penn.”)

This endeavor to make Acrobat a standard across campus was coming to fruition in the spring of 1994 when we met with Adobe Systems in Mountain View, CA for the briefings on Acrobat’s future. Adobe representatives unveiled many of the new features that would be available in the Acrobat 2.0 product family, and we were informed that, beginning with Acrobat 2.0, the Reader would be “freely distributable” (although there appeared to be significant internal debate about the details of how this would work).

In addition, Rob Babcock, then Acrobat product manager, told us that the PDF format was changing from an ASCII (plain text) format to one that would, by default, include binary data.

Originally a PDF file was, by definition, a plain text file. On page 8 of the first edition of the Portable Document Format Reference Manual, published June 1993, it states: “A PDF file is a 7-bit ASCII file, which means PDF files use only the printable subset of the ASCII character set to describe documents — even those with images and special characters.” Then, in 1994, we were being told that, although the Acrobat creation tools could still generate the earlier ASCII format, by default the next generation of Acrobat products would save files in the newer binary format.

Upon hearing this, my first thought was, “Oh, no. We’ve just deployed Reader 1.0 on computers across campus, and now you’re switching to a binary file format. We’ll have to get all the schools at Penn and all the computer labs across the University to upgrade to Acrobat 2.0 or they won’t be able to read the PDF files people will be creating.” This was before the widespread adoption of centrally-managed imaging software for computer workstations. Given how decentralized the University was, this would have been a major undertaking.

Babcock calmly explained, “No, you won’t have to do that. Acrobat Reader 1.0 will read the new binary format without a problem.”

My initial reaction was: No, that can’t be true. The earlier product — which used, by definition, a plain text version of PDF — could read the later, binary-encoded documents? That couldn’t be right.

I had read enough of the PDF Reference Manual to have a basic understanding of the structure of a PDF file. As I pondered Babcock’s assertion, my mind raced through what I knew about the internals of PDF. I then realized that Babcock’s claim was, indeed, correct.

From its inception, PDF was, at least in part, a self-describing format. It specifies the filters used to encode its own data stream and, from the outset, Adobe’s Acrobat viewers were designed to interpret a PDF file through these filters. By changing the filter used to decode its own data, Acrobat was able to switch from a pure ASCII file to binary-encoded format. Acrobat Reader 1.0 could read the binary files created by the forthcoming Acrobat 2.0 products.

This was my “a-ha” moment with PDF, the instant when I realized PDF was more than just its graphical abilities (impressive though these were). PDF was designed as a format that was not only expressive, but also durable. It was intended to be “future proof.”

In the intervening years, PDF has held fast to that promise. While not all changes to PDF were able to incorporate this type of forward compatibility, backward compatibility has remained strong. I have PDF documents from circa 1993 that still render flawlessly in the Acrobat 9 viewers. In fact, due to improvements in Acrobat’s font rendering technology, they look better than they did in the version of Acrobat with which they were created.

While some later additions to PDF — such as embedded video and Flash SWF files — cannot be accurately rendered by the earlier viewers, in most cases the features of a PDF document elegantly degrade in the earlier viewers. And, on some occasions — like the introduction of binary encoding in 1994 — Adobe’s Acrobat products have even provided forward compatibility.

By and large, as enhancements have been added to PDF, the vision of a durable, robust format has been retained throughout the evolution of the Acrobat product line.

Adobe co-founder Chuck Geschke once observed that your organization’s documents are more important than the software used to create them, and they need to outlive the computer platform on which they were generated. Thanks to PDF, they can.

2 thoughts on “PDF Evolution and Compatibility: A Personal Anecdote”

ann says:

July 18, 2010 at 9:49 AM

With this release we should have NO worries!

LikeLike

huioyu54 says:

August 25, 2010 at 4:06 AM

I agree, PDF is great. Now we don’t have to worry about finding converters so that we can read documents written by outdated word processors!

LikeLike