PDF and eBooks: Linking Form and Content
| By Pat Coyne | Page 1 |
The other area where electronic texts need to compensate for their relative inflexibility is scrolling and searching. Until they use the alternatives, few people realise how quick, easy and efficient is flipping through the pages of a book. Pressing the PgDn button or using the scroll bars on a computer doesn't really compete. So any etext should have:
- Free-text search, preferably with Boolean, proximity and wild-card options.
- A good selection of bookmarks linked to chapters, sections, etc., as well as a linked table of contents, so that one click takes you to the relevant section.
- If it is a work of reference, a proper linked index. Text search may not be enough. For example you might be trying to find out when Joseph Bloggs visited Mexico and in the book it reads, "Joe high-tailed it to Yucatan, pursued by the Navy."
The PDF Advantage
From all this, it will be clear that the portable document format (PDF) is an extremely good format for the production of electronic books, certainly much better than HTML or any other format currently available, although the Open Ebook standard may provide stiff competition in the future. At the moment, our company uses PDFs exclusively, because of what we see are its advantages.
- It handles fonts brilliantly, displaying and printing them
accurately at all resolutions.
- Once you have created the PDF document, page numbering and
layout is independent of magnification or device used, unlike
HTML where pagination is browser dependent. That is very
important for referencing.
- There are sophisticated and flexible links, bookmarks and
thumbnails, which facilitate text navigation.
- Very importantly from a publishers point of view, PDFs can be easily produced from virtually any DTP or word processing software.
That said, there are a few improvements that I would like to see that would make PDF an even better format for book production.
Download A Christmas Carol
Electric Book Co's free PDF version
570K ZIP
Still Room to Improve
First, PDF is a final format. The books themselves are created in either a word processing or DTP package. Since we may be putting in tens of thousands of links, it is quite impractical to do this in PDF. Instead, they are created using database programs writing either pdfmark operators, which are then incorporated into a postscript file or else hyperlinks in Word's XML format, which is then transformed to postscript using Adobe's PDFMaker. Currently, the set of pdfmark operators is rather limited; it would certainly be useful if Adobe released or created operators for the whole set of commands available in Acrobat. Thomas Merz's book Web Publishing with Acrobat/PDF does detail a number of extra operators, but they are undocumented and I don't know how well they would work.
As for PDFMaker, in our experience it has more bugs than a nest of termites, particularly when it has to cope with a large number of links. These include linking to the wrong page, putting in spurious links and simply falling flat on its face. (If Adobe would like a list of the shortcomings we have found, I will be pleased to oblige.) As it is, we use PDFMaker mostly to define the active area for links and program the rest ourselves. Now these problems may not be PDFMaker's fault, but rather the failings of Word's VBA; but since Adobe has not released the code for the macro, it is impossible to judge. If Adobe did release the code, I am sure it would stimulate some creative programming.
Secondly, I would like to see a more sophisticated version of Acrobat Search. In particular, it could do with much better Boolean and proximity functions, so that you can find more then one Boolean link or specify proximity by sentence or paragraph. When you are dealing with large documents like books, those facilities are important.
Thirdly, there is the question of Acrobat's operation on the Internet. Adobe has rightly stressed Acrobat's Web compatibility, but in our experience PDF files are still somewhat fragile on download. We regularly get emails asking why the free files on our web site are "password protected" or appear as garbage on the screen, when what has happened is perfectly good files have been corrupted on download. Curiously, we have never had any problems reported on downloading the same files zipped, so it must be possible to make PDFs more robust. Some improvement here would certainly make our lot easier, especially since we plan to offer many more titles on the Internet.
The Web Trend
On a related issue, the world, like it or not, is increasingly demanding browser-enabled software, and so it must be questionable whether any software, like Acrobat, that has its own specific interface, will be acceptable in the longer term. Acrobat does of course run as a plug-in under IE or Netscape Navigator, but the fact that it has to be launched and has its own separate control panel can prove tiresome, if not actually confusing. The Adobe Document Server (ADS) is an ingenious attempt to solve the problem, but it is not really suitable for longer documents because each page has to be loaded separately for viewing. Ultimately Adobe should aim to have a browser that can move seamlessly between HTML/XML and PDF, including being able to index PDF documents automatically as they are loaded, so that they can be searched.
However, all caveats aside, Acrobat, even in its present incarnation, is a terrific product and one ideally suited to be a major player in the emerging world of electronic books. Judging by recent announcements, Adobe now realises that it has a real opportunity on its hands. Go for it.
This article originally appeared on Planet PDF