meprojectscontact
 

PJ Modifications and Hints

PJ is a great library developed by Etymon to parse, create and modify PDF files. Here you will find some hints, modifications and extensions to PJ, either written by me or collected from various sources.
All modifications should be applied to version 1.10.

License

All code available on this page is made available under the GNU Public License (GPL), the same license that PJ uses.

Unpacking PJ under Windows

Some classes used by PJ only differ in the case of some letters, so you will get errors if you unpack the pj.jar file under Windows and then try to recompile it. Since I needed to access the PJ source from Windows, I have come up with a rather complicated way to do so:

  1. Get hold of a computer running Linux (or some other OS that distinguishes between upper and lower case characters and supports Samba).
  2. Install Samba.
  3. Copy pj.jar to this computer and unpack it there.
  4. Share the directory you unpacked pj.jar to using Samba.
  5. Mount this share from windows and assign a drive letter.

Now you can work on this share using any Windows tools you like and keep the different character cases.

On the (now non-existent) PJ mailing list, someone claimed to have modified the PJ source to work without these character case issues. Unfortunately, I never got these modifications. Perhaps someday, I will do that myself...

Access a byte array instead of a file

The original PJ only offers support for parsing a PDF file. This is necessary because the structure of a PDF file needs a lot of seeks (jumping to a specific position inside a file) to be handled efficiently.

There are some cases in which you don't want to work on a file, e.g. when you use PJ in an Applet. Another data structure that allows for efficient seeks is a byte array. To avoid rewriting large parts of the PJ code, I have created a wrapper for a byte array the simulates a RandomAccessFile, just the kind of object PJ wants to work on.
Unfortunately, RandomAccessFile cannot directly be extended since many methods are declared final and thus cannot be overwritten. So I have created three classes to work around this problem:

  • RandomAccess: An interface that provides all methods of RandomAccessFile used by PJ
  • RandomAccessFileWrapper: A wrapper around RandomAccessFile that implements all methods declared in RandomAccess. These methods are simply passed on to a standard RandomAccessFile.
  • RandomAccessByteArray: A class that implements all methods defined in RandomAccess on a byte array.

Finally, I modified the PJ classes Pdf and PdfParser to create and work on a RandomAccess object instead of a RandomAccessFile object.

With these modifications, you can easily use PJ in an applet: Get an InputStream from the URL you want to read, copy the contents of this stream into a byte array (using ByteArrayOutputStream) and initialize Pdf with this byte array.

Here are the modified PJ classes and the newly created classes. Download the zip file, copy the files located under com/etymon/pj to the PJ source code you extracted previously (see above) and leave the other files in the directories contained in the zip file.

Sorry, I forgot to include the modified PjObjectVector.java file now listed above. Without this file, you will get an error when compiling Pdj.java.
Thanks to Niko Theiner for spotting this mistake.

 

valid xhtml v1.1 - last modified: 2009-04-06