Sunday, September 2, 2007

iText - Java PDF library

I recently had to do some tuning on a project I had done that involves converting and merging documents into PDF format. The main performance issue turned out to be poor performing sql and a function-based index fixed it. I decided to try tuning a few other pieces of code and that brought me to try an newer version of the iText library...wow was I pleasantly surprised.

In the project I use iText to merge PDF documents into a single document and also use it to convert TIF and JPG files to PDF format. The ERP system this project was going to be a part of shipped with iText 1.02 so that is why I initially decided to use that library for the PDF manipulation. As part of my tuning effort I decided to try the latest version, 2.0.4. On average performance improved over 40%. As an added bonus some TIF files that failed to convert in the prior 1.02 version were handled flawlessly by the newer release.

In the near future I would like to compare the performance against other Java PDF libraries (such as PDFBox) to see how iText compares. The code that handles the PDF manipulation is isolated from the rest of the logic and implements an interface so it shouldn't be too difficult to try the test since the implementing class is plugable.

Couple lessons learned for me are:
  1. Write classes that do one thing and do it really well. I was lucky here, my project is relatively small and the code was well isolated making the upgrade in versions easy to test side by side.
  2. For sourceforge projects that I rely on I will sign up for the release mailing list and try out new versions. Open source projects like iText are continually evolving. I made a huge mistake of thinking that the art of PDF manipulation would not have advanced much since the release of 1.02 (which I think was sometime in 2004).
  3. iText is a great PDF library for Java. I had never used this library before this project but had heard a lot about it. What really impresses me the most about the project is the documentation and the fantastic examples on the site. I was able to use a lot of code from the examples and that left me to focus more on my program's logic rather than on the details of merging PDF files or converting images to PDF.
One last thing I want to mention was how painless the transition was from release 1.02 (circa 2004) to the latest release of 2.0.4. Just had to change 2 method calls from deprecated methods to new ones that were direct replacements. Great library, fast, stable API, well documented, definitely recommend it.

No comments: