JPEG acceleration with libjpeg-turbo
2011-04-24, 16:16 by Jonas Wallden in Development
Pike, the language environment in which Roxen products execute, includes a very common open-source JPEG library called libjpeg. This library implements decoding and encoding of JPEG images and has been used for years and years. If you work with RXML you have probably used <cimg src="..." format="jpeg" /> at some time or another to generate image thumbnails.
Recently I stumbled upon a variant called libjpeg-turbo which has been designed as a drop-in replacement for libjpeg. The developers claim 2-4x faster execution due to large amounts of hand-written SSE assembly for Intel x86 and x86_64 architectures. The library has been adopted by Chromium and Firefox and that's naturally a stamp of approval that greatly reduces any fears of incompatibilities or future abandonment.
In addition to Pike we also ship ImageMagick with Roxen CMS and Roxen Editorial Portal. ImageMagick is used primarily with Roxen EP to offload scaling of images in news feeds. This is a task which can strain even the fastest server; it can take many seconds per image to generate the set of medium- and low-res thumbnails that EP needs. Luckily it runs as independent sub-processes that take advantage of all cores in your machine (up to an admin-defined limit), but obviously parallelization doesn't reduce latency for a single image. With that in mind, since ImageMagick also relies on libjpeg we have another candidate for the turbo library.
As of this week I've successfully compiled and integrated the libjpeg-turbo library for Pike and ImageMagick in all of our Mac OS X and RHEL 4/5 builds. Initial benchmarks on a Core i5 iMac shows that the 2-4x speed improvement was overly conservative. Have a look at some benchmark numbers of Image.JPEG.decode() and Image.JPEG.encode() in Pike:
3888 x 2600 pixel image (12 MB compressed)
|Decode||1.28 sec||0.26 sec|
|Encode (Q=95)||1.14 sec||0.14 sec|
|Encode (Q=75)||0.86 sec||0.09 sec|
|Encode (Q=25)||0.75 sec||0.07 sec|
640 x 480 pixel image (0.2 MB compressed)
|Decode||0.024 sec||0.005 sec|
|Encode (Q=95)||0.030 sec||0.004 sec|
|Encode (Q=75)||0.026 sec||0.003 sec|
|Encode (Q=25)||0.024 sec||0.002 sec|
In this particular test the replacement is 5-12x faster than the old implementation! Of course this represents RAM-based throughput only so ImageMagick will not see improvements on the same scale, but nevertheless it's very impressive! I can also add that all output was byte-for-byte identical with both libraries.
The updated library will be included in future builds of Roxen 5.x for the platforms mentioned earlier.