Posted by
Paul Kok  -  May 2009
Does anyone have experience in creating PDF's from XML files? Or more specific: how to make a 'download as PDF' button on a Roxen page?

I heard that there is no Roxen module so far. I think the options are: use expetrnal tool or us XSL-FO. XSL-FO is a nice way (and a W3C standard) to format xml for (mainly) PDF documents. You descrive the format in the XSL(-FO) file. But still, then you have to parse it to a PDF file. Among others, Apache FOP can do that. Formatting Objects Processor (FOP) is a Java application that converts XSL-FO files to PDF or other printable formats.

I'm thinking of creating a Java Roxen module with FOP so I can create pdf's from a xsl fo file in Roxen (which of course can be created with > XML -> XSLT transformation of Roxen).

I downloaded the binary at: I got the zip from the O'Reilly article with examples. In the article is also mentioned how to use FOP in your java application.
Now, when you use the fop binary it's easy to transform a xsl fo:
         fop.bat krusty.pdf
It's even possible to do the xslt parsing as well, so your input is a xml and xsl file, but of course that part can also be handled by Roxen.It now generates the pdf file in the same folder as the .fo file.

Question now is how to implement it in Roxen:
1) Use the command line interface called from a roxen module (java or pike)
2) Use the fop library in a roxen module build with java. In the article is also mentioned how to use FOP in your java application.

I would say that #2 would be more elegant. However, I think #2 would introduce unwanted complexity in a proof-of-concept test.
Java in roxen ups the complexity a notch, which may be hard to troubleshoot easily.

And then... maybe make a rxml tag <xslfo2pdf/> like this?
      <xsltransform xsl="xml2xslfo.xsl">
             <insert file="test.xml"/>

Would be very cool! :)

Ans then...
 - Just put the response to the client with the right http headers? (Sounds like a good idea making the PDFs on the fly, yes.)
  - Store it in the cache database? (Probably a very good idea.)
  - Put it in the CVS? (I'd say a database cache is better.)

I'm just thinking out loud! :)

Anyone have more ideas or experineces in this? Or is anyone interested in this at all?

Hope to hear from you guys!
Posted by
Martin Pedersen  -  May 2009
Interesting stuff there Paul.

I think you are on the right track regarding the generation and caching of the PDF's. However I think one must look out for the generation time of the PDF-files. If this takes a lot of time one alternative could be to generate the PDF's and store them in the cache on commits, i.e. using a commit hook. Do you have any tests which would give a hint on how much time one PDF-generation takes?

It would be interesting to see how this develops, please keep us posted.
Search this thread: