How to convert docx/odt to pdf/html with Java? This question comes up all the time in any forum like stackoverflow. So I decided to write an article about this topic to enumerate the Java (open source) frameworks which manages that.
Here some paid product which manages docx/odt to pdf/html converters :
- Aspose.Words for Java which manages only docx converter.
- Docmosis which manages docx and odt converters.
- Muhimbi PDF Converter Services.
To be honest with you, I have not tried those solution because it’s not free. I will not speak about them in this article.
Here some open source product which manages docx/odt to pdf/html converters :
- JODConverter : JODConverter automates conversions between office document formats using OpenOffice.org or LibreOffice. Supported formats include OpenDocument, PDF, RTF, HTML, Word, Excel, PowerPoint, and Flash. It can be used as a Java library, a command line tool, or a web application.
- docx4j: docx4j is a Java library for creating and manipulating Microsoft Open XML (Word docx, Powerpoint pptx, and Excel xlsx) files. It is similar to Microsoft’s OpenXML SDK, but for Java. docx4j uses JAXB to create the in-memory object representation.
- XDocReport which provides:
Here criteria that I think which are important for converters :
- best renderer : the converter must not loose some formatting information.
- fast : the converter must be the more fast.
- less memory intensive to avoid OutOfMemory problem.
- streaming: use InputStream/OutputStream instead of File. Using streaming instead of File, avoids some problems (hard disk is not used, no need to have write right on the hard disk)
- easy to install: no need to install OpenOffice/LibreOffice, MS Word on the server to manage converter.
In this article I will introduce those 3 Java frameworks converters and I will compare it to give Pros/Cons for each framework and try to be more frankly because I’m one of XDocReport developer.