I’ve spent a significant amount of time lately on trying to solve this issue and came across many, many barriers, so I thought I’d share with you my findings and the solution that I used. Whilst Word 2007 SP2 (or Word 2007 using the Save As PDF Add-in) and above natively support saving documents as PDF (and can therefore be used by C# to save as if saving a normal .doc or .docx – you can do this easily by adapting my code) below this level it is not possible. There are a few different ways that this problem can be solved (notably, using a paid library or a web service) but most involve payment and some involve methods that, in my case, were not acceptable (e.g. having to install additional software or having to access locations that were locked down).
So to solve this issue I have instead utilised Word’s ability to save to HTML and then converted this to PDF using the Pechkin library. Here’s how:
- Set your project to target x86 (right click your project, click Properties, select Build and then set “Platform target:” to x86) – this is essential because Pechkin currently only supports x86.
- Either use NuGet to install Pechkin (go to Tools -> Library Package Manager -> Package Manager Console and type “Install-Package Pechkin.Synchronized”) or manually install:
- Download Pechkin: Pechkin.zip (I found this difficult to get hold of outside of NuGet)
- Unpack the files and include them somewhere in your project e.g. a new Lib folder
- Right-click on References and select Add Reference, then select Browse and choose Common.Logging.dll, Pechkin.dll and Pechkin.Synchronized.dll from the files you just unpacked then click OK
- Either in an existing class or in a new class add the following using statements:
using Pechkin; using Pechkin.Synchronized; using Microsoft.Office.Interop.Word;
- Add the following method which will allow you to save your Word documents as HTML:
public static string saveAsHtml(string inputFile) { // Load the required MS Word app object that will allow us to carry out the conversion _Application oWord = new Application(); // Use a dummy value for passing optional arguments object oMissing = System.Reflection.Missing.Value; // Convert the .doc into a PDF object oSaveFormat = WdSaveFormat.wdFormatHTML; // Pass a reference to a generic object to the COM function and load // the Word doc into memory object oMergedWordDocPath = inputFile; Document oWordDoc = oWord.Documents.Open(ref oMergedWordDocPath, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing); oWordDoc.Activate(); // Use the same file name except for the extension as the Word doc object oMergedHtmlDocPath = Path.ChangeExtension(inputFile, "htm"); // Save the HTML file oWordDoc.SaveAs(ref oMergedHtmlDocPath, ref oSaveFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing); object saveChanges = WdSaveOptions.wdDoNotSaveChanges; // Close the document ((_Document) oWordDoc).Close(ref saveChanges, ref oMissing, ref oMissing); oWordDoc = null; // Close Word ((_Application) oWord).Quit(ref oMissing, ref oMissing, ref oMissing); oWord = null; // Return the path to the newly created PDF file return (string) oMergedHtmlDocPath; }
- Add the following method which will allow you to save your HTML document as PDF:
public static string saveHtmlAsPdf(string inputFile) { // Read in the HTML to a byte array byte[] html = File.ReadAllBytes(inputFile); // Create the pdf converter byte[] pdf = new SynchronizedPechkin( new GlobalConfig()).Convert( new ObjectConfig() .SetLoadImages(true) .SetPrintBackground(true) .SetScreenMediaType(true) .SetCreateExternalLinks(true), html); // Construct the new output name string outputFileName = Path.ChangeExtension(inputFile, "pdf"); // Read the converted file in to the new file using (FileStream file = System.IO.File.Create(outputFileName)) { file.Write(pdf, 0, pdf.Length); } // Return the path and filename of the new file return outputFileName; }
- Add the following method which will make use of these methods to perform the conversion:
public static string convertDocToPdf(string docPath) { string htmlFile = saveAsHtml(docPath); return saveHtmlAsPdf(htmlFile); }
- And finally wherever you wish to perform the conversion use the following:
string pdfFile = convertDocToPdf(fileInputPath);
I hope someone else finds this useful and it saves them the many hours that I spent coming up with this solution!
NOTE: I kept receiving the following error:
Could not load file or assembly ‘Common.Logging, Version=2.1.1.0, Culture=neutral, PublicKeyToken=af08829b84f0328e’ or one of its dependencies. The located assembly’s manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)”:”Common.Logging, Version=2.1.1.0, Culture=neutral, PublicKeyToken=af08829b84f0328e
The reason for this is that when using NuGet to download Pechkin, it included an Assembly Binding Redirection to use version 2.1.2.0 instead of any other version. There are two different reasons for getting this error:
- The Assembly Binding Redirection code in your App.config is missing, use the following:
<runtime> <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1"> <dependentAssembly> <assemblyIdentity name="Common.Logging" publicKeyToken="af08829b84f0328e" culture="neutral" /> <bindingRedirect oldVersion="0.0.0.0-2.1.2.0" newVersion="2.1.2.0" /> </dependentAssembly> </assemblyBinding> </runtime>
- Your existing Assembly Binding Redirection is not working, in this case I would recommend using Common.Logging 2.1.1.0 – this isn’t the easiest to find so you can download this yourself via NuGet but it may not work as you may be told that Pachkin has a dependency on 2.1.2.0 in which case you can download it here: Common.Logging.dll.