Categories
IT Programming

How to save to PDF in C# using Word 2003 or 2007 without SP 2 (using Pechkin)

I’ve spent a significant amount of time lately on trying to solve this issue and came across many, many barriers, so I thought I’d share with you my findings and the solution that I used. Whilst Word 2007 SP2 (or Word 2007 using the Save As PDF Add-in) and above natively support saving documents as PDF (and can therefore be used by C# to save as if saving a normal .doc or .docx – you can do this easily by adapting my code) below this level it is not possible. There are a few different ways that this problem can be solved (notably, using a paid library or a web service) but most involve payment and some involve methods that, in my case, were not acceptable (e.g. having to install additional software or having to access locations that were locked down).

So to solve this issue I have instead utilised Word’s ability to save to HTML and then converted this to PDF using the Pechkin library. Here’s how:

  1. Set your project to target x86 (right click your project, click Properties, select Build and then set “Platform target:” to x86) – this is essential because Pechkin currently only supports x86.
  2. Either use NuGet to install Pechkin (go to Tools -> Library Package Manager -> Package Manager Console and type “Install-Package Pechkin.Synchronized”) or manually install:
    1. Download Pechkin: Pechkin.zip (I found this difficult to get hold of outside of NuGet)
    2. Unpack the files and include them somewhere in your project e.g. a new Lib folder
    3. Right-click on References and select Add Reference, then select Browse and choose Common.Logging.dll, Pechkin.dll and Pechkin.Synchronized.dll from the files you just unpacked then click OK
  3. Either in an existing class or in a new class add the following using statements:
    using Pechkin;
    using Pechkin.Synchronized;
    using Microsoft.Office.Interop.Word;
  4. Add the following method which will allow you to save your Word documents as HTML:
    public static string saveAsHtml(string inputFile) {
        // Load the required MS Word app object that will allow us to carry out the conversion
        _Application oWord = new Application();
    
        // Use a dummy value for passing optional arguments
        object oMissing = System.Reflection.Missing.Value;
    
        // Convert the .doc into a PDF
        object oSaveFormat = WdSaveFormat.wdFormatHTML;
    
        // Pass a reference to a generic object to the COM function and load
        // the Word doc into memory
        object oMergedWordDocPath = inputFile;
        Document oWordDoc = oWord.Documents.Open(ref oMergedWordDocPath, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
        oWordDoc.Activate();
    
        // Use the same file name except for the extension as the Word doc
        object oMergedHtmlDocPath = Path.ChangeExtension(inputFile, "htm");
    
        // Save the HTML file
        oWordDoc.SaveAs(ref oMergedHtmlDocPath, ref oSaveFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing);
    
        object saveChanges = WdSaveOptions.wdDoNotSaveChanges;
    
        // Close the document
        ((_Document) oWordDoc).Close(ref saveChanges, ref oMissing, ref oMissing);
        oWordDoc = null;
    
        // Close Word
        ((_Application) oWord).Quit(ref oMissing, ref oMissing, ref oMissing);
        oWord = null;
    
        // Return the path to the newly created PDF file
        return (string) oMergedHtmlDocPath;
    }
  5. Add the following method which will allow you to save your HTML document as PDF:
    public static string saveHtmlAsPdf(string inputFile) {
        // Read in the HTML to a byte array
        byte[] html = File.ReadAllBytes(inputFile);
    
        // Create the pdf converter
        byte[] pdf = new SynchronizedPechkin(
            new GlobalConfig()).Convert(
                new ObjectConfig()
                .SetLoadImages(true)
                .SetPrintBackground(true)
                .SetScreenMediaType(true)
                .SetCreateExternalLinks(true), html);
    
        // Construct the new output name
        string outputFileName = Path.ChangeExtension(inputFile, "pdf");
    
        // Read the converted file in to the new file
        using (FileStream file = System.IO.File.Create(outputFileName)) {
            file.Write(pdf, 0, pdf.Length);
        }
    
        // Return the path and filename of the new file
        return outputFileName;
    }
  6. Add the following method which will make use of these methods to perform the conversion:
    public static string convertDocToPdf(string docPath) {
        string htmlFile = saveAsHtml(docPath);
    
        return saveHtmlAsPdf(htmlFile);
    }
  7. And finally wherever you wish to perform the conversion use the following:
    string pdfFile = convertDocToPdf(fileInputPath);

I hope someone else finds this useful and it saves them the many hours that I spent coming up with this solution!

NOTE: I kept receiving the following error:

Could not load file or assembly ‘Common.Logging, Version=2.1.1.0, Culture=neutral, PublicKeyToken=af08829b84f0328e’ or one of its dependencies. The located assembly’s manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)”:”Common.Logging, Version=2.1.1.0, Culture=neutral, PublicKeyToken=af08829b84f0328e

The reason for this is that when using NuGet to download Pechkin, it included an Assembly Binding Redirection to use version 2.1.2.0 instead of any other version. There are two different reasons for getting this error:

  1. The Assembly Binding Redirection code in your App.config is missing, use the following:

      <runtime>
        <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
          <dependentAssembly>
            <assemblyIdentity name="Common.Logging" publicKeyToken="af08829b84f0328e" culture="neutral" />
            <bindingRedirect oldVersion="0.0.0.0-2.1.2.0" newVersion="2.1.2.0" />
          </dependentAssembly>
        </assemblyBinding>
      </runtime>
  2. Your existing Assembly Binding Redirection is not working, in this case I would recommend using Common.Logging 2.1.1.0 – this isn’t the easiest to find so you can download this yourself via NuGet but it may not work as you may be told that Pachkin has a dependency on 2.1.2.0 in which case you can download it here: Common.Logging.dll.

About Stephen Pickett


Stephen Pickett is a programmer, IT strategist and architect, project manager and business analyst, Oracle Service Cloud and telephony expert, information security specialist, all-round geek. He is currently Technical Director at Connect Assist, a social business that helps charities and public services improve quality, efficiency and customer engagement through the provision of helpline services and CRM systems.

Stephen is based in south Wales and attended Cardiff University to study Computer Science, in which he achieved a 2:1 grading. He has previously worked for Think Consulting Solutions, a leading voice on not-for-profit fundraising, Fujitsu Services and Sony Manufacturing UK as a software developer.

Stephen is the developer of ThinkTwit, a WordPress plugin that allows you to display multiple Twitter feeds within a blog.

By Stephen Pickett

Stephen Pickett is a programmer, IT strategist and architect, project manager and business analyst, Oracle Service Cloud and telephony expert, information security specialist, all-round geek. He is currently Technical Director at Connect Assist, a social business that helps charities and public services improve quality, efficiency and customer engagement through the provision of helpline services and CRM systems.

Stephen is based in south Wales and attended Cardiff University to study Computer Science, in which he achieved a 2:1 grading. He has previously worked for Think Consulting Solutions, a leading voice on not-for-profit fundraising, Fujitsu Services and Sony Manufacturing UK as a software developer.

Stephen is the developer of ThinkTwit, a Wordpress plugin that allows you to display multiple Twitter feeds within a blog.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.