Second Life of a Hungarian SharePoint Geek

October 27, 2011

Using SharePoint 2010 Word Automation Services to convert document synchronously

Filed under: SP 2010, Word Automation Services — Tags: , — Peter Holpar @ 21:43

A few months ago a fellow developer asked me how to use WAS in an application that requires synchronous document conversion. In the current post I show you a simple way for that.

As you might know (if not you can read a bit more about that at the end of this post), you can submit Word documents to WAS and let it convert the documents to PDF or other formats like XPS. WAS works as a timer job, so conversion is done based on the schedule of the job that you should set based on the number of documents to be converted and the free resources of the server. As in the case of any timer job, you can start the Word Automation Services Timer Job immediately using the web UI and from custom code as well.

For the sample method I pass in the document content as a byte array and the converted document is returned by the method as a byte array as well. First I’ve implemented a Stream-based solution but found it is easier to work with byte arrays in this case (see reason a bit later).

After preparing and starting the ConversionJob, we start the WAS timer job if immediate conversion is requested, then wait until the conversion is finished either successfully or unsuccessfully or until the timeout interval elapsed. In case of timeout, we cancel the conversion process. Next we display possible conversion errors and delete the documents from the working document library if requested.

  1. private byte[] ConvertDocument(SPWeb web, byte[] docToConvert, bool isImmediate,
  2.     String conversionLibName, int timeOutSecs, bool deleteDocs)
  3. {
  4.  
  5.     byte[] result = null;
  6.     SPList conversionLib = web.Lists[conversionLibName];
  7.  
  8.     SPFolder folder = conversionLib.RootFolder;
  9.  
  10.     // Get the default proxy for the current Word Automation Services instance
  11.     SPServiceContext serviceContext = SPServiceContext.GetContext(web.Site);
  12.     WordServiceApplicationProxy wordServiceApplicationProxy =
  13.         (WordServiceApplicationProxy)serviceContext.GetDefaultProxy(typeof(WordServiceApplicationProxy));
  14.  
  15.     ConversionJob job = new ConversionJob(wordServiceApplicationProxy);
  16.     job.UserToken = web.CurrentUser.UserToken;
  17.     job.Settings.UpdateFields = true;
  18.     job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
  19.     job.Settings.OutputFormat = SaveFormat.PDF;
  20.  
  21.     String docFileName = Guid.NewGuid().ToString("D");
  22.  
  23.     // we replace possible existing files on upload
  24.     // although there is a minimal chance for GUID duplicates 🙂
  25.     SPFile docFile = folder.Files.Add(docFileName + ".docx", docToConvert, true);
  26.     conversionLib.AddItem(docFileName + ".docx", SPFileSystemObjectType.File);
  27.  
  28.     String docFileUrl = String.Format("{0}/{1}", web.Url, docFile.Url);
  29.     String pdfFileUrl = String.Format("{0}/{1}.pdf",
  30.         web.Url, docFile.Url.Substring(0, docFile.Url.Length – 5));
  31.  
  32.     job.AddFile(docFileUrl, pdfFileUrl);
  33.  
  34.     // let's do the job 🙂
  35.     // Start-SPTimerJob "Word Automation Services"
  36.     job.Start();
  37.  
  38.     if (isImmediate)
  39.     {
  40.         StartServiceJob("Word Automation Services Timer Job");
  41.     }
  42.  
  43.     ConversionJobStatus cjStatus = new ConversionJobStatus(wordServiceApplicationProxy, job.JobId, null);
  44.     // set up timeout
  45.     TimeSpan timeSpan = new TimeSpan(0, 0, timeOutSecs);
  46.     DateTime conversionStarted = DateTime.Now;
  47.  
  48.     int finishedConversionCount = cjStatus.Succeeded + cjStatus.Failed;
  49.     while ((finishedConversionCount != 1) && ((DateTime.Now – conversionStarted) < timeSpan))
  50.     {
  51.         // wait a sec.
  52.         Thread.Sleep(1000);
  53.         cjStatus = new ConversionJobStatus(wordServiceApplicationProxy, job.JobId, null);
  54.         finishedConversionCount = cjStatus.Succeeded + cjStatus.Failed;
  55.     }
  56.  
  57.     // timeouted -> cancel conversion
  58.     if (finishedConversionCount != 1)
  59.     {
  60.         job.Cancel();
  61.     }
  62.  
  63.     // we can output the possible failed conversion error(s)
  64.     foreach (ConversionItemInfo cii in cjStatus.GetItems(ItemTypes.Failed))
  65.     {
  66.         Console.WriteLine("Failed conversion. Input file: '{0}'; Output file: '{1}'; Error code: '{2}'; Error message: '{3}';",
  67.             cii.InputFile, cii.OutputFile, cii.ErrorCode, cii.ErrorMessage);
  68.     }
  69.  
  70.     SPFile convertedFile = web.GetFile(pdfFileUrl);
  71.     // shouldn't be null (unless there is a conversion error)
  72.     // but we check for sure
  73.     if ((convertedFile != null) && (convertedFile.Exists))
  74.     {
  75.         Stream pdfStream = convertedFile.OpenBinaryStream();
  76.  
  77.         result = new byte[pdfStream.Length];
  78.         pdfStream.Read(result, 0, result.Length);
  79.  
  80.         // delete result doc if requested
  81.         if (deleteDocs)
  82.         {
  83.             convertedFile.Delete();
  84.         }
  85.     }
  86.  
  87.     // delete source doc if requested
  88.     if (deleteDocs)
  89.     {
  90.         docFile.Delete();
  91.     }
  92.  
  93.     return result;
  94.  
  95. }
  96.  
  97. private void StartServiceJob(string serviceTypeName, string jobTypeName)
  98. {
  99.     SPFarm.Local.Services.ToList().ForEach(
  100.         svc => svc.JobDefinitions.ToList().ForEach(
  101.             jd =>
  102.             {
  103.                 if ((jd.TypeName == jobTypeName) && ((serviceTypeName == null) || (serviceTypeName == svc.TypeName)))
  104.                 {
  105.                     jd.RunNow();
  106.                 }
  107.             }));
  108. }

To start immediate conversion in the ConvertDocument method I used a slightly modified version of the StartServiceJob method already introduced in my former post.

  1. private void StartServiceJob(string serviceTypeName, string jobTypeName)
  2. {
  3.     SPFarm.Local.Services.ToList().ForEach(
  4.         svc => svc.JobDefinitions.ToList().ForEach(
  5.             jd =>
  6.             {
  7.                 if ((jd.TypeName == jobTypeName) && ((serviceTypeName == null) || (serviceTypeName == svc.TypeName)))
  8.                 {
  9.                     jd.RunNow();
  10.                 }
  11.             }));
  12. }
  13.  
  14. private void StartServiceJob(string jobTypeName)
  15. {
  16.     StartServiceJob(null, jobTypeName);
  17. }

The following code snippet shows a sample for calling the ConvertDocument method. In this case we request an immediate conversion with 240 seconds timeout and use the standard Shared Documents document library as a working folder, deleting the temporary files.

  1. DateTime startTime = DateTime.Now;
  2. byte[] doc = File.ReadAllBytes(@"C:\Data\HelloWorld.docx");
  3. byte[] pdf = ConvertDocument(web, doc, true, "Shared Documents", 240, true);
  4. if (pdf != null)
  5. {
  6.     File.WriteAllBytes(@"C:\Data\HelloWorld.pdf", pdf);
  7. }
  8. Console.WriteLine("Duration of conversion: {0} ms", (DateTime.Now – startTime).TotalMilliseconds);

The sample above requires further work if you would like to use it in a real application. First, you should add some extra error handling, for example check if default WordServiceApplicationProxy is found at all, etc.

Next, instead of submitting documents one by one to WAS it is better to create a ConvertDocument version that supports multiple document conversion. In this case you should use arrays of byte arrays that I found easier than bothering (like disposing through using blocks) with multiple streams simultaneously.

You can extend the supported conversion options to other formats as well, like XPS.

In a real life application you probably wouldn’t like to start immediate conversions on each requests because it might produce a heavy load on your servers. Instead you can create a specific queue for documents with the option for high privilege users to submit dedicated document types for immediate conversions and leave the default conversion schedule for the others.

Although our original goal was to create a synchronous conversion method, sometimes it is more comfortable to do the conversion asynchronously, for example to avoid locking of the UI thread. To support that in your application, you can start ConvertDocument in a separate thread and raise your custom .NET events to reflect the output of the conversion job.

11 Comments »

  1. I understand that ConversionJob is persited in a database.

    If a system is to be used to convert thousands of documents a day, eventually this database will become full and before that increasingly slow.

    Once I have the required conversion completed, I no longer need the ConversionJob.

    How can this database be kept in order?

    Also supposing the system were to go down before a conversion starts, some ConversionJob ID should be kept in a database.

    Can the ConversionJob ID be serialised into a database and then how can this ID be used to retrieve the ConversionJob

    Comment by Alan Masters — February 18, 2012 @ 08:34

  2. Hi,

    I have just implemented a custom webservice to do synchronous document conversion. I am able to queue the job, get the correct SPJobDefinition from the service and call RunNow on it, but it seems that the job still only runs every 15 seconds. Do you have any idea what might be causing this and if there’s possibly another setting that might affect this anywhere?

    Comment by Matthew — April 3, 2012 @ 07:55

    • Hi, unfortunately it seems to be normal. Even one call RunNow, there is still a small time gap before the job actually starts. I think it is true to the admin UI as well.

      Comment by Peter Holpar — April 9, 2012 @ 20:38

  3. Hi
    I am receiving an Security validation for the page is invalid well the RUnNow() method is called. I am created a webservice that runs within SharePoint, how can I resolve this. I have tried to use elevated privileges, but this does not seem to work either
    Thanks

    Comment by kevin — April 19, 2012 @ 14:07

  4. Thanks very much! This was exactly what I was looking for 🙂

    Comment by John Mc — May 11, 2012 @ 00:56

  5. Hi, i am receiving Access Denied Error on RunNow() method, please guide me.

    Comment by Prasad — July 18, 2012 @ 07:13

    • Did you solve your access denied problem?

      Comment by trpclman — November 15, 2012 @ 15:35

      • Hi,
        did anybody manage to fix that access denied error?
        I am fighting with it and it is winning big time…
        I will appreciate any help.

        Many thanks,
        Julian

        Comment by Julian — February 12, 2013 @ 12:42

  6. Hi. I have converted the word file to pdf successfully.But,my word document contains some images after conversion into pdf those images are some what downgraded.
    How can I improve the image quality in pdf files using word automation services?

    Comment by Srinivas — April 15, 2013 @ 10:20

  7. (WordServiceApplicationProxy)serviceContext.GetDefaultProxy(typeof(WordServiceApplicationProxy)); return null 😦

    Comment by Thé noir — December 31, 2013 @ 12:06

  8. Hi Peter,

    Thanks for the excellent solution. Like some of your other readers I am receiving an “Access Denied” error when attempting call RunNow() on the job definition. I believe it may have something to do with this: ContentService.RemoteAdministratorAccessDenied = false.

    Have you experienced any such issues, and of so, were you able to find a solution?

    Thanks,
    Everett

    Comment by everettcomstock — January 9, 2014 @ 07:57


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.