InfoPath loves Microsoft Flow

Although it has been repeatedly said that the history of InfoPath is over, for many companies still building workflows’ forms using InfoPath is as obvious as using Excel. However, when doing that in Office 365 and SharePoint Online the product’s boundaries are really visible and are becoming a real pain. 

I’ve been struggling some time ago with an issue related to the workflow’s instance size (here), what was directly related to the size of the InfoPath form together with attached files. The obvious solution I was thinking then was moving those attachments away from the form, into a dedicated SharePoint library, but I wasn’t able to do that easily mainly because of the algorithm that is used in InfoPath to handle file attachments.

How InfoPath handles attachments?

File attachments in InfoPath form are being uploaded directly to the form’s XML structure and encoded into a Base64 string:

InfoPath XML structure for attachments

But what turns out, InfoPath does not only encode file contents, but also adds additional information: header that stores such information as size, version and filename length, then the filename itself. 

Although the header size is fixed, it always has 24 bytes, the amount of bytes for the filename varies on its length. That additional number of bytes is always stored in the byte no. 20 as a DWORD information and the filename itself is stored using Unicode, so the length must be multiplied by 2.

Therefore, to get the actual file contents, the Base64 string must be decoded, then the first 24 bytes + the filename length*2 have to be cut off.

Read more about programmatically encoding and decoding InfoPath attachments here: https://support.microsoft.com/en-us/help/892730/how-to-encode-and-decode-a-file-attachment-programmatically-by-using-v

Looking for a solution

During looking for the best solution I first started with Nintex, however Nintex does not have any actions allowing it to convert the Base64 strings back into the binary data, which is needed for the file to be created. Then I switched to Microsoft Flow. I was really impressed about the amount of functions allowing me to convert one data type into another. I also found a function converting Base64 to a binary.

However, after the first euphory I got stuck looking for an approach to split actual file contents from the header. Having in mind, that the header size is not fixed, I wasn’t able to simply calculate how many first characters from Base64 I should cut off (using a simple equation, that each character corresponds to 6 bits, and 8 bits is a byte and so on…). Flow does not have as well any function that would allow to work on the binary data in a form of an array.

And then it dawned on me! I opened my Azure portal and navigated to Function Apps. This service really allows to extend capabilities of Flows and Logic Apps (and Nintex workflows as well) into the areas not available before.

What is Azure Function?

Basically this is a service allowing execution of a small code, function, method in a cloud. It is charged either per consumption (so for each run of the function, there may be delays, price list) or as an app service (hosted on a VM that you own in Azure).

What is also super cool about Azure Functions is the variety of languages the function can be written in. You can choose between JavaScript, PHP, Python, PowerShell, Bash and of course C# and F# (source). The code can be written using the built-in editor, or using Visual Studio (source).

Azure Function editor

Functions can also be invoked in many ways (source). I chose HTTP trigger, so that I was able to use it via the “HTTP Request” action.

The solution

I wrote a simple function, that does the following:

  1. Receives the Base64 string using the POST request;
  2. Decodes it into the byte array, extracts header and filename;
  3. Encodes file content’s back into Base64 string;
  4. Returns filename together with the Base64 string back to the requester.

The code itself is pretty easy, based on what can be found in Technet:

using System.Net;
using System.IO;
using System.Text;

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    log.Info("C# HTTP trigger function processed a request.");

    // Get request body
    dynamic requestBody = await req.Content.ReadAsAsync<object>();
    string base64String = requestBody?.data;

    byte[] attachmentNodeBytes = Convert.FromBase64String(base64String);

    // Byte 20 contains a DWORD with the length of the
    // filename buffer. The filename is stored using Unicode 
    // so the length is must be multiplied by 2.
    int fnLength = attachmentNodeBytes[20] * 2;
    byte[] fnBytes = new byte[fnLength];
 
    // The actual filename starts at byte 24 
    for (int i = 0; i < fnLength; ++i)
    { 
        fnBytes[i] = attachmentNodeBytes[24 + i]; 
    }

    // Convert the filename bytes to a string. The string 
    // terminates with \0 so the actual filename is the 
    // original filename minus the last character ! 
    char[] charFileName = UnicodeEncoding.Unicode.GetChars(fnBytes); 
    string fileName = new string(charFileName); 
    fileName = fileName.Substring(0, fileName.Length - 1);

    // The file is located after the header, which is 24 bytes long
    // plus the length of the filename. 
    byte[] fileContents = new byte[attachmentNodeBytes.Length - (24 + fnLength)];

    for (int i = 0; i < fileContents.Length; ++i)
    {
        fileContents[i] = attachmentNodeBytes[24 + fnLength + i];
    }

    string fileContentsInBase64 = Convert.ToBase64String(fileContents);

    return base64String == null
        ? req.CreateResponse(HttpStatusCode.BadRequest, "Please pass a name on the query string or in the request body")
        : req.CreateResponse(HttpStatusCode.OK, fileName + "###" + fileContentsInBase64);
}

The Flow is also not complicated. It does the following:

  1. Receives a HTTP Request containing information: InfoPath filename, name of the attachment field in InfoPath, target library name;
  2. Reads form’s file contents, using the xpath function gets Base64 strings and stores them in array;
  3. Then for each item from an array, it calls my Azure Function to split file contents from its name;
  4. Then creates a file in the given library, using obtained from Azure Function filename and file contents;
  5. Then replaces the Bae64 string in the InfoPath form with an empty string and updating the form so that the attachment is removed.

Move Infopath attachments flow

The expression using xpath function to get list of attachments looks as below:

xpath(xml(body('Get_file_content_using_path')), concat('//*[contains(local-name(), "', triggerBody()['text_1'], '")]/text()'))

Where the “text_1” is one of the flow’s input parameters, the name of the attachments control.

Results (yeah, I know Azure Function’s run history wasn’t refreshed when I opened it, but it did execute for real):

Microsoft Flow moving InfoPath attachments to SPO library

Possible development

The possible next steps to develop the solution may be:

  1. Insertion of a hyperlink to the file, uploaded into library, back to the form, to a dedicated field in the form, next to the attachment control;
  2. Creation of subfolders, ex. for each form flow can create a separate subfolder using its name, to store its files.

 

How do you like the solution? Please share your comments!