In my daily work over business processes in Office 365, specifically in SharePoint Online, one thing annoys me the most – namely the lack of mechanisms to inform me that the workflow has hung up – that it is in a “Suspended” state.

The solution for that issue is not provided by Microsoft or Nintex – the company with which products I have been working for a quite long time. There are only workarounds, but they are inadequate, and I wanted to be able to react proactively and not reactively to any flow suspension event.

“Suspended” – what does it mean?

In a situation where a workflow does not have access to a given resource (due to lack of permissions or the fact that the resource does not exist), or it can’t access a remote service or it can’t convert one type of data into another – it usually “freezes” – it sets itself in a state in which it waits for an administrator’s action. In such a case the administrator should solve the problem that the flow came across and then resume it.

While in SharePoint on-premise this status is explicitly communicated and in fact, even the Nintex Hawkeye analytical tool is able to capture this fact and record it in the presented data, in SharePoint Online this situation is much more complicated.

It turns out that in the SPO only the internal status of the workflow changes to “Suspended”, but the status for the user does not change and as a result the workflow is still perceived as working.

Workaround? You welcome!

Since I already knew that I could not count on third-party solutions, I started looking for possible alternatives. Basically, I found only one – working on alerts and workflow history lists.

Namely, it’s about using the History list (usually a Workflow History list, available at:
http://sharepoint/site/lists/Workflow%20History) to create there a dedicated view that will only show those elements that will contain specific phrases in “Description”. I use “error”, “exception”, “not found”, “failed”:

Suspended workflows view
Configuration of filtering for the view of suspended flows

Finally, set an alert for this view to receive a notification if a new record appears. Apparently ok, but:

  • there are much more reasons for the suspension of flow, very often no entry in the flow history appears.
  • having many processes on different sites, you have to create a lot of such views, which in turn translates into the inability to easily manage this.
  • an element in workflow history does not tell us directly which list the affected element is from – it’s hard to find a page with a suspended flow status.

Maybe PowerShell? Or CSOM?

Looking for help, of course, I came across various solutions that use PowerShell or CSOM. However, even if I would like to use them, I have not been able to run them in the SharePoint Online environment. Maybe that is just me. Maybe the environment πŸ™‚

SP.WorkflowServices REST API

And so, looking for solutions I came up with the idea of using the REST API. And since it is REST, it’s Microsoft Flow πŸ™‚ The script was supposed to be simple:

  1. Obtaining Request Digest token
  2. Retrieving list of sites and list to monitor
  3. For each site get the list of workflow subscription GUIDs ( /_api /SP.WorkflowServices.WorkflowSubscriptionService.Current/EnumerateSubscriptions())
  4. For each subscription retrieve a list of active instances (/_api/SP.WorkflowServices.WorkflowInstanceService.Enumerate())
  5. For each instance check status – if = 2 – means suspended.

Possible internal statuses of the instance

NotStarted : 0, Started : 1, Suspended : 2, Canceling : 3, Canceled : 4, Terminated : 5, Completed : 6, NotSpecified : 7, Invalid : 8

And everything would be beautiful, if not for the fact that the “Enumerate” method for “WorkflowInstanceService” in SharePoint Online is … unavailable! I made sure by checking the list of available methods using the URL: /_api/$metadata.

How did I do it?

I understood that the only way to get to an instances is to simply enumerate site workflows and each of the list elements separately.

So I decided to use the two methods:

  1. SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForSite()
  2. SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForListItem()

Configuration list

In SharePoint, I created an “administrative” list, which holds information about the sites and lists to be monitored, for the presence of suspended flows. The list stores the following data:

  1. Website URL for monitoring
  2. List names, separated by a semicolon
  3. GUIDs list, separated by a semicolon

Microsoft Flow flows

Basically, these are two flows:

  1. The main one, which is run by the schedule (in my case once, for 4h), which for each site collects a list of subscriptions and workflows on the site, then, for each of the lists indicated for monitoring, runs …
  2. An additional flow that examines the 50 most recently modified items.

Main workflow

Variables used in the workflow

In the beginning, workflow takes a list of sites and semicolon separated lists and then initiates the variables used in the next action:

  1. temp subscription object and subscription object – these are two “object” variables used to build an object that stores information about workflow subscriptions on a given site, in the form: {“GUID”: “Name”},
  2. list names and list guids – two “array” variables that hold names and GUIDs of defined lists, converted to list, using the expression: “split (list_name_semicolon_separated, ‘;’)”,
  3. report variable – a text variable to which information about found, suspended flows is written and which finally serves as the content of the sent e-mail,
  4. counter – a numeric variable that is used by a loop passing values ​​from the list of GUIDs.
Part 2 of flow, downloading the subscription list

Then the workflow starts working for each site defined for monitoring. And yes, for each site it retrieves the Request Digest token (POST query to _api/contextinfo and the value extraction using the “Parse JSON” action), and then for each subscription found it adds it to the subscription object parameters:

  1. Using the expression: union(addProperty(variables (‘var_TempSubscInstanceName_dict’), item()?[‘Id’], item()?[‘Name’]), variables(‘var_SubscInstanceName_dict’)) create an object in variable temp subscription object, combining the existing variable subscription object with the GUID data and Name of the subscription being processed.
  2. Overwrite the variable subscription object with the result of the “Compose” action.
Get information about suspended flows in the current site

Next, I get the list of workflow from the site, which are in the
“Suspended” status “Suspended”:

/_api/SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForSite()?$Filter=Status eq 2&$select=Id,FaultInfo,WorkflowSubscriptionId,InstanceCreated,LastUpdated

For each of them I add to the final message information about it: what is its name, what is its address and the reason why it found itself in this state.

I download the workflow name from the subscription object – knowing the workflow GUID, I discover its title:

variables('var_SubscInstanceName_dict')?[item()?['WorkflowSubscriptionId']]
Getting information about suspended workflows in the list

Next, for each monitored list defined for a given site, I run a second flow, which task is to review the recently modified, 50 elements and check if there are no suspended flows on them.

To run the flow, I need to pass the information:

  1. URL of the site
  2. List name
  3. List’s GUID
  4. GUID and subscription names object

Here is where I am using the counter variable. I need it to get the right values for each rotation of the loop, from the list names and GUIDs variables:

variables('var_ListGuids_coll')?[variables('var_Counter_int')] variables('var_ListNames_coll')?[variables('var_Counter_int')]

At the end, the main workflow sends me an e-mail with the prepared content and a list of information about suspended workflows on sites and lists that have been found.

Additional workflow

Its task is to browse the recently modified, 50 elements of the indicated list. Workflow is triggered by a POST request.

The beginning of the additional flow operation

Like the main flow, it also retrieves the Request Digest token first. Then it initializes the variables used in the next actions and gets a list of items to check. Note that I only download the ID and Title of each element, sorting them descending after the modification date:

_api/web/lists(['ListGUID'])/Items()?$select=ID,Title&$orderby=Modified desc&$top=50
Checking in the loop whether the items have no suspended flows

Then, for each element, the workflow checks if there is any suspended workflow associated with it. I do this using the following query:

/_api/SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForListItem(listId='['ListGUID']', itemId='['Id']')?$Filter=Status eq 2&$select=Id,FaultInfo,WorkflowSubscriptionId,InstanceCreated,LastUpdated

Then, for each result found, information about it is added to the results message.

Finally, the workflow returns the result of its operation – a list of elements and names of workflows that are suspended.

E-mail message

Finally, the workflow sends a message to me where I can see where the suspended workflows are currently located and directly from it I can go straight to the workflow status page to eventually resume it:

An e-mail with information about suspended flows

Summary

The discussed solution provides me with data that allows me to proactively react to the suspension of workflows. I am no longer put in a situation where the client informs me that “this workflow probably does not work” – I am able to resume it faster, thanks to which the end user will not even notice the problem.

One drawback of the solution is that I have to keep the list with the names and GUIDs for monitoring manually, but this is a job I do not do often.

Importantly, the workflow for 12 sites and approx. 20 lists (approx. 1000 records) works approx. 15 minutes.

If you like this solution or have any questions – leave a comment! πŸ™‚