Discovering suspended workflows with Microsoft Flow
Table of contents:
In my daily work over business processes in Office 365, specifically in SharePoint Online, one thing annoys me the most – namely the lack of mechanisms to inform me that the workflow has hung up – that it is in a “Suspended” state.
The solution for that issue is not provided by Microsoft or Nintex – the company with which products I have been working for a quite long time. There are only workarounds, but they are inadequate, and I wanted to be able to react proactively and not reactively to any flow suspension event.
“Suspended” – what does it mean?
In a situation where a workflow does not have access to a given resource (due to lack of permissions or the fact that the resource does not exist), or it can’t access a remote service or it can’t convert one type of data into another – it usually “freezes” – it sets itself in a state in which it waits for an administrator’s action. In such a case the administrator should solve the problem that the flow came across and then resume it.
While in SharePoint on-premise this status is explicitly communicated and in fact, even the Nintex Hawkeye analytical tool is able to capture this fact and record it in the presented data, in SharePoint Online this situation is much more complicated.
It turns out that in the SPO only the internal status of the workflow changes to “Suspended”, but the status for the user does not change and as a result the workflow is still perceived as working.
Workaround? You welcome!
Since I already knew that I could not count on third-party solutions, I started looking for possible alternatives. Basically, I found only one – working on alerts and workflow history lists.
Namely, it’s about using the History list (usually a Workflow History list, available at:
http://sharepoint/site/lists/Workflow%20History) to create there a dedicated view that will only show those elements that will contain specific phrases in “Description”. I use “error”, “exception”, “not found”, “failed”:
Finally, set an alert for this view to receive a notification if a new record appears. Apparently ok, but:
- there are much more reasons for the suspension of flow, very often no entry in the flow history appears.
- having many processes on different sites, you have to create a lot of such views, which in turn translates into the inability to easily manage this.
- an element in workflow history does not tell us directly which list the affected element is from – it’s hard to find a page with a suspended flow status.
Maybe PowerShell? Or CSOM?
Looking for help, of course, I came across various solutions that use PowerShell or CSOM. However, even if I would like to use them, I have not been able to run them in the SharePoint Online environment. Maybe that is just me. Maybe the environment π
SP.WorkflowServices REST API
And so, looking for solutions I came up with the idea of using the REST API. And since it is REST, it’s Microsoft Flow π The script was supposed to be simple:
- Obtaining Request Digest token
- Retrieving list of sites and list to monitor
- For each site get the list of workflow subscription GUIDs ( /_api /SP.WorkflowServices.WorkflowSubscriptionService.Current/EnumerateSubscriptions())
- For each subscription retrieve a list of active instances (/_api/SP.WorkflowServices.WorkflowInstanceService.Enumerate())
- For each instance check status – if = 2 – means suspended.
Possible internal statuses of the instance
NotStarted : 0, Started : 1, Suspended : 2, Canceling : 3, Canceled : 4, Terminated : 5, Completed : 6, NotSpecified : 7, Invalid : 8
And everything would be beautiful, if not for the fact that the “Enumerate” method for “WorkflowInstanceService” in SharePoint Online is β¦ unavailable! I made sure by checking the list of available methods using the URL: /_api/$metadata.
How did I do it?
I understood that the only way to get to an instances is to simply enumerate site workflows and each of the list elements separately.
So I decided to use the two methods:
- SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForSite()
- SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForListItem()
Configuration list
In SharePoint, I created an “administrative” list, which holds information about the sites and lists to be monitored, for the presence of suspended flows. The list stores the following data:
- Website URL for monitoring
- List names, separated by a semicolon
- GUIDs list, separated by a semicolon
Microsoft Flow flows
Basically, these are two flows:
- The main one, which is run by the schedule (in my case once, for 4h), which for each site collects a list of subscriptions and workflows on the site, then, for each of the lists indicated for monitoring, runs β¦
- An additional flow that examines the 50 most recently modified items.
Main workflow
In the beginning, workflow takes a list of sites and semicolon separated lists and then initiates the variables used in the next action:
- temp subscription object and subscription object – these are two “object” variables used to build an object that stores information about workflow subscriptions on a given site, in the form: {“GUID”: “Name”},
- list names and list guids – two “array” variables that hold names and GUIDs of defined lists, converted to list, using the expression: “split (list_name_semicolon_separated, ‘;’)”,
- report variable – a text variable to which information about found, suspended flows is written and which finally serves as the content of the sent e-mail,
- counter – a numeric variable that is used by a loop passing values ββfrom the list of GUIDs.
Then the workflow starts working for each site defined for monitoring. And yes, for each site it retrieves the Request Digest token (POST query to _api/contextinfo and the value extraction using the “Parse JSON” action), and then for each subscription found it adds it to the subscription object parameters:
- Using the expression: union(addProperty(variables (‘var_TempSubscInstanceName_dict’), item()?[‘Id’], item()?[‘Name’]), variables(‘var_SubscInstanceName_dict’)) create an object in variable temp subscription object, combining the existing variable subscription object with the GUID data and Name of the subscription being processed.
- Overwrite the variable subscription object with the result of the “Compose” action.
Next, I get the list of workflow from the site, which are in the
“Suspended” status “Suspended”:
/_api/SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForSite()?$Filter=Status eq 2&$select=Id,FaultInfo,WorkflowSubscriptionId,InstanceCreated,LastUpdated
For each of them I add to the final message information about it: what is its name, what is its address and the reason why it found itself in this state.
I download the workflow name from the subscription object – knowing the workflow GUID, I discover its title:
variables('var_SubscInstanceName_dict')?[item()?['WorkflowSubscriptionId']]
Next, for each monitored list defined for a given site, I run a second flow, which task is to review the recently modified, 50 elements and check if there are no suspended flows on them.
To run the flow, I need to pass the information:
- URL of the site
- List name
- List’s GUID
- GUID and subscription names object
Here is where I am using the counter variable. I need it to get the right values for each rotation of the loop, from the list names and GUIDs variables:
variables('var_ListGuids_coll')?[variables('var_Counter_int')] variables('var_ListNames_coll')?[variables('var_Counter_int')]
At the end, the main workflow sends me an e-mail with the prepared content and a list of information about suspended workflows on sites and lists that have been found.
Additional workflow
Its task is to browse the recently modified, 50 elements of the indicated list. Workflow is triggered by a POST request.
Like the main flow, it also retrieves the Request Digest token first. Then it initializes the variables used in the next actions and gets a list of items to check. Note that I only download the ID and Title of each element, sorting them descending after the modification date:
_api/web/lists(['ListGUID'])/Items()?$select=ID,Title&$orderby=Modified desc&$top=50
Then, for each element, the workflow checks if there is any suspended workflow associated with it. I do this using the following query:
/_api/SP.WorkflowServices.WorkflowInstanceService.Current/EnumerateInstancesForListItem(listId='['ListGUID']', itemId='['Id']')?$Filter=Status eq 2&$select=Id,FaultInfo,WorkflowSubscriptionId,InstanceCreated,LastUpdated
Then, for each result found, information about it is added to the results message.
Finally, the workflow returns the result of its operation – a list of elements and names of workflows that are suspended.
E-mail message
Finally, the workflow sends a message to me where I can see where the suspended workflows are currently located and directly from it I can go straight to the workflow status page to eventually resume it:
Summary
The discussed solution provides me with data that allows me to proactively react to the suspension of workflows. I am no longer put in a situation where the client informs me that “this workflow probably does not work” – I am able to resume it faster, thanks to which the end user will not even notice the problem.
One drawback of the solution is that I have to keep the list with the names and GUIDs for monitoring manually, but this is a job I do not do often.
Importantly, the workflow for 12 sites and approx. 20 lists (approx. 1000 records) works approx. 15 minutes.
If you like this solution or have any questions – leave a comment! π
Michel Mendes
Very clever solution! Well done!
Tomasz Poszytek
Thank you!
Enrico Knapp
Excellent work! I will definitely adopt this and use this to monitor my installations too! Thanks for your work and sharing your result! You’re a genius!
Tomasz Poszytek
Thank you Enrico! π
Palesa Emmanuel Sikwane
Great post Tomasz!!
Tomasz Poszytek
Thank you! π So glad to see such a feedback from you.
Raj
Great work Tomasz, i think i can implement this solution using SharePoint 2013 workflows, right?
Tomasz Poszytek
Yes, this should work for both SP2010 and 2013 workflows, since API is the same. At least in SPO π
Helena Wise
Hello! Thank you for the great idea and implementation, it very useful! I’m adopting the flows you described to monitor my workflows. It works but sometimes the ‘additional flow’ takes more than 2 min and failed with the error:
“The execution of template action ‘Response’ is failed: the client application timed out waiting for a response from service. This means that workflow took longer to respond than the alloted timeout value. The connection maintained between the client application and service will be closed and client application will get an HTTP status code 504 Gateway Timeout.”
Could you share please the settings of your HTTP Request from main flow and Response from additional flow?
Tomasz Poszytek
Hi, actually I can’t π This is a commercial project, please send me the mail if you need assistance.
SClark
Great work for SPD, but as they’re obsolete, it’s now time to make “DISCOVERING SUSPENDED MICROSOFT FLOWS WITH MICROSOFT FLOW.”
Tomasz Poszytek
Unfortunately cloud flows are not having status “suspended”. They can be simply stuck, but still running.