To understand process flow is to understand movement over time. Within process mining, we consider a process to be a flow of tasks that occur in response to events. As a result, these events often change the task’s state.
Most project management systems observe a series of events over a period of time and record them in a relational database. For instance, a task may transition from design to develop to deploy, and the time of each event is recorded inside a system of record.
This allows for basic analysis. We can query the task and compute the amount of time that events sit before they move. This is a structured event log, where events constitute the core of the data being analyzed. To facilitate process analysis, we transform the event data we mine into a task centric view of what’s happening.
We store the event data in a non-structured document within our database that consists of the different events a task can undergo. Using this document, we create a graphical representation of the events. This allows us to apply graph theory to understand the relationship between the different past task states and compare it to other task graphs within our system. The generated graph is directed and cyclical in nature.
We invest in this effort of transforming the data because graphs explicitly show a connection between two states; the nodes represent complicated, multi-faceted states, and the edges represent when and how those states change. The edges between the nodes represent the space in-between two statuses. These edges allow us to understand things like time, who or what transitioned the status and which direction the status is changing.
This allows the platform to appreciate state relative to time, simple causality, and eventually dependent origination. Importantly, structuring the data in this way allows for comparative graph analysis to determine key differences in flow. This can unlock insights for companies like how two different teams might get work through their own processes; for example, when comparing the work of two QA teams, process mining might uncover that one team sends more work back to the development team, where as a different QA team deploys code to production quicker that more often requires back in development.
Imagine you wanted to answer the question “how much time should a major feature spend in design before it goes to engineering?” If the task spends too much time in design, it may represent waste from the team moving too slow. But, spending too little time might generate equal process waste because it increases the likelihood the task will be returned to design after it has gone into development (or even production.)
Trying to structure the records and perform this type of analysis on relational data is complex and difficult. However, once the difficult task of structuring the data in graph data elements is completed, this analysis becomes trivial.
The simple example above brings up one more important consideration. We all know time only moves in one direction, but processes can move forward or backward. Of course, all process movement is not equal, moving backwards is a sign of waste that becomes exponentially more expensive depending on how far back in the process a task needs to be reverted. A poor design decision is trivial to fix during the design phase, slightly more expensive to fix while the code is being written, but exponentially more expensive once that code is in production (particularly when subsequent code relies on the original code.) In sum, the cost/waste of backwards tasks in a process significantly increases with the number of steps needed to reverse.
With this structure in place, we can understand, visualize and query complex process flow and understand optimal versus wasteful processes in a new way. This allows us to easily compute normalized metrics for process health like the Sprint Performance Score (SPS) or the Product Delivery Score. It allows us to tell the story of how a sprint is going wrong and better intervene to right the ship.
This all starts with an understanding of time, and putting a time-centric data structure at the heart of our analysis engine to understand tasks, events and ultimately complex processes.