The real importance of software delivery metrics comes not from the metrics themselves, but from the shared understanding and alignment that discussing the metrics creates. In some cases software development metrics need to evolve, but metrics will always be imperfect, incomplete, and potentially subject to being manipulated. Like Schroedinger’s cat, the act of examining the metric means that the state may change. Regardless of the exact numbers presented in any given metric, the real business value comes from understanding what those numbers are telling you about how the work is flowing. Understanding the process is way more important and nuanced than totalling the points.
I’ll elaborate with a real-world example. One of our customers reached out to us a few weeks ago because they were seeing sprint velocities in Jira that weren’t matching the velocities they were seeing in Bloomfilter. As an engineer working on our integration layer with other systems, this was not what I wanted to hear. For this customer, we ingest and mine their sprint data directly from Jira, so we would expect our basic metrics (like velocity) to look identical between Bloomfilter and Jira. What was going on?
My curiosity piqued, I dug into the problem. Matching velocities is a more challenging exercise than you might expect. At first glance you might think that to calculate velocity we can just add up the point total of completed tasks, but tasks can have a pretty complex life cycle before they’re complete. They might be marked “done” several times by several different teams and just because they’re “done” right now doesn't guarantee they won’t be pulled back into development tomorrow. To handle this complexity, we use sophisticated process mining algorithms on a graph representation of this data, so our calculations are not always as simple as A+B=C.
So, I dug in trying to figure out why we didn’t always agree. What I found surprised me. We were both right.
Wait…..what? Development velocity is a measurement of something. You can’t measure the same thing twice and get different results and think that’s okay! Since we aren’t in the realm of quantum physics, the current state should be knowable. God does not play dice and all that.
The truth is more nuanced and interesting. SDLC metrics have more edge cases than you might expect. For example, you can find a number of different ways to define what velocity is, but broadly, it’s a measure of how much work got done. But what constitutes “work being done” is subject to interpretation and disagreement.
Here’s a real world example. If I spend today completing a task and tomorrow the business decides that the feature represented by that task is no longer valuable, does that count as “work being done? Even if we remove the task from the sprint and delete the issue in Jira, the engineer doesn’t get the day of work back. This was one of the discrepancies we were seeing with Jira. If you delete that task, Jira doesn’t count it anymore. If you delete a task after the work is done, we will count it. It happened in the real world.
What’s more interesting is why we did the wrong work. Velocity, in the physics sense, isn’t just about how fast you’re going - it’s how fast you’re going and in what direction. So the velocity for the sprint we were looking at was actually faster than what we thought, but maybe it was trending in the wrong direction - building an unnecessary feature, creating a new bug, or duplicating work. That’s the real problem here.
Another scenario: I do work on an un-pointed task. Once I’ve finished my work, I can go ahead and mark that task as done, and it’ll be counted towards velocity… or not, if it’s unpointed. If I assign points to a task after it’s complete, should those points count as velocity? Honestly, I can see arguments for or against, but at Bloomfilter we decided ultimately that developers should get credit for the work they’ve done, and we shouldn’t get in the way of people ultimately trying to do the right thing.
Here again, the process is more interesting than the number of points. There’s a more interesting question - why wasn’t this card pointed in the first place? Plenty of explanations come to mind. It could’ve been created on the fly mid-sprint to address a new issue that arose (I know I’m guilty of this). It could’ve been pulled into a sprint before a refinement session because the team is moving that fast. We want to find out why. For our customer, this turned out to be more of a case by case scenario, but we want to give credit to their engineers where it’s due, and help them make the process changes they need to avoid these anti-patterns in the future.
These are conversations worth having. Rather than sweeping some of this weirdness under the rug, we were able to come back to our customer and tell them about what was happening. This allowed them to go to their development teams and start to make changes in how they decide to do work and when. Now, we’re building analytics to tell Bloomfilter users when these same scenarios happen to them. Developers will get the credit they deserve for their work, team leaders can direct their teams more effectively, and technology leaders using Bloomfilter will have much more clarity on what’s really going on in their development processes.
Building software is hard. There’s so many moving parts - people, programs, and policies interweaving in all sorts of ways attempting to produce great products. There’s ambiguity and complexity every step of the way, no matter what. Even in a metric as seemingly simple as velocity, there’s so many stories to tell, so much nuance under the covers. The most important thing is to not get hung up on the numbers themselves, but keep communicating about “why” the numbers are what they are, and what story those numbers are telling about the health of the process.