I’m curious, who actually measures/understands DOR...
# general
j
I’m curious, who actually measures/understands DORA metrics? https://thenewstack.io/google-says-you-might-be-doing-dora-metrics-wrong/
t
Now that you bring it up. I'm really confused about the "change failure rate". The best teams have less than 5% and the worst more than 64%. But the best teams deploys many times every day, and the worst team less than once per month. In my previous job, we released 50 times per day. If we had a change failure rate of 5% we would have 2.5 incidents per day!!! If you release once per month, and have a 50% change failure rate you would have an incident every second month or so. Sounds like the worst teams have a much more stable environment. I don't understand this number. In our case, we had a change failure rate below 0.1%. What am I missing?
j
@Nathen Harvey i’m gonna let you take that one (because math)
t
less than 0.1% is also less than 5%, which still puts you in the “best teams” category. Also if you deploy 50x/day, you will have a quick fix coming vs it taking weeks/month to fix. And if you deploy 50x/day, you probably have things like blue-green deployments, rollbacks, etc. Or you would if you had 2.5 incidents/day.
t
True. But my comment still stands. Why do the best teams have a change failure rate of <5% (it used to be <10%)? Anyone deploying daily (the best teams) that has more than 1% has more incidents than the worst teams. I've been advocating the DORA metrics many times at user groups and conferences, and this number is just plain weird, and not convincing at all.
j
part of the change is that they were using groupings of numbers, like 1 to 15%, 16 to 30% before and now they are using a sliding bar where all whole numbers from 1 to 100 can be chosen so it’s more accurate
t
I get that, but why do you not change the slider then (our number was 0,07%)? Or is it actually the case that teams that have high releasing frequently have way more incidents? If that is the case why is that not described in the DORA report and Accelerate? I believe uptime was added as the fifth metric a few years back. That would IMO be a much better number. If teams with high throughput had higher uptime (e.g 99.99%) vs teams with low throughput would have lower (e.g., 99.5%)... then it would be convincing.
t
DORA also says not to fixate on industry numbers and to compare your own organization against itself over time.
j
And benchmark against yourself.
r
@Jennifer Riggins i actually asked that in a survey i'm about to publish...
j
Do let me know @Roni Floman Well let us all know!
t
I'm honestly trying to understand why a 5% and 10% change failure rate for companies that deploy multiple times per day is not unacceptably high. It literally means that you would have incidents daily. Is that the case?
Benchmark against yourself.
If that is the advice, why does the State of DevOps share this matrix every year? I understand that if we improve over time it's better. DORA is supposed to be a scientific study, and when I ask questions, I feel you are just saying that I should not look at the numbers. If this was 5% vs 10% I would understand, but I'm talking about two orders of magnitude. 10% vs. 0.07%. It's like code coverage, 80% is good, 90% is better. But can we agree that 30% is not enough? Can we also agree that multiple daily incidents are not good? And do we agree that that would be the case if you have a change failure rate of, say, 10% and you deploy 50 times per day?
j
@Thomas Jespersen if they are bets, released with progressive delivery where you can do feature flags and automatic rollbacks and observe how the systems are handling changes, is that a bad thing?
t
image.png
n
👋 thanks for the discussion here. a couple of quick thoughts … It’s important to look at all four metrics together; they serve as a proxy for batch size of changes and we find that smaller batches generally lead to better delivery performance. When looking at a number of changes 5% requiring immediate attention (eg, hotfix, rollback, rollforward) is a more stable set than 50% requiring that immediate attention. The time to recover from a failed deployment is less than one hour for the elite group and 1-6 months for the low group. Deploying an application 50 times in one day and providing immediate human intervention post-deployment 5 times a day may or may not be unacceptably high; context really matters. The DORA report includes these performance levels each year, in part, because we want to provide a snapshot of the current state of software delivery. The levels are not set in advance. Rather they are a reflection of the data we’ve collected from survey respondents. Availability was added in 2018 and renamed to “reliability” in 2021 to better reflect operational performance. It was always meant as a measure of the “ability for technology teams and organizations to make and keep promises and assertions about the software product or service they are operating.” Uptime may be a component of that availability.
I’d also recommend https://dora.community as a good place to continue these types of discussions
t
OK. Seems that nobody wants to answer my truly honest question. Does elite teams truly have multiple daily deployments that requires immediate attention? I don't understand that a 5% change failure rate is acceptable if it means you have daily incidents. Again, I'm not the noob that is trying to understand DevOps. I was CTO for 80 engineers, and we did 50 daily deployments and had a change failure rate of 0.07%, which is two orders of magnitude better than elite teams according to DORA. I know we had a good setup, and I've talked about it at conferences and user groups many times. Here is a 1-hour talk I did a year ago that shows our SDLC, and where I talk for 10 minutes about DORA. https://www.youtube.com/live/3D04VfzX-oM?si=1g75W7BvxdIopwJa&amp;t=18008. Back then, we did continuous delivery (manual approval step), but we changed to continuous deployment (push on green), which both improved our deployment frequency and lowered our change failure rate significantly.
n
Sorry, let me try to answer your specific question… There is a cluster of participants in the 2023 survey who report that it takes less than a day for a change to go from committed to production and they’re pushing changes to production on demand. Those same participants report needing to intervene after a deployment 5% of the time and an ability to recover a failed deployment in less than an hour. This cluster represents about 18% of the survey respondents and is called the “elite” cluster for software delivery performance. Those measures, on demand, less than one day, 5%, and less than one hour, represent the mean of each measure within that cluster. Your experience seems to reinforce the findings: speed and stability of software changes are not in opposition, they move together. Your changes are likely either fast and stable or slow and unstable. Thanks for sharing the video and for giving the talk!
Does elite teams truly have multiple daily deployments that requires immediate attention? The data can’t tell us this specifically. The deployment frequency of the elite group is “On demand (multiple deploys per day)” but that could mean 2 deploys per day. There might be 60 deploys in a month where 3 require immediate attention. It could also mean that an application is deployed 50 times a day where 2-3 require immediate attention.
And, I think, our collective lived experience is that if you’re deploying the same application 50 times a day and intervening 2-3 times a day you might be at risk of negatively impacting the well-being of folks on that team in such a way that you’d prioritize changing something.
Likewise our expectation, and your lived experience, is that if you’re able to deploy an application 50 times a day, you’re likely only intervening 1-2 times a month
We also expect that if you’re deploying the same app 50 times a day without meeting the reliability expectations of your users, that other outcomes (organizational performance, well-being, etc.) are likely to suffer. The feature-driven team in the 2023 report has good software delivery performance but some of the worst org performance, team performance, and job satisfaction.
t
There might be 60 deploys in a month where 3 require immediate attention.
Hmm... that makes more sense. Maybe my misconception was that elite teams would deploy even more frequently. With 50 deployments per day, I guess that made us an outlier even in the elite teams cluster. One of my ambitions as CTO was to do one deployment per engineer per day. We were very close, especially after changing to continuous deployments (push on green). It made a tremendous safe work environment, where nobody feared making changes to production. We configured our pipelines to pause deployments after 3 PM, which meant that we only had a couple of "incidents" outside business hours per year, and in 10 years, I don't recall an incident in the night.
j
I have to ask @Thomas Jespersen what org do you work at and are you interested in being interviewed about developer productivity engineering there? 😉
t
I stopped last month after 8 years. So I can not talk as freely anymore. If you watch the video I shared you will se how open I was. I showed our real DevOps setup and simulated a real incident on stage.
n
@Thomas Jespersen were all of those engineers working on the same application or service?
I would say that regularly shipping the same application to production 50 times per day is quite an achievement. And likely one that removed software delivery as a constraint for many of your organizational goals.
t
were all of those engineers working on the same application or service?
Kind of. We had 12 teams (8 product teams and 4 small platform teams). Each product team owned 1-2 Self-contained systems (big microservices, that included micro frontends, that could be deployed completely in isolation). In total, we had a dozen self-contained systems, and each of these was deployed a handful of times per day. For the customers, it was one product.
n
Got it. So 50 deploys/day is really for the organization. DORA’s software delivery performance survey questions are at the service level, “For the primary application or service you work on…” I wonder if the folks working on those product teams would say they deploy 50 or 5 times a day if the survey offered that level of precision in the answers. Either way, those teams were realizing elite software delivery performance. When that’s no longer the constraint, where do other bottlenecks show up for the team? for the organization?
t
I'm not sure what they would answer, as most did not consider a self-contained system a product or service. But let's say they answer 5 times per day = 1800 times per year. They would also say that they made releases that needed immediate attention only a handful of times per year. For some teams, it may have been a dozen times per year, for others zero. Still, one or two orders of magnitude lower than the 5% from Elite Teams. BUT I realize now that we likely made significantly smaller deployments than an average Elite team. The smaller the better.