Retool Blog | How to measure developer productivity: Metrics and methods

The topic of developer productivity is often fraught. Developers can feel pressured to hit arbitrary commit targets or story points that bear little relationship to real customer value. Engineering managers who try to introduce productivity measures are often met with skepticism or even resentment, making it feel like developers just don’t want to be held accountable. In reality, developers often care just as much as leadership about delivering value, it’s just that no one is aligned on how to measure progress towards it.

In this two-part series, we explore a number of ways to approach this thorny topic: how to measure productivity holistically, how to root out and address bottlenecks in your team, and how to roll out meaningful changes without ruffling everyone’s feathers.

Can developer productivity be quantified?

How you define developer productivity is closely correlated with how you measure it.

Lines of code, number of commits or features shipped—these are quantitative activity metrics: easier to measure, but not meaningful on their own. How many lines of code does it take to deliver a great customer experience? How many commits is reasonable for a productive developer to produce each day?

A lot of airtime has been given to the flaws in trying to distill productivity down to pure metrics (see the resounding backlash to McKinsey), so let’s consider the other side of the coin:

How much time are developers able to spend on truly differentiated work that contributes to solving business problems?
Do developers have adequate opportunity to get into (and remain in) flow, avoiding constant context switching?
Are developers shipping the right things within the right time frame to meet business needs?

These questions are a little harder to answer with activity metrics, but should get you closer to measuring the type of productivity that actually impacts business goals.

We’ll explore some ways to measure productivity from both angles, but first it’s helpful to identify metrics that aren’t meaningful in isolation. Tracking these metrics can still be valuable, but it’s important to put them into context.

How not to measure engineering productivity

Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”

Are activity metrics ever useful in measuring engineering productivity? Yes and no.

Tracking employees’ hours in the office every week won’t tell you much about whether they’re moving important projects forward in that time. By the same token, measuring the number of commits, lines of code, or other one-dimensional activity metrics in isolation is unlikely to give you great insight into whether or not your teams are being productive in ways that actually impact your company. A lot of programming and architectural work requires deep thought and reflection (and in many cases, happens away from the computer).

“Some of the hardest and most impactful engineering work will be all but invisible on any set of individual metrics. You want people to trust that their manager will have their backs and value their contributions appropriately at review time, if they simply act in the team’s best interest. You do not want them to waste time gaming the metrics or courting personal political favor.”—Charity Majors, Can Engineering Productivity Be Measured?

However, quantitative activity metrics can be a directional signal that points you towards where to dig in to uncover possible inefficiencies or bottlenecks.

“I always say that a lack of commits for engineers is something that poses a question rather than giving you any answers,” says Emily Field, Engineering Manager at Retool. “If your team is not contributing code, you have to investigate whether there’s an issue with the development environment, or if there’s a lack of clarity in terms of expectations, a lack of training, lack of a skill set match to what you’re working on—there are a lot of possible reasons.”

Team productivity > individual productivity

In the webinar How to thoughtfully measure engineering productivity—beyond lines of code, the panel agreed that engineering productivity metrics and dashboards should be used to track team performance, not individuals’.

Even in cases where you intentionally choose metrics as a target instead of just using them as signal, you don’t want to create an environment where:

Psychological safety is damaged because team members fear retribution for missing targets (see “The Anxiety Zone” in Amy Edmunson’s matrix), or
Individual developers’ performance or eligibility for promotion hinges on hitting a target, creating a perverse incentive. Nora Jones, Founder and CEO of Jeli, shared an example of an anti-pattern from a previous employer:

“People had to create large promotion packets in order to get their employees promoted. The org got so big that leadership would go through those packets really quickly, just scanning for metrics. What ended up happening is that a bunch of people started submitting code at the same time a week before promotion packets were due, because they were trying to adhere to the OKRs they had set out for themselves to get promoted. But all that code being shipped ended up leading to a number of incidents during that time … where they were trying to improve productivity, they actually ended up hurting themselves.”

How engineering managers are actually tracking productivity

With those words of caution out of the way, let’s explore what actually useful productivity metrics could look like. With dozens, even hundreds of possible ways to measure productivity, the data you track and how you slice it should be based on what your company is optimizing for. Here are some of the common ways companies are gathering data on productivity:

DORA metrics

The DevOps Research and Assessment (DORA) team, Dr. Nicole Forsgren, Jez Humble, and Gene Kim, share four key metrics tracked by high-performing teams in their book, Accelerate:

Deployment frequency
Lead time for changes
Median time to recovery (MTTR)
Change failure rate

The first two track development velocity, whereas the latter two are indicators of stability.

The SPACE framework

SPACE “provides a way to think rationally about productivity in a much bigger space and to choose metrics carefully in a way that reveals not only what those metrics mean, but also what their limitations are if used alone or in the wrong context.” Instead of explicitly laying out which metrics to track, SPACE proposes five dimensions of developer productivity:

Satisfaction and well-being: Employee satisfaction, burnout
Performance: Reliability of code, customer satisfaction, adoption and retention
Activity: Count of commits, PRs, code reviews, CI builds, incidents
Communication and collaboration: Documentation, time to onboard new team members, quality of reviews
Efficiency and flow: Interruptions vs focus time, number of handoffs in a process

For each dimension, there are suggestions of possible metrics and how to gather them, but the paper’s authors recommend that teams capture several metrics across at least three of the dimensions, combined with qualitative data (such as surveys).

Qualitative productivity data

Having 1:1 conversations with your reports and running surveys can give you insight into the bottlenecks (real and perceived). In both cases though, the data you uncover is a clue rather than an answer in itself, and you will likely need a blend of quantitative and qualitative data points to unearth your blockers.

“During a 1:1 an engineer expresses frustration at waiting a long time for a code review. Later on, another engineer tells you code review has been challenging because pull requests are too big … Metrics can supplement these conversations and paint a more comprehensive picture. Adding objective data points allows you and your team to better grasp the current state of your processes and have discussions that go beyond gut feel.” —Ale Paredes, How to leverage metrics to manage remotely

Aggregate metrics

Aggregate metrics can be a useful way to identify trends or bottlenecks, without getting into the weeds of individuals’ performance. These are relatively easy to gather from existing data:

How many contiguous blocks of focus time vs time fragmented by meetings do your developers have access to each week?
How many Slack messages are people sending, and are they in public or private channels vs DMs?
How long does it take for a team member to get a response when they ask for help?
How are incidents distributed among your team and/or across time zones?
How long does it take for a PR to be reviewed?

The Pragmatic Engineer also explores 17 companies’ developer productivity metrics which you can read for more inspiration.

It’s easy to get overwhelmed by all the possible measures you could be paying attention to—but keep in mind that tracking too many indicators is likely to do more harm than good. Every decision you make will have trade-offs, and if it’s unclear what the topmost priority is, you could make an argument in favor of anything. Start by identifying a handful of metrics that are most relevant to what your organization is optimizing for.

Use gamification to your advantage

There’s also nothing inherently wrong with developers gaming the system to hit target metrics if the metrics align to your organization’s goals.

“I’m convinced that most devs are intrinsically bent toward gamification and that we need to approach our methodology with that in mind. … Constantly trying to improve and build efficiencies is a good thing. Trying to move vanity metrics is not. Good leaders will encourage the former and discourage the latter… ”—Subterranean Alien on r/programming

Chosen unwisely, targets and incentives can create undesirable side effects, as seen in the example above, in which engineers’ promotion efforts resulted in an increase in incidents. Goals Gone Wild describes a number of well-intentioned goals that backfired for leaders who didn’t think through how they could be gamed: Sears’ hourly sales targets led to overcharging customers and unnecessary work, and Ford developed a small, fuel-efficient car with a major flaw—it could ignite on impact. These cautionary tales serve as a reminder to think through how your metrics might be gamed and whether those second-order effects are aligned with (or contradict) the overarching goal.

You get what you measure when it comes to developer productivity

A common blocker to shipping code is code reviews. If pull requests are growing stale and you want to encourage your team to unblock their teammates, you can set a target of “X% of pull requests are reviewed within X hours.” The metric is aligned with your goal, and targeting a percentage rather than “all” leaves some realistic room for outliers.

If you want to encourage a culture of iteration and shipping more often, targeting an increase in the number of pull requests may result in engineers ‘gaming’ the system by breaking down PRs into even smaller diffs than usual. In this case, that is the behavior you want to encourage, so everyone wins.

Tracking a blend of qualitative and aggregate quantitative data takes some of the perceived threat out of developer productivity metrics. Monitoring aggregate metrics can point you towards bottlenecks in your system, while asking your developers about their experiences helps them to feel heard and exposes inefficiencies that aren’t always visible in pure activity metrics.

Want more on boosting developer productivity? Check out part two of this developer productivity series, where we go deeper into finding and addressing bottlenecks, and ways to introduce improvements effectively.

Reader