Last Mile Software
Back to Blog
For Everyone 5 min read

The Bottleneck Moves

AI coding tools made generation fast. They didn't make software delivery easy — they moved the constraint.

by Alex Tanton
The Bottleneck Moves

Six months in, teams using AI coding tools seriously start noticing something.

The generation is faster. That part worked. A senior engineer with Claude Code on a familiar codebase can produce in an hour what used to take a day. Velocity metrics are up. Sprint demos look good. Stakeholders are happy.

And then, quietly, something else happens. The bottleneck moves.

What the demos don’t show

The productivity story about AI coding tools focuses almost entirely on generation speed. That’s the number that’s easy to measure: lines of code per hour, features per sprint, time from ticket to PR. These metrics go up. The demos and benchmarks are broadly accurate.

What’s harder to measure: what happens to that code over the next six months.

Whether the team can debug it when something breaks at 2am. Whether a new engineer can pick up the module and extend it without introducing three new bugs. Whether the assumptions the AI baked in — subtly, invisibly, because the prompt was a little vague — are still true in production when the data looks different from the test set.

Generation is the fast part of the software lifecycle. It’s also, in most engineering teams, not the bottleneck. The bottleneck is confidence: can we ship this, maintain this, extend this, own this?

AI tools are phenomenally good at making generation faster. They have essentially no effect on the confidence problem unless you build around them deliberately.

The constraint shifts

Here’s the pattern I see in teams six to twelve months into serious AI adoption:

When generation gets fast, integration becomes the constraint. PRs are larger, code is generated rather than authored, and reviewing AI output requires different eyes than reviewing hand-written code. Teams that kept their review process unchanged find that the review starts to slip. Not dramatically — just a little cursory. The generated code looks plausible. The tests pass. Nobody reads three layers deep.

When integration is managed, maintainability becomes the constraint. The code that shipped fast in Q3 is now generating subtle bugs in edge cases in Q4. The original engineers have moved on. The AI made architectural decisions that aren’t documented anywhere. The module works until the assumptions it was built on stop being true — and nobody on the current team can fully reconstruct what those assumptions were.

When velocity is up, on-call load shifts in ways that don’t show in sprint metrics. Incidents are still rare. But they’re harder to diagnose. The mean time to understand what broke increases even as the mean time to ship features decreases. That gap is where teams feel the friction without being able to name it.

The difficulty didn’t disappear. It redistributed.

The teams getting ahead of it

The teams I’ve seen navigate this well share a quality that I’d describe as disciplined absorption.

They’re not slowing down. They’re using the productivity gains from AI tooling as an opportunity to build better scaffolding — tighter review practices, more explicit documentation habits, process triggers that catch drift before it becomes an incident.

Concretely, that looks like:

Review for intent, not just implementation. When AI generates code, the review question isn’t just “does this look right” — it’s “do I understand every decision this made, and can I own it when it breaks?” That’s a higher bar, but it’s the bar that actually produces maintainable code.

Definition of done that includes comprehension. Not “tests pass” but “someone on this team can explain why this code works, what it assumes, and what would make it fail.” That someone doesn’t have to be the person who generated it. But it has to be someone.

Documentation that lives in the code, not outside it. AI agents forget context between sessions. The code needs to carry more of the why — not just what the function does, but what invariant it protects, what the edge case is, why this approach was chosen over the obvious alternative. That’s always been good practice. It’s more load-bearing now.

Monitoring calibrated to the new failure modes. AI-generated code fails differently than authored code. The failures are often subtle — not crashes, but plausible-but-wrong behavior that passes validation. Monitoring and alerting needs to be set up with that failure mode in mind, not just the failure modes you were used to.

The last mile of AI adoption

There’s a version of this story that treats the bottleneck shift as a problem with AI coding tools. I don’t think that’s right.

The tools are genuinely getting better. Generation quality is up. Context handling is better. The agents are smarter about asking clarifying questions rather than making decisions silently.

But no amount of better tooling eliminates the last mile. The last mile is always the part that’s hardest to automate — the integration with human judgment, organizational context, and institutional memory that makes code maintainable rather than just functional.

For AI coding tools, the last mile is everything that happens after generation: the review, the documentation, the test strategy, the on-call preparation, the review-process calibration. None of it is exciting. None of it shows up in the benchmarks.

All of it is what separates teams that ship AI code well from teams that just ship AI code fast.

The bottleneck moves. The work of moving it moves with it.

Need help with the last mile?

Whether you're shipping an AI project or adopting AI in your workflow, let's talk.

Schedule Free Intro Call