Article Image
read

Our team has mostly moved to agent-driven development. Agents create tickets, write the code, open PRs, and post review comments. The output is way up. More PRs, more lines, more velocity.

But there’s also more bugs. A lot of recent PRs are just fixing regressions from other agent-generated PRs. Or reverting changes that broke something adjacent.

I don’t think we’re unique here. Any team using agents heavily is probably hitting this.

The review burden

Here’s the part I didn’t expect: reviewing PRs as a human is more tiring now, not less.

Net productivity has increased. But the lines of code per feature has increased too, as well as PR comments (agents tend to be very verbose). So much more to read that real human review can no longer catch up.

The diffs are bigger. The context is harder to trace. I have to check whether the PR accidentally undoes something someone else shipped last week. To review properly, I need to understand the problem, check the history, trace the affected systems, verify the approach doesn’t break adjacent behavior.

At that point I’ve basically done the same work as if I’d taken the task myself. Except I didn’t choose the approach, so now I’m reverse-engineering someone else’s (or some agent’s) decisions.

And if I use an agent to review? Agent reviews flag a mix of real issues, edge cases that’ll never happen, and outright non-issues. When the developer tells the agent “address all review comments” without triaging, the follow-up commits sometimes introduce regressions from changes that didn’t need to happen.

The code gets messier with each round.

A shift in code review

For the last 20 years, PR review was where senior developers mentored juniors. You’d point out conventions, share domain knowledge, explain why a certain approach is better. The junior reads your comments, learns, writes better code next time.

It occurs to me that my PR comments are no longer being read by humans. An agent reads them, applies the fixes, moves on. The developer probably never sees them.

If I point out a better convention or a cleaner pattern, there’s really no point. The agent won’t remember it next time. And honestly, the agent knows the better approach.

So if my review comments are just a conversation with an agent that forgets everything by the next PR, what’s the point?

I think the point changes.

You’re not mentoring anymore.

Focus on architecture and boundaries

What’s still worth reviewing is the shape of the solution. Not the code itself.

Does this logic belong on the client or the server? Should this live in the API layer or the domain layer? Is this feature creeping into a service that shouldn’t own it? Should this be a separate module, a separate boundary entirely?

These are architectural questions. Agents don’t think about them. They solve the ticket in front of them, in the file in front of them. They won’t suggest “actually, this problem is better solved two layers up” or “this should be a server-side concern, not client-side.” They don’t reason about where responsibilities should live across a system.

So that’s what review becomes. You’re reviewing the solution, not the code. Not the style, not the comments, not the naming. Leave that to linters. There’s too much code now anyway, and honestly, agents read code better than we do.

The value of a human reviewer is pointing out: “This works, but it’s in the wrong place.”

Responsibility hasn’t changed

One thing that hasn’t shifted: whoever works on the task owns it.

The reviewer’s job is to point out how things could be done better. That’s it. The reviewer doesn’t take on responsibility for the outcome.

The developer who opens the PR is responsible. They have to test it. In fact, testing matters more now than before, because the developer didn’t write the code. An agent did. You can’t rely on the intuition of “I wrote this, I know it works.” You have to actually verify.

The developer has to own that baseline: does this work, does it break anything adjacent, is it actually correct. And when something breaks in production later, the developer picks it up and fixes it. Not the reviewer. Not the agent.

More agent code means more code. More code means more bugs. That’s an invariant that never changes.


Image

@samwize

¯\_(ツ)_/¯

Back to Home