6 min read

Upgrade with an AI agent: Rector yes, the rest no

I let an AI agent upgrade a smaller project. Rector delivered. Everything else got left lying. An honest field report.

Upgrade with an AI agent: Rector first, then a pat on the back. Stupid only when NPM, containers, and tests just get left lying. I tried it: a smaller project, upgrade by AI agent. Everything described nicely as spec-driven development — clean problem statement, clear target state, defined quality criteria. The result was illuminating, just not the way I'd hoped.

I'm writing this because in recent weeks I've seen a lot of voices celebrating AI agents for framework upgrades as if they'd reinvented the wheel. My impression after the practice: the agent is a good junior. A very fast junior. But just like with a real junior, the question isn't whether they can do something — it's what they overlook on their own.

What the agent really did well

The Rector part it delivered. It picked the right rule sets, applied the PHP and TYPO3-specific transformations, adjusted namespaces, rewrote deprecated calls, and worked through the classes in a commit rhythm that was tidy. That's worth a lot. Anyone who's done a TYPO3 or PHP upgrade by hand knows how much grunt work sits in those steps. The agent does that part better today than I do on a tired Friday afternoon.

It also ran tests. It reacted to red output and fixed errors locally. The commits were readable, the branch structure clean. On the surface that looked very much like "done".

What the agent did not do

Only: the project wasn't Rector. The project was a whole deployment. And that's exactly where it started to crumble. The agent pushed through the PHP-side upgrade cleanly, but:

The package.json stayed on a Node major version that was officially end of life. NPM dependencies had auditable vulnerabilities nobody had touched. The frontend build setup ran unchanged afterwards, with loaders that should have been reconfigured for the new TYPO3 version. The container base images still dragged their old PHP extensions along, and two config flags whose behaviour changed in the new version were carried over unchanged.

And finally: the test coverage it kept was the test coverage of the old project. The new APIs that came in with the upgrade were untested. The agent didn't suggest closing that gap on its own. It did what was in the spec: "upgrade PHP/TYPO3, keep tests green". It kept them green. In the old scope.

Why this isn't an AI problem but a spec problem

That's the sentence that became clearer to me again and again during the review. The agent didn't make any mistakes. It did what I told it to. That's not a new observation in software work. That's the same problem as Jira tickets you hand to external developers without context. Or tenders where the customer gets exactly what's written there but not what they meant.

The difference is: with humans, we've learned that good craft also includes questioning the spec. An experienced developer doesn't take a ticket 1:1. They write a clarifying question. They say: "You want to upgrade PHP, but then we also have to do Node, otherwise the frontend breaks." I didn't get that pushback from the agent. And I'm not sure I'll get it any time soon.

My consequence

I'll keep using AI agents for upgrades. I'd be a fool not to. The Rector part of a TYPO3 upgrade takes a day off me that I can use otherwise. But I've radically expanded my spec. It's no longer called "PHP/TYPO3 to X.Y" — it now lists explicitly:

- PHP and TYPO3 major upgrade including Rector sets
- NPM/Node major upgrade plus audit and breaking-change review
- Container base images including PHP extensions and system libraries
- CI pipeline adjustments and runner versions
- Configuration review against the changelogs of the new version
- Test coverage for new APIs with target coverage

That's more work writing the spec. But it's the work I'd have to do as a senior anyway. The agent saves me typing the changes, not the thinking.

Anyone currently experimenting with AI agents and writing "Wow, this works!" on LinkedIn, please look again in four weeks. Whether the site still runs. Whether the build still works. Whether the security reports in the monitoring have got quieter, or just the customer hasn't noticed yet. Upgrade is a state, not a commit. And that state doesn't end at Rector.

Questions I often hear about this

A few things readers regularly ask me on this topic.

Which part stays human regardless?+

The thinking. I still decide what I want to upgrade, why, with which target state, and which compromises I accept. The agent handles the typing and the grunt work, I handle the direction.

What's the biggest misconception about AI-driven upgrades?+

That "tests green" equals "upgrade done". An upgrade is more than the code: containers, frontend, CI, config, security reports. If that's missing from the spec, the agent won't do it.

Would you let an agent loose on a larger client project?+

Only under review. I let the agent produce the diffs, but I review them like I would for a new team member. That eats time, but it's the insurance that nothing breaks quietly after the release.

Is spec-driven development overkill for AI-driven upgrades?+

No, I'd say the opposite. Without a clear spec the agent runs in directions you'll have to claw back later. A cleanly written spec is the one hour that saves the day.

Which agent did you actually use?+

I currently use several, depending on the project. I deliberately don't put the name in here, because the tools overtake one another every few months. The more important question for me is how I phrase the spec — the agent is interchangeable, the spec isn't.

Ich schau mir deine Spec an.

If you'd like to take this deeper

I advise individual IT leaders under OnlyOle — 1:1, no agency overhead. If that sounds relevant to you, we'll talk about your situation directly.

Zu OnlyOle