The architect-implementer split: why your expensive model shouldn't write code

7 Apr 2026 · 4 min read ·

The most expensive thing your AI coding agent does is think about what to build. The actual file creation is commodity work. This is obvious once you see it, but most agent setups ignore it. They run Claude Opus or GPT-4o for everything — planning the architecture and writing the boilerplate and running the tests. That is like paying a senior architect to lay bricks.

The industry is converging on a split. An expensive, high-quality model reads the codebase, designs the solution, and produces code snippets. It never writes diffs directly. A cheap, fast model takes those snippets, places them in the right files, runs tests, and commits. It never reasons about architecture. Aider shipped this as architect/editor mode in September 2024 — Opus reasons, GPT-4o-mini applies edits. Goose has a lead/worker split where the expensive model handles the first N turns then hands off. Traycer runs Sonnet for planning and Grok-fast as parallel scouts. OpenDev defines five model roles with different capabilities.

The economic logic is simple. GLM-5.1, DeepSeek, and Qwen-family models are free or nearly free. Claude Opus is fifteen dollars per million output tokens on API, or rate-limited on a subscription. If eighty percent of your agent’s work is mechanical — and it is — you are burning expensive tokens on cheap work.

Every implementation I have found ships one or two pieces of this pattern. Nobody ships all five: model split where the architect plans and the implementer builds, task queue dispatch with durable execution and retry, coaching injection where structured feedback from reviewers is prepended to every dispatch, signal-based stall detection that catches monologue and built-nothing rather than relying on timeouts, and worker reflection where the implementer reports what surprised it after each task, feeding back into coaching.

Aider has the model split but not the rest. Goose has the split and rudimentary stall detection. OpenHands has excellent stall detection but is not designed for model-split dispatch. The combination matters because these features compound. Coaching makes the cheap model better over time. Reflection surfaces patterns that coaching misses. Stall detection prevents the cheap model from burning hours on a dead end. The task queue makes the whole thing asynchronous — dispatch work before bed, review results in the morning.

Once you have exact code in a spec, you reach the deeper insight: why send it through a language model at all? I learned this the hard way. I wrote a 38KB spec with exact Python code blocks and dispatched it to GLM-5.1. It echoed the spec back as output text. Zero files created. The model treated my instructions as a document to summarise. The fix is not a better prompt, though “You are an executor, create the files” helps. The fix is removing the model from the loop entirely for deterministic work. A manifest format where the executor creates files via bash at zero tokens, runs tests at zero tokens, and only invokes the cheap model if tests fail. Happy path cost: zero LLM tokens for file creation. This is the Aider insight taken one step further. Aider separates reasoning from editing. Manifest mode separates the deterministic from the probabilistic. If you can write the code block, you have already done the hard part. The model’s job is not to copy your code — it is to fix what you got wrong.

On my setup: Claude Opus for judgment, GLM-5.1 via ZhiPu for implementation at zero cost, Temporal for durable dispatch to a remote worker. The expensive model never touches implementation. The cheap model never touches architecture. A typical overnight batch: four to six tasks dispatched before bed, results reviewed in the morning. The architect spent maybe two thousand Opus tokens writing specs. The implementer spent unlimited free tokens executing them. Total Opus cost for a night of coding: less than a single chat message. The constraint is not the model — it is the spec quality. A prose description wastes the implementer’s turns on interpretation. Exact code blocks get copied faithfully. The architect’s job is to produce snippets sharp enough that the implementer is just a file-creation machine.

I am building this as mtor — named after the kinase that regulates when cells start translating mRNA into protein. Because that is what this is: regulating when AI agents start translating specs into code.