Test-first dispatch for AI coding agents
/ 2 min read
I run an architect-implementer split for AI coding: Claude (expensive, good judgment) designs, GLM-5.1 (free, unlimited tokens) implements. The architect dispatches tasks to the implementer via Temporal workflows.
The problem I kept hitting: the implementer writes both the tests AND the implementation. The tests always pass — because they validate what was built, not what was intended. Circular validation.
The fix: the architect writes the tests
# Before (prose spec → ribosome writes everything)mtor "Add research mode to mtor. Research tasks search outside the codebase and return structured synthesis..."
# After (test file IS the spec)# CC writes assays/test_research_mode.pymtor "Make assays/test_research_mode.py pass."The test file encodes the contract — what functions should exist, what they should return, what edge cases matter. The implementer’s job is to make red tests green. No prose ambiguity.
Why this works
Tests are judgment. Writing a good test requires understanding what the code should do — its contract with callers, its edge cases, its failure modes. That’s architect work.
Implementation is execution. Making tests pass is mechanical. Read the test, understand the assertion, write code that satisfies it. That’s implementer work.
Circular validation disappears. When the same agent writes both test and implementation, it optimizes for internal consistency, not correctness. Splitting the roles breaks the loop.
Enforcement
Soft guidance drifts. I added a hard gate in the dispatch CLI:
if mode == "build" and not re.search(r"test_\w+\.py|assays/", prompt): sys.exit(_err( cmd, "Build tasks require a test file.", "NO_TEST_FILE", 'Write tests first, then: mtor "Make assays/test_feature.py pass."', ))The dispatcher refuses to send work without a test file reference. The coaching layer (prepended to every dispatch) tells the implementer: “Tests are written by the architect, not you. If dispatched without a test file, run existing tests only — do NOT write new tests yourself.”
Three enforcement layers:
- CLI gate (pre-dispatch): refuses without test reference
- Coaching (in-flight): tells implementer not to write its own tests
- Chaperone review (post-build): rejects diffs that add functions without corresponding tests
The deeper insight
This isn’t just about AI agents. It’s the same principle behind TDD in human teams: the person who defines “done” shouldn’t be the person who declares “done.” Separating specification from implementation is as old as engineering itself.
The AI version just makes it literal: the architect writes test_feature.py, the implementer runs pytest until it’s green. The test file IS the interface between judgment and execution.