I run an architect-implementer split for AI coding: Claude for design, GLM-5.1 for implementation. The architect dispatches tasks to the implementer via Temporal workflows. The problem I kept hitting: the implementer writes both the tests and the implementation. The tests always pass — because they validate what was built, not what was intended. Circular validation.
The fix is that the architect writes the tests. The test file encodes the contract — what functions should exist, what they should return, what edge cases matter. The implementer’s job is to make red tests green. Instead of dispatching “add research mode to mtor with structured synthesis,” I write the test file first, then dispatch “make this test file pass.” No prose ambiguity.
Tests are judgment. Writing a good test requires understanding what the code should do — its contract with callers, its edge cases, its failure modes. That is architect work. Implementation is execution. Making tests pass is mechanical — read the test, understand the assertion, write code that satisfies it. That is implementer work. When the same agent writes both test and implementation, it optimises for internal consistency, not correctness. Splitting the roles breaks the loop.
Soft guidance drifts, so I added hard enforcement in three layers. A CLI gate at dispatch time refuses to send work without a test file reference. A coaching note prepended to every dispatch tells the implementer that tests are written by the architect and it should not write new tests itself. A chaperone review after the build rejects diffs that add functions without corresponding tests.
This is not just about AI agents. It is the same principle behind TDD in human teams: the person who defines done should not be the person who declares done. Separating specification from implementation is as old as engineering itself. The AI version just makes it literal: the architect writes the test file, the implementer runs pytest until it is green. The test file is the interface between judgment and execution.