skip to content

Autonomy Starts at the Check


Most conversations about agent autonomy start in the wrong place. They ask how capable the model is, how long the context window is, how many tools the agent can call, or how much freedom the user is willing to grant. Those questions matter, but they are not the first boundary. The first boundary is whether the system can tell when the agent is right.

An agent is not autonomous because it can try a task. It is autonomous when the surrounding system can decide whether the attempt worked. Without that check, the agent is not operating independently. It is producing work that a human still has to interpret, inspect, and absorb the risk of. The autonomy has not disappeared. It has just been moved onto the reviewer.

This makes task routing much clearer. If the objective is clear and the result is machine-checkable, the agent can run. If the objective is clear but only a human can judge the result, the next useful move is usually to build the check. If signals exist but the objective is fuzzy, the work is to sharpen the target until those signals mean something. If neither the objective nor the verification surface is clear, calling the task autonomous is just optimism with better tooling.

This is why tests, linters, screenshots, health checks, replayable logs, and acceptance criteria are not bureaucratic accessories around agent work. They are the thing that turns effort into autonomy. A model can generate code without a test suite. It can summarize research without source-health checks. It can click through a browser without an assertion on the page state. In all three cases, the agent may still be useful, but the responsibility for knowing whether it succeeded remains outside the system.

The mistake is to route work by confidence in the agent rather than by maturity of the verification surface. A stronger model helps at the margin. It makes better first moves and recovers from messier states. But if the task has no executable notion of success, the stronger model mostly becomes a more fluent producer of unverifiable output. It may reduce the number of obvious mistakes while making the remaining uncertainty harder to see.

The better design habit is to ask what would make the task checkable before asking which agent should do it. For coding work, that may be a failing test written first. For research, it may be source counts, primary-source retrieval, freshness checks, and explicit gaps. For operational work, it may be a before-and-after state that survives the turn. For writing, it may be a rubric that rejects the known failure modes rather than a vague instruction to make the piece good.

This changes the meaning of delegation. Delegating a task to an agent without a check is not really delegation. It is asking the agent to create something and asking the human to absorb the verification burden afterward. Delegating a task with a check is different. The human defines the target and the system carries more of the loop. The agent can attempt, observe, repair, and stop because the environment has a way to answer back.

The practical unit of autonomy is therefore not the agent. It is the checked loop. A clear target, an action surface, a feedback signal, and a stop condition. When that loop exists, autonomy can expand safely. When it does not, the honest next step is not to give the agent more freedom. It is to build the missing check.