The governance mistake is to treat an agent like a model with extra hands.
A normal model risk process asks familiar questions. What is the input? What is the output? How was the model validated? Who approved the result? That works when the system is basically a function. It breaks when the model is embedded in a workflow that logs into tools, searches sources, calls APIs, edits documents, archives messages, or changes records.
In that world, the risk is not only that the model reasons badly. The risk is that the environment lies to the model and the transcript still looks clean. A search tool returns partial results. A credential is missing. A backend times out. A page is stale. A command succeeds but acts on the wrong state. The model then turns that partial evidence into fluent prose, and the governance record says the agent did the work.
That is not a model governance problem in the old sense. It is workflow governance.
Every’s recent piece on Codex-native apps is useful here because it reports Dan Shipper’s distinction between delegation and collaboration. Sometimes the agent goes away and does a task. Sometimes it sits beside the human while the work unfolds. Governance has to make the same distinction, because the evidence required for each mode is different.
Delegation needs completion evidence. If the agent was sent to research, triage, reconcile, or draft, the control question is not whether a human eventually saw the final answer. The control question is whether the system can prove the agent had enough valid inputs to produce that answer. Which tools ran? Which ones failed? Were all required sources reachable? Was any result empty, stale, truncated, cached, rate-limited, or unauthenticated? If the answer cannot carry that provenance, the task was not governed. It was merely supervised after the fact.
Collaboration needs attention evidence. If the human is working alongside the model, the control question is not whether the human was present. Presence is cheap. The question is whether the human saw the right intermediate state before making the decision. A reviewer who sees polished text but not the missing backend is not reviewing the work. They are reviewing the agent’s cover letter for the work.
This is where “human in the loop” becomes too vague to be useful. A human in the loop is not a control. It is a role. It becomes a control only when the system gives that human the evidence needed to intervene. An approval button attached to an opaque workflow is governance theater. A review screen that says what the agent attempted, what succeeded, what failed, and what confidence the output earned is closer to a control.
The same point applies to audit trails. Prompt and output logs are not enough for agentic systems. They preserve the conversation, but they do not preserve the work. If an agent used five tools to produce a recommendation, the audit trail needs the execution path: tool calls, parameters, source freshness, backend counts, exception states, retries, approvals, and skipped steps. Otherwise the record is legible only at the narrative layer, exactly where agents are best at smoothing over uncertainty.
This changes how AI risk should be tiered. Today, many governance processes tier AI systems by use case, model capability, data sensitivity, or impact. Those still matter, but agents add another dimension: execution dependence. A low-stakes summarizer with no tools is mostly an output-quality problem. A medium-stakes agent with weak tools can become high-risk because it acts on a world it cannot reliably observe. The danger is not intelligence. The danger is confident action against unverified state.
The practical control surface is tool truth. Before an agent makes a claim, the system should know whether the tools behind that claim were healthy enough to support it. Before an agent takes an action, the system should know whether the state it is acting on is current. Before a human approves, the system should show what evidence the approval is actually covering. These are not nice observability features. They are the control environment.
This also means assurance has to move earlier. Post-hoc review is too late if the agent has already turned missing evidence into a clean recommendation. The workflow should degrade before the prose does. If only two of seven research backends answered, the output should say so or refuse the stronger claim. If the document store returned stale content, the agent should not be allowed to cite it as current. If a credential is missing, the system should fail closed rather than invite the model to improvise around the gap.
For banks and other regulated firms, this is the shift that matters. The governed object is no longer just a model artifact. It is the chain of model, tools, data, permissions, execution state, human review, and retained evidence. You can have a perfectly reasonable model inside an ungoverned workflow. You can also have a mediocre model inside a tightly governed workflow that is easier to trust because its limits are visible.
Agent-native apps will make this unavoidable. Once software is designed for agents to inhabit, the application surface includes everything the agent can touch. Governance has to follow that surface. The control cannot stop at the model card, the prompt library, or the final approval. It has to live in the workflow itself.
The clean governance question is no longer “was a human in the loop?” It is “what did the system prove before the human was asked to trust it?”