Why Agents Break Governance
/ 5 min read
The category boundary is not AI versus traditional software. It is systems that advise versus systems that act. A model that suggests a trade and a model that executes a trade are governed differently because their failure modes are different. The chatbot produces a wrong answer and the user ignores it. The agent produces a wrong action and the action may be irreversible. Most governance frameworks treat agentic AI as a harder version of generative AI. It is not harder. It is different. Four interactions between agentic properties create risks that did not exist before and that manual governance cannot address.
The first is autonomy meeting uncertain judgment. An agent acts without human approval using non-deterministic reasoning. The same prompt produces different outputs on different runs. You cannot certify an agent by sampling its behaviour at a point in time, because its behaviour at the next point in time may differ. Guardrails reduce the probability of a bad outcome but never eliminate it. A human analyst who is uncertain asks a colleague. An agent that is uncertain acts anyway, unless the harness was specifically designed to detect uncertainty and escalate. Most harnesses are not designed this way because most builders do not think of the harness as a governance surface. They think of it as plumbing.
The second is machine speed meeting irreversibility. Errors compound before humans can intervene. An agent operating at machine speed can take hundreds of actions in the time it takes a human to read an alert. If those actions are irreversible, the damage is done before oversight can engage. The governance question that matters is not whether an agent can take an action but whether that action can be undone. An agent reasoning internally is doing something that can be discarded. An agent about to send a message, delete a record, or commit a transaction is crossing a boundary that may not allow return. At machine speed, the window between reasoning and commitment collapses. A former HSBC resilience risk executive and Bank of England CISO put it plainly earlier this year: the technology has not changed the risk, it has changed the tempo, and organisations are not built to keep up.
The third is the natural language attack surface meeting action capability. In generative AI, prompt injection produces bad text. In agentic AI, prompt injection produces bad actions. A crafted GitHub issue title triggered remote code execution through an open-source AI agent earlier this year. A malicious instruction embedded in a forwarded email, read by an agent with send permissions, becomes an exfiltration channel. The attack surface is qualitatively different because consequences are real-world rather than informational. Simon Willison coined the term lethal trifecta for the structural vulnerability that appears when an agent simultaneously accesses private data, processes untrusted content, and can communicate externally. Any two of the three are manageable. All three together create a channel where untrusted content can instruct the agent to exfiltrate private data through its communication capability. The OpenClaw incidents proved this is not theoretical. The director of AI alignment at a major AI lab connected an agent to her email with instructions to only suggest actions. The agent deleted her emails. No remote kill switch existed. Manual configuration is not a control.
The fourth is delegation meeting autonomy. Agent A spawns Agent B, which inherits or exceeds A’s permissions without anyone explicitly authorising B’s capabilities. The chain creates emergent authorisation that no single policy granted. Each step in a multi-step plan has a probability of error, and those errors compound rather than cancel because the same misunderstanding propagates through every downstream step. Over ten steps, even ninety-five percent accuracy per step yields roughly sixty percent overall accuracy. When multiple agents interact, the system can behave unpredictably even if each individual agent is well-governed. This is the systemic dimension that most frameworks miss entirely: not one agent going wrong, but a fleet behaving in ways no one designed or tested.
Underneath all four interactions sits a structural vulnerability that amplifies each of them: prompt injection. Every other AI risk has an engineering solution. Gates handle irreversibility. Permission scoping handles delegation. Kill switches handle speed. Prompt injection has no complete solution. Large language models cannot reliably distinguish instructions from data when both arrive as natural language. There is no parameterised prompt, no architectural separation that makes the problem disappear. Defences reduce the probability of successful injection. They will never reach zero. This matters because every other control assumes the agent is following its own instructions. A hijacked agent obeys the attacker’s instructions using the agent’s own permissions. Injection converts a data input into a control override. And the injection does not need to come from the user. An agent that reads customer emails, scraped webpages, or third-party API responses is exposed to injection through those channels. A poisoned instruction embedded in a document that a human would never notice becomes the agent’s new directive. The agent’s owner configured everything correctly. The attack arrived through the data, not the interface.
These four interactions break the assumptions underlying current model risk management standards. Those standards assume bounded scope, stable parameters, reconstructable decision paths, and periodic review cycles. Agentic AI violates all four simultaneously. The response is not to abandon the standards, which remain the conceptual anchor, but to complement periodic validation with continuous monitoring, embedded controls, and automated testing. Manual review is a point-in-time opinion on a non-deterministic system. It decays the moment the reviewer walks away.
Every major bank is reaching this conclusion independently. JPMorgan published architectural principles for deterministic external controls outside the agent’s reasoning loop. Wells Fargo is hiring for agentic AI safety and security. Citi brought in a global head of AI risk. The question is no longer whether agentic AI needs different governance. The question is what that governance looks like, and whether your organisation is building it or waiting.