CLI, MCP, or code mode: the answer depends on who's running the sandbox • Terry Li

Two influential takes from the last six months.

Simon Willison, October 2025: “My own interest in MCPs has waned ever since I started taking coding agents seriously. Almost everything I might achieve with an MCP can be handled by a CLI tool instead.” A month later: “I don’t use MCP at all any more when working with coding agents.”

Cloudflare, February 2026: “The Cloudflare API has over 2,500 endpoints. Exposing each one as an MCP tool would consume over 2 million tokens. With Code Mode, we collapsed all of it into two tools and roughly 1,000 tokens of context.” Their new MCP server exposes search() and execute(). The agent writes JavaScript against the OpenAPI spec inside a V8 isolate, and the spec never enters the context window.

Both are credible. Both are widely cited. They appear to contradict each other. They don’t. They’re answering different questions.

The question underneath

The real question is: where does the sandbox live, and who owns the tools?

MCP as a protocol is agnostic about both. That is its weakness. Willison and Cloudflare have each picked a concrete answer, and their answers are right for their respective situations.

Willison’s sandbox is a coding agent’s filesystem. He has Claude Code (or similar) with shell access on a machine he controls. The sandbox is the agent harness plus whatever Docker isolation he adds for risky operations. The tools are CLIs installed on his own machine. In this world, MCP is pure overhead. The CLIs are already there, the model knows how to call --help, and token-expensive tool schemas buy nothing.

Cloudflare’s sandbox is a V8 isolate running on their edge. Their tools are 2,500 API endpoints that exist whether or not an agent is calling them. They are not going to install 2,500 CLIs on every user’s machine. They need something that ships with an OpenAPI spec and can run the agent’s code close to the API. A Dynamic Worker isolate — no filesystem, no environment variables, no outbound fetch unless explicitly allowed — is the right shape for that.

Neither is wrong. They are solving different problems.

Three shapes, three criteria

Generalise this and you get three viable architectures for agent tool-use in 2026.

Agent-local CLI inventory. The agent runs on a machine it trusts. Tools are binaries on PATH. Discovery happens via --help. This is Willison’s world, and it is also the right answer for any individual developer or small team with direct machine access.

Server-side code mode. The tool owner runs a sandbox and exposes two meta-tools — search and execute. The agent writes code against a typed API, the server runs it, only the result crosses back. This is Cloudflare’s pattern, and Block’s Goose already implements a client-side variant. Anthropic’s SDK has “Programmatic Tool Calling” doing the same thing.

Hosted sandbox with curated tools. A third party provides a sandboxed Linux container with pre-installed CLIs that wrap approved APIs. The agent is given shell access inside the sandbox only. Everything is logged, network egress is whitelisted, credentials are short-lived. This is E2B, Modal, AWS Bedrock AgentCore, Agent-Infra’s AIO Sandbox, and the architecture any regulated enterprise will actually land on.

The decision criterion for each is clean.

Agent runs locally on a machine you control → CLI inventory.
You are a service provider with a large API surface → server-side code mode.
Agent runs for users who cannot be given a shell (compliance, multi-tenant, untrusted code) → hosted sandbox.

Where this leaves MCP the protocol

MCP still has a job. All three architectures above can use MCP as the transport — the thing the agent talks to. Cloudflare’s code-mode server is itself an MCP server. E2B has MCP adapters. CLI-first setups still use MCP for the handful of tools that genuinely benefit from it, like persistent browser sessions or complex multi-step workflows.

What does not survive is the default from 2025: wrap everything in MCP, expose hundreds of tools upfront, trust that context windows will grow fast enough. GitHub’s official MCP server consuming tens of thousands of tokens before the agent does anything is the shape people are retreating from.

The pattern replacing it in every case is progressive disclosure. Tools exist on disk or behind a search function. The agent discovers them when needed. The context window only sees what is relevant to the current task. Willison’s CLIs do this via --help. Cloudflare does it via search(). Hosted sandboxes do it via filesystem navigation. Claude Skills do it via frontmatter scanning.

It is the same idea reached from four directions, and it is the bet to make.

For enterprises specifically

If you are standing up agent infrastructure inside a regulated institution — a bank, a hospital, a government department — and someone hands you “just install these MCP servers,” you now have grounds to push back. The right shape is almost certainly a hosted sandbox with a curated CLI inventory wrapping your approved internal APIs. It gives you audit trails that are already native to Linux, identity scoping via short-lived credentials, network isolation via VPC rules, and change control via standard CI/CD. None of which requires a new threat model, and all of which your security team already understands.

The twelve-month forecast I would bet on: MCP survives as the glue between agents and tools. Almost no enterprise directly exposes more than a handful of tools over raw MCP. The rest live behind either a code-mode gateway or a sandbox runtime. The teams that built on sand in 2025 will be migrating in 2026.

Sources

Simon Willison, Claude Skills are awesome, maybe a bigger deal than MCP (October 2025)
Simon Willison, Code execution with MCP: Building more efficient agents (November 2025)
Anthropic, Code execution with MCP: building more efficient AI agents (November 2025)
Cloudflare, Code Mode: give agents an entire API in 1,000 tokens (February 2026)