skip to content

After the Harness


The generic agent harness is becoming a bad place to compete.

That does not mean agents are over. It means the middle of the stack is being eaten. The model companies have the strongest reason to build the runtime around their models. They can tune the prompt format, tool loop, browser surface, memory layer, eval harness, sandbox, and product affordances together. When the model changes, they can change the harness with it. Everyone else building the same generic layer is fighting the owner of the terrain.

This is uncomfortable if the imagined product was “an agent platform.” It is liberating if the product was always the work.

The interesting question is not whether agents can browse, call tools, invoke MCP servers, load skills, write files, or run code. Those things matter, but they are substrate. The better question is what recurring human workflow becomes possible once that substrate is assumed.

Most enterprise work is not waiting for a smarter chat box. It is waiting for someone to redraw the path the work travels. Where does a request enter? Which sources are authoritative? Which judgment can be assisted and which judgment must remain accountable? What evidence has to survive the run? What is allowed to change automatically? What requires approval? What failure should stop the workflow rather than produce a graceful-looking answer? Where should the human be shown uncertainty, not just output?

That is not harness engineering. It is workflow design.

MCP solves connection. Skills solve reusable instruction. Neither solves the operating model. A connected agent with a skill library can still inhabit a terrible process. It can retrieve the wrong source faster, apply an obsolete rule more fluently, or escalate to a human at the one point where review is least useful. The agent did not fail. The surrounding work shape failed.

The value after the harness is in the boring nouns that rarely make the demo: intake, evidence, authority, exception, review, memory, audit, feedback. These are not decorations around an agent. They are the thing that makes an agent useful in a real institution.

This is why vertical agents are easy to mock and hard to replace. A thin vertical wrapper is fragile. If all it does is place domain language around a general-purpose model, the next platform release will swallow it. But a real vertical system is not a wrapper. It contains the domain’s evidence hierarchy, its tolerance for ambiguity, its approval boundaries, its common exceptions, its data cleanup habits, and its definition of a finished piece of work. That is much harder for a model lab to absorb from the outside.

The mistake is to think of the platform as the product. In most serious workflows, the platform is only the room where the product can finally happen.

The product is the redesigned path from request to acceptable outcome.

That path includes models and tools, but it also includes the parts a model company cannot infer from generic usage logs: institutional trust, local data quality, reviewer psychology, control appetite, customer obligations, and all the small rituals by which an organization decides that work is good enough to stand behind.

So yes, the agent harness may become commodity. Good. Commodities are useful. Roads, databases, browsers, and payment rails became more valuable when people stopped treating each one as the whole business.

The work left after the harness is the work that was always hardest to outsource: deciding what good work is, then building a system that can travel toward it without pretending the judgment disappeared.