The Session Boundary Is Why You Still Don't Have AI Agents

I was wrapping up a late session a few weeks ago, trying to dispatch a set of background audit agents before closing my terminal. The idea was straightforward: kick off some autonomous checks, go to sleep, wake up to results. Instead, I watched each agent die the moment the session closed. The terminal window shut. The processes vanished. The work had not happened.

That’s when it clicked. We’ve spent two years talking about AI agents, and almost everything marketed as an “agent” today is nothing of the sort. The distinction isn’t about how well the model reasons. It isn’t about chain-of-thought or tool use or context windows. It’s about one brutally simple question: does this thing require a human to be present while it runs?

If the answer is yes, you have a very sophisticated interactive tool. You do not have an agent.

This sounds like a minor implementation detail until you try to build anything that actually runs while you sleep. Every major AI assistant, every “agentic workflow” framework, every demo that starts with “watch it autonomously do X” implicitly assumes someone is watching. Close the laptop and it stops. The session boundary is load-bearing in a way that almost nobody talks about, because the demos never show the moment the user disconnects.

The plumbing to fix this is not particularly hard. You need a persistent execution environment — a process that doesn’t die with a terminal session — and you need some kind of state store so the agent can resume after failures rather than starting over. Daemonize the process, write state to disk, add retry logic with backoff. Senior engineers have been doing this with cron jobs and message queues for decades. The AI layer sits on top of infrastructure patterns that are genuinely old and well-understood. In that sense, the plumbing is almost free.

What is not free is the trust framework. And this is where the “AI agents for enterprise” conversation usually goes wrong, because the hard questions aren’t technical.

When an agent runs autonomously overnight and hits a decision it wasn’t explicitly designed for, what does it do? If it proceeds, who is accountable for the outcome? If it escalates, to whom, through what channel, and what happens if no one is available? When it fails — and it will fail — what’s the audit trail? Can you reconstruct what the agent decided and why? Can you demonstrate to a regulator that the process was controlled, even though no human was present at the moment of execution?

Banks are a useful case study because the appetite is real and the constraints are genuine. Every large bank I’ve encountered over the past two years has some version of an “AI agents” initiative. The vision is usually the same: autonomous processes that handle routine work, flag anomalies, maybe draft initial analyses. The gap between that vision and what actually ships is almost never the model quality. It’s the operating model. What can the agent decide autonomously? What must it escalate, and through which channel, and with what latency? Where do results surface? Who owns the agent’s outputs in the same way a person owns their work product?

These questions sound like change management, which is why technologists sometimes wave them away as someone else’s problem. They’re not. They are the product. An AI agent without a trust framework is a liability, not a capability. It’s a process that runs invisibly, makes decisions nobody reviewed, and produces outputs that nobody officially owns. For a bank, that’s not just unhelpful — it’s actively dangerous.

The session boundary is a useful forcing function because it makes this concrete. You cannot build a session-independent execution layer without immediately confronting the operating model questions. Who restarts the agent if it crashes at 3am? Where do results go if the person who kicked it off has left the company? What’s the escalation path if the agent hits a genuinely novel situation mid-run? The moment you try to answer these questions, you realize the AI reasoning is the easy part. The hard part is the institutional scaffolding around it.

So where does this lead? If the missing ingredient is the trust framework and not the model capability, then the competitive moat for anyone building AI agent infrastructure isn’t smarter reasoning — it’s the audit trail, the escalation protocol, the accountability chain. The teams that figure out session-independent execution with robust observability will run circles around teams still iterating on prompts. And the organizations that treat “agentic AI” as a governance problem first, a technology problem second, will actually ship agents that do things while you sleep — instead of watching them die when you close your laptop.

That reframe — governance first, technology second — is uncomfortable for most technology teams. It should be uncomfortable. It means the product work starts long before anyone writes a line of code.

Related Articles

LLM evals aren't data science

Guardrails Beat Guidance

Your Output Is Your Selections

Share this article