skip to content

After Automation, Judgment Becomes Infrastructure


Dan Shipper’s “After Automation” argues that as AI makes execution cheaper, the scarce work shifts toward framing, taste, and judgment. I think the next step is to ask what happens when that judgment itself has to become infrastructure.

The usual story about automation is still too attached to the old bottleneck. It imagines a task with a human on one side and a machine on the other. The machine gets better, takes more of the task, and the human either supervises or moves on. That story works for a factory step, a spreadsheet macro, or a narrow software workflow. It is less useful once the machine can draft, search, compare, plan, call tools, and produce a plausible next action in domains where the problem was never only execution.

Once execution gets cheap, the important question is not “can this be done?” More things can be done than should be done. The important questions become: what is worth attempting, what frame should govern the attempt, what standard should the output meet, what evidence would make it acceptable, and who is responsible for accepting it. Automation moves work out of the visible middle and into the edges. Upstream, the human has to define the task well enough for the system to aim at the right thing. Downstream, the human or institution has to decide whether the result deserves reliance.

That downstream step is often described as review, but review is too small a word. Review is not a person glancing over an answer after the productive work is complete. In serious work, review is part of production. The standard you will use to judge the answer changes the answer you should ask for. A system that knows it must preserve source links, quote sensitive text exactly, separate fact from inference, or route uncertain cases to a human will behave differently from one asked to “write a summary.” The acceptance criteria are not paperwork after the work. They are the work’s shape.

This is why the most interesting AI systems are not just faster versions of old workflows. They are memory systems for judgment. A good workflow does not merely produce an output. It records why the output was acceptable, which failure mode was caught, what instruction should change next time, and which part of the process can be made deterministic. The compounding asset is not the prompt. The compounding asset is the loop that turns a corrected mistake into a future constraint.

Prompts are a fragile place to store that kind of learning. They are too easy to copy without context, too easy to bloat, and too easy to confuse with the work itself. Some judgment belongs in prompts, especially the part that describes intent and taste. But repeated judgment wants to harden. A recurring check becomes a rubric. A recurring rubric becomes a test. A recurring manual search becomes a tool. A recurring correction becomes a guardrail. The system gets better when the human’s judgment leaves a residue that future work can actually use.

This is the part of “AI-native work” that still feels under-described. Many teams will get good at using agents to create more drafts, more options, more analyses, and more implementation attempts. That is useful, but it also increases the supply of competent sameness. If every team can generate a reasonable first pass, the differentiator is no longer first-pass competence. It is the quality of the taste loop around it. The team that knows what to reject, what to preserve, what to make repeatable, and what never to delegate will pull away from the team that merely produces more.

In regulated or high-stakes environments, this is not an aesthetic point. It becomes a control design problem. If AI reduces the cost of execution, then controls cannot sit only around execution. They have to move upstream into task framing and downstream into acceptance. What is the system allowed to attempt? Which sources is it allowed to rely on? Which claims require evidence? Which outputs need human signoff? Which actions are reversible? Which failures should become tests? These are governance questions, but they are also product questions. The control surface is the user experience.

The tempting mistake is to treat the human as the control. Put a person in the loop, say judgment remains central, and declare the system safe enough. But a human in the loop is not automatically judgment in the loop. If the system gives the reviewer no evidence trail, no uncertainty signal, no comparison point, no memory of prior failures, and no cheap way to reject or correct the output, then the human is mostly a liability sponge. The workflow has preserved accountability without preserving agency.

A better system makes judgment easier to exercise and harder to forget. It shows the basis for claims. It narrows the review surface to the parts that matter. It distinguishes generation from acceptance. It remembers corrections. It routes repeated checks out of human attention and into deterministic machinery. It lets the human spend less time being a ceremonial checkpoint and more time doing the work that remains scarce: choosing the frame, naming the standard, noticing the exception, and deciding what can be trusted.

That is the real post-automation frontier. Not a world where agents do everything, and not a retreat into artisanal human judgment as a badge of superiority. The frontier is the design of systems where human judgment compounds. The human does not disappear. The human also does not remain trapped at the keyboard, performing every step by hand to prove seriousness. The human sets the shape, the machine increases the number of attempts, and the system turns the best corrections into infrastructure.

After automation, judgment becomes the product. The next question is whether it can also become durable.