The Unexplainable Alpha

The ceiling on my AI agent systems is not execution. Execution is the part that works. I can run eight agents overnight, farm out research across models, coordinate multi-step workflows that would have taken a week of manual effort. The ceiling is what I point them at. The quality of the directive is the binding constraint, and no amount of orchestration sophistication fixes a bad one.

This is the taste problem. Not taste in the aesthetic sense — not “good design” or “refined preferences.” Taste as value forecasting. The ability to look at a space of possible work and identify what will matter before the evidence is in. Applied to AI agents, the taste question is deceptively simple: what should we work on next?

Most people treat that as a planning question. It’s not. Planning is what happens after the taste call. The taste call is upstream: which problems are worth solving, which angles are worth pursuing, which deliverables will compound rather than decay. A well-planned sprint on the wrong problem is still waste. An agent system that executes flawlessly against a mediocre brief produces mediocre output at scale.

Taste is distinct from popularity, which is what other people value now. Taste is knowing what’s good before the market confirms it — value forecasting, not trend following. Every practitioner has had the experience of building something that felt right — felt important — before they could fully articulate why, and then watching the landscape shift to validate the bet six months later. That’s taste operating. It’s also the experience of watching someone with bad taste execute brilliantly on something nobody ends up needing.

VCs have been running this playbook for decades. Early-stage investing is pure taste — no data, no market validation, a hundred pitches a week, pick two. The returns are power-law: one good taste call pays for fifty bad ones. The associates can do everything except the taste part. Same pattern applies to anyone running AI agent systems: the agents are the associates, and your taste is the portfolio allocation.

The uncomfortable part: taste requires courage. Forecasting value means committing to a view before the evidence is complete. Publishing, shipping, betting resources on a direction when the only signal you have is your own accumulated judgment. The person who waits for consensus before acting has outsourced their taste to the crowd, and the crowd’s taste is mediocre by definition — it’s an average.

I’ve tried to encode taste into my agent systems. Quality bars, scoring rubrics, heuristics mined from hundreds of sessions. Some of it works. I can tell an agent “don’t produce anything below this standard” and the output improves. I can give it a rubric that captures 70% of what I’d flag in a manual review. But the remaining 30% is where the real taste lives, and it resists encoding because the world generates novel situations faster than I can write rules for them. This is the expert system lesson from the 1980s, replaying in a new context: codified judgment handles the cases you’ve seen before, but taste is specifically the capacity to judge cases you haven’t.

The gap between encoded taste and actual taste is measurable. I track what I produce versus what I actually use — what gets referenced again, what influences a decision, what compounds into something larger. The things I produce that I never touch again are my taste errors. They felt worth doing at the time. They weren’t. The gap between “seemed worth doing” and “was worth doing” is the taste error rate, and reducing it is the real skill development in this era. Not prompt engineering. Not agent architecture. The upstream judgment about what to build.

Here’s the arbitrage I keep coming back to. Right now, most people using AI agents are prompting for summaries. Asking for first drafts. Using the technology to do the same things slightly faster. The people who are encoding their taste — their judgment about what matters, their quality standards, their instinct for which problems are load-bearing — into systems that run autonomously are building something that compounds. The encoded taste gets tested against reality every time an agent produces output. The feedback loop tightens. The heuristics sharpen.

This window has maybe two or three years. After that, the infrastructure for encoding judgment into agent systems will be commoditized too, and the advantage will shift to whoever started accumulating calibrated taste earliest. The moat isn’t the system. The moat is the years of encoded, tested, refined judgment running through the system.

Everything else in this stack is a solved problem on a long enough timeline. Faster models, better orchestration, cheaper inference, more capable tool use — all of it converges. The thing that doesn’t converge is the quality of the question you ask before any of that machinery starts running.

In finance they call it alpha — the returns that survive after you strip out every explainable factor. Quants have spent forty years encoding alpha into algorithms. They keep succeeding partially, each factor captured and commoditized, and yet the residual persists. Taste is the unexplainable alpha of knowledge work. It shrinks as you encode more, but it never hits zero, because the world keeps generating situations your rules haven’t seen.

Taste is the last moat because it’s the one thing that has to be earned.


Related: [Agentic Engineering Principles](Agentic Engineering Principles) · [Exoskeleton, Not Colleague](Exoskeleton, Not Colleague) · [AI Thinking Partner Not QA Bot](AI Thinking Partner Not QA Bot) · programs-over-prompts