ai-engineering
27 essays on this topic.
- Move the gate to the package manager
When a supply chain attack lands and the timeline is asking for discipline, the durable fix is one layer down — at the package manager, not at your attention span.
- How I Used 6 LLMs to Write One Word Doc Comment
The value of running six frontier models isn't six perspectives — it's six chances to be wrong, which means you can set a much higher bar for what counts as right.
- Biology as a Design Constraint: How Cell Biology Names Generate Architecture
Using cell biology naming not as metaphor but as engineering manual — how mTOR's biology predicted circuit breakers, autophagy, and negative feedback loops before we designed them.
- Your AI Agent's Quality Gate Is Lying to You
A 96% rejection rate that was actually a 96% false positive rate — how a monitoring blind spot turned a productive overnight batch into apparent failure.
- Test-first dispatch for AI coding agents
The architect writes the tests. The implementer makes them pass. No prose specs, no circular validation.
- I made my coding agent dispatch system improve itself
I dispatched a 952-line monolithic CLI through my own coding-agent dispatch system to be refactored into seven modules. It worked. Notes on what self-bootstrap reveals about agent harness design.
- What 16,000 Simon Willison posts reveal about the state of AI coding agents
I scraped 16,181 of Simon Willison's posts and analysed the 395 from 2026. An inflection in November 2025, GLM-5 closing the gap, and why the harness — not the model — is the competitive moat.
- Correctness is model-determined
I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.
- The Navigation Problem in Agent Flywheels
Your agent system shouldn't stop when the task list is empty. The real bottleneck isn't execution — it's discovering what's worth doing next.
- Programs Over Prompts
The temptation in agent systems is to make everything a prompt. But most of the work is deterministic — and deterministic work deserves code, not suggestions.
- The Pipeline Paradox
Monitoring systems need consumers before they need features
- When to Make Your Pipeline Agentic
Most LLM pipelines don't need agents. The ones that do share a specific pattern — the step needs to decide what to do next, not just process what it's given.
- The CLI Boundary
Which parts of an AI dev workflow can be wrapped in a CLI, and which can't — learned the hard way by building the wrong thing and measuring it.
- Guardrails Beat Guidance
Prompt instructions are suggestions. Hooks are constraints. One survives a model swap.
- Your Output Is Your Selections
AI commoditises execution. What remains is taste — the 'that's the one' reflex. And the only way to sharpen it is to ship and see what reality says back.
- The Skill Is Knowing What Matters
The bottleneck in a world of AI tools isn't crafting the output — it's knowing which output is worth crafting.
- Push Not Pull
AI agents that require you to go looking for their results aren't agents — they're automation with better UX. The loop closes when results arrive, not when you remember to check.
- The Human Bus Problem
Adding more AI tools doesn't make you faster if you're still the junction between every agent step.
- The Identification Problem
Having great AI delegation tools and not using them isn't a tool problem — it's a pattern recognition problem, and that distinction changes everything.
- The Last 10% Is the Feedback Loop
The execution layer of an AI system is only half the infrastructure — the reporting layer is what determines whether anyone acts on the results.
- The Session Boundary Is Why You Still Don't Have AI Agents
The gap between AI assistants and AI agents isn't about reasoning capability — it's about whether the thing can survive your laptop closing.
- Agentic AI in Production Looks Like a Workflow
The gap between 'agentic AI' hype and what actually ships in production turns out to be a workflow — and that's a feature, not a failure.
- Shifting Priors Is Not Finding Truth
An experiment with AI deliberation revealed something uncomfortable: accumulating confident opinions feels like convergence on truth, but isn't.
- The Deliberation Format Is the Product
I ran an experiment to find where multi-model deliberation adds value. The answer surprised me: it's the structured format, not the model diversity.
- Stop Asking Which AI Model Is Better. Ask Which Phase.
The planning/execution split is more useful than any benchmark comparison.
- LLM evals aren't data science
Evaluating LLM systems requires judgment, not statistics. That shifts who's qualified to do it — and where the gap is in most organisations.
- Progressive disclosure in MCP tools
When building MCP servers, search should return scannable summaries — not full content. Let the model decide what to read.