Topic

ai-engineering

27 essays on this topic.

Papers

13 May 2026
Move the gate to the package manager
When a supply chain attack lands and the timeline is asking for discipline, the durable fix is one layer down — at the package manager, not at your attention span.
18 Apr 2026
How I Used 6 LLMs to Write One Word Doc Comment
The value of running six frontier models isn't six perspectives — it's six chances to be wrong, which means you can set a much higher bar for what counts as right.
8 Apr 2026
Biology as a Design Constraint: How Cell Biology Names Generate Architecture
Using cell biology naming not as metaphor but as engineering manual — how mTOR's biology predicted circuit breakers, autophagy, and negative feedback loops before we designed them.
8 Apr 2026
Your AI Agent's Quality Gate Is Lying to You
A 96% rejection rate that was actually a 96% false positive rate — how a monitoring blind spot turned a productive overnight batch into apparent failure.
7 Apr 2026
Test-first dispatch for AI coding agents
The architect writes the tests. The implementer makes them pass. No prose specs, no circular validation.
7 Apr 2026
I made my coding agent dispatch system improve itself
I dispatched a 952-line monolithic CLI through my own coding-agent dispatch system to be refactored into seven modules. It worked. Notes on what self-bootstrap reveals about agent harness design.
7 Apr 2026
What 16,000 Simon Willison posts reveal about the state of AI coding agents
I scraped 16,181 of Simon Willison's posts and analysed the 395 from 2026. An inflection in November 2025, GLM-5 closing the gap, and why the harness — not the model — is the competitive moat.
7 Apr 2026
Correctness is model-determined
I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.
19 Mar 2026
The Navigation Problem in Agent Flywheels
Your agent system shouldn't stop when the task list is empty. The real bottleneck isn't execution — it's discovering what's worth doing next.
19 Mar 2026
Programs Over Prompts
The temptation in agent systems is to make everything a prompt. But most of the work is deterministic — and deterministic work deserves code, not suggestions.
17 Mar 2026
The Pipeline Paradox
Monitoring systems need consumers before they need features
14 Mar 2026
When to Make Your Pipeline Agentic
Most LLM pipelines don't need agents. The ones that do share a specific pattern — the step needs to decide what to do next, not just process what it's given.
14 Mar 2026
The CLI Boundary
Which parts of an AI dev workflow can be wrapped in a CLI, and which can't — learned the hard way by building the wrong thing and measuring it.
13 Mar 2026
Guardrails Beat Guidance
Prompt instructions are suggestions. Hooks are constraints. One survives a model swap.
13 Mar 2026
Your Output Is Your Selections
AI commoditises execution. What remains is taste — the 'that's the one' reflex. And the only way to sharpen it is to ship and see what reality says back.
13 Mar 2026
The Skill Is Knowing What Matters
The bottleneck in a world of AI tools isn't crafting the output — it's knowing which output is worth crafting.
13 Mar 2026
Push Not Pull
AI agents that require you to go looking for their results aren't agents — they're automation with better UX. The loop closes when results arrive, not when you remember to check.
13 Mar 2026
The Human Bus Problem
Adding more AI tools doesn't make you faster if you're still the junction between every agent step.
13 Mar 2026
The Identification Problem
Having great AI delegation tools and not using them isn't a tool problem — it's a pattern recognition problem, and that distinction changes everything.
13 Mar 2026
The Last 10% Is the Feedback Loop
The execution layer of an AI system is only half the infrastructure — the reporting layer is what determines whether anyone acts on the results.
13 Mar 2026
The Session Boundary Is Why You Still Don't Have AI Agents
The gap between AI assistants and AI agents isn't about reasoning capability — it's about whether the thing can survive your laptop closing.
13 Mar 2026
Agentic AI in Production Looks Like a Workflow
The gap between 'agentic AI' hype and what actually ships in production turns out to be a workflow — and that's a feature, not a failure.
13 Mar 2026
Shifting Priors Is Not Finding Truth
An experiment with AI deliberation revealed something uncomfortable: accumulating confident opinions feels like convergence on truth, but isn't.
13 Mar 2026
The Deliberation Format Is the Product
I ran an experiment to find where multi-model deliberation adds value. The answer surprised me: it's the structured format, not the model diversity.
12 Mar 2026
Stop Asking Which AI Model Is Better. Ask Which Phase.
The planning/execution split is more useful than any benchmark comparison.
11 Mar 2026
LLM evals aren't data science
Evaluating LLM systems requires judgment, not statistics. That shifts who's qualified to do it — and where the gap is in most organisations.
11 Mar 2026
Progressive disclosure in MCP tools
When building MCP servers, search should return scannable summaries — not full content. Let the model decide what to read.