Posts about ai-engineering

8 Apr 2026
Biology as a Design Constraint: How Cell Biology Names Generate Architecture
Using cell biology naming not as metaphor but as engineering manual — how mTOR's biology predicted circuit breakers, autophagy, and negative feedback loops before we designed them.
8 Apr 2026
Your AI Agent's Quality Gate Is Lying to You
A 96% rejection rate that was actually a 96% false positive rate — how a monitoring blind spot turned a productive overnight batch into apparent failure.
7 Apr 2026
Test-first dispatch for AI coding agents
The architect writes the tests. The implementer makes them pass. No prose specs, no circular validation.
7 Apr 2026
Correctness is model-determined
I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.
7 Apr 2026
I made my coding agent dispatch system improve itself
mtor dispatched a coding task to improve itself — the tool that sends work to AI agents was improved by an AI agent.
7 Apr 2026
What 16,000 Simon Willison posts reveal about the state of AI coding agents
Analysis of Simon Willison's blog corpus reveals AI coding agents crossed a reliability threshold in late 2025 and are now reshaping software engineering.
19 Mar 2026
The Navigation Problem in Agent Flywheels
Your agent system shouldn't stop when the task list is empty. The real bottleneck isn't execution — it's discovering what's worth doing next.
19 Mar 2026
Programs Over Prompts
The temptation in agent systems is to make everything a prompt. But most of the work is deterministic — and deterministic work deserves code, not suggestions.
17 Mar 2026
The Pipeline Paradox
Monitoring systems need consumers before they need features
14 Mar 2026
When to Make Your Pipeline Agentic
Most LLM pipelines don't need agents. The ones that do share a specific pattern — the step needs to decide what to do next, not just process what it's given.
14 Mar 2026
The CLI Boundary
Which parts of an AI dev workflow can be wrapped in a CLI, and which can't — learned the hard way by building the wrong thing and measuring it.
13 Mar 2026
Guardrails Beat Guidance
Prompt instructions are suggestions. Hooks are constraints. One survives a model swap.
13 Mar 2026
Your Output Is Your Selections
AI commoditises execution. What remains is taste — the 'that's the one' reflex. And the only way to sharpen it is to ship and see what reality says back.
13 Mar 2026
The Skill Is Knowing What Matters
The bottleneck in a world of AI tools isn't crafting the output — it's knowing which output is worth crafting.
13 Mar 2026
Push Not Pull
AI agents that require you to go looking for their results aren't agents — they're automation with better UX. The loop closes when results arrive, not when you remember to check.
13 Mar 2026
The Human Bus Problem
Adding more AI tools doesn't make you faster if you're still the junction between every agent step.
13 Mar 2026
The Identification Problem
Having great AI delegation tools and not using them isn't a tool problem — it's a pattern recognition problem, and that distinction changes everything.
13 Mar 2026
The Last 10% Is the Feedback Loop
The execution layer of an AI system is only half the infrastructure — the reporting layer is what determines whether anyone acts on the results.
13 Mar 2026
The Session Boundary Is Why You Still Don't Have AI Agents
The gap between AI assistants and AI agents isn't about reasoning capability — it's about whether the thing can survive your laptop closing.
13 Mar 2026
Agentic AI in Production Looks Like a Workflow
The gap between 'agentic AI' hype and what actually ships in production turns out to be a workflow — and that's a feature, not a failure.
13 Mar 2026
Shifting Priors Is Not Finding Truth
An experiment with AI deliberation revealed something uncomfortable: accumulating confident opinions feels like convergence on truth, but isn't.
13 Mar 2026
The Deliberation Format Is the Product
I ran an experiment to find where multi-model deliberation adds value. The answer surprised me: it's the structured format, not the model diversity.
12 Mar 2026
Stop Asking Which AI Model Is Better. Ask Which Phase.
The planning/execution split is more useful than any benchmark comparison.
11 Mar 2026
LLM evals aren't data science
Evaluating LLM systems requires judgment, not statistics. That shifts who's qualified to do it — and where the gap is in most organisations.
11 Mar 2026
Progressive disclosure in MCP tools
When building MCP servers, search should return scannable summaries — not full content. Let the model decide what to read.

Posts about ai-engineering

Biology as a Design Constraint: How Cell Biology Names Generate Architecture Using cell biology naming not as metaphor but as engineering manual — how mTOR's biology predicted circuit breakers, autophagy, and negative feedback loops before we designed them.

Your AI Agent's Quality Gate Is Lying to You A 96% rejection rate that was actually a 96% false positive rate — how a monitoring blind spot turned a productive overnight batch into apparent failure.

Test-first dispatch for AI coding agents The architect writes the tests. The implementer makes them pass. No prose specs, no circular validation.

Correctness is model-determined I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.

I made my coding agent dispatch system improve itself mtor dispatched a coding task to improve itself — the tool that sends work to AI agents was improved by an AI agent.

What 16,000 Simon Willison posts reveal about the state of AI coding agents Analysis of Simon Willison's blog corpus reveals AI coding agents crossed a reliability threshold in late 2025 and are now reshaping software engineering.

The Navigation Problem in Agent Flywheels Your agent system shouldn't stop when the task list is empty. The real bottleneck isn't execution — it's discovering what's worth doing next.

Programs Over Prompts The temptation in agent systems is to make everything a prompt. But most of the work is deterministic — and deterministic work deserves code, not suggestions.

The Pipeline Paradox Monitoring systems need consumers before they need features

When to Make Your Pipeline Agentic Most LLM pipelines don't need agents. The ones that do share a specific pattern — the step needs to decide what to do next, not just process what it's given.

The CLI Boundary Which parts of an AI dev workflow can be wrapped in a CLI, and which can't — learned the hard way by building the wrong thing and measuring it.

Guardrails Beat Guidance Prompt instructions are suggestions. Hooks are constraints. One survives a model swap.

Your Output Is Your Selections AI commoditises execution. What remains is taste — the 'that's the one' reflex. And the only way to sharpen it is to ship and see what reality says back.

The Skill Is Knowing What Matters The bottleneck in a world of AI tools isn't crafting the output — it's knowing which output is worth crafting.

Push Not Pull AI agents that require you to go looking for their results aren't agents — they're automation with better UX. The loop closes when results arrive, not when you remember to check.

The Human Bus Problem Adding more AI tools doesn't make you faster if you're still the junction between every agent step.

The Identification Problem Having great AI delegation tools and not using them isn't a tool problem — it's a pattern recognition problem, and that distinction changes everything.

The Last 10% Is the Feedback Loop The execution layer of an AI system is only half the infrastructure — the reporting layer is what determines whether anyone acts on the results.

The Session Boundary Is Why You Still Don't Have AI Agents The gap between AI assistants and AI agents isn't about reasoning capability — it's about whether the thing can survive your laptop closing.

Agentic AI in Production Looks Like a Workflow The gap between 'agentic AI' hype and what actually ships in production turns out to be a workflow — and that's a feature, not a failure.

Shifting Priors Is Not Finding Truth An experiment with AI deliberation revealed something uncomfortable: accumulating confident opinions feels like convergence on truth, but isn't.

The Deliberation Format Is the Product I ran an experiment to find where multi-model deliberation adds value. The answer surprised me: it's the structured format, not the model diversity.

Stop Asking Which AI Model Is Better. Ask Which Phase. The planning/execution split is more useful than any benchmark comparison.

LLM evals aren't data science Evaluating LLM systems requires judgment, not statistics. That shifts who's qualified to do it — and where the gap is in most organisations.

Progressive disclosure in MCP tools When building MCP servers, search should return scannable summaries — not full content. Let the model decide what to read.

Biology as a Design Constraint: How Cell Biology Names Generate Architecture
Using cell biology naming not as metaphor but as engineering manual — how mTOR's biology predicted circuit breakers, autophagy, and negative feedback loops before we designed them.

Your AI Agent's Quality Gate Is Lying to You
A 96% rejection rate that was actually a 96% false positive rate — how a monitoring blind spot turned a productive overnight batch into apparent failure.

Test-first dispatch for AI coding agents
The architect writes the tests. The implementer makes them pass. No prose specs, no circular validation.

Correctness is model-determined
I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.

I made my coding agent dispatch system improve itself
mtor dispatched a coding task to improve itself — the tool that sends work to AI agents was improved by an AI agent.

What 16,000 Simon Willison posts reveal about the state of AI coding agents
Analysis of Simon Willison's blog corpus reveals AI coding agents crossed a reliability threshold in late 2025 and are now reshaping software engineering.

The Navigation Problem in Agent Flywheels
Your agent system shouldn't stop when the task list is empty. The real bottleneck isn't execution — it's discovering what's worth doing next.

Programs Over Prompts
The temptation in agent systems is to make everything a prompt. But most of the work is deterministic — and deterministic work deserves code, not suggestions.

The Pipeline Paradox
Monitoring systems need consumers before they need features

When to Make Your Pipeline Agentic
Most LLM pipelines don't need agents. The ones that do share a specific pattern — the step needs to decide what to do next, not just process what it's given.

The CLI Boundary
Which parts of an AI dev workflow can be wrapped in a CLI, and which can't — learned the hard way by building the wrong thing and measuring it.

Guardrails Beat Guidance
Prompt instructions are suggestions. Hooks are constraints. One survives a model swap.

Your Output Is Your Selections
AI commoditises execution. What remains is taste — the 'that's the one' reflex. And the only way to sharpen it is to ship and see what reality says back.

The Skill Is Knowing What Matters
The bottleneck in a world of AI tools isn't crafting the output — it's knowing which output is worth crafting.

Push Not Pull
AI agents that require you to go looking for their results aren't agents — they're automation with better UX. The loop closes when results arrive, not when you remember to check.

The Human Bus Problem
Adding more AI tools doesn't make you faster if you're still the junction between every agent step.

The Identification Problem
Having great AI delegation tools and not using them isn't a tool problem — it's a pattern recognition problem, and that distinction changes everything.

The Last 10% Is the Feedback Loop
The execution layer of an AI system is only half the infrastructure — the reporting layer is what determines whether anyone acts on the results.

The Session Boundary Is Why You Still Don't Have AI Agents
The gap between AI assistants and AI agents isn't about reasoning capability — it's about whether the thing can survive your laptop closing.

Agentic AI in Production Looks Like a Workflow
The gap between 'agentic AI' hype and what actually ships in production turns out to be a workflow — and that's a feature, not a failure.

Shifting Priors Is Not Finding Truth
An experiment with AI deliberation revealed something uncomfortable: accumulating confident opinions feels like convergence on truth, but isn't.

The Deliberation Format Is the Product
I ran an experiment to find where multi-model deliberation adds value. The answer surprised me: it's the structured format, not the model diversity.

Stop Asking Which AI Model Is Better. Ask Which Phase.
The planning/execution split is more useful than any benchmark comparison.

LLM evals aren't data science
Evaluating LLM systems requires judgment, not statistics. That shifts who's qualified to do it — and where the gap is in most organisations.

Progressive disclosure in MCP tools
When building MCP servers, search should return scannable summaries — not full content. Let the model decide what to read.