skip to content
Topic

ai-engineering

27 essays on this topic.

  1. Move the gate to the package manager

    When a supply chain attack lands and the timeline is asking for discipline, the durable fix is one layer down — at the package manager, not at your attention span.

  2. How I Used 6 LLMs to Write One Word Doc Comment

    The value of running six frontier models isn't six perspectives — it's six chances to be wrong, which means you can set a much higher bar for what counts as right.

  3. Biology as a Design Constraint: How Cell Biology Names Generate Architecture

    Using cell biology naming not as metaphor but as engineering manual — how mTOR's biology predicted circuit breakers, autophagy, and negative feedback loops before we designed them.

  4. Your AI Agent's Quality Gate Is Lying to You

    A 96% rejection rate that was actually a 96% false positive rate — how a monitoring blind spot turned a productive overnight batch into apparent failure.

  5. Test-first dispatch for AI coding agents

    The architect writes the tests. The implementer makes them pass. No prose specs, no circular validation.

  6. I made my coding agent dispatch system improve itself

    I dispatched a 952-line monolithic CLI through my own coding-agent dispatch system to be refactored into seven modules. It worked. Notes on what self-bootstrap reveals about agent harness design.

  7. What 16,000 Simon Willison posts reveal about the state of AI coding agents

    I scraped 16,181 of Simon Willison's posts and analysed the 395 from 2026. An inflection in November 2025, GLM-5 closing the gap, and why the harness — not the model — is the competitive moat.

  8. Correctness is model-determined

    I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.

  9. The Navigation Problem in Agent Flywheels

    Your agent system shouldn't stop when the task list is empty. The real bottleneck isn't execution — it's discovering what's worth doing next.

  10. Programs Over Prompts

    The temptation in agent systems is to make everything a prompt. But most of the work is deterministic — and deterministic work deserves code, not suggestions.

  11. The Pipeline Paradox

    Monitoring systems need consumers before they need features

  12. When to Make Your Pipeline Agentic

    Most LLM pipelines don't need agents. The ones that do share a specific pattern — the step needs to decide what to do next, not just process what it's given.

  13. The CLI Boundary

    Which parts of an AI dev workflow can be wrapped in a CLI, and which can't — learned the hard way by building the wrong thing and measuring it.

  14. Guardrails Beat Guidance

    Prompt instructions are suggestions. Hooks are constraints. One survives a model swap.

  15. Your Output Is Your Selections

    AI commoditises execution. What remains is taste — the 'that's the one' reflex. And the only way to sharpen it is to ship and see what reality says back.

  16. The Skill Is Knowing What Matters

    The bottleneck in a world of AI tools isn't crafting the output — it's knowing which output is worth crafting.

  17. Push Not Pull

    AI agents that require you to go looking for their results aren't agents — they're automation with better UX. The loop closes when results arrive, not when you remember to check.

  18. The Human Bus Problem

    Adding more AI tools doesn't make you faster if you're still the junction between every agent step.

  19. The Identification Problem

    Having great AI delegation tools and not using them isn't a tool problem — it's a pattern recognition problem, and that distinction changes everything.

  20. The Last 10% Is the Feedback Loop

    The execution layer of an AI system is only half the infrastructure — the reporting layer is what determines whether anyone acts on the results.

  21. The Session Boundary Is Why You Still Don't Have AI Agents

    The gap between AI assistants and AI agents isn't about reasoning capability — it's about whether the thing can survive your laptop closing.

  22. Agentic AI in Production Looks Like a Workflow

    The gap between 'agentic AI' hype and what actually ships in production turns out to be a workflow — and that's a feature, not a failure.

  23. Shifting Priors Is Not Finding Truth

    An experiment with AI deliberation revealed something uncomfortable: accumulating confident opinions feels like convergence on truth, but isn't.

  24. The Deliberation Format Is the Product

    I ran an experiment to find where multi-model deliberation adds value. The answer surprised me: it's the structured format, not the model diversity.

  25. Stop Asking Which AI Model Is Better. Ask Which Phase.

    The planning/execution split is more useful than any benchmark comparison.

  26. LLM evals aren't data science

    Evaluating LLM systems requires judgment, not statistics. That shifts who's qualified to do it — and where the gap is in most organisations.

  27. Progressive disclosure in MCP tools

    When building MCP servers, search should return scannable summaries — not full content. Let the model decide what to read.