Posts
RSS feed- I Built 200 CLIs for My AI. Here's What Actually Matters.
A Chinese article argues CLI is becoming the AI plugin format. I've been living this for months with 442 tools. The article is right about CLI. It's wrong about what makes CLI work.
- The Template Is the Schema
Seven PyPI releases of a CV generation tool in one afternoon taught me that template-guided synthesis lives and dies by what the template already contains.
- Assume the LLM never ran
A 208 MB log, 59,356 retries, and zero LLM calls. A debugging story about what happens when the symptom lies about the cause.
- Same Trigger, One Skill
A simple rule for keeping AI agent skill systems coherent: if two skills fire on the same trigger, merge them. Different trigger, different skill. No exceptions.
- Overnight Autonomous AI Coding: What Actually Works
I left an AI coding pipeline running overnight with 21 monitoring cycles. 5 features merged, 10 specs dispatched, 3 root causes found. Here's what worked, what broke, and the quality of the output.
- The reversible direction
When choosing CLI vs MCP, pick the one you can undo. CLI wraps into MCP cheaply. MCP does not unwrap.
- The One Env Var That Cost a Day
ANTHROPIC_API_KEY vs ANTHROPIC_AUTH_TOKEN — how a single wrong environment variable made an AI coding pipeline silently fail for hours, and the debugging journey that found it.
- What Anthropic's Managed Agents validates — and what to steal
Anthropic shipped a hosted agent platform. Its architecture looks familiar. Here's what a solo builder can learn from how they decoupled the brain from the hands.
- What LLM Wiki Looks Like After Six Months
Karpathy's LLM Wiki pattern is a good starting point. Here's what changes when you run it for real — enforcement over convention, decay over growth, and knowledge that fires without being asked.
- Biology as a Design Constraint: How Cell Biology Names Generate Architecture
Using cell biology naming not as metaphor but as engineering manual — how mTOR's biology predicted circuit breakers, autophagy, and negative feedback loops before we designed them.
- Your AI Agent's Quality Gate Is Lying to You
A 96% rejection rate that was actually a 96% false positive rate — how a monitoring blind spot turned a productive overnight batch into apparent failure.
- Test-first dispatch for AI coding agents
The architect writes the tests. The implementer makes them pass. No prose specs, no circular validation.
- 4 Principles for Agent-Facing CLI Design
Most advice about making CLIs agent-friendly is just good CLI design. Only four principles are actually agent-specific.
- Correctness is model-determined
I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.
- The architect-implementer split: why your expensive model shouldn't write code
Smart model plans, cheap model builds. The pattern everyone's converging on for AI coding agents — and the piece nobody's shipped yet.
- I made my coding agent dispatch system improve itself
mtor dispatched a coding task to improve itself — the tool that sends work to AI agents was improved by an AI agent.
- What 16,000 Simon Willison posts reveal about the state of AI coding agents
Analysis of Simon Willison's blog corpus reveals AI coding agents crossed a reliability threshold in late 2025 and are now reshaping software engineering.
- Building porin: a library for agent-facing CLIs
I turned the seven patterns into a zero-dependency Python library. Then I added MCP bridge support. Here's what I learned about the gap between patterns and code.
- Seven patterns for agent-facing CLIs
Three independent authors converged on nearly identical patterns for CLIs that AI agents invoke. Here's what they agree on, what's missing, and why nobody has built a framework for it yet.
- The Name Collision That Found Two Tools
When a dispatcher and an executor share a name, you don't have a naming problem. You have an architecture problem.
- The primary-source tax
Multi-engine search agreement is not primary-source verification. A cautionary tale about hallucinating reference content from consistent secondary summaries.
- CLI, MCP, or code mode: the answer depends on who's running the sandbox
Willison says CLIs beat MCP. Cloudflare says server-side code mode beats both. They're both right, because they're answering different questions.
- Why I didn't package my AI organism
I designed an elegant framework install for my personal AI system. Then I listed the hard problems and shipped a three-hour cleanup instead.
- Ten Things I Learned From the Agent Skills Gold Rush
A day of reading skill repositories taught me less about the skills themselves than about how much I'd missed of the surrounding ecosystem.
- What I Found Evaluating 5 Agent Skill Repos
Five skill repositories, a day of reading code, and a significant correction I had to make the same afternoon.
- When the Name Doesn't Fit
Naming as a design constraint: if a tool resists a name, the tool needs redesigning, not the name.
- The rename that built a tool
I renamed one concept across 130 files. The pain crystallized into a tool that will do the next rename in minutes.
- Always latest as a system property
Most projects pin dependency versions to avoid breakage. We automated the opposite: daily upgrades with automatic rollback.
- The dispatch layer was eating the quality, not the model
We blamed the LLM for a 54% task failure rate. The real culprit was seven layers of dispatch infrastructure between intent and execution.
- Governance Is a Design Problem
Compliance-first governance produces paperwork. Design-first governance produces systems you can actually explain to a regulator.