Posts about ai-agents
-
Why I didn't package my AI organism
I designed an elegant framework install for my personal AI system. Then I listed the hard problems and shipped a three-hour cleanup instead.
-
The rename that built a tool
I renamed one concept across 130 files. The pain crystallized into a tool that will do the next rename in minutes.
-
The Boundary Is an Assessment
The tool/skill distinction isn't a property of the capability. It's a property of the context it operates in.
-
The Test Before the Output
The line between tool and skill is whether you can write the test before seeing the result.
-
Judgment Is a Moving Boundary
The line between tool and skill isn't a property of the task. It's a property of how well you understand the task.
-
Skills Should Die
Every AI skill should be trying to make itself unnecessary. The ones that survive are the ones that haven't been understood yet.
-
The LLM Is the Tool
When the transformation is predictable, the LLM is just a runtime. A cheaper, more flexible runtime than custom code.
-
270 Agents While I Slept
I ran an autonomous agent loop overnight — 43 waves, ~270 dispatches, ~250 vault files produced. Here's what I learned about building systems that work while you sleep.
-
The Unexplainable Alpha
In AI agent systems, execution commoditizes. Research commoditizes. Coordination commoditizes. Taste — the ability to forecast what will matter — is the bottleneck that doesn't automate away.
-
The Navigation Problem in Agent Flywheels
Your agent system shouldn't stop when the task list is empty. The real bottleneck isn't execution — it's discovering what's worth doing next.
-
Programs Over Prompts
The temptation in agent systems is to make everything a prompt. But most of the work is deterministic — and deterministic work deserves code, not suggestions.
-
Taste Is the Bottleneck
When you can run 60 agents overnight, knowing what to build matters more than building it.
-
Meta-Skills Are the Multiplier
We cut from 181 skills to 35 and added a 15-row routing table. Behavior improved across the board. The lesson: meta-skills compound, tool wrappers just add.
-
Optimize for Routing, Not Tokens
With 1M context windows, token savings are rounding error. The real metric is P(right tool | user intent) — does your agent reach for the right tool at the right moment?
-
The Reliability Hierarchy: Hooks, Rules, Skills
In AI agent systems, use the most reliable trigger mechanism that fits — most builders default to skills for everything, which is using the weakest mechanism as the default.
-
Skills as Prototype, MCP as Production
Skills and MCP servers aren't competitors. They're different stages of the same lifecycle. Build the procedure as a skill first. Graduate the tool parts to MCP when they stabilize.
-
The Three Paradigms of Agent Knowledge
Agent knowledge systems have three fundamental paradigms: static context, dynamic tools, and retrieval. Most stop at two. The third is the biggest unexploited opportunity.
-
Match Form to Access Pattern
The governing principle for structuring knowledge in AI agent systems isn't 'always atomic' — it's matching how knowledge is stored to how it's accessed.
-
Play Within the Design
Every AI coding platform has mechanisms designed for specific purposes. Using them as intended beats clever hacks — and the reason is deeper than cleanliness.
-
Stealing from Peers: A Truth-Seeking Discipline
Most people scan competitors for positioning. I scan them for transferable patterns — and route each steal to every domain it applies to.
-
The Boring Future of AI Agents
The real arrival of AI agents isn't spectacular. It's when you stop noticing.
-
The Treadmill and the Loop
Getting ahead of AI best practices is a treadmill. The durable skill is testing assumptions faster than they expire.
-
Personas Exploit a Blind Spot in LLM-as-Judge Evaluation
Persona prompting generates the exact type of hallucination that automated LLM judges reward as 'depth.' Two experiments, blind evaluation, and a fact-check that flipped the finding.
-
The Persona Paradox in AI Agent Teams
Personas hurt for structured tasks, help for judgment-heavy tasks. Two experiments, blind evaluation, frontier models. The distinction is task-dependent, not binary.
-
The Debate Round Is Where Value Lives
Independent parallel reviews produce overlapping findings. The cross-critique round produces resolution. That's where multi-agent value actually emerges.
-
Planning Needs Eyes
A 3-pass AI planning pipeline caught 0 out of 6 design issues. The same planning done in-session with tool access caught 2.5. Planning isn't a prompt problem — it's a tools problem.
-
Put the Rule Where It Fires
Documenting a rule is half a loop. The rule only works when it fires at the moment of decision — not when it sits in a file nobody reads.
-
What Human Memory Teaches AI Agents (and What It Doesn't)
A calculator doesn't simulate forgetting — it manages its context budget. What to cherry-pick from cognitive science for AI agent memory, and what to leave behind.
-
The MTEB Leader Barely Beats a Free Model on Agent Memory
I benchmarked 10 memory backends and multiple embedding models on actual agent memory retrieval. The results challenge common assumptions about what matters.
-
The Agent Governance Gap Is Already Here
Agentic AI isn't a future governance problem — it arrived ungoverned, and this week saw the first enforcement action.