skip to content

A Skill Is Not a Prompt


The easiest mistake in agent systems is to rename prompts and think the architecture improved.

Garry Tan put the seed of this neatly: “Skillify it.” The phrase works because it points away from a better prompt and toward a more complete capability.

A prompt is an instruction. It tells the model what to do in this moment, under this context, with whatever discipline the model can still summon by the time it reaches the relevant paragraph. Sometimes that is enough. A one-off request does not need infrastructure. If the task is ephemeral, the instruction can be ephemeral too.

A skill is different. A skill should exist only when the same kind of judgment keeps recurring. It is the place where the system admits that the work has a shape. The model should not have to rediscover the same operating rule every time. The human should not have to remember the same caveat every time. The instruction earns a durable home because repetition has exposed something reusable.

But that durable home is not the finished product. A markdown file full of good advice is still only advice. It can drift. It can fire in the wrong situation. It can tell the model to use a tool that no longer works. It can look authoritative while quietly depending on a human to notice whether anything succeeded. That is prompt sprawl with better stationery.

The useful unit is a capability package.

The judgment lives in prose because judgment often needs prose. It needs context, boundaries, examples, anti-patterns, and the small distinctions that would be awkward or brittle as code. But the parts that require reliability should not stay in prose. File parsing, state checks, retries, source counting, diff inspection, and output validation belong in deterministic code. If the model can hallucinate it and the system cannot afford the hallucination, it should not be left as an instruction.

Then the package needs checks. The code needs tests. The skill needs examples that prove it behaves correctly on the cases that caused it to exist. The trigger needs negative examples, because the most expensive skill failure is often not failing to run, but running at the wrong time. A good skill with a bad resolver is worse than no skill. It gives the system confidence in the wrong context.

The moment the capability can touch external state, the bar rises again. Now the package needs idempotency, permissions, observability, and rollback. A skill that only helps the model think can be lightweight. A skill that sends, deletes, commits, purchases, schedules, or mutates records is no longer just an instruction layer. It is part of the control surface.

This is the difference between vibe coding and an operating system for agents. The distinction is not whether the instruction is written in Markdown. Markdown is just a convenient container. The distinction is whether the instruction can be invoked at the right time, tested against known failures, backed by deterministic machinery where reliability matters, and bounded when action becomes expensive.

This also explains why some skills should disappear. Once the judgment becomes predictable, the package should move downward. A recurring decision becomes a rule. A rule becomes a test. A repeated tool sequence becomes a script. A skill is not a badge of sophistication. It is a temporary home for judgment that has not yet collapsed into something cheaper and more reliable.

So the question is not “should this be a prompt or a skill?” The question is what part of the work is still judgment, what part is already deterministic, and what proof would let the system carry the responsibility without silently handing it back to the human.

Call it a skill if you want. But build it like a package.