What 60K Stars Actually Validates

Garry Tan released gstack — twenty-three Claude Code skills encoding his judgment as a startup CEO. It hit 60K stars in days. I have been building the same thing for months. Different domain, same architecture: markdown skills, role-based decomposition, filesystem state between sessions, hooks for deterministic enforcement. Neither of us copied the other. We converged.

When two people solving different problems arrive at the same structural decisions independently, the structure is probably load-bearing. That is what 60K stars validates — not Garry Tan’s specific opinions about code review, but the SKILL.md file as a platform primitive.

Twenty-three skills organised as a virtual engineering team. CEO review, eng manager review, design review, security officer, QA, release manager, investigate, learn. Each encodes a role with explicit judgment rules, not generic instructions. The interesting parts are not the roles. They are the enforcement patterns underneath.

His security and engineering review skills score every finding one to ten and suppress anything below five from the main report. Low-confidence findings go to an appendix. This is obvious in retrospect — every review skill I have seen, including mine, reports everything at equal weight, which means the important findings drown in noise. His anti-sycophancy constraint is not “be honest” — that is too vague for an LLM. Instead: never say “that is an interesting approach,” take a position. Never say “could work,” say whether it will work based on evidence. Never say “there are many ways to think about this,” pick one. These are specific banned phrases that close the exact escape hatches LLMs actually use. When his CEO review skill identifies potential scope expansions, it presents each one as a separate question with effort estimate, risk, and a recommendation. Not “here are five things, yes or no” — one at a time, each with context. And his investigate skill auto-freezes edits to the narrowest directory after forming a hypothesis. The freeze state is written to a file and enforced by a PreToolUse hook. The LLM cannot be prompted to bypass it. Deterministic safety, not polite instruction.

I built a system called vivesca over the past year. Different domain — consulting and personal infrastructure instead of startup engineering. But the architectural decisions are identical. Skills are markdown files with YAML frontmatter and prose bodies. Each skill encodes a role, not a procedure. Cross-session state lives on the filesystem. Deterministic enforcement uses hooks, not prompts.

Where we diverge is portability. His skills are generic — they encode startup CEO judgment that applies to any codebase. That is why they have 60K stars. Mine are domain-tuned — they encode consulting workflows, regulatory monitoring, career management, content publishing. They assume my infrastructure, my tools, my context. Same format, same architecture, completely different knowledge. His are portable judgment. Mine are situated judgment. Both are SKILL.md files.

The SKILL.md pattern is converging from multiple directions. Anthropic ships it natively. Vercel, Microsoft, and independent developers all publish skills in the same format. Garry Tan arrives at it independently from the user side. When a format converges this fast without a standards body pushing it, the format is solving a real problem. LLMs are capable but lack judgment. Skills are how you inject judgment without fine-tuning. A markdown file that says “think like a security officer with these specific constraints” is a cheaper, more debuggable, more composable unit of judgment than any alternative.

The patterns gstack validated — confidence gating, anti-sycophancy constraints, opt-in ceremonies, hook enforcement — are not one person’s innovations. They are the patterns that survive contact with heavy daily usage. Anyone building twenty-plus skills and using them seriously will discover them, because the defaults fail at scale. The 60K stars are not for the markdown. They are for the judgment encoded in the markdown. That is what is hard to produce and easy to consume.

Keep reading