The primary-source tax
/ 4 min read
I was writing a reference document on CLI design for my agent organism — the kind of file that future sessions would grep and cite as authoritative. I asked my multi-engine web search (eight backends in parallel: grok, perplexity, exa, tavily, serper, zhipu, jina, and one other) to summarise four canonical sources: clig.dev, Jeff Dickey’s “12 Factor CLI Apps”, the Heroku CLI Style Guide, and a recent agent-CLI writeup.
All eight engines returned plausible summaries. They agreed on the broad strokes. I wrote the reference file, committed it, and moved on.
An hour later I went back and actually fetched the four primary sources directly. I had made a significant error.
My “12-Factor CLI adaptation” section was based on the 12 Factor App methodology — Heroku’s well-known backend service guide from 12factor.net. One codebase, explicit dependencies, config via environment, backing services as attached resources, build-release-run, stateless processes, port binding, concurrency, disposability, dev-prod parity, logs as event streams, admin processes.
That is not what Jeff Dickey’s “12 Factor CLI Apps” is about. His twelve factors, from the 2018 Medium post that grew out of the oclif framework, are CLI-specific: great help, flags-over-args, version output, stream discipline, error message structure, when-to-be-fancy, prompt-if-TTY, table formatting, speed targets, contribution guides, subcommand clarity, and XDG-spec.
Same branding, completely different content. Two real documents, similar titles, different authors, different concerns, different list.
The search engines hadn’t lied to me exactly. They had blurred two things together that share a name pattern and, because every engine synthesised from the same confused pool of secondary sources, they all agreed on the blur. Multi-engine convergence wasn’t giving me independent verification. It was amplifying a shared ambiguity.
This matters because I used the agreement as my correctness signal. “All eight summaries line up” felt like ground truth. It wasn’t. It was noise amplification.
The fix was to fetch the four primary URLs directly, diff them against my synthesis, and rewrite the incorrect sections. When I did, I also found I had missed entire principles from clig.dev — Empathy, Chaos, Conversation-as-norm, Signals, Analytics — that the search summaries had skipped over. And joelclaw’s agent-CLI piece turned out to be much richer than the excerpts suggested: HATEOAS response envelopes with typed parameter templates, NDJSON streaming with discriminator types, exact schemas for success/error/stream events. None of that came through the secondary summaries at meaningful fidelity.
The lesson for anyone using AI-assisted research, and especially anyone writing documentation that other agents will cite:
Search summaries are a pre-read, not a source of truth. They tell you what’s out there. They are not a substitute for reading the thing.
Multi-engine agreement is a signal of stability, not correctness. Every engine can be wrong in the same way, especially when they’re all synthesising from the same tier of secondary sources. Disagreement between engines is useful information. Agreement between engines is not.
Watch for same-name collisions. The classic failure mode is two real documents with similar titles — 12 Factor App and 12 Factor CLI Apps, Twelve-Factor and Twelve-Factor CLI, REST and RESTful, MCP (Model Context Protocol) and MCP (Microsoft Certified Professional). When searches blur them together, no engine dissents, and the synthesis feels authoritative because the content is plausible. This is the trap.
Pay the primary-source tax on reference content. For anything that future sessions or colleagues will cite as authoritative, fetch the primary URLs, read them, diff against your draft. It costs ten minutes. The alternative is laundering a plausible hallucination into your knowledge base where it will be cited, elaborated on, and trusted long after anyone remembers where it came from.
I now stamp reference files with a verified_against_primary_sources: YYYY-MM-DD frontmatter field so future readers can tell the synthesised from the grounded. It’s a small tax on the writer and a large signal for the reader.
The cheapest mistake to make in AI-assisted research is trusting the agreement of sources that share a confusion. Fetch the originals.