Posts about experiment

17 Mar 2026
The Treadmill and the Loop
Getting ahead of AI best practices is a treadmill. The durable skill is testing assumptions faster than they expire.
17 Mar 2026
Personas Exploit a Blind Spot in LLM-as-Judge Evaluation
Persona prompting generates the exact type of hallucination that automated LLM judges reward as 'depth.' Two experiments, blind evaluation, and a fact-check that flipped the finding.
16 Mar 2026
The Persona Paradox in AI Agent Teams
Personas hurt for structured tasks, help for judgment-heavy tasks. Two experiments, blind evaluation, frontier models. The distinction is task-dependent, not binary.

The Treadmill and the Loop Getting ahead of AI best practices is a treadmill. The durable skill is testing assumptions faster than they expire.