Posts about experiment
-
The Treadmill and the Loop
Getting ahead of AI best practices is a treadmill. The durable skill is testing assumptions faster than they expire.
-
Personas Exploit a Blind Spot in LLM-as-Judge Evaluation
Persona prompting generates the exact type of hallucination that automated LLM judges reward as 'depth.' Two experiments, blind evaluation, and a fact-check that flipped the finding.
-
The Persona Paradox in AI Agent Teams
Personas hurt for structured tasks, help for judgment-heavy tasks. Two experiments, blind evaluation, frontier models. The distinction is task-dependent, not binary.