Topic
evaluation
4 essays on this topic.
Papers
- A framework rejection is not the end of the evaluation
Forty thousand stars are voting on something. The framework verdict was correct. Closing the file was premature — the value still lives in the dependency tree.
- Cross-Model Review: Why Model Diversity Beats Model Capability
When AI models review each other's work, independence matters more than intelligence. The same principle that makes external audit valuable makes cross-model review sharper than same-family review.
- Personas Exploit a Blind Spot in LLM-as-Judge Evaluation
Persona prompting generates the exact type of hallucination that automated LLM judges reward as 'depth.' Two experiments, blind evaluation, and a fact-check that flipped the finding.
- The Debate Round Is Where Value Lives
Independent parallel reviews produce overlapping findings. The cross-critique round produces resolution. That's where multi-agent value actually emerges.