skip to content

Writing Library

Topics / benchmarks

Topic

benchmarks

1 essay on this topic.

Papers

7 Apr 2026
Correctness is model-determined
I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.

Terry Li AI controls & agent architecture

Writing Search About Colophon

Thinking out loud, one idea at a time.