1 essay on this topic.
I benchmarked four AI coding harnesses on 12 tasks using the same model. The harness barely matters for correctness — it's all about the model.