It’s tempting to treat the ability to summarise well as a proxy for intelligence. The reasoning is seductive: compression demands understanding. You can’t reliably discard the inessential without grasping what’s essential. This is why Feynman’s “explain it simply” test feels right — lossy compression that preserves meaning requires a genuine model of the subject.
But the proxy breaks down when you look at what summarisation doesn’t test.
Summarisation requires abstraction, salience detection, and hierarchical modelling — real cognitive abilities. What it doesn’t require is generative ability (creating novel structures, not just compressing existing ones), adversarial reasoning (stress-testing ideas rather than faithfully condensing them), or transfer (applying a compressed model in a new domain).
Large language models make the case cleanly. They are superb summarisers. They are arguably weak on all three of those dimensions. If summarisation were a reliable marker of general intelligence, we’d have to accept that GPT-4 is broadly intelligent — and most people working with these systems would push back on that claim.
The sharper framing: good summarisation is a hallmark of comprehension, which is a component of intelligence, not the whole thing. The fuller picture probably requires compression plus generation plus calibration — knowing what you don’t know.
Interestingly, the inverse holds more reliably. Poor summarisation is a fairly strong signal of poor comprehension. The test is more useful as a filter than a crown — better at identifying what’s missing than confirming what’s present.