Intelligence is a Shape, Not a Scalar
Intelligence is a Shape, Not a Scalar
François Chollet posted something recently that I keep thinking about. It sounds reasonable and is mostly wrong:
One of the biggest misconceptions people have about intelligence is seeing it as some kind of unbounded scalar stat, like height. “Future AI will have 10,000 IQ”, that sort of thing. Intelligence is a conversion ratio, with an optimality bound. Increasing intelligence is not so much like “making the tower taller”, it’s more like “making the ball rounder”. At some point it’s already pretty damn spherical and any improvement is marginal.
He’s right about the scalar part. Intelligence is not height. “10,000 IQ” is meaningless. He’s right that there are diminishing returns near an optimum. He’s right that speed, memory, and recall are separate from the core conversion ratio.
Where he’s wrong is the ball.
The Claim
Chollet defines intelligence as the efficiency with which a system converts experience into generalizable models. Sample efficiency. How little data do you need to see before you can handle novel situations? This is a clean definition. It has a theoretical optimum (Solomonoff induction), and Chollet’s claim is that human intelligence is already close to that optimum. The ball is already pretty round.
The supporting evidence is real. Humans score ~85% on ARC (the Abstraction and Reasoning Corpus, which Chollet designed to measure exactly this). Current AI systems, with vastly more data and compute, score significantly lower. Human sample efficiency on fluid reasoning tasks is genuinely impressive. We generalize from very few examples. We transfer knowledge across domains. We build theoretical models that predict situations we have never encountered.
Chollet also argues that the advantages machines will have (processing speed, unlimited working memory, perfect recall) are “mostly things humans can also access through externalized cognitive tools.” Calculators, databases, notebooks. The scaffolding can be externalized. The core intelligence is already near-optimal.
This is a good argument. I think it’s wrong in three ways, and the third way is the one that worries me.
No Free Lunch
The No Free Lunch theorem says: there is no algorithm that is optimal across all possible problems. Any algorithm that performs well on one class of problems performs poorly on another class. Optimality is always relative to a distribution.
...Read more →