LLMs Are Not High IQ

Published by

on

Practicing for intelligence tests doesn’t improve performance much. IQ tests really seem to measure some innate ability that is relatively unresponsive to training. Processing speed and short-term memory capacity seem to play a role, but I have yet to encounter a satisfying explanation of why exactly practice doesn’t help much.

What does this mean for the intelligence of LLMs? After all, they’re trained on vast amounts of data and don’t have the same speed or memory limitations humans have. Do they somehow overcome humans’ limitations and increase their IQ with more training data?

The answer isn’t clear and there aren’t authoritative studies on this question. The reason is probably that IQ scores for LLMs aren’t that informative even if the problem of published IQ tests contaminating the training data is resolved. As a result, not many people make LLMs take IQ tests.

There is one paper that shows that the size of the model matters, but the more interesting observation is that until now, frontier LLMs underperform humans in some components of IQ tests like Raven’s Progressive Matrices. LLMs aren’t good at visual reasoning, which is a known limitation. In one comparison, GPT-4o achieved an accuracy of up to 68% on advanced matrix reasoning, which includes Raven’s. Humans in the 99th percentile achieved up to 95% accuracy. I couldn’t find any comparison of the most recent models like GPT-5.2 against human performance.

It seems likely that increasing LLMs’ performance in the domains that they are relatively bad at, like visual reasoning, will increase their usefulness most.

Previous Post