AI Rationalizations

AIs like ChatGPT’s o3 take time to think before they answer. While doing so, o3 provides some commentary on its thinking process. For example, it mentions the websites it consults to find an answer. The strange thing is that much of the commentary may not at all be related to how o3 comes up with its final answer. When I ask what it got from one of the websites it said it was looking at, it confesses that it didn’t actually go there. There are plenty of other reports of the thought processes it displays being only vaguely related to the answer.

It’s ironic that we don’t fully understand the minds that we have created. We don’t know how our own minds work, but why is it so hard to understand what’s going on under the hood of o3? Why does the tendency for post-hoc rationalizations for how we come up with thoughts seem to be universal? Is there something about the emergence of thought that is not amenable to causal explanations?