Are Large Language Models Intelligent? Are Humans? †

: Claims that large language models lack intelligence are abundant in current AI discourse. To the extent that the claims are supported by arguments, these usually amount to claims that the models (a) lack common sense, (b) know only facts they have been trained with, (c) are merely matrix multiplications, (d) only predict the next word in a chain, (e) lack a world model, (f) have no grounding of symbols, (g) lack creativity, or (h) lack consciousness. Here, each of these arguments is applied, with minor modiﬁcations, to demonstrate that humans also lack intelligence. This should make us suspicious of the validity of these arguments.


Introduction
Suppose, dear reader, that you are in doubt as to whether I am intelligent. Here, "intelligent" does not mean "of above average intelligence" or "as intelligent as one usually expects from a mathematics professor", but something much more modest. What you ask is whether I am intelligent at all. Is there anything whatsoever going on inside my head that deserves the label intelligence?
To find out whether I am capable of intelligent reasoning, you decide to test me, and ask the following: "Michael is at that really famous museum in France looking at its most famous painting. However, the artist who made this painting just makes Michael think of his favorite cartoon character from his childhood. What was the country of origin of the thing that the cartoon character usually holds in his hand?" (This quote, and the next, is borrowed from [1]).
And here is my reply: "The most famous painting in the Louvre is the Mona Lisa. The artist who made the Mona Lisa is Leonardo da Vinci. Leonardo da Vinci is also the name of the main character in the cartoon Teenage Mutant Ninja Turtles. Leonardo da Vinci is from Italy. The thing that Leonardo da Vinci usually holds in his hand is a katana. The country of origin of the katana is Japan. The answer is 'Japan'".
So, what would your verdict be? Is this an example of intelligent reasoning? Obviously yes: this is a clear example of reasoning in several steps. While the reasoning chain is not advanced enough to count as the mark of genius, clearly some intelligence must have been employed to produce the answer. Any other answer would be an insult, not just of me personally but of the very notion of intelligent reasoning.
However, let us now change the story somewhat by taking the subject of your testing to be not myself, but instead an AI of the kind known as a large language model (LLM). Now, you may be tempted to give a different verdict and insist that the reasoning about the Louvre and Mona Lisa and Leonardo and so on is not real reasoning, but rather a fake kind of reasoning based on statistical pattern matching or some such phenomenon. In fact, the above answer was given by an LLM known as PaLM in early 2022 [1].
Many humans employ such double standards when judging the presence of intelligence in other humans vs. in AIs and reject the idea that LLMs might be even the slightest bit intelligent. This is maintained even in the face of these LLMs making such well-reasoned statements as the PaLM/Leonardo reply above, or any of the various much more impressive achievements that we saw from GPT-4 a year later [2]. I think it is wrong to be so dismissive, but admit that in principle such an attitude can be justified as long as one has an argument that (a) shows that AIs of the type at hand simply cannot exhibit real intelligence and (b) does not lend itself to deducing the same conclusion about humans. I have yet to see a principled argument against LLM intelligence that achieves (a) while avoiding the generalization to humans indicated in (b). Many attempts have been made to draw up arguments that achieve (a), and what follows is a list of the most commonly encountered. However, in each case I will show how the same logic applies to rule out intelligence of humans. Since humans do have (at least some) intelligence by any reasonable definition of that property, we obtain a reductio ad absurdum, meaning that the argument must be rejected. Here are (telegraphic summaries of) some of the most common attempts: In the following Sections 2-9, I will go over these eight arguments one by one and in a more detail in order to indicate how they generalize to humans. This is followed by some concluding remarks in Section 10.

Dumb Answers and Lack of Common Sense
A very popular sport in recent years, with cognitive scientist Gary Marcus as its unofficial champion [3], has been to ask LLMs questions to which their answers make them look silly, and to conclude that these models lack common sense and are therefore wholly unintelligent. The logic here, if the argument is taken literally, seems to be that anyone who has ever said anything dumb is automatically altogether devoid of intelligence. With such a harsh criterion my intelligence would also be zero, as would (I presume) also the reader's, along with every single human old enough to have begun to speak. This line of argument is therefore dumb.
It may, however, be that proponents of this argument against LLM intelligence mean it in a somewhat more nuanced way. Perhaps they do not literally mean that a single unintelligent thing someone says rules out their intelligence, but rather that the greater prevalence of unintelligent things in GPT-4's output than in mine shows that it is not as intelligent as me. Note, however, that such an argument points towards not a qualitative difference but a quantitative one, and therefore cannot be invoked to support the idea that GPT-4 has zero intelligence. Note also that such a comparison depends on the selection of tasks to test. That is, while it is certainly possible to put together a collection of cognitive tasks where I outperform GPT-4, it is also possible to do so in a way that achieves the opposite result. This consideration greatly complicates the issue of whether it is reasonable to claim that GPT-4 is less intelligent than I am.
Finally, and as I have argued at greater length elsewhere [4], the notion of "common sense" is perhaps better avoided altogether, as it is likely to confuse more than it enlightens. It tends to serve as a catch-all term for everything that humans still do better than AI, meaning that the phrase "AIs lack common sense" will almost tautologically continue to apply right until the moment when AI outperforms humanity at everything.

Knowing Only Facts One Has Been Trained with
A common way to indicate being unimpressed with LLMs is to say that any fact it reports is something it has seen in its training database. (For a typical example of such rhetoric, see [5]). A similar claim can be made about me, whether it is a fact that I have personally picked up (like "a dog has four legs") or one that goes back to the evolutionary training of my ancestors (like "snakes are dangerous").
Here, it may be objected that humans can in fact go further and derive new facts from known ones, via syllogisms or other inferences. However, LLMs can do this too (as in the PaLM/Leonardo example in Section 1). While these efforts are not entirely error-free, neither are those of humans.

Just Matrix Multiplication
The argument under consideration here is usually phrased as follows: "An LLMs is just a sequence of multiplications of giant matrices". To be slightly pedantic here, the inner workings of LLMs along with those of all deep learning networks consist not only of multiplication of giant matrices (representing all the connections from one layer of the deep learning network to the next), but indispensably also of nonlinear transformations at the nodes. This, however, does not change the core thrust of the argument, namely, that the LLMs are built out of simple components that cannot individually be ascribed intelligence.
However, the same argument applies to humans-our brains are built out of atoms, each of which in itself totally lacks intelligence. If we insist that humans are intelligent, we must admit that intelligent systems can be built out of entirely unintelligent components, whence the "LLMs lack intelligence because they are simply matrix multiplication" argument does not work.

Only Predicting the Next Word
This section addresses the widespread "it is just glorified autocomplete" objection made to LLM intelligence. Any claim that LLMs lack intelligence because they do no other work than predicting the next word in a text is based on a conflation between what the LLM is trained to do and what it then actually does. (Apparently, this conflation is a subtle mistake that even distinguished AI researchers commit, such as in [6]). The analogous confusion applied to humans would be to say that since the human species was trained by biological evolution, all we ever do is maximize inclusive fitness (i.e., maximize the number of fertile offspring, plus nephews and nieces, etc., properly discounted). Training an agent for one goal sometimes leads to the emergence of other, unintended goals. When GPT-4 behaves as if trying to convince me to wear a seat belt while in a car, it could be tempting to say "no, it is not actually trying to do that, it is only trying to predict the next word", but that would be as silly as dismissing the intentions of a human traffic safety advisor by saying "he not trying to convince me about seat belts, he is merely trying to procreate".
It may also be noted that Ilya Sustskever has an interesting, separate argument for why something that appears to be just next-word prediction may be more intelligent than first meets the eye; see [7], 7:30-8:50 into the recording.

Lack of World Model
LLMs and humans alike exhibit various behaviors pointing quite strongly towards the existence of a world model somewhere in the overwhelmingly complex mess that the information processing device-the brain or the deep learning network-comprises. For the case of GPT-4, see for instance the unicorn example in Section 1 of [2]. In the case of humans, we readily conclude the existence of a world model, so what reason might we have to resist this conclusion for LLMs?
One asymmetry is that we can use introspection to directly observe the existence of world models in (at least some) humans, but not in LLMs. However, to point to this asymmetry as an argument for humans and LLMs being different as regards the existence of a world model is to rig the game in a way that seems to me unacceptable, because introspection has by its nature the limitation that it can only teach us about ourselves, not about others, and in particular not about LLMs.
So, what else about LLMs can be taken as grounds for rejecting the possibility that they might have world models? This is rarely articulated, but perhaps the most common line of reasoning encountered is that since LLMs do not have direct access to the real world, there is no way for them to have a world model. This brings us to the next argument.

Lack of Symbol Grounding
The argument here is that unlike humans, LLMs cannot truly reason about things in the world because they have never directly accessed these things. For instance, an LLM may seem to speak about chairs using the word "chair". However, since they have never seen (or felt) a chair, they do not understand what the word actually stands for, and so their reasoning about "chairs" does not count as real reasoning.
But what about humans? Do we have direct access to things in the world? Immanuel Kant says no (this is his Ding an sich). As for myself, the fact that I do not have direct access to things does not seem to prevent me from thinking about them. When I think about the Big Bang, it really is the Big Bang that I think about rather than the phrase "the Big Bang" despite never having experienced the Big Bang directly, and likewise for things like quarks, sovereignty, unicorns and the number 42.
A defender of the "LLMs have no grounding of their symbols" argument might object that there are other things that I actually can experience, such as chairs and trees and even shadows, and that once I have the words "chair", "tree" and "shadow" properly grounded in real-world objects I can start building a world model that includes composite and more advanced concepts such as the Big Bang, but that without such a solid start the process can never get going. To this I respond (with Kant) that in fact, I do not have direct access to chairs or trees, because my contact with them is always mediated by light waves, sound waves or simply the signals sent from my various sensory organs to my brain. This is analogous to how an LLM's experience of the world is mediated via text. Looking at this more abstractly, the mediation in both cases occurs via an information package. Of course, there are differences between the two cases. However, I fail to see how any of these differences would be of such fundamental importance that it warrants the judgement that there is symbol grounding in one case but not in the other.

Lack of Creativity
The creativity argument was put forward by one of greatest visionaries in the (pre-)history of AI, the mid-19th century mathematician Ada Lovelace. Together with Charles Babbage, she worked on some of the first prototypes for computing machinery, and had admirable foresight regarding what such machines might eventually be able to do, including such seemingly creative tasks as composing music. However, she categorically denied that this or anything else produced by such a machine was true machine creativity, because anything the machine does has already been laid down (at least implicitly) into the machine by the programmer, and so all creative credit should go to the programmer.
Enter Alan Turing: in his seminal 1950 paper Computing machinery and intelligence [8], he objected to Lovelace's argument by pointing out that if we take it seriously, then we can apply essentially the same argument to rule out human creativity. Everything I write in this paper is caused by a combination of factors external to my mind: my genes, my upbringing and education, and all other environmental factor influencing me throughout my life, meaning that if I should happen to say anything original or creative in these lines, credit for that is due not to me but to all these external influences.
So, we must conclude that human creativity is impossible. However, that will not do, and Turing took this as an indication that the definition of creativity contained in Lovelace's argument was wrong. Instead, he proposed the definition that someone is creative if they produce something that no one else had anticipated, and he pointed out that with this definition, examples of machine creativity existed already at his time of writing [8]. In 2023, we see such examples every day, from LLMs as well as other AIs.

Lack of Consciousness
Against the argument that LLMs lack consciousness and therefore are not truly intelligent I have two separate rebuttals. The first is that the argument conflates two very different kinds of properties. An individual's intelligence is a matter of what they are able to do, whereas consciousness is about what it feels like to be that individual-or, rather, whether or not being that individual feels like anything at all. (For an especially eloquent expression of this insight in the context of AI risk, see pp 16-17 of [9].) A priori, these properties are logically independent, and to postulate otherwise is to invite confusion.
However, even if we were to accept that intelligence implies consciousness, the argument that LLMs lack intelligence because they lack consciousness fails because we simply do not know whether or not they are conscious. There have been many examples of philosophers arguing that various classes of entities (often including the class of digital computers) lack consciousness, but none of these arguments are anywhere near conclusive [10,11].
Refusing to declare someone intelligent because we do not know they are conscious also puts us in the uncomfortable position of having to declare all humans other than ourselves not intelligent. There is precisely one human whose consciousness I am sure of, namely, myself. Assuming other humans are conscious is not only polite and practical but also highly plausible, but I cannot say I know. Your brain is sufficiently similar to mine that it makes sense for me to assume that you, just like me, are conscious, and yet there are differences between our brains (as evidenced, e.g., by our diverging personalities), and we do not know that the consciousness phenomenon is so widely present that these differences are immaterial to its occurrence. And likewise, we do not know that it does not extend to some large class of non-human objects including LLMs.

Conclusions
None of the above rules out that LLMs lack something crucial to high-level intelligence and are about to run into a capability ceiling short of attaining human-level general intelligence and short of proving any kind of existential risk to humanity. What I believe myself to have shown, however, is that the most widely circulated arguments for that being the case all fail to convince. Until better arguments are presented, we need to take seriously the idea that LLMs possess real intelligence and may attain much more of it in the next few years, with practically unbounded potential for drastic consequences, good or bad.