The Technium

Why Are LLMs Smart?


A popular way to explain how current LLMs work is to say that “all” they do is predict the next most likely word in a sentence. From one perspective, this is correct. Trained on all human language, the LLMs distilled billions of word sequences so that they can imitate authentic-sounding strings of words that have never been said before. These sentences sound plausible because, based on training on millions of average human texts, the models were predicting what an average human might say next. They really did succeed in doing that expected task.

What is harder to account for is the emergent creative abilities of the LLMs.

The amount of intelligence required to compose one coherent sentence can almost be reduced to the rules in a grade-school grammar book. But the amount of intelligence needed to produce a string of sentences focused on one topic — a paragraph — far exceeds any rules. And the amount of intelligence wrapped up in a string of paragraphs, as in a conversation, begins to approach a pattern we call “thinking.” Keep in mind all the work a human needs to do to write a coherent page of text. As researchers scaled up the size and scope of LLMs, they were stunned to find that their systems could begin to imitate the elemental patterns of human thinking found in paragraphs and conversations.

They were shocked because at no point in their invention did they try to program in the elemental process of thinking, or intelligence. They were “merely” extending the patterns of language. The collective surprise of an LLM such as ChatGPT is that by extending the pattern of language, we can arrive at some level of intelligence that is useful beyond language.

If programmers did not program ChatGPT with logical deduction skills, where does the intelligence in its models come from? Why can LLMs behave so intelligently (even if not infallibly), when no one has programmed them to be intelligent? The apparent intelligence of LLMs has been very troubling to experts in the AI field, because there was no theory of intelligence that predicted large models of language would be able to deduce logic, or solve the mathematics of the protein-folding problem.

Intelligence locked in language

One explanation is that the elemental intelligence exhibited by LLMs is locked within human writing and in language itself. You can construct a sentence using a grammar rulebook, but to construct a paragraph you need logic, deduction, and reasoning. And further, as any teacher will tell you, to create a coherent essay — a string of paragraphs — you need some kind of clear thinking. The voluminous training material scooped up by the LLM creators is more than just words, more than just sentences, more than just paragraphs. All the trillion words are embedded in articles, books, essays, rants, replies, comments, tweet-threads, arguments, debates, stories, tales, accounts, reports, blogs. These, and a hundred other long forms, contain intelligence in their arrangement of words. It is the architecture of language that conveys the intelligence.

An essay, if it is any good, contains an intelligence beyond what is contained in a mere sentence. A scientific paper contains scientific logic within its structure — the paper is an argument with hypothesis and evidence. A threaded debate contains lawyerly deduction in its text. A fictional tale contains the architecture of a narrative in its sentences. In short, the text of humans contains the thinking of humans. When you think hard to put your argument into words on a page, the final text you create also contains the intelligence you put into it. The full text of this very essay you are reading holds both a representation of my thinking and, in a small but important way, the actual thinking itself. That logic is held in the pattern of its words. The order and choice of words over the span of a whole essay therefore contains intelligence — and the big surprise is that LLMs can extract that intelligence, simulate it to write a new essay, and increasingly apply it in other fields.

So the first grand surprise of LLMs is that the intelligence we experience in them derives from the intelligence we have inadvertently coded into human text, rather than from any explicit software code. There appears to be a seminal, fundamental relationship between language and thinking. Human writing is thus not only a reflection of the structure of language, but to some degree also a reflection of human thinking. Distill the patterns in human writing at scale, and you also get some patterns of human thinking. Imitate human writing and conversation, and you can imitate human intelligence — at least in part.

What’s missing

The kind of smartness embedded in LLMs is knowledge-based. They have become know-it-alls, with strong verbal skills — recall, grammar, deduction, analogy. It’s surprising and impressive that they’re as smart as they are. But our own kind of intelligence includes other forms of smartness they don’t yet have: intuition, continuous learning, disruptive insight.

So the current question is: where would those elements of intelligence come from? If LLMs get their smartness from human writing, what would be the foundational training source for intuition and greater creativity?

Two bets

The frontier model makers (Anthropic, OpenAI, Google, xAI) are betting trillions of dollars that they can find these other elements of intelligence simply by continuing to scale up LLMs. What if we extend them to ridiculous scales — neural nets with trillions of parameters, running on millions of chips, trained not just on all the text humans have written but on all the data humans have collected? Won’t even greater degrees of human intelligence emerge? The frontier AI companies are betting they can reach AGI (artificial general intelligence) this way.

But we don’t know if this is the way. My suspicion is that there will be diminishing returns on scaling neural nets. There are already plenty of experiments trying to shrink neural nets through clever mathematics, so they run smaller, cheaper, faster. There are experiments with non-neural-net architectures entirely, including some returns to old-school symbolic reasoning. And there are experiments in hybrids, adding some special sauce to the neural nets. At some point, adding yet more neurons won’t help. Our own relatively tiny brains are a testimony to intelligence at small, limited scale — running on only 25 watts.

Our brains seem to be “merely” neural nets too, limited as they may be. But my guess is that our creativity and leaps of insight come not from what we know — knowledge — but from how we know it. Unlike current LLMs, our brains are capable of continuous learning. We iterate around and around, compounding small differences into large meanings, getting closer to a breakthrough on each cycle of thought and learning. Our significant smartness is not based solely on our knowledge, but also on our ability to keep learning. Right now, the smartness of LLMs is based primarily on their encyclopedic knowledge — on extracting the intelligence humans have structured into our encyclopedias, books, and everything we write. They are superhuman in their grasp of knowledge, and the structure of that knowledge unleashes bits of reasoning and smartness. That will probably not be enough to go all the way to the kind of creativity and insight human brains can produce. That variety of intelligence will likely require algorithms for continuous learning, or a different design than neural nets alone.

Bottom-up systems keep surprising us

For decades, during several “AI winters,” the smartest computer scientists strongly believed that neural nets would never produce the kind of AI they have already produced. They were totally surprised that neural nets worked. (Turns out that the main thing they’d lacked before was scale.) They were further astounded that it was neural nets running language translation models that first generated bits of intelligence. No one, not even the scientists working on those early language models, was expecting that.

So wide, bottom-up systems like neural nets keep surprising us. They may not be able to take us all the way, but they have almost always been the best place to start, and have taken us much further than we expected. Neural nets will probably keep surprising us.

Their first leap in intelligence came unexpectedly from the structure of our language. I am betting that their second leap of intelligence will come from something equally unexpected.




Comments


© 2023