Understanding (in) Artificial General Intelligence

The two main position regarding the possibility of “artificial general intelligence” (AGI) can be formulated in a scheme of “Geoffrey Hinton vs Yann LeCun”, the two “godfathers of AI” who received the 2018 Turing Award (basically a “Nobel Prize” in computer science), together with Yoshua Bengio, for their work on deep learning [1]. In an interview by CNN, Hinton said: “if it gets to be much smarter than us, it’ll be very good at manipulation because it will have learned that from us… It’ll figure out ways of manipulating people to do what it wants” [2]. The “it” obviously refers to AGI. While in a recent interview by Wired, when he was asked about AGI, LeCun responded bluntly: “Machine learning is great. But the idea that somehow we’re going to just scale up the techniques that we have and get to human-level AI? No. We’re missing something big to get machines to learn efficiently, like humans and animals do. We don’t know what it is yet” [3]. Such a scheme of opposition is the good old “paranoia vs skepticism” that is very familiar in the field of AI: having a too extreme view in either position would result in an extreme distortion of reality.

Aside from their popcorn-grabbingly exciting tweet wars [4], however, the quandary of views between Hinton’s and LeCun’s stem from a more fundamental philosophical problem, about which Hinton describes in his interview with New Statesman: “It’s all a question of whether you think that when ChatGPT says something, it understands what it’s saying” [5]. For Hinton, large language models (LLM) like GPT-4 can be said to understand what they are saying. Hinton is here a radical reductionist materialist, for whom “there is no mental stuff as opposed to physical stuff”, that there are “only nerve fibres coming in” and all we do is react to sensory input. Thus, for Hinton, “understanding isn’t some kind of magic internal essence. It’s an updating of what it knows”. Whereas for LeCun, today’s LLMs “still don’t understand that if A is the same as B, then B is the same as A” [6], and that they still lack common sense. Here we can see that Hinton’s position is a behaviorist and connectionist one, while LeCun’s is more of a constructivist and symbolic one (no wonder that Meta AI, the company in which LeCun is working as a Chief AI Scientist, is the biggest explorer of neuro-symbolic AI with commonsense). Can we really say that ChatGPT understands what it says? In what sense does it understand what it says?

What does an LLM even do? In principle, it does nothing more than computing the most probable words to put in a sequence of words. But before it can generate any sequence of words, it must first undergo a learning from massive corpora of data (books, articles, webpages…) about what sort of words that are usually found in a certain sequence of words, by tracking relationships or correlations between words. This is the so-called “attention mechanism”. The process of learning itself is unsupervised, meaning that the model does not need any data labelling by humans to find patterns in a dataset. The real crux, however, is in the architecture of the model which allows it to learn so efficiently, that is, the transformer. It was first proposed in 2017 by a group of Google researchers in a paper titled “Attention Is All You Need”, with an initial intent to improve the accuracy and speed of machine translation, which at that time, still used recurrent neural networks (RNNs) as its foundation [7]. RNNs pay attention in a sequential way, whereas transformers do it in parallel. Sure, a transformer is still a neural network, but with its parallel attention mechanism, it can learn context better. Pre-train it with massive corpora of data, and we will get an AI that seems to understand. Add a decoder block into the transformer, and we will get an AI which can generate stories and essays. Voilà! A generative pre-trained transformer (GPT).

So all LLMs are essentially artificial neural networks (ANN), that is, networks whose design are inspired in detail by our own brain’s natural neural network. But even such a description might sound still overly abstract. To get the gist of what one is facing here, one should instead ask: what is ANN really, in concrete physical reality? The answer is obviously: there is no ANN in concrete physical reality, because it is an abstract model. But if there are things that are trying to represent the model in anyway, then at the bottom of it all, we are just left with transistors switching on and off (transmitting electrons or not transmitting electrons). That is literally what the word “digital” really amounts to in concrete physical reality. One can even say that the greatest irony of our vastly digitalized society is the fact that the word “digital” itself has been extremely mystified. Are we ready then to say that a complex configuration of transistors really can understand anything, that it is intelligent?

But one should also remember that the brain is nothing but a complex system of neurons which, every now and then, transfers electrons. The difference is that we are carbon-based, while machines are silicon-based (both elements are of the same group). People like Hinton believe that this fact alone opens up the space of possibility that GPT can be said to understand things like we do. In fact, we still do not know how neuronal activities in the brain can generate understanding—but understanding do happens! We are just an input-output system just like AIs. Is understanding, then, really just a more sophisticated connecting of bits of data? LeCun would agree on this physicalist view to some extent, but for him it is not enough.

LeCun’s position here is based on a fair observation of an irony that practically all of the vision of a future (including that of Hinton’s and especially Nick Bostrom’s) which involves AI becoming AGI and beyond lacks a specific description of a mechanism by which AGI or even artificial superintelligence (ASI) could arise from a baseline intelligence that we have now. Such doomsday scenarios or digital utopias take a huge leap of faith: they take our contemporary AI paradigm—that is, connectionist AI—for granted and extrapolate it. Forget about ASI, how can we even “generalize” the contemporary form of narrow AI (playing chess, driving cars, generating essays) into AGI (playing chess and driving cars and generating essays…)? Do some sort of embodiment in robots are necessary to do it, like the old-fashioned Hollywood imaginings? And if so, how to resolve the so-called “Moravec’s paradox” which observes that “it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility”? [8].

One candidate for “something big” that LeCun thinks we miss in AI research is “common sense”. In fact, we have now a kind of a resurgence of the study of common sense in AI, with Hector J. Levesque and Ronald Brachman leading the way. An AI with common sense was actually what the original AI pioneers of 1956—John McCarthy, Marvin Minsky, Allen Newell and so on—had in mind before the invention of deep learning and big data. Such an AI is now termed “good old-fashioned AI” (GOFAI). Those AI pioneers emphasized an intelligence which is not based on learning big data, but on common sense. What is “common sense”? Levesque and Brachman give a working definition in their Machines like Us: “Common sense is the ability to make effective use of ordinary, everyday, experiential knowledge in achieving ordinary, everyday, practical goals” [9]. And for Levesque and Brachman, our contemporary “AI systems display expertise, not common sense”. Common sense is thus the general component. They propose a knowledge representation and reasoning type of AI to do this.

In their book, Levesque and Brachman propose a scenario which involves a self-driving car, here is my version: it is 31 December and you want to buy some stuff for New Year’s party in your house with friends. You are lazy, so you send your self-driving car to shop instead, and you tell it to go to your favorite grocery store. The car approaches an intersection, the traffic light is red, so the car stops and waits for the green light. Five minutes go by, but the light is still red. The car’s sensors detect some activity and noise around the intersection, but no action is suggested. The car’s computers—excellent with obstacle avoidance and lane following—are not equipped to understand that drivers on the other side are getting out of their cars and gathering together. The car’s navigation system has no rerouting suggestions: there is no traffic or accidents. So the car sits there, waiting for the light to turn green. Then fifteen minutes go by, which means, your friends have come, and you are stressed out because your car has not even given any notification to your app that it has arrived at the store! Now, of course, what the car encounters is a New Year’s parade! But does it know this? If you were to drive the car, you would definitely change route, because you understand the situation. Does the car understand the situation? Does it even have to understand? Well, in our case, as humans, we do need to understand. And if an AGI is a human-level AI (or beyond), then it definitely needs to understand. To understand the above situation is to have common sense.

One important note is that Levesque and Brachman’s scenario involves an embodied intelligence, that is, an AI with a body of a car. Such a scenario might be an unfair counterargument to Hinton’s position, because his thesis is that you do not have to “act on” the world physically in order to understand it: “That’s awfully tough on astrophysicists. They can’t act on black holes!”. So how about a disembodied intelligence like GPT-4? Researchers at Microsoft claim that GPT-4 shows some “sparks” of common sense and AGI: the researchers asked GPT-4 to draw a unicorn three times over the course of one month and got that the AI seemed to get the concept of unicorn right [10]. Then GPT-4 were asked to stack a book, nine eggs, a laptop, a bottle and a nail: while the older version recommended placing the eggs on top of the nail, GPT-4 arranged the items so the eggs would not break. However, do such results demonstrate a real understanding of the world, or are they merely consequences of the mechanism of LLMs with big data? Does GPT-4 really understand why the eggs should not break, given the state of affairs of the world; or does it only connect words related to the word “eggs” with one another? It boils down to this: does GPT-4 really understand or merely show a behavior as if it understands?

For Hinton, there is no difference: to exhibit an “understanding behavior” is to simply understand. He might even be very curious of why people find it so difficult to accept that LLMs do understand just like our brain understand, for a very simple reason that we even still do not understand how we can understand at all. Sure. Such an interesting insight is emphasized by Sean Carroll in one of the episodes of his Mindscape podcast, titled “AI Thinks Different”, after the Sam Altman drama and the mysterious Q*—allegedly an AGI—project of OpenAI made headlines: “One possibility is that human beings, despite the complexity of our brains, are ultimately at the end of the day, pretty simple information processing machines and thought of as input-output devices, thought of as some sensory input comes into this black box and it does the following things, maybe we’re just not that complicated” [11]. Maybe, fundamentally, we are just computationally very simple. But, as the episode’s title suggests, Carroll maintains that AI thinks in a different way from us. For Carroll, the big difference lies on the fact that we humans have a “model of the world”, which contains knowledge of the laws of physics. In fact, a “worldview” is what Levesque and Brachman said common sense would allow us to construct, and in turn, it produces common sense. A worldview is that which enables us to understand something in different ways, even though the thing interpreted is the same. We do not only understand; we understand through a specific worldview.

And maybe to understand is to understand through a specific worldview. This is precisely what Immanuel Kant had found out centuries ago, with his introduction of the “transcendental a priori” that is inherent in our thought. For related reasons, William Egginton even argues that “Kant wouldn’t fear AI”, because “Kant realized that human cognition could not be reduced to following instructions, no matter how complex” [12]. Indeed, philosophy is one of the domains that computation cannot wholly grasp, because it concerns directly with worldviews. One does not simply understand; one understands—for instance—as a Derridean. I once had a long conversation with GPT-4 in which I prompted it to be a Derridean by asking it to elaborate how LLMs function. It was disappointing: its answers were very general and it did not even try to roleplay. Whereas if we ask Jacques Derrida himself, we would get a very eccentric—and to an extent “annoying”—answer typical of a deconstructionist (remember Derrida’s interview where he even problematized the very word “elaborate”). Derrida’s worldview permeates his choice of words and manner of speaking. Now, the irony is that, a deconstructionist might say that a worldview is nothing but a connecting of certain words with certain functions in a certain way, and there is no such thing as a “totalizing principle” that we call “worldview”. Is not that how an LLM works?

I remember that in March 2023, Neil Gaiman tweeted: “ChatGPT doesn’t give you information. It gives you information-shaped sentences”, to which Francois Chollet, the author of Keras deep learning open-source library, replied: “For an LLM there is no difference between saying something true, something false, or pure nonsense” [13]. A connectionist AI champion such as Hinton would reject Chollet’s characterization by saying that such differentiation is not necessary. However, it still does not explain why humans have such a need to even differentiate the two in the first place.

There is one thing that a human deconstructionist has and an LLM does not have: it is the “passion”—or maybe even “commitment” to be in a certain partial position looking at things. Such a “passion” of partiality is experienced by us when we are talking with Derrida, for instance. This might be the luxury of having an opinion and a sentiment toward something. Ask GPT-4 or PaLM 2 (Bard AI) about their opinion on things, they will answer with a kind of apology: “I’m sorry, as an LLM, i do not have an opinion, but I can give you information…”. Of course, they are programmed to do that and one can surely, in principle, design a “deconstructionist AI” of sorts, but it would not be general intelligence. What is curious is that in human intelligence, there is an interplay between partiality and generality, the one which is difficult or maybe impossible to be imitated by machines. In The Myth of Artificial Intelligence: Why Computers Can’t Think The Way We Do, Erik J. Larson, an AI researcher at DARPA, identifies this interplay as what Charles Sanders Peirce called “abductive reasoning”, because in contrast to induction which “moves from facts to generalizations that give us (never certain) knowledge of regularity”, abduction “moves from the observation of a particular fact to a rule or hypothesis that explains it” [14].

Abductive reasoning is a kind of guesswork which is not merely “random stabs at truth”. This is the enigmatic domain of hunches and detective work that we typically use as the starting point to even do a more deductive or inductive chains of reasoning. “Without a prior abductive step”, Larson continues, “inductions are blind, and deductions are equally useless” [15]. Scientists and mathematicians do it all the time. Even Copernicus had to ignore mountains of data accumulated through the Ptolemaic worldview (thus it was not deductive nor inductive) in order to propose his heliocentric model, the founding gesture which is now known as the hallmark of scientific revolution. Here one cannot but remember Planck’s heuristic solution to the ultraviolet catastrophe that gave birth to quantum physics or all of Einstein’s annus mirabilis papers for that matter. In order to deduce or induce, one has to have a framework (worldview) and prior contextual knowledge first, because only then can one propose a hypothesis or identify a problem in the first place.

William Dembski challenges Larson’s thesis that our contemporary AI cannot do abductions by way of showing that GPT-4 “performed admirably” in choosing the best explanation for some problems, thus making it capable to do an “inference to the best explanation” (IBE) [16]. But Dembski here conflates IBE with abduction. Abduction is not IBE. As Niya Stoimenova, an AI researcher at DEUS, said: “Generally, abduction covers the act of generating a novel case (where learnings can be transferred from one context to another). IBE, on the other hand, is a very special and more context-specific form of induction that doesn’t necessarily require you to identify patterns quantitatively (i.e., you don’t need to observe a pattern 10.000 times to formulate a rule)” [17]. She refers to the paper by William H. B. Mcauliffe, a scholar of Charles Sanders Peirce, that explains in more detail of why abduction is not IBE [18].

Simply put, the fact that abduction involves “the act of generating a novel case” explains how Copernicus could propose his heliocentric model and how Descartes could eccentrically proclaim “cogito ergo sum“, which seemingly came out of nowhere—both cannot be reduced to a certain computing of a “best explanation” for a given data, because both introduce a breakout from the implicit worldview which is inscribed in the “best explanation” of the data. Abduction is generative, but not in the way that an LLM is. A simple reading of the history of science and philosophy is sufficient to give us another insight about our mysterious humanly intellectual capabilities, that of what Thomas Kuhn termed as “paradigm shift”, or a change of worldviews. We do not only have worldviews; we can also change our worldviews as a whole. Such a dynamic process is akin to a self-referential process in which a program can change its own program. This problem of self-programming is still unsolved or even impossible to solve, for it is rooted in a much fundamental problem of the limit of computability: there are problems that a computer cannot solve, not due to empirical or technological constraints, but because of the limits inherent in computation itself. Alan Turing realized this (he also termed “intuition” for what we call “abduction”). But this is for another essay.

So, who is right: Hinton or LeCun? One might infer from the exposition above that I am more of a LeCunian (in addition to being a Lacanian). I kind of am, but this does not necessarily mean that for me a connectionist AI such as GPT-4 is useless—hell, even I use it all the time for work and daily routine. What I just find problematic in the connectionist paradigm of AI is that it tends to overlook its inherent limitation, to the point of producing nightmarish doomsday scenarios. Thus we get a collective paranoia boosted by the media. But I am also totally not ready to be as optimistic as LeCun is, especially with regard to how humans would handle AI. Sure, LeCun has emphasized that Meta is “open”—unlike OpenAI—about its research on AI, but he has not emphasized enough about how AI could be dangerous in the hands of people with bad intentions. I’m here heeding Norbert Wiener, one of the founding figures of our informational age (aside from Alan Turing and Claude Shannon), who said that “the machine’s danger to society is not from the machine itself but from what man makes of it” [19]. Yes, in accordance with Hinton and Bostrom’s apocalyptic visions, our future might involve AIs that dominate humans, but those AIs in turn are controlled by a certain group of people! It will be, in essence, the good old story of l’exploitation de l’homme par l’homme.

References

[1] https://www.acm.org/media-center/2019/march/turing-award-2018

[2] A video interview by CNN: https://edition.cnn.com/videos/tv/2023/05/02/the-lead-geoffrey-hinton.cnn

[3] https://www.wired.com/story/artificial-intelligence-meta-yann-lecun-interview/

[4] https://venturebeat.com/ai/ai-pioneers-hinton-ng-lecun-bengio-amp-up-x-risk-debate/

[5] https://www.newstatesman.com/long-reads/2023/06/men-made-future-godfathers-ai-geoffrey-hinton-yann-lecun-yoshua-bengio-artificial-intelligence

[6] https://www.cnbc.com/2023/12/03/meta-ai-chief-yann-lecun-skeptical-about-agi-quantum-computing.html

[7] Access the paper here: https://arxiv.org/abs/1706.03762

[8] Hans Moravec, Mind Children: The Future of Robot and Human Intelligence, (London: Harvard University Press, 1988), p. 15.

[9] Hector J. Levesque and Ronald J. Brachman, Machines like Us: Toward AI with Common Sense, (Cambridge: The MIT Press, 2022), p. 12.

[10] Access the paper here: https://arxiv.org/abs/2303.12712

[11] Read the full transcript of the episode in Carroll’s own blog: https://www.preposterousuniverse.com/podcast/2023/11/27/258-solo-ai-thinks-different/

[12] https://time.com/6309281/immanuel-kant-ai-essay/

[13] https://twitter.com/fchollet/status/1639692810659188737

[14] Erik J. Larson, The Myth of Artificial Intelligence: Why Computers Can’t Think the Way We Do, (Cambridge: The Belknap Press, 2021), p. 160.

[15] Ibid, p. 161.

[16] https://evolutionnews.org/2023/09/inferring-the-best-explanation-via-artificial-intelligence/

[17] https://towardsdatascience.com/on-why-machines-can-think-40edafce293d

[18] Mcauliffe, William H. B. “How Did Abduction Get Confused with Inference to the Best Explanation?” Transactions of the Charles S. Peirce Society, vol. 51, no. 3, 2015, pp. 300–19. Access the paper here: https://www.jstor.org/stable/10.2979/trancharpeirsoc.51.3.300?read-now=1&seq=1#page_scan_tab_contents

[19] Norbert Wiener, The Human Use of Human Beings: Cybernetics and Society, (London: Free Association Books, 1989), p. 182.

Source link