Prelude to Artificial General Intelligence | by Pedro Uria-Recio

On November 18, 2023, Sam Altman, the Chief Executive Officer of Open AI, faced termination from the Board of Directors of his company. Three days later, on November 21, OpenAI announced Altman’s reinstatement as CEO along with a new initial Board.

According to leaked information, concerns surfaced within OpenAI regarding an AI breakthrough called Q*, potentially a precursor to Artificial General Intelligence (AGI), prompting lead researchers to voice reservations to the Board. Following these discussions, Sam Altman was abruptly removed from OpenAI, sparking a tumultuous series of days involving Altman, the board, and employees. The termination appeared to be connected to Altman’s inclination to quickly commercialize Q*, in contrast to the board’s focus on prioritizing safety measures [Knight].

AGI is a collective term that emerged as researchers and experts in AI contemplated creating machines capable of replicating human-like intelligence in all its cognitive aspects, which would include precisely what algorithms cannot do today: the ability to reason, devise strategies, make decisions under uncertainty, and represent knowledge, such as common sense. It also encompasses planning future actions, learning from past experiences, and effectively communicating in natural language. Furthermore, it is considered desirable for AGI to possess specific physical attributes, such as visual and auditory perception, and the ability to detect potential dangers and take action.

Any one of these elements would represent the next evolutionary step for AI, as the AI algorithms that we have today are only capable of solving specific, pre-defined problems and can make decisions only within the specific context of AI programming. Pioneers in AI, such as Alan Turing and John McCarthy, laid down fundamental ideas for AGI in their work decades ago, postulating how general cognitive abilities would take shape as an evolutionary step.

Despite uncertainty about the nature of Q*, some people speculated that it could signify a groundbreaking architectural development comparable to the advent of transformers, introduced by Google in 2017, which is the foundational technology that made possible all current Large Language Models (LLMs), including OpenAI’s GPT-4 [TechCrunch]. Speculating about what Q*could be is a handy path to introduce the current areas of research toward achieving AGI. This is the framework for this chapter.

The following pages deep dive into today’s multiple fields of study, which effectively constitute the starting gun for AGI.

Machines of Tomorrow: From AI Origins to Super Intelligence & Post Humanity. How AI Will Shape Our World.

Link to the book: Machines of Tomorrow

One of the potential explanations of what Q* actually is concerns OpenAI’s initiative to solve mathematical problems. While this may seem like a simple accomplishment that AI has already addressed, there is significance in AI actually grasping mathematical reasoning. The true importance lies in AI’s potential to comprehend mathematical proofs, with far-reaching implications across various domains, given the foundational role of mathematics in the world [Berman].

This reminds us of the “General Problem Solver” designed by Allen Newel and Herbert Simon in 1957 and the symbolic logic approach defended by John McCarthy, which we discussed in Chapter 4. More than 60 years have spanned since then, but machines still struggle with tasks involving logic and reasoning and still cannot proceed without specific, coded directives.

The capacity for independently generating mathematical or logical proofs requires a deeper comprehension of the proofs themselves, which would surpass the predictive abilities of current LLMs. When confronted with a problem involving mathematical or logic concepts, LLM models may provide correct answers without genuinely understanding the underlying theoretical proofs, merely reproducing patterns from their programming and training data. This underscores the existing limitations in the ability to grasp the rationales behind logical principles. The task of comprehending why answers are correct or incorrect, rather than merely predicting the next character in a sequence, is currently an insurmountable challenge for LLMs.

In this regard, a research paper titled “STaR: Bootstrapping Reasoning with Reasoning” was published by Stanford and Google in May 2022 [Zelikman et al.]. The paper explores the generation of step-by-step chains of thought to enhance the performance of language models in complex reasoning tasks, such as common sense questions or mathematics. The concept of “Chain of Thought” involves guiding the model to reason about intermediate steps when faced with challenging problems rather than directly arriving at complex solutions at once. This step-wise approach results in more accurate answers. The paper introduces a framework named STAR (Self-Taught Reasoner), which iteratively enhances an AI model’s complex reasoning capabilities by following a cyclic process. First, the STAR generates rationales for answering questions based on a few examples. Then, it refines the rationales in case of incorrect answers and fine-tunes the model using the new rationales. Then, it goes back to the first step and continues iterating until the answers are good enough. OpenAI published a paper along the same lines in March 2023, where it also suggested breaking down large problems into intermediary reasoning steps and applying feedback for each intermediate step, not only for the final result, which has been common practice until now [Lightman et al.].

This multi-step approach is very intuitive and represents how human beings think. When confronted with a complex mathematical problem or when required to write a significant piece of programming code, we do not immediately arrive at the ultimate solution in one go, especially for complex problems. Instead, we break down problems into smaller components, address each part individually, and then integrate these solutions to derive the overall answer. This systematic approach is particularly evident in coding but also in any kind of engineering project, as well as in writing a book. Similarly, applying the same modularity principles to algorithms could also increase the rationality of AI systems.

There is a second theory about what Q* could be. The term Q* hints at a connection with fundamental themes in the scientific literature on reinforcement learning, precisely Q-learning and the A* algorithm. Q-learning is the most common reinforcement learning algorithm, and we have discussed it in Chapter 6. Additionally, A* is a classic graph search algorithm developed in 1968 to plan routes for robots, in particular, a robot called Shakey, which we will talk about in Chapter 11. Shakey was the first mobile, all-purpose robot with self-reflective reasoning; it could entirely deconstruct commands into their most basic parts, whereas other contemporary robots would need instructions on each step of a more complex task.

Consequently, there has been speculation that Q* might involve a fusion of Q-learning and A* search with the very ambitious objective of bridging an LLM with the foundational aspects of deep reinforcement learning.

Reinforcement learning is compelling due to its capacity to look ahead and plan future moves within a complex environment of possibilities and its ability to learn through self-play. Both tactics played a pivotal role in the success of AlphaGo, the machine-learning software we talked about in Chapter 7. AlphaGo not only defeated the best Go players worldwide but surpassed them significantly. Notably, lookahead planning and self-play have not been integral to LLMs thus far [Berman].

Look-ahead planning is a process wherein a model anticipates future scenarios to generate improved actions or outputs. Presently, LLMs face challenges in executing effective lookahead planning. Their responses often just rely on predicting the next probable token in a sequence, lacking precise foresight and strategic planning. One way of applying look-ahead to LLMs would employ a tree-shaped structure to systematically explore various optimization possibilities to solve a problem through the trial-and-error process of a reinforcement learning algorithm. Such techniques could not substantially improve the model’s ability to plan ahead, but they would partially augment its capability to address logic and reasoning challenges. However, it is unlikely that models trained this way offer a really profound comprehension of the underlying reasons for the validity or invalidity of logical or mathematical arguments.

The angular stone of applying reinforcement learning to LLMs is the idea of self-play. Self-play involves an agent enhancing its gameplay by interacting with slightly varied versions of itself. Self-play has also not been part of the standard training techniques of LLMs. LLMs do not play against themselves to continue learning better answers. Instead, as we know from the previous chapter, LLMs are trained through self-supervised learning. In the field of LLMs, most instances of self-play are likely to resemble automatic AI feedback of an AI playing against itself, rather than competitive interactions playing against humans.

Automatic AI feedback basically means that an AI model would be getting feedback about its strengths and weaknesses automatically from another AI system, whose main function is to evaluate the first model. The concept of AI feedback is a fundamental area of research at this moment. Current LLMs such as GPT undergo training using RLHF (Reinforcement Learning by Human Feedback), a method wherein the model is instructed and refined based on feedback from human evaluators, who manually score the AI on how good or bad answers are or how appropriate they are in terms of ethics, bias or politeness. This method has been effective in developing the first generation of LLMs. However, it is a time-consuming and costly process due to its dependence on human input [Christiano et al.].

Another important concept is self-improvement. Self-improvement entails an AI system engaging in repeated self-play, surpassing human performance by exploring various possibilities within the game environment. The shift from human scoring to automatic AI-based scoring at a scale, especially if another AI model is involved in the hand-offs, would allow AI models to self-improve, representing a watershed breakthrough in the development of AGI.

This methodology was again prominently demonstrated in the case of AlphaGo. AlphaGo was initially designed to learn by imitating expert human players. Doing this, it reached a level comparable to the best human players but fell short of surpassing them. The breakthrough came with self-improvement, where the AI played millions of games in a closed sandbox environment, optimizing its performance based on a simple reward function of winning the game. This new approach allowed AlphaGo to surpass human capabilities, outperforming top human players within 40 days through self-improvement.

There are several manners in which self-improvement could work with LLMs. Consider a scenario in which queries are directed to an LLM. Typically, the model provides an answer, but it is not easy to know how good the answer is. However, the introduction of a second agent to scrutinize and validate the initial agent’s work markedly improves the quality of results. This would be comparable to the GAN models used in deep-fakes that we discussed in the previous chapter, where one model is trained to create realistic images, and another one is trained to evaluate how realistic those images are. Each one feeding into the other, the two parts of the GAN enter into a self-improvement cycle.

That reward function to evaluate how good or bad the results are in the case of Go is very explicit. It is determined by the number of stones a player has on the board and the territory they control. The main challenge for LLMs lies in the absence of a general reward criterion, unlike in the game of Go, where winning or losing is clear and thus programmable. Language, being diverse and multifaceted, lacks a singularly discernable reward function or reward definition for swiftly evaluating all decisions regarding output, e.g., creating content.

While the potential for self-improvement in narrow domains exists, extending this concept to the general case remains an open question in the field of AI. Answering that question might unlock the key to AGI.

Apart from self-play, there are other ways of creating self-improving algorithms. One of them is called genetic algorithms. Q* has not been connected directly to genetic algorithms, but they are similar to the concept of self-play.

The concepts of natural selection and genetics — which hold that individuals with advantageous traits have a higher chance of surviving and procreating and passing those traits on to the following generation — are mimicked by genetic algorithms. This is actually not a new concept. The initial genetic algorithms were developed by Lawrence J. Fogel in 1960 [Fogel], but there is currently a renewed interest in them due to their applications in AI.

In a genetic algorithm, a population of potential solutions to a given problem is represented as a set of individual software programs, or individuals for short, each encoded as a string of parameters or variables. These individuals are then evaluated based on their fitness, which measures how well they solve the problem at hand. The fitter individuals are more likely to be selected to form the next generation, simulating the natural selection process.

Much like in humans, the genetic algorithm operates through a cycle of selection, crossover, and mutation. During selection, individuals are chosen based on their fitness to serve as parents for the next generation. Crossover involves combining the genetic information of two-parent methods to create offspring with a mix of their traits. That genetic information is the encoding of the method itself, for example, its hyperparameters. Mutation introduces random changes in the offspring method’s genetic information, adding diversity to the population. This process is repeated over several generations, and over time, the population evolves towards better solutions to the problem.

When solving optimization problems with a large and complex solution space, genetic algorithms are helpful. They have been successfully used in a variety of fields, including engineering, finance, and Machine Learning, to find solutions that may be challenging to develop using conventional optimization techniques.

The drawback with genetic algorithms is similar to that of reinforcement learning, specifically that they necessitate an objective function that defines the algorithm’s effectiveness. In Darwinian natural selection, applied to living species, the reward function is not dying before reproduction [Darwin]. As we have highlighted, defining a proper reward function for LLMs is challenging.

The third speculative theory about Q* suggests a potential connection between Q* learning and synthetic data. Synthetic data is another promising area of research to accelerate the learning of AI systems toward AGI. Synthetic data is data that is not real, as in training data gathered from real-world sources, but rather is realistic enough for an AI algorithm to be trained effectively on it.

Acquiring high-quality datasets poses a ubiquitous and formidable challenge. Companies possessing an exceptionally valuable, distinct, and well-maintained dataset hold significant value. Only a few companies have extensive and unique datasets of the likes of Google, Amazon, Meta, Reddit, and a few others slightly lower on the totem pole, such as Mobile Operators and Banks. Notably, OpenAI lacks its own exclusive dataset and sources datasets from various channels, including purchases and Open-Source datasets. If AI could autonomously generate synthetic datasets, that would eliminate the reliance on this limited number of sources. Many prominent companies and startups are working on synthetic data, but there are severe obstacles to maintaining quality and avoiding premature stagnation.

For example, in the case of self-driving cars, only a few companies like Google’s Waymo or Tesla have been able to build huge datasets with millions of hours of real video of roads because they started incorporating video cameras and scaled data collection years ago. Other more traditional car manufacturers, like GM or Ford, are also building self-driving cars, but they started incorporating video cameras much later and do not have the extensive video dataset that Google or Tesla have. Synthetic data will be hugely useful for them to train their driving algorithms. We will talk more about self-driving cars in Chapter 13 when we talk about robot mobility.

The chief advantage of synthetic data is that it is an avenue to introduce completely innovative ideas or approaches into a model. Models trained on static datasets are only as reliable as those datasets and may be unable to genuinely generate new ideas. Current LLMs heavily rely on their training set, producing responses derived from existing knowledge rather than generating genuinely new and innovative ideas. Coming back to the example of Alpha Go above, when AlphaGo was using training data from expert human players, the only strategies it could learn were those strategies used by humans, but there could be other much better strategies that are not reflected in this limiting training dataset. Similarly, by providing an AI model with synthetic data of good quality, we are opening up the solution space in which the model can learn.

Synthetic data could be highly valuable for training AGI. Even combining all available datasets might fall short of meeting the data requirements for training advanced AI, such as AGI. The solution lies in synthetic data or a hybrid of real and synthetic data, and there is speculation that Q* might be leveraging this approach. Variations or mutations created through synthetic data could be employed to train algorithms through self-play with automatic AI feedback or through genetic algorithms.

Synthetic data also represents a very slippery slope to the evolution of AI. One of the problems with synthetic data, which is realistic but not real, is, of course, that it makes it very difficult to differentiate between real content and realistic content or, in terms of output, what is real and what is fabricated. While it is an avenue to introduce completely innovative ideas or approaches into a model, it can also be used for the injection of personal biases. The algorithm, in fact, cannot make the distinction, and this opens the door to massive manipulation of information, such as fake news, fake videos, fake evidence in criminal cases, and any highly biased input. We will talk about this in detail in the context of dystopian outcomes in Chapter 23.

Finally, we will cover two areas of current research that are not related to the Q* rumors that surfaced with the firing of Sam Altman but hold promise to advance AGI. The first one is using sensory data, which comes from our senses, primarily video, to train algorithms. For some prominent AI leaders, including Yann LeCun, harnessing sensory data for training algorithms has the potential to expedite the acquisition of knowledge [LeCun].

Animals and humans show rapid cognitive development with much less data input than current AI systems, which require enormous amounts of training data. At the moment, LLMs are typically trained on text datasets that would require 20 thousand years for a human to read. Even with all of this training data, these models still have trouble with basic ideas like logical or mathematical reasoning. Humans, on the other hand, require much less textual training data to reach a higher level of understanding.

According to LeCun, the explanation for this lies in humans encountering a broad range of data types beyond mere text. Specifically, a significant portion of the information we receive is in the form of images and videos, which is a highly rich and inherently contextual format. If we take into account visual data and the richness of images in comparison to text, humans’ data intake surpasses the training data of a LLM, even from a young age. For instance, a two-year-old’s exposure to visual data is approximately 600 terabytes, while the training data for an LLM typically amounts to about 20 terabytes. This implies that a two-year-old has been exposed to 30 times more data than an LLM typically receives during its training process. Advertising executive Fred Barnard once famously coined, “A picture is worth a thousand words,” and as it turns out, he was directionally correct, though slightly overstating the reality.

According to LeCun, the reason humans learn faster is not solely because our brains are larger than current LLMs. Instead, he gives another reason to support the argument that video data also holds significant importance in the training process. Animals, including parrots, corvids, octopuses, and dogs, are also considerably more intelligent than current LLMs. These animals possess roughly a few trillion hyperparameters, which aligns closely with current LLMs. GPT-4 is rumored to have 1.76 trillion hyperparameters, while GPT-3 has 175 billion. The hyperparameters of an artificial neural network are equivalent to brain synapses.

By way of comparison, human brains are indeed much bigger than current LLMs. Humans have around 100 billion neurons and between 100 and 1,000 trillion synapses — with younger people having many more synapses than older ones [Herculano‐Houzel] [Wanner] [Zhang] [Yale].

New architectures that can mimic the effective learning seen in humans and the above animals are being developed to use sensory data in the learning process of advanced AI models. Adding more text data — whether synthetic or not — works as a temporary measure. But integrating sensory data, especially video, is the ideal solution that might be able to get scientists closer to AGI. Video has more bandwidth than text and superior internal structure because it contains spatial, movement, audio, and textual data. Video also offers more learning opportunities than text because of its natural repetition, and it offers significant knowledge about the structure of the world.

Ultimately, as the Romans quite aptly put it, “de gustibus et de colorem non disputandem” (about taste and color there is no dispute.) Exclusively text-based training where data includes the word “green,” for example, will never generate contextual understanding that green to me can be blue to you. Or when someone says “my, that food is delicious,” the true meaning can only be understood when seeing if it was said with rolling eyes or not. As there are no extensive databases of real videos suitable to train AI on contextual common sense, it is highly likely that synthetic video will be used for this, opening up the risk of specific bias, as we have reviewed previously.

A “world model” in AI refers to a comprehensive representation of an environment that is used in reinforcement learning. World models encapsulate the key elements, dynamics, and relationships within that environment, allowing an AI agent to simulate and understand its surroundings. World models have also been used in robotics training since the early days. These models enable the robot to interpret and predict events, facilitating decision-making and planning. We will talk about this in detail in Chapter 11.

Human beings also use mental representations similar to world models. These representations are built through sensory perception, experience, and learning, allowing individuals to understand and navigate the world around them. Human world models encompass various experiential elements, including spatial relationships, cause-and-effect dynamics, and social interactions. It informs what is colloquially referred to as one’s “worldview.” Similar to robot training, humans use these mental representations for decision-making, planning, and adapting to new situations. In many ways, it is inescapable to inform decisions.

Current LLMs do not explicitly incorporate elaborated world models, and integrating them is one of the advanced avenues being explored today to achieve AGI. It could offer two key advantages toward the goal of instantiating AGI: First, by incorporating a broader understanding of the environment, LLMs could generate more contextually relevant and informed responses, unlocking the model’s ability to engage in nuanced conversations, comprehend more complex scenarios, and provide more accurate and context-aware information. Second, world models might also enable LLMs to simulate and reason about different situations, potentially improving their capacity for common sense reasoning and problem-solving in diverse contexts.

We note that world models might also be related to the topic of human and AI consciousness. Consciousness is the state of being aware of and able to perceive one’s thoughts, sensations, feelings, and surroundings. This is a complicated topic that science knows little about. We will discuss this in the context of Superintelligence in Chapter 26.

Source link