Generative AI—think Dall.E, ChatGPT-4, and many more—is all the rage. It’s remarkable successes, and occasional catastrophic failures, have kick-started important debates about both the scope and dangers of advanced forms of artificial intelligence. But what, if anything, does this work reveal about natural intelligences such as our own?
I’m a philosopher and cognitive scientist who has spent their entire career trying to understand how the human mind works. Drawing on research spanning psychology, neuroscience, and artificial intelligence, my search has drawn me towards a picture of how natural minds work that is both interestingly similar to, yet also deeply different from, the core operating principles of the generative AIs. Examining this contrast may help us better understand them both.
The AIs learn a generative model (hence their name) that enables them to predict patterns in various kinds of data or signal. What generative there means is that they learn enough about the deep regularities in some data-set to enable them to create plausible new versions of that kind of data for themselves. In the case of ChatGPT the data is text. Knowing about all the many faint and strong patterns in a huge library of texts allows ChatGPT, when prompted, to produce plausible versions of that kind of data in interesting ways, when sculpted by user prompts—for example, a user might request a story about a black cat written in the style of Ernest Hemingway. But there are also AIs specializing in other kinds of data, such as images, enabling them to create new paintings in the style of, say, Picasso.
What does this have to do with the human mind? According to much contemporary theorizing, the human brain has learnt a model to predict certain kinds of data, too. But in this case the data to be predicted are the various barrages of sensory information registered by sensors in our eyes, ears, and other perceptual organs. Now comes the crucial difference. Natural brains must learn to predict those sensory flows in a very special kind of context—the context of using the sensory information to select actions that help us survive and thrive in our worlds. This means that among the many things our brains learn to predict, a core subset concerns the ways our own actions on the world will alter what we subsequently sense. For example, my brain has learnt that if I accidentally tread on the tail of my cat, the next sensory stimulations I get will often include sightings of wailing, squirming, and occasionally feelings of pain from a well-deserved retaliatory scratch.
Read More: AI and the Rise of Mediocrity
This kind of learning has special virtues. It helps us separate cause and simple correlation. Seeing my cat is strongly correlated with seeing the furniture in my apartment. But neither one of these causes the other to occur. Treading on my cat’s tail, by contrast, causes the subsequent wailing and scratching. Knowing the difference is crucial if you are a creature that needs to act on its world to bring about desired (or to avoid undesired) effects. In other words, the generative model that issues natural predictions is constrained by a familiar and biologically critical goal—the selection of the right actions to perform at the right times. That means knowing how things currently are and (crucially) how things will change and alter if we act and intervene on the world in certain ways.
How do ChatGPT and the other contemporary AIs look when compared with this understanding of human brains and human minds? Most obviously, current AIs tend to specialize in predicting rather specific kinds of data—sequences of words, in the case of ChatGPT. At first sight, this suggest that ChatGPT might more properly be seen as a model of our textual outputs rather than (like biological brains) models of the world we live in. That would be a very significant difference indeed. But that move is arguably a little too swift. Words, as the wealth of great and not-so-great literature attests, already depict patterns of every kind—patterns among looks and tastes and sounds for example. This gives the generative AIs a real window onto our world. Still missing, however, is that crucial ingredient—action. At best, text-predictive AIs get a kind of verbal fossil trail of the effects of our actions upon the world. That trail is made up of verbal descriptions of actions (“Andy trod on his cat’s tail”) along with verbally couched information about their typical effects and consequences. Despite this the AIs have no practical abilities to intervene on the world—so no way to test, evaluate, and improve their own world-model, the one making the predictions.
More From TIME
This is an important practical limitation. It is rather as if someone had access to a huge library of data concerning the shape and outcomes of all previous experiments, but were unable to conduct any of their own. But it may have deeper significance too. For plausibly, it is only by poking, prodding, and generally intervening upon our worlds that biological minds anchor their knowledge to the very world it is meant to describe. By learning what causes what, and how different actions will affect our future worlds in different ways, we build a firm basis for our own later understandings. It is that grounding in actions and their effects that later enables us to truly understand encountered sentences such as “The cat scratched the person who trod on its tail.” Our generative models—unlike those of the generative AIs—are forged in the fires of action.
Might future AIs build anchored models in this way too? Might they start to run experiments in which they launch responses into the world to see what effects those responses have? Something a bit like this already occurs in the context of online advertising, political campaigning, and social media manipulating, where algorithms can launch ads, posts and reports and adjust their future behavior according to specific effects on buyers, voters, and others. If more powerful AIs closed the action loop in these ways, they would be starting to turn their currently passive and “second-hand” window onto the human world into something closer to the kind of grip that active beings like us have on our worlds.
But even then, there’d be other things missing. Many of the predictions that structure human experience concern our own internal physiological states. For example, we experience thirst and hunger in ways that are deeply anticipatory, allowing us to remedy looming shortfalls in advance, so as to stay within the correct zone for bodily integrity and survival. This means that we exist in a world where some of our brain’s predictions matter in a very special way. They matter because they enable us to continue to exist as the embodied, energy metabolizing, beings that we are. We humans also benefit hugely from collective practices of culture, science, and art, allowing us to share our knowledge and to probe and test our own best models of ourselves and our worlds.
In addition, we humans are what might be called “knowing knowers”—we depict ourselves to ourselves as having knowledge and beliefs, and we have slowly designed the complex worlds of art, science, and technology to test and improve our own knowledge and beliefs. For example, we can write papers that make claims that are swiftly challenged by others, and then run experiments to try to resolve the differences of opinion. In all these ways (even bracketing obvious but currently intractable questions about ‘true conscious awareness’) there seems to be a very large gulf separating our special kinds of knowing and understanding from anything so far achieved by the AIs.
Could AIs one day become prediction machines with a survival instinct, running baseline predictions that pro-actively seek to create and maintain the conditions for their own existence? Could they thereby become increasingly autonomous, protecting their own hardware and manufacturing and drawing power as needed? Could they form a community, and invent a kind of culture? Could they start to model themselves as beings with beliefs and opinions? There is nothing in their current situation to drive them in these familiar directions. But none of these dimensions is obviously off-limits either. If changes were to occur along all or some of those key missing dimensions, we might yet be glimpsing the soul of a new machine.