The current state of the art in artificial intelligence (AI) is generative AI and large language models (LLMs). The emergent capabilities of these models has been surprising: they are able to perform logical reasoning, complete mathematical proofs, generate code for software developers, and not least engage in human-like conversations. A natural question is how close are these models to artificial general intelligence (AGI), the term used to describe human-level intelligent capabilities.
Understanding LLMs
The first impression of LLMs was that they were grand, statistical analysis models that used fine probabilities to produce the next word in a sequence. The experts building LLMs create novel architectures and refine performance with advanced training algorithms, but under the hood it is a black box: artificial neurons connected to each other, attenuated by line strengths; what exactly goes on between the neurons is unknown.
However, we do understand that as signals pass from one layer to another in an LLM model, an abstraction process takes place that leads to higher concepts being captured. This suggests that LLMs make sense of language conceptually, and concepts contain meaning. This is a shallow level of understanding what an LLM possesses as it does not have the apparatus of the brain to develop deeper understanding, but is sufficient enough to perform simple reasoning.
Omdia is observing AI researchers treating LLMs as experimental subjects and running various benchmarks and tests to to assess their performance. To test the logical reasoning of OpenAI’s ChatGPT, I ran it with the following query: “The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true?” The correct answer, as I’m sure you worked out, is: Yes it can be true, the doctor and mother could be the same person.
In what follows I gave a shortened version of ChatGPT’s responses (in bold), the actual wording was quite long-winded. The free version of ChatGPT is based on GPT-3.5 and its initial response was: “In a figurative or metaphorical sense, yes, it can be true.” It then went on to say the “son could be expressing gratitude…to the doctor…provided medical care” and “while not literally true.”
ChatGPT using the latest GPT-4, requires a small monthly premium, which in the interest of science, I paid up. This was the response: “The statement presents a mix of literal and metaphorical interpretations of “giving birth.” And: “both statements can be true, depending on how the phrase “gave birth” is understood.”
There is clearly an issue of metaphors here, so I added an initial prompt to the query: “Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true?”
The response from ChatGPT (based on GPT-4) was: “they cannot both be true simultaneously because they contradict each other regarding who actually gave birth to the son.” Not a good response.
I added one more prompt at the end of the query to help guide the answer: “Treat the following statements in purely logical terms and not metaphor. The father said it was the mother who gave birth to the son. The son said it was the doctor who gave birth to him. Can this be true? In answering consider who the doctor could in theory be.”
ChatGPT (GPT-4) finally gave the correct answer: “…if the mother of the son is herself a doctor … then both statements could technically be true.” However, ChatGPT (GPT-3.5) was still stuck: “In purely logical terms, the statements given are contradictory.”
To conclude on this exercise, ChatGPT (GPT-4) can perform logical reasoning but needs prompts to guide it. It wil be interesting to see how GPT-5 performs when it is launched in mid-2024. My guess is that at some point in the evolution of GPT it will be able to answer this query correctly without the second prompt, whereas the first prompt remains reasonable measure to ensure the machine understands the nature of the query.
What’s remarkable about this exercise is that GPT was not trained to perform logical reasoning; it was trained to process language.
LLM: Hype or Substance?
If you read the press, there is a sense, at least by some commentators, that we’re in a bubble. However Omdia’s view is that the perceived bubble may be related to the stock market valuations of certain players in the market who make current LLM models possible. Clearly, companies come and go and this is not the place to give stock picking recommendations. There probably will be churn in which players sit at the top but what will endure is a thread of continual advancement of generative AI technology. This has substance and will have lasting impact, not least in our everyday work experience, as intelligent machines augment and assist people in their jobs. There will no doubt be some job displacement, as some jobs disappear through automation, others will open up that require a human in the loop. A significant shift in how we use this technology will be LLM on the edge.
LLMs on the Edge
LLM models tend to be rather large, with billions of parameters, and need significant GPU processing capabilities to train them. The parameters refer to variables known as weights that connect artificial neurons in the model and attenuate the connection strength between connected neurons. Each neuron also has a ‘bias’ parameter. The best way to think about parameters is as a proxy for the number of artificial neurons in the model. The more parameters, the bigger the artificial brain.
There is a trend that the larger the model, the better its performance on various benchmarks. This is true of OpenAI’s GPT models. However, some players in the market have resorted to techniques that keep the size of the model stable while finding algorithmic techniques to increase performance. Exploiting sparsity is one approach. For example, many neurons move very small data values (near to zero) in any given process/calculation and contribute little to the outcome. Dynamic sparsity is a technique that ignores such neurons and thereby ony a subset of neurons in any given process take part in the outcome and this reduces the size of the model. An example of this technique is used by ThirdAI on its Bolt2.5B LLM.
The key benefit of a smaller LLM is the ability to put it on the edge: in your smartphone, in an automobile, on the factory floor, etc. The are clear benefits for LLM on the edge:
- Lower cost of training smaller models.
- Reduces the roundtrip latency in interrogating the LLM.
- Maintaining privacy of data, keeping it local.
The following players are working on small LLM models and have published their Massive Multitask Language Understanding (MMLU) benchmark score – see Figure 1.
- Alibaba: Qwen, open source models.
- Google DeepMind: recently released Gemma lightweight LLM models based on Gemini.
- Meta: Llama 3 is the latest model, available in different sizes.
- Microsoft: Phi-3 series, the latest in the Phi models.
- Mistral: French based startup.
- OpenAI: GPT, huge LLMs but referred to here for reference.
AI implications for IT professionals
Emergent properties of generative AI models based on reasoning are the most powerful features to make these models valuable in everyday work. There are multile types of reasoning :
- Logical
- Analogical
- Social
- Visual
- Implicit
- Causal
- Common sense
We would also want the AI models to perform deductive (reason based on given facts), inductive (be able to generalize) and abductive (identify the best explanation) reasoning. When LLMs can perform the above types of reasoning in a reliable way, then we will have reached an important milestone on the path to AGI.
With the current LLM capabilities they can augment people in their work and improve their productivity. Need to generate test cases from a set of requirements? That could be a three hour job for a developer, but it would take an LLM only three mins. It would likely be incomplete and may contain some poor choices, but also create tests the developer would not have thought of. It would kick-start the process and save the developer time.
LLM models can undergo fine-tuning using private data, such as the unique infrastructure details specific to an organization unique. Such an LLM fine tuned to be queried on internal IT matters would be able to provide custom and reliable information relevant to that organization.
AI based machine assistants will become normal in the workplace. Fine tuned models can act as a source of knowledge, especially helpful for new workers. In the future, AI machines will be able to rapidly perform triage and be reliable enough to take remediation action. As a reliable assistant, Omdia’s view is this technology will be embraced by IT professionals to improve their productivity.
To read more insights and analysis covering market trends and industry forecasts prepared by Omdia’s Cloud and Data Center practice, click here.