The discussion about Large Language Models (LLM) is closely connected to that of AI chatbota technology that is now becoming more and more common but which many still ignore how it works.
In this sense an LLM can be considered as “the skeleton” of any chatbot such as ChatGPTfor example, and allows him to understand the questions of users, process what is requested e provide answers efficient in a short time. Let’s find out more about it.
The history of LLMs
The idea behind Large Language Models is not as recent as you think and, in reality, it has been around for almost 100 yearsonly after it was theorized for the first time in the 1930s it only remained a brilliant theory but, obviously, not achievable with the technologies of the time.
In 1967 it saw the light ELIZAwhat can be considered the first conversational computer in history and was located inside the MIT (Massachusetts Institute of Technology).
Without going into too much detail this device used a series of conversational scripts which he used to answer user questions. Of course, you shouldn’t imagine the potential of the current technology behind ChatGPT, but it still represented a great invention at the time.
The first idea of LLM dates back to 1930 when, of course, this intuition was elaborated, leaving it only a brilliant theory for about 100 years
Clearly, despite the idea behind the project, ELIZA does not have much to do with current computer technologies AI chatbot that they requested almost 50 years of evolution before becoming a reality within everyone’s reach.
In 2013, it arrived word2vecone of the very first natural language processing (NLP) algorithms) capable of taking a word and converting it into an array of numbers, also called a vector.
Also here, a rather complex systembut still a long way from what we can do today with the chatbots available, but which nevertheless represented a big step forward in the sector.
The very first LLM was BERTa solution developed in 2019 to improve the search engine algorithm of Google to help him “understand” better user searches and of course, deliver better results.
The arrival on the market of ChatGPThowever, dates back to 2022, when this system paved the way for all the technologies that are gradually conquering all sectors of technology, making giant strides in the field almost daily.
Come funzionano i Large Language Models?
As everyone knows, one of the fundamental parts of an LLM is the “training”, which can be considered as a kind of training that feeds the model with a huge amount of data.
Books, articles, siti, texts and anything that can form the program and help him “learn” a certain topic to ensure that it can respond efficiently to user requests.
Naturally the type of training varies depending on the type of LLM that is taken into consideration, if we consider GIVE HERfor example, its ultimate goal is provide an image starting from a textual input.
This requires different training than a chatbot textual only and, in this case, training will be necessary which also includes images of a different nature.
Returning to “classic” chatbots, after training the text is divided into “token”, these small units can represent a word, a punctuation mark and anything that can help the model to analyze small portions of information in order to make its operation more efficient.
Another crucial feature of this technology is represented by neural networks which allow the system to predict which words will be used in the progression of a sentence, helping them to contextualize each term.
A key part of the LLM discussion is the training that requires huge amounts of data for a chatbot, for example, to answer user questions
This procedure allows the model to give a coherent meaning to a text and to offer the user answers relevant to what is requested.
Context analysis happens through a vector representationwith each word being represented as if it were a vector, to help the model understand the meaning of the words with a mathematical system.
Of course it is good to remember that an LLM also takes into account the syntax of sentencesevaluating both the meaning of words, but also the way in which they are positioned within a sentence. By doing so the responses are not only effective but also grammatically correct.
Those who are called are also very important in this discussion attention mechanisms, which allow the model to give greater weight to some parts of the text starting, precisely, from the context.
In this way the model will be able to give more accurate answers by placing greater attention on what it considers the key parts of the speech.
Large Language Models, practical uses
There are many Large Language Models on the market and they have been developed by some of the most important hi-tech companies in the world, such as OpenAI, Google, Meta and many others.
Generally these models are used, as just mentioned, within the various chatbots supported byartificial intelligence which are becoming increasingly popular and are coming into contact with very different technologies.
Clearly we are talking about an innovation that is gradually affecting all technological sectors, from the more specialized ones, such as business suites for productivity until we get to the solutions for entertainment or for them everyday technologieslike smartphones.
The reason for this diffusion is simple: because they are systems extremely versatile and can be used to power an endless number of services, from those related to customer assistantuntil we get to search engine or garlic voice assistants.
The future developments of these models are incredible and could really change people’s lives: let’s think about the healthcare sectorwith the possibility of simulating the evolution of the clinical picture of a patient, for example.
Of course, at the moment we are talking about a product it is not free of errors and what are called “hallucinations”, but technological progress runs fast and the destiny of these models is to become the key to understanding the future and already now it is a prediction not too far from coming true.
To know more: Artificial Intelligence: what it is and what it can do for us