István ÜVEGES: ”All that Glitters ain’t Gold” – Tradeoffs between ”Small” and Large Language Models

In the rapidly evolving world of artificial intelligence, a debate rages over the practical usefulness of large-scale language models (LLMs) versus smaller AI models for businesses and industries. Today, we investigate whether smaller AI models currently have practical advantages over their larger counterparts in terms of industrial applications. In this regard, knowledge of cost of entry, reliability (e.g., recent service disruptions by large players such as OpenAI), and the tailoring of smaller models to specific business needs and data are important.

Cost implications: the heavy price of large models

Training LLMs is always a complex and costly undertaking. Just how expensive is illustrated by the announcement made by OpenAI at this year’s Dev Day. In it, the company announced that it will now offer the creation of customized GPT-4 models as part of its services. The cost of this is estimated by the company to be in the region of $2-3 million up front, with a timeframe for model creation measured in months.

For this type of task, the additional costs are made up of many factors. Among these, the two most important components are the cost of the hardware resources needed to train the model (rental or purchase) and the time needed to train the model. For generative models, the number of model parameters is also important. A GPT-4 model, for instance, has billions of parameters, while BERT models, which constitute the first generation of LLMs, can have only 110 million parameters. In broad terms, the more parameters, the more GPUs are required to run the model simultaneously to train and use it efficiently.

In addition, the market demand for high-capacity GPUs has driven up costs significantly in recent years, often making these devices expensive and difficult to access. It is true that technological solutions are being developed that can reduce some of the resource requirements for model training, but their application is still a very complex engineering task.

The key question here is: how much worse does an older, lower-end model that can be built for a few thousand dollars perform than today’s state-of-the-art, but costly devices? For a predictive application, 10-20% accuracy can mean the difference between a market-leading and mid-range solution, but what about a few percent difference? Not to mention the fact that the performance of a model is often only accurately revealed once the full model training has been completed. This can be a significant risk for startups, for example, where the effective use of capital is a vital issue.

Service disruption: the challenge of reliability

Large-scale language models, while offering a wide range of capabilities, are subject to service disruption. These disruptions highlight potential problems with reliability. The dependency of LLMs on sophisticated and massive computing infrastructure increases the risk of such service disruptions, affecting their suitability for continuous industrial use.

In industry, there are many situations where an application using AI needs to be available 0-24. In such cases, the question rightly arises: can a company afford to outsource the operation of this technology? For small companies, where the technological expertise and operational experience of large global service providers is not available, outsourcing such infrastructure is almost always the right choice. For large enterprises, however, the situation is not nearly so trivial.

Customizability and data protection: the advantage of smaller models

One of the key advantages of smaller AI models is their customizability. Smaller, industry-specific models can be more efficiently trained and fine-tuned for specific business applications. They therefore offer better results that are closely tailored to the specific needs of the industry or business. In contrast, large and general-purpose language models often produce less accurate, non-specific outputs that may not be fully suited to specialized applications.

In addition, smaller, self-service models can facilitate control over protected data. This is particularly beneficial for businesses that rely on sensitive or very specific data sets. But from a more distant perspective, most companies would not want to see snippets of proprietary code, memos of internal conversations, or other company documents migrate to the training data of externally hosted models.

It should also be remembered that Google’s domain-specific models, such as Med-PaLM 2 and Sec-Palm, show how customized models can effectively integrate specific industry knowledge and data sets. This customizability extends to startups and smaller businesses, allowing them to fine-tune models to their own needs with relatively fewer resources compared to LLMs.

Conclusion: a shift towards smaller, more practical models

To sum up, while LLMs developed by OpenAI, Meta, Google and others have made significant progress in AI capabilities, their practical application in industry is currently overshadowed by smaller, more specialized AI models. The high cost, the potential for service disruption and the inherently generic nature of LLMs make them less suitable for specific industrial applications. In contrast, the cost-effectiveness, reliability, and customizability of smaller models to specific business needs and data is crucial, making them a more practical choice for industry at present.

As the technology continues to evolve, the balance between large and small models may shift. Just think of OpenAI’s recently announced GPTs service. This service shows progress in the field of customization detailed above. Overall, however, the balance is still tipping towards smaller AI models for specific industrial applications.

István ÜVEGES is a researcher in Computer Linguistics at MONTANA Knowledge Management Ltd. and a researcher at the Centre for Social Sciences, Political and Legal Text Mining and Artificial Intelligence Laboratory (poltextLAB). His main interests include practical applications of Automation, Artificial Intelligence (Machine Learning), Legal Language (legalese) studies and the Plain Language Movement.