For many years, businesses and their data science teams valued accuracy above all else when it came to a model’s performance. Increasingly, however, other factors and trade-offs have come into play depending on the business context of models. 

From biases buried in training data to the costs associated with runtime and resourcing services underpinned by large language models, businesses looking to harness the power of AI can come up against many challenges. One way foundation large language model (LLM) providers have sought to bridge the gap with many of these concerns is through offering smaller versions of their flagship products, such as small language models (SLMs). 

Increasingly, SLMs are proving a viable and cost-effective alternative to LLMs, but as with the adoption of any advanced technology, use cases must be well defined to deliver true value.

Cost management

A good place to start when determining whether an SLM or an LLM will be the right fit is to understand how much power the solution you’re implementing needs. Rudimentary tasks may not even require AI at all to be automated, and some tasks will not require the full force of an LLM and its associated costs, which can ramp up quickly due to the resources needed to support them, such as GPU usage. In most business contexts, LLMs will also typically be hosted by cloud service providers like Azure and AWS, meaning a high volume of calls to the model will see costs increase further. 

This is where the viability of SLMs over LLMs can become apparent. Most foundation model providers today release different versions of their products, such as Meta’s Llama 3, which comes in two sizes—8B and 70B parameters. Different versions are charged at different rates, with SLM offerings being much cheaper. This is a key consideration for businesses when testing any use case, particularly when it comes to financial constraints, which is a key factor for small and mid-sized organizations. 

More recent versions of a foundation model may yield better performance, but if an earlier and cheaper version, or a smaller variation performs well enough, upgrading is likely an unnecessary expense. For example, some SLMs are small enough that they can be run locally on a device such as a laptop and even a mobile phone. GPT-2, for example, is now fairly outdated, but can be downloaded and run locally on a standard modern laptop. However, these use cases would be mostly restricted to prediction and responses. Training the model locally would be extremely time intensive, to the point that it would likely not be worth doing.  

The best use cases for SLMs and LLMs

SLMs that underpin sentiment analysis use cases, such as chatbots, are quickly emerging as the most readily adoptable AI-powered applications. Chatbots can be used internally for knowledge discovery and Q&A tasks, or externally for customer service use cases. These are often domain-specific so will require some level of fine-tuning, but once they are up and running, they will quickly transform workflows and increase efficiency. 

A use case like clustering documents, such as grouping customer support tickets by topic and assigning a priority level to each one is well served by an SLM. However, for more intricate tasks, such as parsing HR documents for niche information or a more advanced classification engine for documents and files across systems, an LLM is the more appropriate choice. 

This is because the context window — the amount of information surfaced by the model by a user’s prompt — provided by SLMs is generally much smaller. SLMs are also much more likely to hallucinate as they are trained on much less data. With a smaller knowledge base, they are far more likely to produce inaccurate guesses at the answers needed. This immediately rules them out of more sensitive applications, such as medical diagnostics, engineering and financial services use cases—which are also still at fairly early stages of development. 

A particularly useful application powered by SLMs is mixture-of-experts (MoE) models, such as Mistral’s Mixtral 8x7b. As the name suggests, this model comprises eight models, each being made up of 7 billion parameters. In essence, this collection of models works together and can often outperform larger models. And although it may seem like running eight SLMs will incur similar running costs to an LLM, Mixtral 8x7b’s will usually only require two of the SLMs to be in use at one time. 

Of course, there will always be limitations for SLMs, as they are trained on much less data. Perhaps the most exciting field of AI today, with many emerging use cases, is that of multimodal models in which diverse types of data, such as text, images, and video can be processed simultaneously, moving us closer to mimicking the capabilities of a human brain. Currently, SLMs are not powerful enough for the more advanced multimodal use cases, such as video generation, as their “brains” are not big enough to handle such complex tasks. LLMs will therefore be at the forefront of AI-led innovation, but SLMs will likely deliver the most immediate business value. 

Don’t forget model principles 

Whether choosing an LLM or an SLM, the fundamentals of selecting models must take precedence. An LLM trained on poor quality data, for example, will perform worse than an SLM trained on high quality data, which is why it’s crucial to experiment with different offerings before committing to one. 

Most models today are open source and allow users to experiment and test use cases. If performance requirements are met with an SLM, there’s no need to pay for an LLM. Starting small is always a good idea, as this leaves the potential for upgrading to an LLM, but it’s important to note that performance may be significantly affected if the decision to downgrade to an SLM is taken.

A key consideration for selecting either option is whether the solution will need to scale up, or if the use case is specific enough to be contained. For example, a customer service chatbot will likely not need to produce wildly different responses from one month to the next. 

To ensure the right choice is made, a robust discovery phase in which the evolution of the use case is fully considered should be prioritized. This will determine the technical and financial constraints that will ultimately determine whether an SLM or an LLM is the right choice.


You may also like…

Who’s going to pay for AI?

Pros and cons of 5 AI/ML workflow tools for data scientists today