Language models in generative AI – does size matter?

Monday April 7, 2025. 11:00 AM , from InfoWorld

Generative AI applications revolve around the large language model, or LLM. From the launch of ChatGPT through to today, LLMs were the focal point for generative AI. They attracted billions in funding and innovation was rife. But are they as essential as they appear?

Today, the rise of multiple LLMs from OpenAI, Anthropic, Google, and others runs alongside new projects based on small language models, or SLMs. SLMs are by definition smaller than LLMs — they are trained on smaller and more specific sets of data, and they can be used to fulfill more specific requirements or tasks. Because they use less data in their training, they cost less to create, which means that companies can create and train their own SLMs.

SLMs can potentially function in more places as well. Because they require fewer resources, SLMs can operate in edge environments or on mobile devices rather than needing significant amounts of compute like LLMs.

SLMs can also affect how developers architect their applications around generative AI. With LLMs, the cost to train the model meant having a cut-off point for the model’s knowledge due to the cost of retraining. If you wanted to add more recent context, then you had to supply this separately, e.g. by using vector search and retrieval-augmented generation, or RAG. However, SLMs are smaller and cheaper to re-train, so you can opt for retraining instead.

However, combining retraining and RAG can help your generative AI system use the most relevant and semantically similar material within responses to user requests. Depending on how timely your system needs to be, you can create a pipeline for new data that can be streamed into your system. To make this work in real time, the data streaming into our systems would have to be converted into vectors and added to the vector database. This would then allow that data to be used as part of searches for semantically similar information in near real time as part of your RAG approach. Periodically, the SLM could be retrained with that new data included.

Advantages of SLMs

Deploying SLMs is now an option for those organizations that want to use generative AI, but also want to have complete control over their environments. Why might you need this? For example, you may want to have control over which data you have in place from the start — the equivalent of a digital clean room where you know all the potential ingredients that are used to generate results. This can also be useful for auditing responses to questions and getting that full audit trail back. When you use another company’s LLM, you will only have a rough idea of what data they have used to train the model. When you use your own SLM, you can be confident of what documents and data were included.

While SLMs may be useful for smaller generative AI applications or edge AI deployments, there is another field where they have great potential. Agentic AI, the latest iteration of generative AI, uses multiple agents trained to fulfill specific tasks in order to produce results. The aim here is to create and support a process from beginning to end with multiple, specialized agents. Whereas LLM services can be useful for responding generically to queries and interacting with users, agentic AI takes advantage of specialized SLMs to provide more targeted responses that support different steps in an end-to-end process.

With different autonomous agents involved at different steps, SLMs can play an important role in how you design agentic systems. The reason for this is that multi-agent applications can use a lot more resources than stand-alone AI applications to reach their end result. A generative AI application will use a certain number of tokens to process a response, e.g. for embedding requests into vectors. Tokens correspond to the number of words used in prompts, with longer and more complex prompts consuming more tokens.

Each component in an application will consume tokens to respond to a request. Depending on the number of agents and steps within a process, the number of tokens will be significantly higher for agentic AI, as each agent will create a response that consumes tokens, then pass that on to the next step (in turn consuming tokens) to create the next response (consuming tokens again), before the final response is created and sent back to the user. Capgemini estimates that, for a service carrying out one request per minute in response to one sensor event, a single-agent service would cost around $0.41 per day, while a multi-agent system would cost around $10.54 — approximately 26 times more expensive.

SLMs vs. LLMs for agentic AI

From this cost comparison, there are two areas we can consider. Firstly, using SLMs rather than full-blown LLMs can bring the cost of that multi-agent system down considerably. Employing smaller and more lightweight language models to fulfill specific requirements will be more cost-effective than using LLMs for every step in an agentic AI system. This approach involves looking at what would be the right component for each element of a multi-agent system, rather than automatically thinking that a “best of breed” approach is the best approach.

Secondly, using agentic AI for generative AI use cases should be adopted where multi-agent processes can provide more value per transaction than simpler single-agent models. The choice here affects how you think about pricing your service, what customers expect from AI and how you will deliver your service overall. Alongside looking at the technical and architecture elements for AI, you will also have to consider what your line of business team wants to achieve.

While simple AI agents can carry out specific tasks or automate repetitive tasks, they generally require human input to complete those requests. Where agentic AI takes things further is through delivering greater autonomy within business processes through employing that multi-agent approach to constantly adapt to dynamic environments. With agentic AI, companies can use AI to independently create, execute and optimize results around that business process workflow. The overall goal will be to replace fragile, static business processes with dynamic, context-aware automation systems.

The future of generative AI is hybrid

The world of generative AI has advanced rapidly over the past few years. While there has been an enormous investment in large language models, the entrance of new models like DeepSeek has changed the conversation around how to support generative AI deployments. For many organizations, language models have changed from being at the center of architectural requirements in their generative AI applications to being a commodity component. This will continue, with developers adopting a hybrid approach to which models they use and how they deploy them. SLMs and LLMs will be used alongside each other to deliver the most relevant results at a given cost and compute level.

How will developers adapt to this new world of SLMs, LLMs, and different models to use? There will need to be more testing around how applications perform with these systems in place so that developers can see how different SLMs and LLMs perform alongside the other components that make up their generative AI applications. This is both in terms of the relevance of results and also cost profiles. When new models come out, these should be tested to see what improvements can be delivered while other elements like data, AI weights, and integrations remain the same. The risk of making wholesale changes — particularly with a non-deterministic application like generative AI — is that you won’t be able to pin down the exact impact of any single change.

To make testing easier, developers can look at open source projects that handle the integration side between different components as part of overall application designs. Projects like Langflow make it easier to connect generative AI services like LLMs, SLMs, vector data stores, and outputs into the whole application. When you are looking at hybrid deployments with potentially multiple different models running alongside each other, that integration side will be critical.

It can be more helpful to visualize these processes as “agentic flows” where one agent’s output becomes the input of another agent, and so on. This visual approach makes it easier to build agentic systems, and easier to manage those integrations between elements over time.

Looking ahead, the world of language models will continue to evolve and new models will enter the arena. SLMs can help developers deliver generative AI applications more efficiently and turn potential projects into production deployments, particularly as multi-agent and agentic AI solidify around real-world use cases. Both SLMs and LLMs will play a role in seizing valuable opportunities, and ensuring that we deliver generative AI applications at cost-effective levels.

Dom Couldwell is head of field engineering EMEA at DataStax.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.