Key strategies for MLops success in 2025

Tuesday February 18, 2025. 10:00 AM , from InfoWorld

Integrating and managing artificial intelligence and machine learning effectively within business operations has become a top priority for businesses looking to stay competitive in an ever evolving landscape. However, for many organizations, harnessing the power of AI/ML in a meaningful way is still an unfulfilled dream. Hence, I thought it would be helpful to survey some of the latest MLops trends and offer some actionable takeaways for conquering common ML engineering challenges.

As you might expect, generative AI models differ significantly from traditional machine learning models in their development, deployment, and operations requirements. I’ll walk through these differences, which range from training and the delivery pipeline to monitoring, scaling, and measuring model success, and leave you with a few key questions organizations should address to guide their AI/ML strategy.

Ultimately, by focusing on solutions, not just models, and by aligning MLops with IT and devops systems, organizations can unlock the full potential of their AI initiatives and drive measurable business impacts.

The foundations of MLops

Like many things in life, in order to successfully integrate and manage AI and ML into business operations, organizations first need to have a clear understanding of the foundations. The first fundamental of MLops today is understanding the differences between generative AI models and traditional ML models.

Generative AI models differ significantly from traditional ML models in terms of data requirements, pipeline complexity, and cost. GenAI models can handle unstructured data like text and images, often requiring really complicated pipelines to process prompts, manage conversation history, and integrate private data sources. In contrast, traditional models focus on specific data and are generally optimized for specific challenges, making them simpler and more cost-effective.

Cost is another major differentiator. The calculations of generative AI models are more complex resulting in higher latency, demand for more computer power, and higher operational expenses. Traditional models, on the other hand, often utilize pre-trained architectures or lightweight training processes, making them more affordable for many organizations. When determining whether to utilize a generative AI model versus a standard model, organizations must evaluate these criteria and how they apply to their individual use cases.

Model optimization and monitoring techniques

Optimizing models for specific use cases is crucial. For traditional ML, fine-tuning pre-trained models or training from scratch are common strategies. GenAI introduces additional options, such as retrieval-augmented generation (RAG), which allows the use of private data to provide context and ultimately improve model outputs. Choosing between general-purpose and task-specific models also plays a critical role. Do you really need a general-purpose model or can you use a smaller model that is trained for your specific use case? General-purpose models are versatile but often less efficient than smaller, specialized models built for specific tasks.

Model monitoring also requires distinctly different approaches for generative AI and traditional models. Traditional models rely on well-defined metrics like accuracy, precision, and an F1 score, which are straightforward to evaluate. In contrast, generative AI models often involve metrics that are a bit more subjective, such as user engagement or relevance. Good metrics for genAI models are still lacking and it really comes down to the individual use case. Assessing a model is very complicated and can sometimes require additional support from business metrics to understand if the model is acting according to plan. In any scenario, businesses must design architectures that can be measured to make sure they deliver the desired output.

Advancements in ML engineering

Traditional machine learning has long relied on open source solutions, from open source architectures like LSTM (long short-term memory) and YOLO (you only look once), to open source libraries like XGBoost and Scikit-learn. These solutions have become the standards for most challenges thanks to being accessible and versatile. For genAI, however, commercial solutions like OpenAI’s GPT models and Google’s Gemini currently dominate due to high costs and intricate training complexities. Building these models from scratch means massive data requirements, intricate training, and significant costs.

Despite the popularity of commercial generative AI models, open-source alternatives are gaining traction. Models like Llama and Stable Diffusion are closing the performance gap, offering cost-effective solutions for organizations willing to fine-tune or train them using their specific data. However, open-source models can present licensing restrictions and integration challenges to ensuring ongoing compliance and efficiency.

Efficient scaling of ML systems

As more and more companies decide to invest in AI, there are best practices for data management and classification and architectural approaches that should be considered for scaling ML systems and ensuring high performance.

Leveraging internal data with RAG

Important questions revolve around data: What is my internal data? How can I use it? Can I train based on this data with the correct structure? One powerful strategy for scaling ML systems with genAI is retrieval-augmented generation. RAG is the ability to use internal data to change the context of a general purpose model. By embedding and querying internal data, organizations can provide context-specific answers and improve the relevance of genAI outputs. For instance, uploading product documentation to a vector database allows a model to deliver precise, context-aware responses to user queries.

Key architectural considerations

Creating scalable and efficient MLops architectures requires careful attention to components like embeddings, prompts, and vector stores. Fine-tuning models for specific languages, geographies, or use cases ensures tailored performance. An MLops architecture that supports fine-tuning is more complicated and organizations should prioritize A/B testing across various building blocks to optimize outcomes and refine their solutions.

Metrics for model success

Aligning model outcomes with business objectives is essential. Metrics like customer satisfaction and click-through rates can measure real-world impact, helping organizations understand whether their models are delivering meaningful results. Human feedback is essential for evaluating generative models and remains the best practice. Human-in-the-loop systems help fine-tune metrics, check performance, and ensure models meet business goals.

In some cases, advanced generative AI tools can assist or replace human reviewers, making the process faster and more efficient. By closing the feedback loop and connecting predictions to user actions, there is opportunity for continuous improvement and more reliable performance.

Focus on solutions, not just models

The success of MLops hinges on building holistic solutions rather than isolated models. Solution architectures should combine a variety of ML approaches, including rule-based systems, embeddings, traditional models, and generative AI, to create robust and adaptable frameworks.

Organizations should ask themselves a few key questions to guide their AI/ML strategies:

Do we need a general-purpose solution or a specialized model?

How will we measure success and which metrics align with our goals?

What are the trade-offs between commercial and open-source solutions, and how do licensing and integration affect our choices?

Here is the key: You are not just building models anymore, you are building solutions. You are building architectures that include many moving parts and each one of the building blocks has the power to change the experience and the metrics that you get from a solution. As MLops continues to evolve, organizations must adapt by focusing on scalable, metrics-driven architectures. By leveraging the right combination of tools and strategies, businesses can unlock the full potential of AI and machine learning to drive innovation and deliver measurable business results.

Yuval Fernbach is the co-founder and CTO of Qwak and currently serves as VP and CTO of MLops following Qwak’s acquisition by JFrog. In his role, he pioneers a fully managed, user-friendly machine learning platform, enabling creators to reshape data, construct, train, and deploy models, and oversee the complete machine learning life cycle.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.