How Deutsche Telekom designed AI agents for scale

Tuesday July 8, 2025. 11:00 AM , from InfoWorld

Across 10 countries in Europe, Deutsche Telekom serves millions of users, each with their own questions, needs, and contexts. Responding quickly and accurately isn’t just good service; it builds trust, drives efficiency, and impacts the bottom line. But doing that consistently depends on surfacing the right information at the right time, in the right context.

In early 2023, I joined a small cross-functional team formed under an initiative led by our chief product officer, Jonathan Abrahamson. I was responsible for engineering and architecture within the newly formed AI Competence Center (AICC), with a clear goal: Improve customer service across our European operations. As large language models began to show real promise, it became clear that generative AI could be a turning point enabling faster, more relevant, and context-aware responses at scale.

This kicked off a focused effort to solve a core challenge: How to deploy AI-powered assistants reliably across a multi-country ecosystem? That led to the development of LMOS, a sovereign, developer-friendly platform for building and scaling AI agents across Telekom. Frag Magenta OneBOT, our customer-facing assistant for sales and service across Europe, was one of the first major products built on top of it. Today, LMOS supports millions of interactions, significantly reducing resolution time and human handover rates.

Just as important, LMOS was designed to let engineers work with tools they already know to build AI agents and has now reached a point where business teams can define and maintain agents for new use cases. That shift has been key to scaling AI with speed, autonomy, and shared ownership across the organization.

Building a sovereign, scalable agentic AI platform

Amid the urgency, there was also a quiet shift in perspective. This wasn’t just a short-term response; it was an opportunity to build something foundational — a sovereign platform, grounded in open standards, that would let our existing engineering teams build AI applications faster and with more flexibility.

In early 2023, production-ready generative AI applications were rare. Most work was still in early-stage retrieval-augmented generation (RAG) experiments, and the risk of becoming overly dependent on closed third-party platforms was hard to ignore. So instead of assembling a stack from scattered tools, we focused on the infrastructure itself, something that could grow into a long-term foundation for scalable, enterprise-grade AI agents.

It wasn’t just about solving the immediate problem. It was about designing for what would come next.

LMOS: Language Model Operating System

What started as a focused effort on chatbot development quickly surfaced deeper architectural challenges. We experimented with frameworks like LangChain, a popular framework for integrating LLMs into applications, and fine-tuned Dense Passage Retrieval (DPR) models for German-language use cases. These early prototypes helped us learn fast, but as we moved beyond experimentation, cracks started to show.

The stack became hard to manage. Memory issues, instability, and a growing maintenance burden made it clear this approach wouldn’t scale. At the same time, our engineers were already deeply familiar with Deutsche Telekom’s JVM-based systems, APIs, and tools. Introducing unfamiliar abstractions would have slowed us down.

So we shifted focus. Instead of forcing generative AI into fragmented workflows, we set out to design a platform that felt native to our existing environment. That led to LMOS, the Language Model Operating System, a sovereign PaaS for building and scaling AI agents across Deutsche Telekom. LMOS offers a Heroku-like experience for agents, abstracting away life-cycle management, deployment models, classifiers, observability, and scaling while supporting versioning, multitenancy, and enterprise-grade reliability.

At the core of LMOS is Arc, a Kotlin-based framework for defining agent behavior through a concise domain-specific language (DSL). Engineers could build agents using the APIs and libraries they already knew. No need to introduce entirely new stacks or rewire development workflows. At the same time, Arc was built to integrate cleanly with existing data science tools, making it easy to plug in custom components for evaluation, fine-tuning, or experimentation where needed.

Arc also introduced ADL (Agent Definition Language), which allows business teams to define agent logic and workflows directly, reducing the need for engineering involvement in every iteration and enabling faster collaboration across roles. Together, LMOS Arc, and ADL helped bridge the gap between business and engineering, while integrating cleanly with open standards and data science tools, accelerating how agents were built, iterated, and deployed across the organization.

Vector search and the role of contextual retrieval

By grounding LMOS in open standards and avoiding unnecessary architectural reinvention, we built a foundation that allowed AI agents to be designed, deployed, and scaled across geographies. But platform infrastructure alone wasn’t enough. Agent responses often depend on domain knowledge buried in documentation, policies, and internal data sources and that required retrieval infrastructure that could scale with the platform.

We built structured RAG pipelines powered by vector search to provide relevant context to agents at run time. Choosing the right vector store was essential. After evaluating various options from traditional database extensions to full-featured, dedicated vector systems we selected Qdrant, an open-source, Rust-based vector database that aligned with our operational and architectural goals. Its simplicity, performance, and support for multitenancy and metadata filtering made it a natural fit, allowing us to segment data sets by country, domain, and agent type, ensuring localized compliance and operational clarity as we scaled across markets.

Wurzel: Rooting retrieval in reusability

To support retrieval at scale, we also built Wurzel, an open-source Python ETL (extract, transform, load) framework tailored for RAG. Named after the German word for “root,” Wurzel enabled us to decentralize RAG workflows while standardizing how teams prepared and managed unstructured data. With built-in support for multitenancy, job scheduling, and back-end integrations, Wurzel made retrieval pipelines reusable, consistent, and easy to maintain across diverse teams and markets.

Wurzel also gave us the flexibility to plug in the right tools for the job without fragmenting the architecture or introducing bottlenecks. In practice, this meant faster iteration, shared infrastructure, and fewer one-off integrations.

Agent building with LMOS Arc and semantic routing

Agent development in LMOS starts with Arc. Engineers use its DSL to define behavior, connect to APIs, and deploy agents using microservice-style workflows. Once built, agents are deployed to Kubernetes environments via LMOS, which handles versioning, monitoring, and scaling behind the scenes.

But defining behavior wasn’t enough. Agents needed access to relevant knowledge to respond intelligently. Vector-powered retrieval pipelines fed agents with context from internal documentation, FAQs, and structured policies. Qdrant’s multi-tenant vector store provided localized, efficient, and compliant data access.

To make agent collaboration more effective, we also introduced semantic routing. Using embeddings and vector similarity, agents could classify and route customer queries, complaints, billing, and sales without relying entirely on LLMs. This brought greater structure, interpretability, and precision to how agents operated together.

Together, Arc, Wurzel, Qdrant, and the broader LMOS platform enabled us to build agents quickly, operate them reliably, and scale them across business domains without compromising developer speed or enterprise control.

‘Heroku’ for agents

I often describe LMOS as “Heroku for agents.” Just like Heroku abstracted the complexity of deploying web apps, LMOS abstracts the complexity of running production-grade AI agents. Engineers don’t need to manage deployment models, classifiers, monitoring, or scaling — LMOS handles all that.

Today, LMOS powers customer-facing agents, including the Frag Magenta OneBOT assistant. We believe this is one of the first multi-agent platforms to go live, with planning and deployment beginning before OpenAI released its agent SDK in early 2024. It is arguably the largest enterprise deployment of multiple AI agents in Europe, currently supporting millions of conversations across Deutsche Telekom’s markets.

The time required to develop a new agent has dropped to a day or less, with business teams now able to define and update operating procedures without relying on engineers. Handovers to human support for API-triggering Arc agents are around 30%, and we expect this to decrease as knowledge coverage, back-end integration, and platform maturity continue to improve.

Scaling sovereign AI with open source and community collaboration

Looking ahead, we see the potential applications of LMOS continuing to grow especially as agentic computing and retrieval infrastructure mature. From the beginning, we built LMOS on open standards and infrastructure primitives like Kubernetes, ensuring portability across developer machines, private clouds, and data centers.

In that same spirit, we decided to contribute LMOS to the Eclipse Foundation, allowing it to evolve with community participation and remain accessible to organizations beyond our own. As more teams begin to understand how semantic search and structured retrieval ground AI in trusted information, we expect interest in building on LMOS to increase.

What’s guided us so far isn’t just technology. It’s been a focus on practical developer experience, interoperable architecture, and hard-won lessons from building in production. That mindset has helped shift us from model-centric experimentation toward a scalable, open, and opinionated AI stack, something we believe is critical for bringing agentic AI into the real world, at enterprise scale.

Arun Joseph is former engineering and architecture lead at Deutsche Telecom.

—

Generative AI Insights provides a venue for technology leaders to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.