How DeepSeek innovated large language models

Thursday March 13, 2025. 10:00 AM , from InfoWorld

The release of DeepSeek roiled the world of generative AI last month, leaving engineers and developers wondering how the company achieved what it did, and how they might take advantage of the technology in their own technology stacks.

The DeepSeek team built on developments that were already known in the AI community but had not been fully applied. The result is a model that appears to be comparable in performance to leading models like Meta’s Llama 3.1, but was built and trained at a fraction of the cost.

[ Related: More DeepSeek news and analysis ]

Most importantly, DeepSeek released its work as open access technology, which means others can learn from it and create a far more competitive market for large language models (LLMs) and related technologies.

Here’s a glimpse at how DeepSeek achieved its breakthroughs, and what organizations must do to take advantage of such innovations when they emerge so quickly.

Inside the DeepSeek models

DeepSeek released two models in late December and late January: DeepSeek V3, a powerful foundational model comparable in scale to GPT-4; and DeepSeek R1, designed specifically for complex reasoning and based on the V3 foundation. Here’s a look at the technical strategy for each.

DeepSeek V3

New mix for precision training: DeepSeek leveraged eight-bit precision matrix multiplication for faster operations, while implementing custom logic to accumulate results with the correct precision. They also utilized WGMMA parallel operators (pronounced “wagamama”).

Taking multi-token prediction to the next level: Clearly inspired by Meta’s French research team, which pioneered predicting multiple tokens simultaneously, DeepSeek utilized enhanced implementation techniques to push this concept even further.

Expert use of “common knowledge”: The basic concept of Mixture-of-Experts (MoE) is akin to activating different parts of the brain based on the task—just as humans conserve energy by engaging only the necessary neural circuits. Traditional MoE models split the network into a limited number of “experts” (e.g., eight experts) and activate only one or two per query. DeepSeek introduced a far more granular approach, incorporating an idea originally explored by Microsoft Research—the notion that some “common knowledge” needs to be processed by model components that remain active at all times.

DeepSeek R1

Rewarding reasoning at scale: Much like AlphaGo Zero learned to play Go solely from game rules, DeepSeek R1 Zero learns how to reason from a basic reward model—a first at this scale. While the concept isn’t new, successfully applying it to a large-scale model is unprecedented. DeepSeek’s research captures some profound moments, such as the “aha moment” when DeepSeek R1 Zero realized on its own that spending more time thinking leads to better answers (I wish I knew how to teach that).

Curating a “cold start”: The DeepSeek R1 model also leverages a more traditional approach, incorporating cold-start data from DeepSeek V3. While no groundbreaking techniques seem to be involved at this stage, patience and meticulous curation likely played a crucial role in making it work.

These DeepSeek advances are a testament to open research, and how it can help the progress of humankind. One of the most interesting next steps? The great team at Hugging Face is already working to reproduce DeepSeek R1 in its Open R1 project.

The importance of LLM agnosticism

The limiting factor for AI will not be uncovering business value or model quality. What is critical is that companies maintain an agnostic strategy with their AI partners.

DeepSeek shows that betting on a single LLM provider will be a losing game. Some organizations have locked themselves into a single vendor, whether OpenAI, Anthropic, or Mistral. But the ability of new players to disrupt the landscape in a single weekend makes it clear: companies need an LLM-agnostic approach.

A multi-LLM infrastructure avoids the dangers of vendor “lock-in” and makes it easier to integrate and switch between models as the market evolves. Essentially, this future-proofs any LLM decision by ensuring optionality through a company’s AI journey.

Enterprises must also maintain control through careful governance. DeepSeek and the fast-emerging world of agentic AI show how chaotic and fast-moving the AI landscape has become. In a world of open-source reasoning models and rapidly multiplying vendors, engineering teams will need to maintain rigorous testing, robust guardrails, and continuous monitoring.

If you can meet these needs, technologies like Deepseek will be a huge positive for all businesses by increasing competition, driving down costs, and opening new use cases that more companies can capitalize on.

Florian Douetteau is co-founder and CEO of Dataiku, the universal AI platform that provides the world’s largest companies with control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.