MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
deepseek
Search

DeepSeek triggers shock waves for AI giants, but the disruption won’t last

Tuesday January 28, 2025. 12:00 PM , from ComputerWorld
Chinese start-up DeepSeek rocked the tech industry Monday after the company’s new generative AI (genAI) bot hit Apple’s App Store and Google Play Store and downloads almost immediately exceeded those of OpenAI’s ChatGPT. US AI model and chipmaker stock prices were hit hard by the newcomer’s arrival; Google, Meta and OpenAI all initially suffered and chipmaker Nvidia’s stock closed the day down 17%. (The tech heavy Nasdaq exchange lost more than 600 points.)

DeepSeek’s open-source AI model’s impact lies in matching US models’ performance at a fraction of the cost by using compute and memory resources more efficiently. But industry analysts believe investor reaction to DeepSeek’s impact on US tech firms and others is being dramatically exaggerated.

“The market is incorrectly presuming this as a zero-sum game,” said Chirag Dekate, a vice president analyst at Gartner Research. “They’re basically saying, ‘Maybe we don’t need to build data centers anymore, maybe we’re not as energy starved because DeepSeek showed us we can do more with less.’”

Giuseppe Sette, president of AI tech firm Reflexivity agreed, stressing that DeepSeek took the market by storm by doing more with less.

“In layman terms, they activate only the most relevant portions of their model for each query, and that saves money and computation power,” Sette said. “This shows that with AI, the surprises will keep on coming in the next few years. And even though that might be a bit of a shocker today, it’s extremely bullish in the long-term — because it opens the way for deeper and broader adoption of AI at all scales.”

In essence, the markets have overlooked that companies such as Google, Meta, and OpenAI can replicate DeepSeek’s efficiencies with more mature, scalable AI models that offer better security and privacy.

“This is not a ‘the sky is falling moment’ for markets. I think they should take a close look at what this actually is: there are techniques you can implement to more effectively scale your AI models,” Dekate said.

Another looming problem for the newcomer is that DeepSeek is purported to filter out content that could be viewed as critical of the Chinese Communist government. DeepSeek’s release of its R1 and R1-Zero reasoning models on Jan. 20 quickly drew attention for two key aspects:

DeepSeek eliminates human feedback in training, speeding up model development, according to AI developer Ben Thompson.

DeepSeek requires less memory and compute power, needing fewer GPUs to perform the same tasks as other models.

DeepSeek claims its breakthroughs in AI efficiency cost less than $6 million and took less than two months to develop.

John Belton, a portfolio manager at Gabelli Funds, an asset management firm whose funds include shares of Nvidia, Microsoft, Amazon, and others, said DeepSeek’s achievements are real, but some of the company’s claims are misleading.

“No, you cannot recreate DeepSeek with $6 million and the extent to which they distilled existing models (took shortcuts potentially without license) is an unknown,” Belton said via email to Computerworld. “However, they have made key breakthroughs that show how to reduce training and inference costs.”

Belton also pointed out that DeepSeek isn’t new. Its creator, Liang Wenfeng, a hedge fund manager and AI enthusiast, published a paper on the performance breakthroughs more than a month ago and released a model with similar methods a year ago.

Dekate said DeepSeek’s rollout was particularly timely because just last month news outlets were publishing stories about AI scaling limitations from leading providers.

As organizations continue to embrace genAI tools and platforms and explore how they can create efficiencies and boost worker productivity, they’re also grappling with the high costs and complexity of the technology.

DeepSeek improved memory bandwidth efficiency with two key innovations: using a lower-position memory algorithm and switching from FP32 (32-bit) to FP8 (8-bit) for model precision training. “They’re using the same amount of memory to store and move more data,” Dekate said.

One analogy would be to consider the onramp to a major city highway — the highway being the data path. If the onramp only has one lane, there are only two ways to address traffic congestion:

Increase the width of the roadway to fit more traffic

Reduce the size of the vehicles to more fit on the roadway

DeepSeek did both. It created smaller vehicles, i.e., it used smaller data packets (8-bit) and therefore was able to pack more data into the same footprint.

The second key innovation was optimizing and compressing the key-value cache. DeepSeek used compression algorithms to reduce memory by processing prompts in two phases: decomposing and generating responses, both relying on efficient key-value cache use.

“They utilized underlying compute and memory resources incredibly efficiently,” Dekate said. “That is an amazing accomplishment, because they’re utilizing the underlying GPU resources more productively. Their models are able to perform at leadership-class levels while using a relatively lower scale of resources.”

Enterprises can benefit as well by adopting the techniques introduced by DeepSeek because it reduces the cost of adoption by using fewer compute resources for inferencing and training. Lower model costs should benefit innovators such as OpenAI and reduce the cost of applying AI across industries.

By using resources more efficiently, DeepSeek enables faster, broader AI adoption by other companies, driving growth in AI development, demand, and infrastructure.

And in the end, DeepSeek’s algorithm still needs AI accelerator technology to work — meaning GPUs and ASICs.

“It’s not the case that DeepSeek just woke up one day and had an amazing breakthrough. No, they’re using sound engineering techniques and they’re using some of the leading AI accelerators — and GPUs happen to be table stakes,” Dekate said. “And they use thousands of them. It’s not like they discovered a new technique that blew this whole space wide open. No. You still need AI accelerators to perform model training.”

Even in the most pessimistic view, if AI costs drop to 5% of those from other leading AI models, that efficiency eventually benefits those other models by reducing their costs, allowing for faster model adoption.

For enterprises, Dekate said, it’s worth exploring DeepSeek and similar models internally and in private settings. “Your legal team evaluates the terms and conditions of your ecosystem quite extensively. They’ll ask if privacy is protected. Are the data sources filtered? Are AI model responses filtered in any sense?” he said.

Before jumping in, enterprises should carefully consider these details. “Models like Gemini and GPT offer reliable, secure responses with enterprise-level protections, unlike many open models that lack these controls,” Dekate argued.

“Once everything settles, the net-net is that DeepSeek has developed very specific capabilities that are quantitative and that’s something to learn from, just as they did from Llama 3,” Dekate said.
https://www.computerworld.com/article/3810766/deepseek-triggers-shock-waves-for-ai-giants-but-the-di...

Related News

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Current Date
Jan, Thu 30 - 09:29 CET