Navigation
Search
|
A smarter approach to training AI models
Monday February 24, 2025. 10:00 AM , from InfoWorld
Artificial intelligence, which was thought to be handily dominated by American labs and researchers, was thrown for a loop with DeepSeek’s R1. And yet, R1 is still subject to many of the same pitfalls as other models—we must make radical advances to move beyond the current limitations of AI. Though exact costs associated with the model are subject to controversy, R1’s release made it clear that there is substantial room for innovation outside of large incumbents.
As that occurs, AI models are beginning to hit the limits of compute. Model size is far outpacing Moore’s Law and the advances in AI training chips. Training runs for large models can cost tens of millions of dollars due to the cost of chips. This issue has been acknowledged by prominent AI engineers including Ilya Sutskever. The costs have become so high that Anthropic has estimated that it could cost as much to update Claude as it did to develop it in the first place. Companies like Amazon are spending billions to erect new AI data centers in an effort to keep up with the demands of building new frontier models. DeepSeek showed us that throwing more compute at the problem ad infinitum isn’t the only viable strategy. While it’s an improvement from the immense costs associated with frontier models, it’s still firmly in the deep learning paradigm, optimizing existing techniques for training traditional models. All the models we have today, including R1, are bound by the scaling laws of deep learning, alluding to the wall that Sutskever et al. talked about. Effectively, we’re stumbling around in the dark, finding methods that work well, and then driving them to exhaustion, being left to figure out how to deal with their negative consequences like hallucinations after AI is widely deployed. This is in contrast to other fields of science that require the practice to be reconciled with theory, like electromagnetism and Maxwell’s equations. But maybe all of this isn’t necessary. With a better foundational understanding of how AI works, we can approach AI model training and deployment in new ways that require a fraction of the energy and compute, bringing the rigor of other sciences to AI with a principles-first approach. As the Endowed Chair’s Fellow at the University of California San Diego, I spent the last five years singularly focused on solving this problem. Here’s what I learned. Deep neural networks are nearing the end of their useful lives While AI has recently entered the forefront of the public consciousness, it’s had a textured history dating back more than 50 years. The story of my field is marked by many winters of muted enthusiasm for AI—a stark contrast to the present day. One of the earliest winters involved Frank Rosenblatt and his perceptron machine, which virtually all machine learning today can trace its roots to. Marvin Minsky published a book that highlighted several perceived inadequacies in the perceptron, greatly diminishing Rosenblatt’s oeuvre and causing a significant decline in neural network research and funding. As it turned out, many of these issues were ameliorated with larger and more complex perceptrons. Rosenblatt’s research was vindicated here at UCSD with Hinton, Rumelhart, and Williams’ seminal 1986 work on backpropagation, paving the way for the AI of today. Returning closer to the present day, we find commercial development of AI beholden to “The Bitter Lesson.” After Nvidia’s CUDA enabled efficient tensor operations on GPUs and deep networks like AlexNet drove unprecedented progress in varied fields, the previously diverse methods competing for dominance in machine learning benchmarks homogenized to solely throwing more compute at deep learning. There’s perhaps no greater example of the bitter lesson than large language models, which displayed incredible emergent capabilities with scaling over the past decade. Could we really reach artificial general intelligence (AGI), that is, systems amounting to the archetypal depictions of AI seen in Blade Runner or 2001: A Space Odyssey, simply by adding more parameters to these LLMs and more GPUs to the clusters they’re trained on?My work at UCSD was predicated on the belief that this scaling would not lead to true intelligence. And, as we’ve seen in recent reporting from top AI labs like OpenAI and luminaries like François Chollet, the way we’ve been approaching deep learning has hit a wall. “Now everybody is searching for the next big thing,” Sutskever aptly puts it. Is it possible that, with techniques like applying reinforcement learning to LLMs à la OpenAI’s o3, we are ignoring the wisdom of the bitter lesson (though these techniques are undoubtedly computationally intensive)? What if we sought to understand a “theory of everything” for learning, and then double down on that? We have to deconstruct, then reconstruct, how AI models are trained Rather than black-box approximations, at UCSD we developed breakthrough technology that understands how neural networks actually learn. Deep learning models feature artificial neurons vaguely similar to ours, filtering data through them and then backpropagating them back up to learn features in the data (the latter step is alien to biology). It is this feature learning mechanism that drives the success of AI in fields as disparate as finance and healthcare. Imagine you are differentiating between a cat and dog. In fractions of a second, your brain would draw upon learned features of both classes, like cats have whiskers, to make a judgment. A traditional neural network undergoes the aforementioned backpropagation process with artificial neurons. We can isolate this feature learning process that is integral to AI without the extraneous facets that have become typical of deep learning models, and double down on this phenomenon. This serves as the impetus for an entirely new, backpropagation-free AI stack that is orders of magnitude more performant than today’s state-of-the-art methods. By eschewing the inefficiencies and less theoretically justified parts of deep learning, we create a path forward to the next generation of truly intelligent AI, that we’ve seen surpasses the wall deep learning has hit. We have to understand how learning works and build models with interpretability and efficiency in mind from the ground up, especially as high-risk applications of AI in sectors like finance and healthcare demand more than the nondeterministic behavior we’ve become accustomed to. While we’ve seen incredible advancements with deep learning over the past decade, we now need to build the next evolution of AI beyond deep learning. It’s this type of thinking and research that will help the US wrestle the AI mantle back from China and ensure that America leads the next wave of AI innovation. Cyril Gorlla is co-founder and CEO at CTGT. He thanks François Chollet, Dalton Caldwell, and Rajesh Gupta for reading drafts of this article. — Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.
https://www.infoworld.com/article/3828438/a-smarter-approach-to-training-ai-models.html
Related News |
25 sources
Current Date
Feb, Mon 24 - 16:43 CET
|