Are we creating too many AI models?

Friday March 28, 2025. 10:00 AM , from InfoWorld

A few days ago, I stared at yet another calendar invitation from a vendor eager to showcase their “groundbreaking” large language model (LLM). The irony wasn’t lost on me—just weeks ago, this same company had proudly showcased its environmental initiatives and impressive environmental, social, and governance scores. Today they are launching another resource-hungry AI model into an already saturated market.

As I joined the call, the familiar enthusiasm bubbled through my screen: “revolutionary capabilities,” “state-of-the-art performance,” “competitive advantage.” But all I could think about was a massive data center somewhere, humming with thousands of GPUs, consuming megawatts of power to train what was essentially another variation of existing technology.

I couldn’t help but wonder: How do they reconcile their sustainability promises with the carbon footprint of their AI ambitions? It felt like watching someone plant trees while simultaneously burning down a forest.

The world is seeing an explosion of LLMs, with hundreds now in existence. They range from proprietary giants, such as GPT-4 and PaLM, to open source alternatives, such as Llama or Falcon. Open source accessibility and corporate investments have fueled this boom, creating a crowded ecosystem where every organization wants its own version of AI magic. Few seem to realize that this growth comes at a staggering cost.

Access to these AI powerhouses has become remarkably democratized. Although some premium models such as GPT-4 restrict access, many powerful alternatives are free or at minimal cost. The open source movement has further accelerated this trend. Llama, Mistral, and numerous other models are freely available for anyone to download, modify, and deploy.

Environmental and economic impact

As I look at graphics that show the number of LLMs, I can’t help but consider the impact at a time when resources are becoming finite. Training alone can cost up to $5 million for flagship models, and the ongoing operational expenses reach millions per month.

Many people and organizations don’t yet realize the staggering environmental impact of AI. Training a single LLM requires enormous computational resources—the equivalent of powering several thousand homes for a year. The carbon footprint of training just one major model can equal the annual emissions of 40 cars or approximately 200 tons of carbon dioxide when using traditional power grids. Inference, which involves generating outputs, is less resource intensive but grows quickly with use, resulting in annual costs of millions of dollars and significant energy consumption measured in gigawatt centers.

The numbers become even more concerning when we look at the scale of current operations. Modern LLMs require hundreds of billions of parameters for training. GPT-3 uses 175 billion, BLOOM operates with 176 billion, and Google’s PaLM pushes this to 500 billion parameters. Each model requires hundreds of thousands of GPU hours for training, consuming massive amounts of electricity and requiring specialized hardware infrastructure.

Computational demands directly translate into environmental impact due to energy consumption and the hardware’s carbon footprint. The location of training facilities significantly affects this impact—models trained in regions that rely on fossil fuels can produce up to 50 times more emissions than those powered by renewable energy sources.

Too much duplication

Some level of competition and parallel development is healthy for innovation, but the current situation appears increasingly wasteful. Multiple organizations are building similar capabilities, with each contributing a massive carbon footprint. This redundancy becomes particularly questionable when many models perform similarly on standard benchmarks and real-world tasks.

The differences in capabilities between LLMs are often subtle; most excel at similar tasks such as language generation, summarization, and coding. Although some models, like GPT-4 or Claude, may slightly outperform others in benchmarks, the gap is typically incremental rather than revolutionary.

Most LLMs are trained on overlapping data sets, including publicly available internet content (Wikipedia, Common Crawl, books, forums, news, etc.). This shared foundation leads to similarities in knowledge and capabilities as models absorb the same factual data, linguistic patterns, and biases. Variations arise from fine-tuning proprietary data sets or slight architectural adjustments, but the core general knowledge remains highly redundant across models.

Consequently, their outputs often reflect the same information frameworks, resulting in minimal differentiation, especially for commonly accessed knowledge. This redundancy raises the question: Do we need so many similarly trained LLMs? Moreover, the improvements from one LLM version to the next are marginal at best—all the data has already been utilized for training, and our capacity to generate new data organically won’t produce significant improvements.

Slow down, please

A more coordinated approach to LLM development could significantly reduce the environmental impact while maintaining innovation. Instead of each organization building from scratch, we could achieve similar capabilities with far less environmental and economic cost by sharing resources and building on existing open source models.

Several potential solutions exist:

Create standardized model architectures that organizations can use as a foundation.

Establish shared training infrastructure powered by renewable energy.

Develop more efficient training methods that require fewer computational resources.

Implement carbon impact assessments before developing new models.

I use LLMs every day. They are invaluable for research, including research for this specific article. My point is that there are too many of them, and too many do mostly the same thing. At what point do we figure out a better way?