AI coding assistants are on a downward spiral

Monday February 24, 2025. 10:00 AM , from InfoWorld

We’re living in a strange time for software development. On the one hand, AI-driven coding assistants have shaken up a hitherto calcified IDE market. As RedMonk Cofounder James Governor puts it, “suddenly we’re in a position where there is a surprising amount of turbulence in the market for editors,” when “everything is in play” with “so much innovation happening.” Ironically, that very innovation in genAI may be stifling innovation in the software those coding assistants increasingly recommend. As AWS developer advocate Nathan Peck highlights, “the brutal truth beneath the magic of AI coding assistants” is that “they’re only as good as their training data, and that stifles new frameworks.”

In other words, genAI-driven tools are creating powerful feedback loops that foster winner-takes-all markets, making it hard for innovative, new technologies to take root.

No room for newbies

I’ve written before about genAI’s tendency to undermine its sources for training data. In the software development world, ChatGPT, GitHub Copilot, and other large language models (LLMs) have had a profoundly negative effect on sites like Stack Overflow, even as they’ve had a profoundly positive impact on developer productivity. Why ask a question on Stack Overflow when you can ask Copilot? But every time a developer does that, one less question goes to the public repository used to feed LLMs training data.

Just as bad, we don’t know if the training data is correct in the first place. As I recently noted, “The LLMs have trained on all sorts of good and bad data from the public Internet, so it’s a bit of a crapshoot as to whether a developer will get good advice from a given tool.” Presumably each LLM has a way of weighting certain sources of data as more authoritative, but if so, that weighting is completely opaque. AWS, for example, is probably the best source of information for how Amazon Aurora works, but it’s unclear whether developers using Copilot will see documentation from AWS or a random Q&A on Stack Overflow. I’d hope the LLMs would privilege the creator of the technology as the best source for information about it, but who knows?

And then there’s the inescapable feedback loop that Peck points out. It’s worth quoting him at length. Here’s how he describes the loop:

Developers choose popular incumbent frameworks because AI recommends them

This leads to more code being written in these frameworks

Which provides more training data for AI models

Making the AI even better at these frameworks, and even more biased toward recommending them

Attracting even more developers to these incumbent technologies

He then describes how this impacts him as a JavaScript developer. JavaScript has been a hotbed for innovation over the years, with a new framework seemingly emerging every other day. I wrote about this back in 2015, and that frenetic pace has continued for the past decade. It’s not necessarily something that will continue though, as Peck details, because the LLMs actively discourage developers from trying something new. Peck describes working with the new Bun runtime: “I’ve seen firsthand how LLM-based assistants try to push me away from using the Bun native API, back to vanilla JavaScript implementations that look like something I could have written 10 years ago.”

Why? Because that’s what the volume of training data is telling the LLMs to suggest. The rich get richer, in other words, and new options struggle to get noticed at all. That’s always been somewhat true, of course, but now it’s institutionalized by data-driven tools that don’t listen to anything beyond sheer volumes of data.

As Peck concludes, this “creates an uphill battle for innovation.” It’s always hard to launch or choose new technology, but AI coding assistants make it that much harder. He offers a provocative but appropriate example: If ChatGPT had been “invented before Kubernetes reached mainstream adoption…, I don’t think there would have ever been a Kubernetes.” The LLMs would have pushed developers toward Mesos or other already available options, rather than the new (but eventually superior) option.

What to do?

Open it up

It’s not clear how we resolve this looming problem. We’re still in the “wow, this is cool!” phase of AI coding assistants, and rightly so. But at some point, the tax we’re paying will become evident, and we’ll need to figure out how to extricate ourselves from the hole we’re digging.

One thing seems clear: As much as closed-source options may have worked in the past, it’s hard to see how they can survive in the future. As Gergely Orosz posits, “LLMs will be better in languages they have more training on,” and almost by definition, they’ll have more access to open source technologies. “Open source code is high-quality training,” he argues, and starving the LLMs of training data by locking up one’s code, documentation, etc., is a terrible strategy.

So that’s one good outcome of this seemingly inescapable LLM feedback loop: more open code. It doesn’t solve the problem of LLMs being biased toward older, established code and thereby inhibiting innovation, but it at least pushes us in the right direction for software, generally.