Why AI projects fail, and how developers can help them succeed

Monday June 23, 2025. 11:00 AM , from InfoWorld

Even as we emerge from generative AI’s tire-kicking phase, it’s still true that many (most?) enterprise artificial intelligence and machine learning projects will derail before delivering real value. Despite skyrocketing investment in AI, the majority of corporate ML initiatives never graduate from proof of concept to production. Why? Well, a CIO survey found that “unclear objectives, insufficient data readiness, and a lack of in-house expertise” sink many AI projects, but I like Santiago Valdarrama’s list even more. At the heart of much of this AI failure is something I uncovered way back in 2013 as “big data” took off: “Everyone’s doing it, but no one knows why.”

Let’s look at how developers can improve the odds of AI success.

Not every problem needs AI

First, as much as we may want to apply AI to a burgeoning list of business problems, quite often AI isn’t needed or isn’t even advisable in the first place. Not every task warrants a machine learning model, and forcing AI into scenarios where simpler analytics or rule-based systems suffice is a recipe for waste, as I’ve written. “There is a very small subset of business problems that are best solved by machine learning; most of them just need good data and an understanding of what it means,” data scientist Noah Lorang once observed. In other words, solid data analysis and software engineering often beat AI wizardry for everyday challenges.

The best strategy is clarity and simplicity. Before writing a line of TensorFlow or PyTorch, step back and ask: “What problem are we actually trying to solve, and is AI the best way to solve it?” Sometimes a straightforward algorithm or even a spreadsheet model is enough. ML guru Valdarrama advises teams to start with simple heuristics or rules before leaping into AI. “You’ll learn much more about the problem you need to solve,” he says, and you’ll establish a baseline for future ML solutions.

Garbage in, garbage out

Even a well-chosen AI problem will falter if it’s fed the wrong data. Enterprise teams often underestimate the critical-but-unexciting task of data preparation: curating the right data sets, cleaning and labeling them, and ensuring they actually represent the problem space. It’s no surprise that according to Gartner research, nearly 85% of AI projects fail due to poor data quality or lack of relevant data. If your training data is garbage (biased, incomplete, outdated), your model’s outputs will be garbage as well—no matter how advanced your algorithms.

Data-related issues are cited as a top cause of failure for AI initiatives. Enterprises frequently discover their data is siloed across departments, rife with errors, or simply not relevant to the problem at hand. A model trained on idealized or irrelevant data sets will crumble against real-world input. Successful AI/ML efforts, by contrast, treat data as a first-class citizen. That means investing in data engineering pipelines, data governance, and domain expertise before spending money on fancy algorithms. As one observer puts it, data engineering is the “unsung hero” of AI. Without clean, well-curated data, “even the most advanced AI algorithms are rendered powerless.”

For developers, this translates to a focus on data readiness. Make sure you have the data your model needs and that you need the data you have. If you’re predicting customer churn, do you have comprehensive, up-to-date customer interaction data? If not, no amount of neural network tuning will save you. Don’t let eagerness for AI blind you to the essential grunt work of ETL, data cleaning, and feature engineering.

Poorly defined success

Too many AI/ML initiatives launch with vague hopes of “delivering value” but no agreed-upon way to quantify that value. A lack of clear metrics is a well-documented AI project killer. For example, a retail company might deploy a machine learning model to personalize offers to customers but fail to decide whether success is defined by increased click-through rates, higher revenue per customer, or improved retention. Without that clarity, even a technically accurate model might be deemed a flop.

In the generative AI arena especially, many teams roll out models without any systematic evaluation in place. As ML engineer Shreya Shankar notes, “Most people don’t have any form of systematic evaluation before they ship… so their expectations are set purely based on vibes.” Vibes might feel good in a demo, but they collapse in production. It’s hard to declare a win (or acknowledge a loss) when you didn’t define what winning looks like from the start.

The solution is straightforward: Establish concrete success metrics up front. For example, if you’re building an AI fraud detection system, success might be “reduce false positives by X% while catching Y% more fraud.” Setting one or two clear KPIs focuses the team’s efforts and provides a reality check against hype. It also forces a conversation with business stakeholders: if we achieve X metric, will this project be considered successful? Developers and data scientists should insist on this clarity. It’s better to negotiate what matters upfront than to try to retroactively justify an AI project with cherry-picked stats.

Ignoring the feedback loop

Let’s say you’ve built a decent first version of an AI/ML model and deployed it. Job done, right? Hardly. One major reason AI initiatives stumble is the failure to plan for continuous learning and iteration. Unlike traditional software, an AI model’s performance can and will drift over time: Data distributions shift, and users react in unexpected ways. In other words, our pristine AI dreams must face the real world. If you ignore feedback loops and omit a plan for ongoing model tuning, your AI project will quickly become a stale experiment that fails to adapt.

The real key to AI success is to constantly tune your model, something many teams neglect amid the excitement of a new AI launch. In practice, this means putting in place what modern MLops teams call “the data flywheel”: monitoring your model’s outputs, collecting new data on where it’s wrong or uncertain, retraining or refining the model, and redeploying improved versions. Shankar warns that too often “teams expect too high of accuracy … from an AI application right after it’s launched and often don’t build out the infrastructure to continually inspect data, incorporate new tests, and improve the end-to-end system.” Model deployment isn’t the finish line: it’s the start of a long race.

All talk, no walk

Finally, too many organizations excel at impressive AI prototypes and pilot projects and stop short of investing the hard work to turn those demos into dependable, production-grade systems at scale. Why does this happen so frequently? One reason is the hype-fueled rush we touched on earlier. When CEOs and board members pressure the company to “get on the AI train,” there’s an incentive to show progress fast, even if that progress is only superficial. As I’ve suggested, we’ve sometimes “allowed the promise of AI to overshadow current reality.”

Another factor is what could be called “pilot purgatory.” Organizations spin up numerous AI proofs of concept to explore use cases but fund them minimally and isolate them from core production systems. Often these pilots die not because the technology failed but because they were never designed with production in mind. An endless stream of disconnected experiments is costly and demoralizing. It creates “pilot fatigue” without yielding tangible benefits. Some of this is fostered by organizational dynamics. In today’s market, it may be easier to get C-level executives to invest in your project if it has AI sprinkled on top. As IDC’s Ashish Nadkarni indicates, “Most of these [failed] genAI initiatives are born at the board level … not because of a strong business case. It’s trickle-down economics to me.”

To avoid this trap, you need to allocate sufficient time and resources to harden a prototype for production: plugging it into real data workflows, adding user feedback channels, handling edge cases, implementing guardrails (like prompt filtering or human fallback for sensitive tasks), etc. Success, in short, will ultimately come down to developers.

Developers to the rescue

It’s easy to be cynical about enterprise AI given the high failure rates. Yet amid the wreckage of failed projects are shining examples of AI done right, often led by teams that balanced skepticism with ingenuity. The differentiator is usually a developer mindset that puts substance over show. Indeed, production-grade AI “is all the work that happens before and after the prompt,” as I’ve suggested.

The good news is that the power to fix these failures lies largely in our hands as developers, data scientists, and technology leaders. We can push back when a project lacks a clear objective or success metric and insist on answering “why” before jumping into “how.” We can advocate for the boring-but-crucial work of data quality and MLops, reminding our organizations that AI is not magic—it’s engineering. When we do embrace an AI solution, we can do so with eyes open and a plan for the full product life cycle, not just the demo.