10 machine learning mistakes and how to avoid them

Monday February 24, 2025. 10:00 AM , from InfoWorld

Machine learning technology is taking hold across many sectors as its applications are more widely adopted. The research firm Fortune Business Insights forecasts the global machine-learning market will expand from $26.03 billion in 2023 to $225.91 billion by 2030. Use cases for machine learning include product recommendations, image recognition, fraud detection, language translation, diagnostic tools, and more.

A subset of artificial intelligence, machine learning refers to the process of training algorithms to make predictive decisions using large sets of data. The potential benefits of machine learning are seemingly limitless, but it also poses some risks.

We asked tech leaders and analysts for the most common ways they’ve seen machine learning projects fail. Here’s what they told us.

10 ways machine learning projects fail

AI hallucinations

Model bias

Legal and ethical risks

Poor data quality

Model overfitting and underfitting

Legacy system integration issues

Performance and scalability issues

Lack of transparency and trust

Not enough domain-specific knowledge

Machine learning skills shortage

AI hallucinations

In the context of machine learning, a hallucination is when a large language model (LLM) perceives patterns or objects that either don’t exist or are imperceptible to humans. When expressed in generated code or chatbot responses, hallucinations lead to output that isn’t useful.

“In today’s environment, concerns like hallucinations are at an all-time high,” says Camden Swita, head of AI/machine learning at unified data platform provider New Relic. He notes that recent research indicates that a large majority of machine learning engineers have observed signs of hallucinations in their LLMs.

Combatting hallucinations requires pivoting from focusing solely on generating content, Swita says. “Rather, developers must emphasize summarization tasks and utilize advanced techniques like retrieval-augmented generation (RAG), which greatly reduce hallucinations,” he says. “Furthermore, anchoring AI outputs on true, validated, and regulated data sources reduces the likelihood of producing misleading information.”

Model bias

Organizations need to watch out for model bias, which is the presence of systematic errors in a model that can cause it to consistently make incorrect predictions. Such errors can come from the algorithm used to train the data, the selection of the training data, the choice of features used when creating the model, or other issues.

“Data used to train machine learning models must contain accurate group representation and diverse data sets,” says Sheldon Arora, CEO of StaffDNA, a company that uses AI to help match candidates with jobs in the healthcare sector. “Too much representation from any one given group results in not accurately reflecting the population. Continuously monitoring model performance ensures equitable representation from all demographic groups.”

Addressing biases is crucial to success in the modern AI landscape, Swita says. “Best practices include implementing continuous surveillance, alerting mechanisms, and content filtering to help proactively identify and rectify biased content,” he says. “Through these methodologies, organizations can develop AI frameworks that prioritize validated content.”

To resolve bias, organizations need to embrace a dynamic approach that includes continually refining systems to keep pace with rapidly evolving models, Swita says. “Strategies need to be meticulously tailored for combating bias,” he says.

Legal and ethical risks

Machine learning comes with certain legal and ethical risks. Legal risks include discrimination due to model bias, data privacy violations, security leaks, and intellectual property violations. These and other risks can have repercussions for developers and users of machine learning systems.

Ethical risks include the potential for harm or exploitation, misuse of data, lack of transparency, and lack of accountability. Decisions based on machine learning algorithms can negatively affect individuals, even if that was not the intent.

Swita reiterates the need to anchor models and output on trusted, validated, and regulated data. “By adhering to regulations and standards governing data usage and privacy, organizations can reduce the legal and ethical risks associated with machine learning,” he said.

Poor data quality

As with any technology that relies on data to generate positive outcomes, machine learning needs high-quality data to be successful. Poor data quality can lead to flawed models and unacceptable results.

Market analysis by research firm Gartner shows that a majority of organizations have issues with their data, with many citing data unreliability and inaccuracy as top reasons for not trusting AI. “Leaders and practitioners struggle with the tension between preparing data for prototypes and ensuring readiness for the real world,” says Peter Krensky, a senior director and analyst on the analytics and AI team at Gartner.

“To address these challenges, organizations must move beyond perfection and adopt approaches that align governance with data’s intended purpose, fostering trust and adaptability,” Krensky says.

Machine learning depends heavily on data quality, says Marin Cristian-Ovidiu, CEO of online entertainment site Online Games. “Bad data [leads to] inaccurate predictions—like a recommendation system promoting irrelevant content because of biased input,” he says.

To fix this, organizations must adopt strong data-cleaning processes and diverse datasets, Cristian-Ovidiu says. High-quality data is essential for building reliable machine learning models, adds Arora. “Data should be regularly scrubbed, and preprocessing techniques need to be implemented to ensure accuracy,” he says. “Good data is the key to training models effectively and receiving reliable output.”

In addition to inaccurate and otherwise flawed data, organizations might find themselves dealing with data points that don’t contribute meaningfully to a particular task. Teams can identify irrelevant data using features such as data visualization and statistical analysis. Once identified, that data can be removed from datasets before training the models.

Model overfitting and underfitting

In addition to the data used, the models themselves can be a source of fault in machine learning projects.

Overfitting happens when a model is trained to fit too closely to a training set. This results in poor performance on new data. Models are typically trained on a known data set to make predictions on new data, but an overfit model won’t be able to generalize well to new data and won’t be able to perform its intended tasks.

“A model is said to be overfit if it performs well on training data but not on new data,” says Elvis Sun, a software engineer at Google and founder of PressPulse, a company that uses AI to help connect journalists and experts. “When it gets too complicated, the model ‘memorizes’ the training data rather than figuring out the patterns.

Underfitting is when a model is too simple to accurately capture the relationship between input and output variables. The result is a model that performs poorly on training data and new data. “Underfitting [happens] when the model is too simple to represent the true complexity of the data,” Sun says.

Teams can use cross-validation, regularization, and the appropriate model architecture to address these problems, Sun says. Cross-validation assesses the model’s performance on held-out data, demonstrating its capacity for generalization, he says. “Businesses can balance model complexity and generalization to produce reliable, accurate machine-learning solutions,” he says. Regularization techniques such as L1 or L2 discourage overfitting by penalizing model complexity and promoting simpler, more broadly applicable solutions, he says.

Legacy system integration issues

Integrating machine learning with legacy IT systems might involve evaluating the existing infrastructure’s readiness for machine learning, creating an integration process, using application programming interfaces (APIs) to exchange data, and other steps. Regardless of what’s involved, it’s crucial to ensure the systems in place can support new machine learning-based products.

“Legacy systems may not meet the infrastructure requirements of machine learning tools, and this may lead to either inefficiencies or incomplete integration,” says Damien Filiatrault, founder and CEO of Scalable Path, a software staffing agency.

“For example, a demand-forecasting machine learning model might be incompatible with current inventory management software used by a retail company,” Filiatrault says. “So, for such an implementation to take place, the system must first be assessed thoroughly.”

Machine learning models can be integrated with older systems through APIs and microservices that enable interaction among them, Filiatrault says. “In addition, data scientists and IT teams collaborating across functions in staggered rollouts ensure smoother adoption.”

Performance and scalability issues

Scalability is another issue, particularly as the use of machine learning grows over time. If systems are not able to maintain their performance and efficiency when dealing with significantly larger datasets, increased complexity, and higher computational demands, the results will likely not be acceptable.

Machine learning models must be able to handle growing data volumes without a significant decline in performance or speed. “Unless a company is using scalable cloud computing resources, they won’t be able to handle fluctuating amounts of data,” Arora says. “Depending on the size of data sets, more complex models may be required. Distributed computing frameworks allow for parallel computations of large datasets.”

Lack of transparency and trust

Machine learning applications can tend to function like a “black box,” which makes it challenging to explain their outcomes, Filiatrault says.

“In healthcare and other similar environments where confidentiality is key, this lack of transparency can be detrimental to users’ confidence,” Filiatrault says. “Using interpretable models whenever possible or employing explanation frameworks like SHAP [SHapley Additive exPlanations] could help address this problem.”

Proper documentation and visualization of decision-making processes might also help foster user trust and compliance with regulations to guarantee the ethical use of AI, Filiatrault says.

“Models often give results without explanations,” says Cristian-Ovidiu. “For example, a player engagement model might increase retention, but provide no clarity on what factors mattered. Use models that are easy to understand, and get help from experts to check the results.”

Not enough domain-specific knowledge

Effective use of machine learning frequently requires extensive knowledge of the issue or field being addressed, Sun says. Companies that lack the right people on their teams may find this type of domain-specific knowledge to be a major issue.

“Depending on factors like industry-specific data structures, business procedures, and laws and regulations, machine learning solutions may or may not be successful,” Sun says.

To bridge this gap, machine learning professionals must collaborate closely with those in related fields, Sun says. “By combining the technical expertise of the machine learning team with the situation-specific knowledge of the domain experts, businesses can create better machine learning models,” he says. “This type of collaboration can take the form of problem definition, training dataset creation, or continuous feedback loops during model development and deployment.”

Machine learning skills shortage

As with so many other areas of technology, organizations are faced with a shortage of the machine learning skills they need.

“Talent challenges often stem from a shortage of skills and the need to bridge gaps between technical and non-technical stakeholders,” Krensky says. “Many organizations struggle with change management, which is critical for driving adoption and aligning teams with evolving capabilities.”

Organizations are overcoming these challenges by focusing on reskilling, fostering collaboration across disciplines, and embracing new roles such as AI translators, Krensky says.