Musk’s Grok 3 ‘94% Inaccurate’: Here’s How Other AI Chatbots Fare Against Truth

Thursday March 13, 2025. 11:12 PM , from eWeek

A bombshell study by the Tow Center for Digital Journalism has exposed a major flaw in AI-powered search engines: they’re terrible at citing news accurately. After analyzing eight AI search platforms, researchers found that over sixty percent of responses contained incorrect or misleading citations. Some AI chatbots performed better than others—Perplexity had a 37% error rate—but Elon Musk’s Grok 3 was the worst offender, generating incorrect citations a staggering 94% of the time.

AI’s citation chaos

Many AI search engines and chatbots, like ChatGPT and Grok, cite traditional news sources like the BBC, The New York Times, or Reuters to increase their trustworthiness. The logic behind this is that if the chatbot links to a trusted publication, the user is more likely to believe the response.

But the study found that many of these citations don’t actually link back to the original content; instead, they fabricate links, cite syndicated or plagiarized work, or misattribute articles to other publishers. When these citations are incorrect, the reputations of the chatbot and publishers are tarnished.

What’s worse? Users who don’t check the sources may unknowingly spread misinformation, reinforcing the chatbot’s inaccuracies.

AI chatbots and access to restricted content

In addition to citation issues, the study uncovered concerning inconsistencies in how AI chatbots handle restricted content. Some chatbots, including ChatGPT and Perplexity, either failed to provide answers for queries from publishers that explicitly allowed crawler access or successfully answered questions about content they should not have been able to access. Perplexity Pro correctly identified nearly a third of 90 excerpts from articles that should have been off-limits.

One particularly alarming discovery was that Perplexity’s free version correctly answered all 10 queries related to paywalled National Geographic articles, despite the fact that National Geographic has explicitly disallowed Perplexity’s crawlers. While AI models can sometimes infer information from publicly available references, this finding raises concerns about whether Perplexity is respecting publisher restrictions as promised.

Press Gazette reported that Perplexity referred to New York Times content 146,000 times in January, despite the publisher blocking its crawlers. Meanwhile, ChatGPT, though it answered fewer queries about restricted content than other chatbots, still showed a tendency to provide incorrect information rather than simply declining to answer.

The problem with AI’s overconfidence

One of AI’s biggest flaws is its unwavering confidence—even when it’s wrong. AI search engines rarely admit when they don’t know something; instead, they fill in the gaps with authoritative-sounding but incorrect answers. This “hallucination” effect makes it difficult for users to spot misinformation, especially if they’re unfamiliar with the topic.

Why human oversight is more important than ever

With AI search engines still struggling with accuracy, human judgment remains essential. Fact-checking, cross-referencing sources, and applying critical thinking are all necessary to separate fact from fiction. Until AI platforms drastically improve their sourcing reliability, users must remain skeptical of AI-generated citations. Read the entire Tow Center study to learn more about their findings.

Learn more about the risks of using Generative AI and steps you can take to mitigate them.
The post Musk’s Grok 3 ‘94% Inaccurate’: Here’s How Other AI Chatbots Fare Against Truth appeared first on eWEEK.