LLMs aren’t enough for real-world, real-time projects

Tuesday June 24, 2025. 11:00 AM , from InfoWorld

The major builders of large language models (LLMs)—OpenAI, DeepSeek, and others—are mistaken when they claim that their latest models, like OpenAI’s o-series or DeepSeek’s R1, can now “reason.” What they’re offering isn’t reasoning. It’s simply an advanced text predictor with some added features. To unlock AI’s true transformative potential, we need to move beyond the idea of reasoning as a one-size-fits-all solution. Here’s why.

If 2024 belonged to ChatGPT, OpenAI hoped it would dominate 2025 with the o-series, promising a leap in LLM reasoning. Early praise for its attempts to curb hallucinations quickly faded when China’s DeepSeek matched its capabilities at a fraction of the cost—on a laptop. Then came Doubao, an even cheaper rival, shaking the AI landscape. Chip stocks dropped, US tech dominance faltered, and even Anthropic’s Claude 3.5 Sonnet came under scrutiny.

But the real issue with the LLM paradigm isn’t just cost—it’s the illusion that all its inherent flaws have been solved. And that’s a dangerous path that could lead to painful dead ends. Despite all the progress, issues like hallucination remain unresolved. This is why I believe the future of AI doesn’t lie in artificial general intelligence (AGI) or endlessly scaling LLMs. Instead, it’s in fusing LLMs with knowledge graphs—particularly when enhanced by retrieval-augmented generation (RAG), combining the power of structured data with generative AI models.

No matter how cheap or efficient, an LLM is fundamentally a fixed, pre-trained model, and retraining it is always costly and impractical. In contrast, knowledge graphs are dynamic, evolving networks of meaning, offering a far more adaptable and reliable foundation for reasoning. Enriching an LLM’s conceptual map with structured, interconnected data through graphs transforms it from probabilistic guesswork into precision. This hybrid approach enables true practical reasoning, offering a dependable way to tackle complex enterprise challenges with clarity—something that LLM “reasoning” often falls short of delivering.

We need to distinguish between true reasoning and the tricks LLMs use to simulate it. Model makers are loading their latest models with shortcuts. Take OpenAI, for example, which now injects code when a model detects a calculation in the context window, creating the illusion of reasoning through stagecraft rather than intelligence. But these tricks don’t solve the core problem: the model doesn’t understand what it’s doing. While today’s LLMs have solved classic logic fails—like struggling to determine how long it would take to dry 30 vs. five white shirts in the sun—there will always be countless other logical gaps. The difference is that graphs provide a structured and deep foundation for reasoning, not masking limitations with clever tricks.

The limits of LLM ‘reasoning’

We’ve seen the consequences of forcing ChatGPT into this role, where it fabricates confident but unreliable answers or risks exposing proprietary data to train itself—a fundamental flaw. Tasks like predicting financial trends, managing supply chains, or analyzing domain-specific data require more than surface-level reasoning.

Take financial fraud detection, for example. An LLM might be asked, “Does this transaction look suspicious?” and respond with something that sounds confident—“Yes, because it resembles known fraudulent patterns.” But does it actually understand the relationships between accounts, historical behavior, or hidden transaction loops? No. It’s simply echoing probability-weighted phrases from its training data. True fraud detection requires structured reasoning over financial networks buried within your transaction data—something LLMs alone cannot provide.

The problem becomes even more concerning when we consider the deployment of LLMs in real-world applications. Take, for example, a company using an LLM to summarize clinical trial results or predict drug interactions. The model might generate a response like, “This combination of compounds has shown a 30% increase in efficacy.” But what if those trials weren’t conducted together, if critical side effects are overlooked, or if regulatory constraints are ignored? The consequences could be severe.

Now, consider cybersecurity, another domain where a wrong response could have catastrophic consequences. Imagine your CSO asking an LLM, “How should we respond to this network breach?” The model might suggest actions that sound plausible but are completely misaligned with the organization’s actual infrastructure, latest threat intelligence, or compliance needs. Following AI-generated cybersecurity advice without scrutiny could leave the company even more vulnerable.

And let’s not overlook enterprise risk management. Suppose a group of business users asks an LLM, “What are the biggest financial risks for our business next year?” The model might confidently generate an answer based on past economic downturns. However, it lacks real-time awareness of macroeconomic shifts, government regulations, or industry-specific risks. It also lacks the current and actual corporate information—it simply does not have it. Without structured reasoning and real-time data integration, the response, while grammatically perfect, is little more than educated guessing dressed up as insight.

This is why structured, verifiable data are absolutely essential in enterprise AI. LLMs can offer useful insights, but without a real reasoning layer—such as knowledge graphs and graph-based retrieval—they’re essentially flying blind. The goal isn’t just for AI to generate answers, but to ensure it comprehends the relationships, logic, and real-world constraints behind those answers.

The power of knowledge graphs

The reality is that business users need models that provide accurate, explainable answers while operating securely within the walled garden of their corporate infosphere. Consider the training problem: A firm signs a major LLM contract, but unless it gets a private model, the LLM won’t fully grasp the organization’s domain without extensive training. And once new data arrives, that training is outdated—forcing another costly retraining cycle. This is plainly impractical, no matter how customized the o1, o2, o3, or o4 model is.

In sharp contrast, supplementing an LLM with a well-designed knowledge graph—especially one that employs dynamic algorithms—solves this issue by updating context rather than requiring retraining. Whereas an LLM like o1 might correctly interpret a question like “How many x?” as a sum, we need it to understand something more specific, such as “How many servers are in our AWS account?” That’s a database look-up, not an abstract mathematical question.

A knowledge graph ensures that a first attempt at practical AI can reason over your data with reliability. Moreover, with a graph-based approach, LLMs can be used securely with private data—something even the best LLM on its own can’t manage.

The smart move is to go beyond the trivial. AI needs knowledge graphs, retrieval-augmented generation, and advanced retrieval methods like vector search and graph algorithms—not just low-cost training models, impressive as they may seem.

Dominik Tomicevic leads European software company Memgraph, provider of an open-source in-memory graph database that’s purpose-built for dynamic, real-time enterprise applications.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.