Open-washing and the illusion of AI openness

Tuesday December 3, 2024. 10:00 AM , from InfoWorld

A peculiar trend has taken hold within the AI gold rush: Major players in the space—from OpenAI to Google and Microsoft—have started to heavily market their AI models as “open.” These companies are using terms like “open AI” to align with notions of transparency, collaboration, and shared progress associated with open source software. However, a closer look reveals that this embrace of “openness” is largely performative—a concept now termed “open-washing.”

Open-washing in AI refers to companies overstating their commitment to openness while keeping critical components proprietary. This approach isn’t new. We’ve seen cloud-washing, AI-washing, and now open-washing, all called out here. Marketing firms want the concept of being “open” to put them in a virtuous category of companies that save baby seals from oil spills. I don’t knock them, but let’s not get too far over our skis, billion-dollar tech companies.

Although these companies claim that their generative AI and large language models (LLMs) are open to all, the reality is that these systems remain locked within frameworks controlled by the very same corporations. Rather than fostering genuine openness, these strategies frequently cement the concentration of power in the hands of a select few. What appears on the surface to be democratic and collaborative is, in practice, a polished marketing strategy to perpetuate control.

This is wholly related to cloud computing, considering that many of these open-washed models exist on cloud providers and are built and sold by cloud providers as well. All of the cool kids have LLMs these days; cloud providers are right in there with theirs.

What is open-washing in AI?

AI firms often tout the open source accessibility of their models. But when you dig deeper, it becomes clear that the critical aspects of these systems—data sets, infrastructures, training methods, and even the practical use of LLMs—remain tightly guarded. These are not minor components but the core elements that drive generative AI systems’ functionality, innovation potential, and scalability. Companies maintain intellectual property dominance by presenting certain parts of their pipelines as open while controlling much of the ecosystem and extracting value from users seeking to customize or extend their tools.

A prime example is the release of LLMs “under permissive licenses” that claim anyone can use or adapt them. Indeed, this might appear to democratize AI, especially to less-experienced developers or startups. However, these models often exclude access to critical features such as complete training data sets or the computational power needed to replicate the models from scratch.

False perceptions

At the heart of open-washing is a distortion of the principles of openness, transparency, and reusability. Transparency in AI would entail publicly documenting how models are developed, trained, fine-tuned, and deployed. This would include full access to the data sets, weights, architectures, and decision-making processes involved in the models’ construction. Most AI companies fall short of this level of transparency. By selectively releasing parts of their models—often stripped of key details—they craft an illusion of openness.

Reusability, another pillar of openness, is much the same. Companies allow access to their models via APIs or lightweight downloadable versions but prevent meaningful adaptation by tying usage to proprietary ecosystems. This partial release offers a calculated level of reusability that maximizes big cloud’s value extraction while minimizing the risk of competitors.

For example, OpenAI’s GPT models are accessible, but their integrations are invariably tied to specific web clients, maintenance libraries, and applications owned by the company. Enterprise developers do not receive free rein to adjust, adapt, or redistribute these models, which runs afoul of licensing agreements. One developer friend of mine put it best when he said, “This stuff is about as open as a bank vault.”

Concentrated AI power

The development of generative AI hinges on immense resources: massive data sets, computing power, and specialized frameworks. Training state-of-the-art LLMs requires computational energy and hardware resources. There is a reason most enterprises are not building LLMs as they once planned: They simply can’t afford it. I’ve been advising my clients to focus on more tactical uses of AI, including small language models and agentic AI. LLMs become remote resources, which is perhaps where they will remain.

Even permissively licensed models, such as Meta’s Llama 3, come with restrictive terms limiting how they can be deployed or adapted. This selective transparency ensures smaller organizations remain dependent on these corporations’ ecosystems, cementing the power imbalance.

Moreover, the labor-intensive process of curating, labeling, and moderating data sets is often obscured. Despite the rhetoric of democratization, these firms exploit global labor forces and maintain critical data sets in silos. This makes replication nearly impossible.

What does this mean for enterprises?

Enterprise leaders need to ask the tough questions when someone says their AI model is “open.” What exactly can you modify? Where’s the complete documentation? Can you take this thing and run with it anywhere you want? You’ll likely find restrictions.

When backed into a corner, providers will remind you that they are a business, not a charity. You should expect to pay for value. I think that is just fine. However, I believe their customers would like them to be less tricky.

I wrote this article because many of my enterprise friends and clients have been moving forward with their cloud AI plans and are rightfully concerned. Big cloud providers may want to take note because the tide could turn toward companies with much more straightforward approaches to selling their LLMs and other AI tech.

My advice? Don’t get caught up in the open-washing hype. Focus on what these AI tools can do for your business within their constraints. And remember, if it looks too open to be true, it probably is.