Meta will offer its Llama AI model as an API too

Wednesday April 30, 2025. 05:34 PM , from InfoWorld

Meta has unveiled a preview version of an API for its Llama large language models. The new offering will transform Meta’s popular open-source models into an enterprise-ready service directly challenging established players like OpenAI while addressing a key concern for enterprise adopters: freedom from vendor lock-in.

“We want to make it even easier for you to quickly start building with Llama, while also giving you complete control over your models and weights without being locked into an API,” Meta said in a statement during its first-ever LlamaCon developer forum.

The Llama API represents Meta’s evolution from simply releasing open-source models to providing a variety of cloud-based AI infrastructure.

Greyhound Research chief analyst Sanchit Vir Gogia said, “They’re shifting the battlefield from model quality alone to inference cost, openness, and hardware advantage.”

OpenAI SDK compatibility

The new service will offer one-click API key creation, interactive model playgrounds, and immediate access to Meta’s latest Llama 4 Scout and Llama 4 Maverick models, the company said.

Integration with existing infrastructure is straightforward through lightweight SDKs in both Python and TypeScript. Meta has maintained compatibility with the OpenAI SDK, allowing developers to convert existing applications with minimal code changes.

The solution includes tools for fine-tuning and evaluation, enabling developers to create custom versions of the new Llama 3.3 8B model — potentially reducing costs while improving performance for specific use cases.

Chip partnerships

Meta will collaborate with AI chip makers Cerebras and Groq to improve inferencing speed, a critical factor in production AI applications.

Cerebras, known for its specialized AI chips, promises dramatically faster performance compared to conventional GPU solutions. According to third-party benchmarks cited by the company, Llama 4 Scout runs on its chips at over 2,600 tokens per second, compared to OpenAI’s ChatGPT running at approximately 130 tokens per second.

“Developers building agentic and real-time apps need speed,” said Andrew Feldman, CEO of Cerebras. “With Cerebras on Llama API, they can build AI systems that are fundamentally out of reach for leading GPU-based inference clouds.”

Similarly, Groq’s Language Processing Unit (LPU) chips deliver speeds of up to 625 tokens per second. Jonathan Ross, Groq’s CEO, emphasized that their solution is “vertically integrated for one job: inference,” with every layer “engineered to deliver consistent speed and cost efficiency without compromise.”

Neil Shah, VP for research and partner at Counterpoint Research, said, “By adopting cutting-edge but ‘open’ solutions like Llama API, enterprise developers now have better choices and don’t have to compromise on speed and efficiency or get locked into proprietary models.”

Greyhound’s Gogia said that Meta’s strategic tie-ups with Groq and Cerebras to support the Llama AI “mark a decisive pivot in the LLM-as-a-Service market.”

Exploiting hesitancy about proprietary AI

The Llama API enters a market where OpenAI’s GPT models have established early dominance, but Meta is leveraging key advantages to attract enterprise customers who remain hesitant about proprietary AI infrastructure.

“Meta’s Llama API presents a fundamentally different proposition for enterprise AI builders — it’s not just a tool, but a philosophy shift,” Gogia noted. “Unlike proprietary APIs from OpenAI or Anthropic, which bind developers into opaque pricing, closed weights, and restrictive usage rights, Llama offers openness, modularity, and the freedom to choose one’s own inference stack.”

Meta’s explicit commitment to data privacy, saying it does not use prompts or model responses to train its AI models, directly addresses concerns about other providers using customer data to improve their systems. Furthermore, its data portability guarantee ensures that models built on the Llama API are not locked to its servers, but can be moved and hosted wherever enterprises wish.

This approach creates a unique middle ground: enterprise-grade convenience with the ultimate exit strategy of complete model ownership.

Market impact and future plans

Currently available as a limited free preview with broader access planned “in the coming weeks and months,” the Llama API positions Meta as a direct competitor to OpenAI, Microsoft, and Google. The company describes this release as “just step one,” with additional enterprise capabilities expected throughout 2025.

Prabhu Ram, VP for industry research group at CyberMedia Research, described Meta’s Llama API as a faster, more open, and modular alternative to existing LLM-as-a-service offerings. “However, it still trails proprietary platforms like OpenAI and Google in ecosystem integration and mature enterprise tooling.”

For technical teams eager to test these performance claims, accessing Llama 4 models powered by Cerebras and Groq requires only a simple selection within the API interface.

Industry analysts suggest Meta’s entry could accelerate price competition in the AI API market while raising the bar for inference performance. For enterprises developing customer-facing AI applications, the performance improvements could enable new categories of applications where response time is critical.

“Meta’s long-term impact will hinge on how effectively it can close the ecosystem gap and deliver enterprise-grade solutions atop its open model stack,” Ram concluded.