Meta introduces Llama Stack distributions for building LLM apps

Friday September 27, 2024. 12:30 AM , from InfoWorld

Looking to ease the development of generative AI applications, Meta is sharing its first official Llama Stack distributions, to simplify how developers work with Llama large language models (LLMs) in different environments.

Unveiled September 25, Llama Stack distributions package multiple Llama Stack API providers that work well together to provide a single endpoint for developers, Meta announced in a blog post. The Llama Stack defines building blocks for bringing generative AI applications to market. These building blocks span the development life cycle from model training and fine-tuning through to product evaluation and on to building and running AI agents and retrieval-augmented generation (RAG) applications in production. A repository for Llama Stack API specifications can be found on GitHub.

Meta also is building providers for the Llama Stack APIs. The company is looking to ensure that developers can assemble AI solutions using consistent, interlocking pieces across platforms. Llama Stack distributions are intended to enable developers to work with Llama models in multiple environments including on-prem, cloud, single-node, and on-device, Meta said. The Llama Stack consists of the following set of APIs:

Inference

Safety

Memory

Agentic System

Evaluation

Post Training

Synthetic Data Generation

Reward Scoring

Each API is a collection of REST endpoints. The introduction of the Llama Stack distributions is happening alongside Meta’s release of Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto edge and mobile devices.