Google’s AI Edge Gallery will let developers deploy offline AI models — here’s how it works

Monday June 2, 2025. 02:21 PM , from InfoWorld

Google has launched AI Edge Gallery, an open-source platform that enables developers to run advanced AI models directly on Android devices, with iOS support planned for a future release.

Released under the Apache 2.0 license and hosted on GitHub, this experimental app harnesses Google’s AI Edge platform to deliver machine learning (ML) and generative AI (GenAI) capabilities without relying on cloud connectivity. Aimed at enterprise developers, it emphasizes data privacy and low latency, offering a robust tool for building secure, efficient applications.

“On-device AI execution via AI Edge Gallery creates a new developer paradigm where privacy becomes a performance feature rather than a compliance burden,” said Abhishek Anant Garg, analyst at QKS Group. “Eliminating network dependencies not only reduces latency to near-zero but also transforms intermittent connectivity from a failure mode into an irrelevant variable.”

A curated hub for on-device AI

Google’s AI Edge Gallery is built on LiteRT (formerly TensorFlow Lite) and MediaPipe, optimized for running AI on resource-constrained devices. It supports open-source models from Hugging Face, including Google’s Gemma 3n — a small, multimodal language model that handles text and images, with audio and video support in the pipeline.

The 529MB Gemma 3 1B model delivers up to 2,585 tokens per second during prefill inference on mobile GPUs, enabling sub-second tasks like text generation and image analysis. Models run fully offline using CPUs, GPUs, or NPUs, preserving data privacy.

The app includes a Prompt Lab for single-turn tasks such as summarization, code generation, and image queries, with templates and tunable settings (e.g., temperature, top-k). The RAG library lets models reference local documents or images without fine-tuning, while a Function Calling library enables automation with API calls or form filling.

Int4 quantization cuts model size by up to 4x over bf16, reducing memory use and latency, according to a Google blog post. A Colab notebook helps developers quantize, fine-tune, and convert models for edge deployment. Model sizes range from 500MB to 4GB, with over a dozen options on the LiteRT Hugging Face community hub.

The setup and enterprise applications

To get started with AI Edge Gallery, developers must enable Developer Mode on their Android devices (Settings > About phone > tap Build number seven times). After downloading the latest APK (v1.0.3) from GitHub, the app can be installed via ADB using the command “adb install -t ai-edge-gallery.apk” or through a file manager with “Unknown Sources” enabled. As an experimental alpha release, the app may exhibit some instability; iOS support is expected soon.

The platform is especially useful for processing sensitive data locally, helping industries like healthcare and finance maintain compliance by keeping records on-device. Its offline capabilities support field applications such as equipment diagnostics, while MediaPipe integration facilitates IoT deployments in retail and manufacturing. The Function Calling library enables automation features, including voice-driven form filling and document summarization.

Abhishek Ks Gupta, partner and national sector leader at KPMG in India, said that on-device AI like Google’s Edge Gallery is a “revolutionary shift for privacy and security by keeping data local.” He added, “It’s fundamentally more secure for that specific data but demands a new security focus — on protecting the device fleet and the models themselves.”

The AI Edge Gallery performance will vary by hardware. For example, Pixel 8 Pro devices can handle larger models smoothly, while mid-tier devices may experience higher latency.

“The challenge lies in reconciling model sophistication with mobile hardware realities: developers must become virtuosos of efficiency rather than simply orchestrators of cloud abundance,” Garg said.

Garg added that the ceiling for on-device generative AI isn’t just technical — it’s conceptual. “On-device generative AI is hitting the same wall that plagued early mobile computing — trying to shrink desktop paradigms into handheld form factors,” he said. “Current approaches that require gigabytes of model weights and sustained high-TOPS performance are fundamentally misaligned with mobile realities. We need AI architectures designed from the ground up for intermittent, low-power, context-aware operation — rather than scaled-down versions of cloud-centric models.”

The big push for local AI processing

Google’s AI Edge Gallery launches amid a broader industry shift toward local AI processing. Apple’s Neural Engine, embedded across iPhones, iPads, and Macs, powers real-time language processing and computational photography—all on-device to preserve privacy.

Qualcomm’s AI Engine, built into Snapdragon chips, drives voice recognition and smart assistants in Android smartphones. Samsung uses embedded NPUs in Galaxy devices to accelerate generative AI tasks without cloud dependence.

Google’s approach is more foundational. “Edge Gallery reflects a shift from direct competition to platform orchestration,” QKS Group’s Garg said. “Rather than battling Apple or Qualcomm on features, Google is building the infrastructure for mobile AI itself. It’s a meta-competitive strategy — akin to becoming the Linux of mobile AI: ubiquitous, invisible, and indispensable.”

The AI Edge Gallery and LiteRT initiative represent what Garg calls “a masterclass in platform strategy.” By creating the infrastructure, open-sourcing the tools, and seeding the ecosystem, Google is making on-device AI broadly accessible — while retaining control over runtimes and model distribution. “Like Intel in the PC era, Google is quietly positioning itself to power the edge AI wave — ubiquitous, essential, and largely invisible,” Garg said.

More Google news and insights:

Who needs Google technology? Probably not you

Google previews Gemini 2.5 Flash hybrid reasoning model

Google adds natural language query capabilities to AlloyDB