Google intros EmbeddingGemma for on-device AI

Monday September 8, 2025. 06:14 PM , from InfoWorld

With the introduction of its EmbeddingGemma, Google is providing a multilingual text embedding model designed to run directly on mobile phones, laptops, and other edge devices for mobile-first generative AI.

Unveiled September 4, EmbeddingGemma features a 308 million parameter design that enables developers to build applications using techniques such as RAG (retrieval-augmented generation) and semantic search that will run directly on the targeted hardware, Google explained. Based on the Gemma 3 lightweight model architecture, EmbeddingGemma is trained on more than 100 languages and is small enough to run on fewer than 200MB of RAM with quantization. Customizable output dimensions are featured, ranging from 768 dimensions to 128 dimensions via Matryoshka representation and a 2K token context window.

EmbeddingGemma empowers developers to build on-device, flexible, privacy-centric applications, according to Google. Model weights for EmbeddingGemma can be downloaded from Hugging Face, Kaggle, and Vertex AI. By working with the Gemma 3n model, EmbeddingGemma can unlock new use cases for mobile RAG pipelines, semantic search, and more, Google said. EmbeddingGemma works with tools such as sentence-transformers, llama.cpp, MLX, Ollama, LiteRT, transformers.js, LMStudio, Weaviate, Cloudflare, LlamaIndex, and LangChain. Documentation for EmbeddingGemma can be found at ai.google.dev.