According to a Google DeepMind blog post published Jun 03, 2026, Google released Gemma 4 12B, a 12-billion-parameter member of the Gemma 4 family designed for on-device, multimodal, agentic workflows.
The blog post and the official Gemma 4 documentation state the model accepts text, images, and native audio and is engineered to run on consumer laptops with 16GB of RAM or VRAM.
Google documents that the model uses an encoder-free approach that projects raw vision and audio inputs directly into the LLM backbone, and it is released under an Apache 2.0 license with open weights available for download (sources: Google DeepMind blog; Gemma 4 docs).
According to a Google DeepMind blog post published Jun 03, 2026, Google released Gemma 4 12B, a 12-billion-parameter member of the Gemma 4 family designed for on-device, multimodal, agentic workflows. The blog post and the official Gemma 4 documentation state the model accepts text, images, and native audio and is engineered to run on consumer laptops with 16GB of RAM or VRAM. Google documents that the model uses an encoder-free approach that projects raw vision and audio inputs directly into the LLM backbone, and it is released under an Apache 2.0 license with open weights available for download (sources: Google DeepMind blog; Gemma 4 docs).