# Google DeepMind Releases Gemma 4 12B On-Device Multimodal Model

> Google DeepMind released Gemma 4 12B on June 3, 2026. The encoder-free multimodal model runs on standard laptops with 16GB RAM, delivers performance nearing the 26B MoE variant, and supports up to 256K tokens across 140 languages.

*Published 2026-06-07*

Gemma 4 12B is a unified, encoder-free multimodal model in the Gemma 4 family developed by Google DeepMind for on-device deployment.

Gemma 4 12B is a unified, encoder-free multimodal model in the Gemma 4 family developed by Google DeepMind for on-device deployment.

Google released the model on June 3, 2026. It uses a single decoder-only transformer that projects raw image patches and audio waveforms directly into the embedding space via lightweight linear layers.

Details are available from the Google blog post announcing the model and the official Gemma model card.

## What benchmarks has Gemma 4 12B achieved?

The model reports strong results across multiple evaluations.

## How does the encoder-free design benefit on-device AI?

By eliminating dedicated encoders, the architecture reduces complexity and memory requirements, allowing the 12B model to run efficiently on standard consumer laptops equipped with 16GB of RAM or unified memory.

> Gemma 4 12B delivers performance nearing our larger 26B MoE model on standard benchmarks, but at less than half the total memory footprint.Olivier Lacombe and Gus Martins, Director of Product Management and Product Manager, Google DeepMind

## What are the main specifications of the Gemma 4 12B model?

Gemma 4 12B Key SpecificationsSpecificationDetailsParameters12 billionContext WindowUp to 256K tokensLanguages SupportedOver 140LicenseApache 2.0AvailabilityHugging Face as google/gemma-4-12B-it

## What input modalities does Gemma 4 12B support?

- Text inputs and generation
- Image understanding and analysis
- Audio waveform processing
- Video input handling

The model is part of a broader family including E2B, E4B, 26B A4B MoE, and 31B variants. It is optimized for Google AI Edge and LiteRT-LM.

Weights are released under Apache 2.0 and accessible via Hugging Face.

## Sources

1. [Gemma 4 12B delivers performance nearing our larger 26B MoE model on standard benchmarks, but at less than half the total memory footprint.](https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/)
2. [Gemma 4 12B Unified achieves 77.2% on MMLU Pro, 72.0% on LiveCodeBench v6, and 69.1% on MMMU Pro (vision).](https://ai.google.dev/gemma/docs/core/model_card_4)

---
Source: https://aiintelreport.com/news/gemma-4-12b-on-device-multimodal-model
Index: https://aiintelreport.com/llms.txt · Full text: https://aiintelreport.com/llms-full.txt
