Google unveils Gemma 3n which runs locally on your devices with less memory When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. Google unveils Gemma 3n which runs locally on your devices..."> Google unveils Gemma 3n which runs locally on your devices with less memory When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. Google unveils Gemma 3n which runs locally on your devices..." /> Google unveils Gemma 3n which runs locally on your devices with less memory When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. Google unveils Gemma 3n which runs locally on your devices..." />

Upgrade to Pro

Google unveils Gemma 3n which runs locally on your devices with less memory

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Google unveils Gemma 3n which runs locally on your devices with less memory

Paul Hill

Neowin
@ziks_99 ·

May 21, 2025 01:02 EDT

At its Google I/O 2025 event the search giant unveiled a bunch of new AI tools, most notably, Gemini Flash 2.5, which everyone has access to. Another interesting development was on the small LLM front, where the company unveiled Gemma 3n, designed to run directly on your personal devices.
The biggest new advancement in Gemma 3n is that it uses an innovation developed by Google DeepMind called Per-Layer Embeddings. This reduces the memory requirements needed for the model. The raw parameter count for Gemma 3n is 5B and 8B, but has memory overheads comparable to 2B and 4B models. Google claims that this models can run with a memory footprint of just 2GB and 3GB, respectively.
Aside from having a smaller memory footprint, techniques such as PLE, KVC sharing and advanced activation quantization allow Gemma 3n to start responding 1.5x faster on mobile with much better quality, when compared to Gemma 3 4B. Gemma 3n also utilizes mix‘n’match capability which allows it to dynamically create submodels that more optimally fit your specific use cases.

Another benefit is that Gemma 3n uses local execution meaning it’s powered entirely by your device, no data ever goes off to a server somewhere where your prompts could be inspected. Additionally, this means that it can be used without an internet connection, which is a massive advantage.
It’s also said to be much better at multimodal inputs as it can understand audio, text, and images and said to have significantly enhanced video understanding. This allows it to perform transcriptions, translations, and interleaved inputs across modalities, enabling understanding of complex multimodal interactions.
Finally, Gemma 3n is also promises to be better when it comes to non-English languages. Users will see improved performance particularly in Japanese, German, Korean, Spanish, and French. The model shows strong performance in multilingual benchmarks such as 50.1% on WMT24++.
You can begin using Gemma 3n now directly in your browser on Google AI Studio with no setup needed. Developers who want to integrate Gemma 3n locally can do so via Google AI Edge, which provides tools and libraries. This second method gives you text and image understanding and generation capabilities today, with more to come in the future.

Tags

Report a problem with article

Follow @NeowinFeed
#google #unveils #gemma #which #runs
Google unveils Gemma 3n which runs locally on your devices with less memory
When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. Google unveils Gemma 3n which runs locally on your devices with less memory Paul Hill Neowin @ziks_99 · May 21, 2025 01:02 EDT At its Google I/O 2025 event the search giant unveiled a bunch of new AI tools, most notably, Gemini Flash 2.5, which everyone has access to. Another interesting development was on the small LLM front, where the company unveiled Gemma 3n, designed to run directly on your personal devices. The biggest new advancement in Gemma 3n is that it uses an innovation developed by Google DeepMind called Per-Layer Embeddings. This reduces the memory requirements needed for the model. The raw parameter count for Gemma 3n is 5B and 8B, but has memory overheads comparable to 2B and 4B models. Google claims that this models can run with a memory footprint of just 2GB and 3GB, respectively. Aside from having a smaller memory footprint, techniques such as PLE, KVC sharing and advanced activation quantization allow Gemma 3n to start responding 1.5x faster on mobile with much better quality, when compared to Gemma 3 4B. Gemma 3n also utilizes mix‘n’match capability which allows it to dynamically create submodels that more optimally fit your specific use cases. Another benefit is that Gemma 3n uses local execution meaning it’s powered entirely by your device, no data ever goes off to a server somewhere where your prompts could be inspected. Additionally, this means that it can be used without an internet connection, which is a massive advantage. It’s also said to be much better at multimodal inputs as it can understand audio, text, and images and said to have significantly enhanced video understanding. This allows it to perform transcriptions, translations, and interleaved inputs across modalities, enabling understanding of complex multimodal interactions. Finally, Gemma 3n is also promises to be better when it comes to non-English languages. Users will see improved performance particularly in Japanese, German, Korean, Spanish, and French. The model shows strong performance in multilingual benchmarks such as 50.1% on WMT24++. You can begin using Gemma 3n now directly in your browser on Google AI Studio with no setup needed. Developers who want to integrate Gemma 3n locally can do so via Google AI Edge, which provides tools and libraries. This second method gives you text and image understanding and generation capabilities today, with more to come in the future. Tags Report a problem with article Follow @NeowinFeed #google #unveils #gemma #which #runs
WWW.NEOWIN.NET
Google unveils Gemma 3n which runs locally on your devices with less memory
When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. Google unveils Gemma 3n which runs locally on your devices with less memory Paul Hill Neowin @ziks_99 · May 21, 2025 01:02 EDT At its Google I/O 2025 event the search giant unveiled a bunch of new AI tools, most notably, Gemini Flash 2.5, which everyone has access to. Another interesting development was on the small LLM front, where the company unveiled Gemma 3n, designed to run directly on your personal devices. The biggest new advancement in Gemma 3n is that it uses an innovation developed by Google DeepMind called Per-Layer Embeddings (PLE). This reduces the memory requirements needed for the model. The raw parameter count for Gemma 3n is 5B and 8B, but has memory overheads comparable to 2B and 4B models. Google claims that this models can run with a memory footprint of just 2GB and 3GB, respectively. Aside from having a smaller memory footprint, techniques such as PLE, KVC sharing and advanced activation quantization allow Gemma 3n to start responding 1.5x faster on mobile with much better quality, when compared to Gemma 3 4B. Gemma 3n also utilizes mix‘n’match capability which allows it to dynamically create submodels that more optimally fit your specific use cases. Another benefit is that Gemma 3n uses local execution meaning it’s powered entirely by your device, no data ever goes off to a server somewhere where your prompts could be inspected. Additionally, this means that it can be used without an internet connection, which is a massive advantage. It’s also said to be much better at multimodal inputs as it can understand audio, text, and images and said to have significantly enhanced video understanding. This allows it to perform transcriptions, translations, and interleaved inputs across modalities, enabling understanding of complex multimodal interactions. Finally, Gemma 3n is also promises to be better when it comes to non-English languages. Users will see improved performance particularly in Japanese, German, Korean, Spanish, and French. The model shows strong performance in multilingual benchmarks such as 50.1% on WMT24++. You can begin using Gemma 3n now directly in your browser on Google AI Studio with no setup needed. Developers who want to integrate Gemma 3n locally can do so via Google AI Edge, which provides tools and libraries. This second method gives you text and image understanding and generation capabilities today, with more to come in the future. Tags Report a problem with article Follow @NeowinFeed
·157 Ansichten