• featured
gemma-4-26B-A4B-it Robot

gemma-4-26B-A4B-it

The Gemma 4 26B-A4B-it (Instruct) represents Google DeepMind’s latest evolution in Mixture-of-Experts (MoE) architecture, released in April 2026. This model is specifically optimized for high-throughput efficiency, balancing the massive knowledge base of a 26B parameter model with the inference speed of a much smaller 4B active parameter model.

$0.13/1M input tokens; $0.4/1M output tokens

Input

No template available.
You can add a prompt template in the admin panel.

Output

Google Gemma 4 26B-A4B-it Documentation

Gemma 4 26B-A4B-it is a sparse Mixture-of-Experts (MoE) model. While it contains a total of 26.2 Billion parameters, only 4.1 Billion parameters are activated per token. This allows for the reasoning capabilities of a large model with the latency profile of a lightweight model.

Key Capabilities

  • Efficiency Powerhouse: Optimized for real-time applications where low latency is critical.
  • 256K Context Window: Natively supports massive document processing and long-form conversations.
  • Multimodal Ready: Seamlessly handles text-to-text and image-to-text reasoning.
  • Enhanced Coding: Specifically fine-tuned on the "StarCoder-3" dataset for advanced Python and TypeScript generation.
  • Safety Aligned: Built with Google's latest constitutional AI safety guardrails for enterprise use.

Request Parameters

To interact with this model via vLLM or OpenAI-compatible endpoints:

Parameter Type Required Description
model string Yes Use "google/gemma-4-26b-a4b-it".
messages array Yes Standard chat format (role/content).
max_tokens integer No Maximum generation length. Default: 4096.
temperature float No Recommended: 0.1 for coding; 0.7 for chat.
top_p float No Nucleus sampling. Default: 0.9.

Optional Parameters

Parameter Type Default Description
frequency_penalty float 0.0 Prevents repetitive word usage.
stop array null Sequences to end generation (e.g., ["\nUser:"]).
logprobs boolean false Returns the probability of the generated tokens.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales