Google Gemma 4 26B-A4B-it Documentation

Gemma 4 26B-A4B-it is a sparse Mixture-of-Experts (MoE) model. While it contains a total of 26.2 Billion parameters, only 4.1 Billion parameters are activated per token. This allows for the reasoning capabilities of a large model with the latency profile of a lightweight model.

Key Capabilities

Efficiency Powerhouse: Optimized for real-time applications where low latency is critical.
256K Context Window: Natively supports massive document processing and long-form conversations.
Multimodal Ready: Seamlessly handles text-to-text and image-to-text reasoning.
Enhanced Coding: Specifically fine-tuned on the "StarCoder-3" dataset for advanced Python and TypeScript generation.
Safety Aligned: Built with Google's latest constitutional AI safety guardrails for enterprise use.

Request Parameters

To interact with this model via vLLM or OpenAI-compatible endpoints:

Parameter	Type	Required	Description
`model`	`string`	Yes	Use `"google/gemma-4-26b-a4b-it"`.
`messages`	`array`	Yes	Standard chat format (role/content).
`max_tokens`	`integer`	No	Maximum generation length. Default: `4096`.
`temperature`	`float`	No	Recommended: `0.1` for coding; `0.7` for chat.
`top_p`	`float`	No	Nucleus sampling. Default: `0.9`.

Optional Parameters

Parameter	Type	Default	Description
`frequency_penalty`	`float`	`0.0`	Prevents repetitive word usage.
`stop`	`array`	`null`	Sequences to end generation (e.g., `["\nUser:"]`).
`logprobs`	`boolean`	`false`	Returns the probability of the generated tokens.

gemma-4-26B-A4B-it

Input

Output

Google Gemma 4 26B-A4B-it Documentation

Key Capabilities

Request Parameters

Optional Parameters

Unlock the most affordable AI hosting