Google
Google
/Gemma 4 31B IT

Quantizations

QuantQuantized bySizeDecodePrefillScoreActions
Unsloth
Unsloth
13.7 GB13.6 tok/s153.4 tok/sRuns poorly
Unsloth
Unsloth
17.1 GB16.9 tok/s159.2 tok/sRuns poorly
MLX Community
MLX Community
17.1 GBN/AN/AN/A
ggml
ggml
17.4 GBN/AN/AN/A
Unsloth
Unsloth
17.5 GBN/AN/AN/A
Unsloth
Unsloth
20.2 GB13.0 tok/s147.6 tok/sRuns poorly
Unsloth
Unsloth
21.7 GB14.3 tok/s157.6 tok/sBarely runs
MLX Community
MLX Community
24.3 GBN/AN/AN/A
LM Studio
LM Studio
26.8 GBN/AN/AN/A
q8_0GGML's logo.
Unknown30.4 GBN/AN/AN/A
N/A
MLX Community
MLX Community
31.4 GBN/AN/AN/A

Device Comparison

Results include trials with 4,096 input tokens and 1,024 output tokens only.

Decode / Prefill Speeds

21 devices