Google
Google
/Gemma 4 31B IT

Quantizations

QuantQuantized bySizeDecodePrefillScoreActions
Unsloth
Unsloth
13.7 GBN/AN/AN/A
Unsloth
Unsloth
17.1 GB11.7 tok/s100.3 tok/sRuns poorly
Unsloth
Unsloth
17.1 GBN/AN/AN/A
ggml
ggml
17.4 GBN/AN/AN/A
Unsloth
Unsloth
17.5 GB11.1 tok/s97.7 tok/sRuns poorly
Unsloth
Unsloth
20.2 GBN/AN/AN/A
Unsloth
Unsloth
20.6 GB9.4 tok/s92.3 tok/sRuns poorly
q8_0GGML's logo.
Unknown30.4 GB7.3 tok/s102.8 tok/sBarely runs
N/A
MLX Community
MLX Community
31.4 GB6.9 tok/s91.1 tok/sBarely runs

Device Comparison

Results include trials with 4,096 input tokens and 1,024 output tokens only.

Decode / Prefill Speeds

14 devices