google/gemma-3-1b-pt

📊 Model Parameters

Total Parameters 1,301,875,840
Context Length 32,768
Hidden Size 1152
Layers 26
Attention Heads 4
KV Heads 1

💾 Memory Requirements

FP32 (Full) 4.85 GB
FP16 (Half) 2.42 GB
INT8 (Quantized) 1.21 GB
INT4 (Quantized) 620.8 MB

🔑 KV Cache (Inference)

Per Token (FP16) 26.62 KB
Max Context FP32 1.62 GB
Max Context FP16 832.0 MB
Max Context INT8 416.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size262,144
Hidden Size1,152
FFN Intermediate Size6,912
Number of Layers26
Attention Heads4
Head Dimension256
KV Heads1

Context & Position

Max Context Length32,768
RoPE Base Frequency1,000,000
Sliding Window Size512
Layer Attention Types[26 items]
RoPE ScalingNot set

Attention Configuration

Tied EmbeddingsYes
Attention BiasNo
Attention Dropout0%
Query Pre-Attention Scalar256
Output SoftcappingNot set
Attention SoftcappingNot set

Activation & Normalization

RMSNorm Epsilon1e-06
Activation Functiongelu_pytorch_tanh

Special Tokens

BOS Token ID2
Pad Token ID0
EOS Token ID1

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding