google/gemma-3-270m

📊 Model Parameters

Total Parameters 435,870,336
Context Length 32,768
Hidden Size 640
Layers 18
Attention Heads 4
KV Heads 1

💾 Memory Requirements

FP32 (Full) 1.62 GB
FP16 (Half) 831.4 MB
INT8 (Quantized) 415.7 MB
INT4 (Quantized) 207.8 MB

🔑 KV Cache (Inference)

Per Token (FP16) 18.43 KB
Max Context FP32 1.12 GB
Max Context FP16 576.0 MB
Max Context INT8 288.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size262,144
Hidden Size640
FFN Intermediate Size2,048
Number of Layers18
Attention Heads4
Head Dimension256
KV Heads1

Context & Position

Max Context Length32,768
RoPE Base Frequency1000000.0
Sliding Window Size512
Layer Attention Types[18 items]
RoPE ScalingNot set

Attention Configuration

Tied EmbeddingsYes
Attention BiasNo
Attention Dropout0%
Query Pre-Attention Scalar256
Output SoftcappingNot set
Attention SoftcappingNot set

Activation & Normalization

RMSNorm Epsilon1e-06
Activation Functiongelu_pytorch_tanh

Special Tokens

BOS Token ID2
Pad Token ID0
EOS Token ID1

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding