google/gemma-2-2b

📊 Model Parameters

Total Parameters 3,204,165,888
Context Length 8,192
Hidden Size 2304
Layers 26
Attention Heads 8
KV Heads 4

💾 Memory Requirements

FP32 (Full) 11.94 GB
FP16 (Half) 5.97 GB
INT8 (Quantized) 2.98 GB
INT4 (Quantized) 1.49 GB

🔑 KV Cache (Inference)

Per Token (FP16) 106.50 KB
Max Context FP32 1.62 GB
Max Context FP16 832.0 MB
Max Context INT8 416.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size256,000
Hidden Size2,304
FFN Intermediate Size9,216
Number of Layers26
Attention Heads8
Head Dimension256
KV Heads4

Context & Position

Max Context Length8,192
RoPE Base Frequency10000.0
Sliding Window Size4,096
Layer Attention Types[26 items]

Attention Configuration

Tied EmbeddingsYes
Attention BiasNo
Attention Dropout0%
Query Pre-Attention Scalar256
Output Softcapping30.0
Attention Softcapping50.0

Activation & Normalization

Activation Functiongelu_pytorch_tanh
RMSNorm Epsilon1e-06

Special Tokens

BOS Token ID2
Pad Token ID0
EOS Token ID1

Data Type

Model Dtypefloat32
Layer Types:
Attention
MLP/FFN
Normalization
Embedding