google/gemma-2-9b

📊 Model Parameters

Total Parameters 10,159,209,984
Context Length 8,192
Hidden Size 3584
Layers 42
Attention Heads 16
KV Heads 8

💾 Memory Requirements

FP32 (Full) 37.85 GB
FP16 (Half) 18.92 GB
INT8 (Quantized) 9.46 GB
INT4 (Quantized) 4.73 GB

🔑 KV Cache (Inference)

Per Token (FP16) 344.06 KB
Max Context FP32 5.25 GB
Max Context FP16 2.62 GB
Max Context INT8 1.31 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size256,000
Hidden Size3,584
FFN Intermediate Size14,336
Number of Layers42
Attention Heads16
Head Dimension256
KV Heads8

Context & Position

Sliding Window Size4,096
Max Context Length8,192
RoPE Base Frequency10000.0
Layer Attention Types[42 items]

Attention Configuration

Tied EmbeddingsYes
Attention BiasNo
Attention Dropout0%
Query Pre-Attention Scalar256
Output Softcapping30.0
Attention Softcapping50.0

Activation & Normalization

Activation Functiongelu_pytorch_tanh
RMSNorm Epsilon1e-06

Special Tokens

BOS Token ID2
Pad Token ID0
EOS Token ID1

Data Type

Model Dtypefloat32
Layer Types:
Attention
MLP/FFN
Normalization
Embedding