xai-org/grok-2

📊 Model Parameters

Total Parameters 54,859,911,680
Context Length 131,072
Hidden Size 8192
Layers 64
Attention Heads 64
KV Heads 8

💾 Memory Requirements

FP32 (Full) 204.37 GB
FP16 (Half) 102.18 GB
INT8 (Quantized) 51.09 GB
INT4 (Quantized) 25.55 GB

🔑 KV Cache (Inference)

Per Token (FP16) 262.14 KB
Max Context FP32 64.00 GB
Max Context FP16 32.00 GB
Max Context INT8 16.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size131,072
Hidden Size8,192
Number of Layers64
Attention Heads64
FFN Intermediate Size32,768
KV Heads8
Head Dimension128

Context & Position

Max Context Length131,072
RoPE Base Frequency208,533,496
Sliding Window Size-1

Attention Configuration

Attention Dropout10.0%
Tied EmbeddingsNo
Output Softcapping50
Attention Softcapping30.0

Mixture of Experts

Expert FFN Size16,384
Experts per Token2
Number of Experts8

Activation & Normalization

Activation Functiongelu
RMSNorm Epsilon1e-12

Dropout (Training)

Hidden Dropout10.0%

Special Tokens

BOS Token ID101
EOS Token ID102
Pad Token ID0

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding