xai-org/grok-2

📊 Model Parameters

Total Parameters 54,859,911,680
Context Length 131,072
Hidden Size 8192
Layers 64
Attention Heads 64
KV Heads 8

💾 Memory Requirements

FP32 (Full) 204.37 GB
FP16 (Half) 102.18 GB
INT8 (Quantized) 51.09 GB
INT4 (Quantized) 25.55 GB

🔑 KV Cache (Inference)

Per Token (FP16) 262.14 KB
Max Context FP32 64.00 GB
Max Context FP16 32.00 GB
Max Context INT8 16.00 GB

⚙️ Model Configuration

Core Architecture

KV Heads8
Head Dimension128
Vocabulary Size131,072
Hidden Size8,192
Number of Layers64
Attention Heads64
FFN Intermediate Size32,768

Context & Position

RoPE Base Frequency208,533,496
Sliding Window Size-1
Max Context Length131,072

Attention Configuration

Tied EmbeddingsNo
Output Softcapping50
Attention Softcapping30.0
Attention Dropout10.0%

Mixture of Experts

Expert FFN Size16,384
Experts per Token2
Number of Experts8

Activation & Normalization

RMSNorm Epsilon1e-05
Activation Functiongelu

Dropout (Training)

Hidden Dropout10.0%

Special Tokens

BOS Token ID101
Pad Token ID0
EOS Token ID102

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding