← All Models
|
Grok - xAI (Elon Musk's AI company):
grok-2
xai-org/grok-2
📊 Model Parameters
Total Parameters
54,859,911,680
Context Length
131,072
Hidden Size
8192
Layers
64
Attention Heads
64
KV Heads
8
💾 Memory Requirements
FP32 (Full)
204.37 GB
FP16 (Half)
102.18 GB
INT8 (Quantized)
51.09 GB
INT4 (Quantized)
25.55 GB
🔑 KV Cache (Inference)
Per Token (FP16)
262.14 KB
Max Context FP32
64.00 GB
Max Context FP16
32.00 GB
Max Context INT8
16.00 GB
⚙️ Model Configuration
Core Architecture
KV Heads
8
Head Dimension
128
Vocabulary Size
131,072
Hidden Size
8,192
Number of Layers
64
Attention Heads
64
FFN Intermediate Size
32,768
Context & Position
RoPE Base Frequency
208,533,496
Sliding Window Size
-1
Max Context Length
131,072
Attention Configuration
Tied Embeddings
No
Output Softcapping
50
Attention Softcapping
30.0
Attention Dropout
10.0%
Mixture of Experts
Expert FFN Size
16,384
Experts per Token
2
Number of Experts
8
Activation & Normalization
RMSNorm Epsilon
1e-05
Activation Function
gelu
Dropout (Training)
Hidden Dropout
10.0%
Special Tokens
BOS Token ID
101
Pad Token ID
0
EOS Token ID
102
Data Type
Model Dtype
bfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding
Attention
MLP
Norm
Embedding
Clear
Expand All
Collapse All