allenai/Olmo-3-1125-32B

📊 Model Parameters

Total Parameters 32,233,522,176
Context Length 65,536
Hidden Size 5120
Layers 64
Attention Heads 40
KV Heads 8

💾 Memory Requirements

FP32 (Full) 120.08 GB
FP16 (Half) 60.04 GB
INT8 (Quantized) 30.02 GB
INT4 (Quantized) 15.01 GB

🔑 KV Cache (Inference)

Per Token (FP16) 262.14 KB
Max Context FP32 32.00 GB
Max Context FP16 16.00 GB
Max Context INT8 8.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size100,278
Hidden Size5,120
FFN Intermediate Size27,648
Number of Layers64
Attention Heads40
KV Heads8

Context & Position

Max Context Length65,536
RoPE Base Frequency500,000
RoPE Scalingyarn (factor: 8.0)
Sliding Window Size4,096
Layer Attention Types[64 items]

Attention Configuration

Tied EmbeddingsNo
Attention BiasNo
Attention Dropout0%

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-06

Special Tokens

BOS Token IDNot set
Pad Token ID100,277
EOS Token ID100257

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding