← All Models
|
Allen AI OLMo - Fully open language models with training data/code:
OLMo-2-1124-7B
OLMo-2-1124-13B
Olmo-3-1025-7B
Olmo-3-1125-32B
allenai/Olmo-3-1125-32B
📊 Model Parameters
Total Parameters
32,233,522,176
Context Length
65,536
Hidden Size
5120
Layers
64
Attention Heads
40
KV Heads
8
💾 Memory Requirements
FP32 (Full)
120.08 GB
FP16 (Half)
60.04 GB
INT8 (Quantized)
30.02 GB
INT4 (Quantized)
15.01 GB
🔑 KV Cache (Inference)
Per Token (FP16)
262.14 KB
Max Context FP32
32.00 GB
Max Context FP16
16.00 GB
Max Context INT8
8.00 GB
⚙️ Model Configuration
Core Architecture
Vocabulary Size
100,278
Hidden Size
5,120
FFN Intermediate Size
27,648
Number of Layers
64
Attention Heads
40
KV Heads
8
Context & Position
Max Context Length
65,536
RoPE Base Frequency
500,000
RoPE Scaling
yarn (factor: 8.0)
Sliding Window Size
4,096
Layer Attention Types
[64 items]
Attention Configuration
Tied Embeddings
No
Attention Bias
No
Attention Dropout
0%
Activation & Normalization
Activation Function
silu
RMSNorm Epsilon
1e-06
Special Tokens
BOS Token ID
Not set
Pad Token ID
100,277
EOS Token ID
100257
Data Type
Model Dtype
bfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding
Attention
MLP
Norm
Embedding
Clear
Expand All
Collapse All