zai-org/GLM-4.5

📊 Model Parameters

Total Parameters 352,797,814,784
Context Length 131,072
Hidden Size 5120
Layers 92
Attention Heads 96
KV Heads 8

💾 Memory Requirements

FP32 (Full) 1314.27 GB
FP16 (Half) 657.14 GB
INT8 (Quantized) 328.57 GB
INT4 (Quantized) 164.28 GB

🔑 KV Cache (Inference)

Per Token (FP16) 376.83 KB
Max Context FP32 92.00 GB
Max Context FP16 46.00 GB
Max Context INT8 23.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size151,552
Hidden Size5,120
FFN Intermediate Size12,288
Number of Layers92
Attention Heads96
KV Heads8
Head Dimension128

Context & Position

Max Context Length131,072
RoPE Base Frequency1,000,000
RoPE ScalingNot set

Attention Configuration

Attention BiasYes
Attention Dropout0%
Tied EmbeddingsNo

Mixture of Experts

Expert FFN Size1,536
Experts per Token8
Expert Groups1
Groups per Token1
Shared Experts1
Number of Experts160
Routing Scale Factor2.5
Dense Initial Layers3
Normalize TopK ProbabilitiesYes

Speculative Decoding

Next-N Prediction Layers1

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token IDNot set
Pad Token ID151,329
EOS Token ID151329, 151336, 151338

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding