zai-org/GLM-4.5-Air

📊 Model Parameters

Total Parameters 106,852,245,504
Context Length 131,072
Hidden Size 4096
Layers 46
Attention Heads 96
KV Heads 8

💾 Memory Requirements

FP32 (Full) 398.06 GB
FP16 (Half) 199.03 GB
INT8 (Quantized) 99.51 GB
INT4 (Quantized) 49.76 GB

🔑 KV Cache (Inference)

Per Token (FP16) 188.42 KB
Max Context FP32 46.00 GB
Max Context FP16 23.00 GB
Max Context INT8 11.50 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size151,552
Hidden Size4,096
FFN Intermediate Size10,944
Number of Layers46
Attention Heads96
KV Heads8
Head Dimension128

Context & Position

Max Context Length131,072
RoPE Base Frequency1,000,000
RoPE ScalingNot set

Attention Configuration

Attention BiasYes
Attention Dropout0%
Tied EmbeddingsNo

Mixture of Experts

Expert FFN Size1,408
Experts per Token8
Expert Groups1
Groups per Token1
Shared Experts1
Number of Experts128
Routing Scale Factor1.0
Dense Initial Layers1
Normalize TopK ProbabilitiesYes

Speculative Decoding

Next-N Prediction Layers1

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token IDNot set
Pad Token ID151,329
EOS Token ID151329, 151336, 151338

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding