zai-org/GLM-4.6

📊 Model Parameters

Total Parameters 352,797,814,784
Context Length 202,752
Hidden Size 5120
Layers 92
Attention Heads 96
KV Heads 8

💾 Memory Requirements

FP32 (Full) 1314.27 GB
FP16 (Half) 657.14 GB
INT8 (Quantized) 328.57 GB
INT4 (Quantized) 164.28 GB

🔑 KV Cache (Inference)

Per Token (FP16) 376.83 KB
Max Context FP32 142.31 GB
Max Context FP16 71.16 GB
Max Context INT8 35.58 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size151,552
Hidden Size5,120
FFN Intermediate Size12,288
Number of Layers92
Attention Heads96
KV Heads8
Head Dimension128

Context & Position

Max Context Length202,752
RoPE Base Frequency1,000,000
RoPE ScalingNot set

Attention Configuration

Attention BiasYes
Attention Dropout0%
Tied EmbeddingsNo

Mixture of Experts

Expert FFN Size1,536
Experts per Token8
Expert Groups1
Groups per Token1
Shared Experts1
Number of Experts160
Routing Scale Factor2.5
Dense Initial Layers3
Normalize TopK ProbabilitiesYes

Speculative Decoding

Next-N Prediction Layers1

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token IDNot set
Pad Token ID151,329
EOS Token ID151329, 151336, 151338

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding