zai-org/GLM-4.7-Flash

📊 Model Parameters

Total Parameters 29,943,390,976
Context Length 202,752
Hidden Size 2048
Layers 47
Attention Heads 20
KV Heads 20

💾 Memory Requirements

FP32 (Full) 111.55 GB
FP16 (Half) 55.77 GB
INT8 (Quantized) 27.89 GB
INT4 (Quantized) 13.94 GB

🔑 KV Cache (Inference)

Per Token (FP16) 240.64 KB
Max Context FP32 90.88 GB
Max Context FP16 45.44 GB
Max Context INT8 22.72 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size154,880
Hidden Size2,048
FFN Intermediate Size10,240
Number of Layers47
Attention Heads20
Head Dimension64
KV Heads20

Context & Position

Max Context Length202,752

Attention Configuration

Attention BiasNo
Attention Dropout0%
Tied EmbeddingsNo

Multi-Head Latent Attention

KV LoRA Rank512
Query LoRA Rank768
QK RoPE Head Dimension64
Value Head Dimension256
QK Non-RoPE Head Dimension192

Mixture of Experts

Expert FFN Size1,536
Shared Experts1
Number of Experts64
Routing Scale Factor1.8
Expert Groups1
Groups per Token1
Experts per Token4
Normalize TopK ProbabilitiesYes
TopK Methodnoaux_tc
Dense Initial Layers1

Speculative Decoding

Next-N Prediction Layers1

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

Pad Token ID154,820
BOS Token ID0
EOS Token ID154820, 154827, 154829

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding