Model Architecture: Qwen/Qwen2.5-32B

📊 Model Parameters

Total Parameters 32,763,876,352

Context Length 131,072

Hidden Size 5120

Layers 64

Attention Heads 40

KV Heads 8

FP32 (Full) 122.05 GB

FP16 (Half) 61.03 GB

INT8 (Quantized) 30.51 GB

INT4 (Quantized) 15.26 GB

Per Token (FP16) 262.14 KB

Max Context FP32 64.00 GB

Max Context FP16 32.00 GB

Max Context INT8 16.00 GB

Vocabulary Size152,064

Hidden Size5,120

FFN Intermediate Size27,648

Number of Layers64

Attention Heads40

KV Heads8

Max Context Length131,072

Uses Sliding WindowNo

Sliding Window SizeNot set

Window Attention Layers64

Layer Attention Types[64 items]

Attention Dropout0%

Tied EmbeddingsNo

Activation Functionsilu

RMSNorm Epsilon1e-05

Pad Token IDNot set

BOS Token ID151,643

EOS Token ID151643

Model Dtypebfloat16

Layer Types:

Attention

MLP/FFN

Normalization

Embedding