Model Architecture: Qwen/Qwen1.5-72B

📊 Model Parameters

Total Parameters 72,287,920,128

Context Length 32,768

Hidden Size 8192

Layers 80

Attention Heads 64

KV Heads 64

FP32 (Full) 269.29 GB

FP16 (Half) 134.65 GB

INT8 (Quantized) 67.32 GB

INT4 (Quantized) 33.66 GB

Per Token (FP16) 2.62 MB

Max Context FP32 160.00 GB

Max Context FP16 80.00 GB

Max Context INT8 40.00 GB

Vocabulary Size152,064

Hidden Size8,192

FFN Intermediate Size24,576

Number of Layers80

Attention Heads64

KV Heads64

Max Context Length32,768

Uses Sliding WindowNo

Sliding Window SizeNot set

Window Attention Layers28

Layer Attention Types[80 items]

Attention Dropout0%

Tied EmbeddingsNo

Activation Functionsilu

RMSNorm Epsilon1e-05

Pad Token IDNot set

BOS Token ID151,643

EOS Token ID151643

Model Dtypebfloat16

Layer Types:

Attention

MLP/FFN

Normalization

Embedding