Model Architecture: Qwen/Qwen1.5-0.5B

📊 Model Parameters

Total Parameters 619,570,176

Context Length 32,768

Hidden Size 1024

Layers 24

Attention Heads 16

KV Heads 16

FP32 (Full) 2.31 GB

FP16 (Half) 1.15 GB

INT8 (Quantized) 590.9 MB

INT4 (Quantized) 295.4 MB

Per Token (FP16) 98.30 KB

Max Context FP32 6.00 GB

Max Context FP16 3.00 GB

Max Context INT8 1.50 GB

Vocabulary Size151,936

Hidden Size1,024

FFN Intermediate Size2,816

Number of Layers24

Attention Heads16

KV Heads16

Max Context Length32,768

Uses Sliding WindowNo

Sliding Window SizeNot set

Window Attention Layers21

Layer Attention Types[24 items]

Attention Dropout0%

Tied EmbeddingsYes

Activation Functionsilu

RMSNorm Epsilon1e-06

Pad Token IDNot set

BOS Token ID151,643

EOS Token ID151643

Model Dtypebfloat16

Layer Types:

Attention

MLP/FFN

Normalization

Embedding