Model Architecture: Qwen/Qwen2-0.5B

📊 Model Parameters

Total Parameters 630,167,424

Context Length 131,072

Hidden Size 896

Layers 24

Attention Heads 14

KV Heads 2

FP32 (Full) 2.35 GB

FP16 (Half) 1.17 GB

INT8 (Quantized) 601.0 MB

INT4 (Quantized) 300.5 MB

Per Token (FP16) 12.29 KB

Max Context FP32 3.00 GB

Max Context FP16 1.50 GB

Max Context INT8 768.0 MB

Vocabulary Size151,936

Hidden Size896

FFN Intermediate Size4,864

Number of Layers24

Attention Heads14

KV Heads2

Max Context Length131,072

Uses Sliding WindowNo

Sliding Window SizeNot set

Window Attention Layers24

Layer Attention Types[24 items]

Attention Dropout0%

Tied EmbeddingsYes

Activation Functionsilu

RMSNorm Epsilon1e-06

Pad Token IDNot set

BOS Token ID151,643

EOS Token ID151643

Model Dtypebfloat16

Layer Types:

Attention

MLP/FFN

Normalization

Embedding