Model Architecture: Qwen/Qwen1.5-14B

📊 Model Parameters

Total Parameters 14,167,290,880

Context Length 32,768

Hidden Size 5120

Layers 40

Attention Heads 40

KV Heads 40

FP32 (Full) 52.78 GB

FP16 (Half) 26.39 GB

INT8 (Quantized) 13.19 GB

INT4 (Quantized) 6.60 GB

Per Token (FP16) 819.20 KB

Max Context FP32 50.00 GB

Max Context FP16 25.00 GB

Max Context INT8 12.50 GB

Vocabulary Size152,064

Hidden Size5,120

FFN Intermediate Size13,696

Number of Layers40

Attention Heads40

KV Heads40

Max Context Length32,768

Uses Sliding WindowNo

Sliding Window SizeNot set

Window Attention Layers35

Layer Attention Types[40 items]

Attention Dropout0%

Tied EmbeddingsNo

Activation Functionsilu

RMSNorm Epsilon1e-06

Pad Token IDNot set

BOS Token ID151,643

EOS Token ID151643

Model Dtypebfloat16

Layer Types:

Attention

MLP/FFN

Normalization

Embedding