Model Architecture: Qwen/Qwen1.5-7B

📊 Model Parameters

Total Parameters 7,721,324,544

Context Length 32,768

Hidden Size 4096

Layers 32

Attention Heads 32

KV Heads 32

FP32 (Full) 28.76 GB

FP16 (Half) 14.38 GB

INT8 (Quantized) 7.19 GB

INT4 (Quantized) 3.60 GB

Per Token (FP16) 524.29 KB

Max Context FP32 32.00 GB

Max Context FP16 16.00 GB

Max Context INT8 8.00 GB

Vocabulary Size151,936

Hidden Size4,096

FFN Intermediate Size11,008

Number of Layers32

Attention Heads32

KV Heads32

Max Context Length32,768

Uses Sliding WindowNo

Sliding Window SizeNot set

Window Attention Layers28

Layer Attention Types[32 items]

Attention Dropout0%

Tied EmbeddingsNo

Activation Functionsilu

RMSNorm Epsilon1e-06

Pad Token IDNot set

BOS Token ID151,643

EOS Token ID151643

Model Dtypebfloat16

Layer Types:

Attention

MLP/FFN

Normalization

Embedding