Model Architecture: mistralai/Ministral-8B-Instruct-2410

📊 Model Parameters

Total Parameters 8,019,808,256

Context Length 32,768

Hidden Size 4096

Layers 36

Attention Heads 32

KV Heads 8

FP32 (Full) 29.88 GB

FP16 (Half) 14.94 GB

INT8 (Quantized) 7.47 GB

INT4 (Quantized) 3.73 GB

Per Token (FP16) 147.46 KB

Max Context FP32 9.00 GB

Max Context FP16 4.50 GB

Max Context INT8 2.25 GB

Vocabulary Size131,072

Hidden Size4,096

FFN Intermediate Size12,288

Number of Layers36

Attention Heads32

Head Dimension128

KV Heads8

Max Context Length32,768

Sliding Window Size32,768

Layer Attention Types[36 items]

Tied EmbeddingsNo

Attention Dropout0%

Activation Functionsilu

RMSNorm Epsilon1e-05

Pad Token IDNot set

BOS Token ID1

EOS Token ID2

Model Dtypebfloat16

Layer Types:

Attention

MLP/FFN

Normalization

Embedding