Model Architecture: mistralai/Mistral-Large-Instruct-2411

📊 Model Parameters

Total Parameters 122,610,069,504

Context Length 131,072

Hidden Size 12288

Layers 88

Attention Heads 96

KV Heads 8

FP32 (Full) 456.76 GB

FP16 (Half) 228.38 GB

INT8 (Quantized) 114.19 GB

INT4 (Quantized) 57.09 GB

Per Token (FP16) 360.45 KB

Max Context FP32 88.00 GB

Max Context FP16 44.00 GB

Max Context INT8 22.00 GB

Vocabulary Size32,768

Hidden Size12,288

FFN Intermediate Size28,672

Number of Layers88

Attention Heads96

Head Dimension128

KV Heads8

Max Context Length131,072

Sliding Window SizeNot set

Attention Dropout0%

Tied EmbeddingsNo

Activation Functionsilu

RMSNorm Epsilon1e-05

Pad Token IDNot set

BOS Token ID1

EOS Token ID2

Model DtypeNot set

Layer Types:

Attention

MLP/FFN

Normalization

Embedding