Model Architecture: mistralai/Mistral-Large-3-675B-Base-2512

📊 Model Parameters

Total Parameters 722,158,138,880

Context Length 131,072

Hidden Size 4096

Layers 32

Attention Heads 32

KV Heads 8

FP32 (Full) 2690.25 GB

FP16 (Half) 1345.12 GB

INT8 (Quantized) 672.56 GB

INT4 (Quantized) 336.28 GB

Per Token (FP16) 131.07 KB

Max Context FP32 32.00 GB

Max Context FP16 16.00 GB

Max Context INT8 8.00 GB

Vocabulary Size131,072

Hidden Size7,168

FFN Intermediate Size14,336

Number of Layers61

Attention Heads32

Head Dimension128

KV Heads8

Max Context Length131,072

Sliding Window Size4,096

Attention Dropout0%

Tied EmbeddingsNo

Activation Functionsilu

RMSNorm Epsilon1e-06

Pad Token IDNot set

BOS Token ID1

EOS Token ID2

Model DtypeNot set

Layer Types:

Attention

MLP/FFN

Normalization

Embedding