mistralai/Devstral-2-123B-Instruct-2512

📊 Model Parameters

Total Parameters 125,025,988,608
Context Length 262,144
Hidden Size 12288
Layers 88
Attention Heads 96
KV Heads 8

💾 Memory Requirements

FP32 (Full) 465.76 GB
FP16 (Half) 232.88 GB
INT8 (Quantized) 116.44 GB
INT4 (Quantized) 58.22 GB

🔑 KV Cache (Inference)

Per Token (FP16) 360.45 KB
Max Context FP32 176.00 GB
Max Context FP16 88.00 GB
Max Context INT8 44.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size131,072
Hidden Size12,288
FFN Intermediate Size28,672
Number of Layers88
Attention Heads96
Head Dimension128
KV Heads8

Context & Position

Max Context Length262,144
Sliding Window SizeNot set

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID1
Pad Token ID11
EOS Token ID2

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding