mistralai/Mistral-Large-Instruct-2411

📊 Model Parameters

Total Parameters 122,610,069,504
Context Length 131,072
Hidden Size 12288
Layers 88
Attention Heads 96
KV Heads 8

💾 Memory Requirements

FP32 (Full) 456.76 GB
FP16 (Half) 228.38 GB
INT8 (Quantized) 114.19 GB
INT4 (Quantized) 57.09 GB

🔑 KV Cache (Inference)

Per Token (FP16) 360.45 KB
Max Context FP32 88.00 GB
Max Context FP16 44.00 GB
Max Context INT8 22.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size32,768
Hidden Size12,288
FFN Intermediate Size28,672
Number of Layers88
Attention Heads96
Head Dimension128
KV Heads8

Context & Position

Max Context Length131,072
Sliding Window SizeNot set
RoPE Base Frequency1000000.0

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID1
Pad Token IDNot set
EOS Token ID2

Data Type

Model DtypeNot set
Layer Types:
Attention
MLP/FFN
Normalization
Embedding