mistralai/Mistral-Nemo-Base-2407

📊 Model Parameters

Total Parameters 12,247,782,400
Context Length 131,072
Hidden Size 5120
Layers 40
Attention Heads 32
KV Heads 8

💾 Memory Requirements

FP32 (Full) 45.63 GB
FP16 (Half) 22.81 GB
INT8 (Quantized) 11.41 GB
INT4 (Quantized) 5.70 GB

🔑 KV Cache (Inference)

Per Token (FP16) 163.84 KB
Max Context FP32 40.00 GB
Max Context FP16 20.00 GB
Max Context INT8 10.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size131,072
Hidden Size5,120
FFN Intermediate Size14,336
Number of Layers40
Attention Heads32
Head Dimension128
KV Heads8

Context & Position

Max Context Length131,072
Sliding Window SizeNot set
RoPE Base Frequency1000000.0

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID1
Pad Token IDNot set
EOS Token ID2

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding