← All Models
|
Mistral - French AI startup known for efficient open models (founded 2023):
Mistral-7B-v0.1
Mistral-Nemo-Base-2407
Codestral-22B-v0.1
Mistral-Small-24B-Base-2501
Ministral-8B-Instruct-2410
Mistral-Large-Instruct-2411
Mixtral-8x7B-v0.1
Mixtral-8x22B-v0.1
Mamba-Codestral-7B-v0.1
Pixtral-12B-Base-2409
Voxtral-Mini-3B-2507
Voxtral-Small-24B-2507
Mistral-Small-3.2-24B-Instruct-2506
Ministral-3-3B-Base-2512
Ministral-3-8B-Base-2512
Ministral-3-14B-Base-2512
Mistral-Large-3-675B-Base-2512
Devstral-Small-2-24B-Instruct-2512
Devstral-2-123B-Instruct-2512
mistralai/Mixtral-8x22B-v0.1
📊 Model Parameters
Total Parameters
140,620,634,112
Context Length
65,536
Hidden Size
6144
Layers
56
Attention Heads
48
KV Heads
8
💾 Memory Requirements
FP32 (Full)
523.85 GB
FP16 (Half)
261.93 GB
INT8 (Quantized)
130.96 GB
INT4 (Quantized)
65.48 GB
🔑 KV Cache (Inference)
Per Token (FP16)
0 B
Max Context FP32
0.0 MB
Max Context FP16
0.0 MB
Max Context INT8
0.0 MB
⚙️ Model Configuration
Core Architecture
Vocabulary Size
32,000
Hidden Size
6,144
FFN Intermediate Size
16,384
Number of Layers
56
Attention Heads
48
KV Heads
8
Head Dimension
Not set
Context & Position
Max Context Length
65,536
Sliding Window Size
Not set
RoPE Base Frequency
1,000,000
Attention Configuration
Attention Dropout
0%
Tied Embeddings
No
Mixture of Experts
Experts per Token
2
Number of Experts
8
Activation & Normalization
Activation Function
silu
RMSNorm Epsilon
1e-05
Special Tokens
BOS Token ID
1
Pad Token ID
Not set
EOS Token ID
2
Data Type
Model Dtype
bfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding
Attention
MLP
Norm
Embedding
Clear
Expand All
Collapse All