MiniMaxAI/MiniMax-M2.1

📊 Model Parameters

Total Parameters 228,689,748,992
Context Length 196,608
Hidden Size 3072
Layers 62
Attention Heads 48
KV Heads 8

💾 Memory Requirements

FP32 (Full) 851.94 GB
FP16 (Half) 425.97 GB
INT8 (Quantized) 212.98 GB
INT4 (Quantized) 106.49 GB

🔑 KV Cache (Inference)

Per Token (FP16) 253.95 KB
Max Context FP32 93.00 GB
Max Context FP16 46.50 GB
Max Context INT8 23.25 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size200,064
Hidden Size3,072
FFN Intermediate Size1,536
Number of Layers62
Attention Heads48
KV Heads8
Head Dimension128

Context & Position

Max Context Length196,608

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo

Mixture of Experts

Experts per Token8
Number of Experts256
Router Scoring Functionsigmoid

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-06

Special Tokens

Pad Token IDNot set
BOS Token ID200,034
EOS Token ID200020

Data Type

Model DtypeNot set
Layer Types:
Attention
MLP/FFN
Normalization
Embedding