deepseek-ai/DeepSeek-V2

📊 Model Parameters

Total Parameters 235,741,434,880
Context Length 163,840
Hidden Size 5120
Layers 60
Attention Heads 128
KV Heads 128

💾 Memory Requirements

FP32 (Full) 878.21 GB
FP16 (Half) 439.10 GB
INT8 (Quantized) 219.55 GB
INT4 (Quantized) 109.78 GB

🔑 KV Cache (Inference)

Per Token (FP16) 1.97 MB
Max Context FP32 600.00 GB
Max Context FP16 300.00 GB
Max Context INT8 150.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size102,400
Hidden Size5,120
FFN Intermediate Size12,288
Number of Layers60
Attention Heads128
KV Heads128
Head Dimension64

Context & Position

Max Context Length163,840

Attention Configuration

Attention BiasNo
Attention Dropout0%
MLP BiasNo
Tied EmbeddingsNo

Multi-Head Latent Attention

KV LoRA Rank512
Query LoRA Rank1,536
QK Non-RoPE Head Dimension128
QK RoPE Head Dimension64
Value Head Dimension128

Mixture of Experts

Dense Initial Layers1
Expert Groups8
Number of Experts160
Shared Experts2
Routing Scale Factor16.0
Groups per Token3
TopK Methodgroup_limited_greedy
Normalize TopK ProbabilitiesNo
Experts per Token6
Expert FFN Size1,536
MoE Layer Frequency1
Router Scoring Functionsoftmax

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-06

Special Tokens

Pad Token IDNot set
BOS Token ID100,000
EOS Token ID100001

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding