moonshotai/Kimi-K2-Thinking

📊 Model Parameters

Total Parameters 1,026,408,232,448
Context Length 262,144
Hidden Size 7168
Layers 61
Attention Heads 64
KV Heads 64

💾 Memory Requirements

FP32 (Full) 3823.67 GB
FP16 (Half) 1911.83 GB
INT8 (Quantized) 955.92 GB
INT4 (Quantized) 477.96 GB

🔑 KV Cache (Inference)

Per Token (FP16) 1.75 MB
Max Context FP32 854.00 GB
Max Context FP16 427.00 GB
Max Context INT8 213.50 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size163,840
Hidden Size7,168
FFN Intermediate Size18,432
Number of Layers61
Attention Heads64
KV Heads64

Context & Position

Max Context Length262,144
RoPE Base Frequency50000.0
RoPE Scaling{...} (7 fields)

Attention Configuration

Attention BiasNo
Attention Dropout0%
Tied EmbeddingsNo

Multi-Head Latent Attention

KV LoRA Rank512
Query LoRA Rank1,536
QK RoPE Head Dimension64
Value Head Dimension128
QK Non-RoPE Head Dimension128

Mixture of Experts

Expert FFN Size2,048
Shared Experts1
Number of Experts384
Routing Scale Factor2.827
TopK Methodnoaux_tc
Expert Groups1
Groups per Token1
Experts per Token8
MoE Layer Frequency1
Dense Initial Layers1
Normalize TopK ProbabilitiesYes
Router Scoring Functionsigmoid

Speculative Decoding

Next-N Prediction Layers0

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID163,584
Pad Token ID163,839
EOS Token ID163586

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding