Qwen/Qwen3-Next-80B-A3B-Thinking

📊 Model Parameters

Total Parameters 79,674,391,296
Context Length 262,144
Hidden Size 2048
Layers 48
Attention Heads 16
KV Heads 2

💾 Memory Requirements

FP32 (Full) 296.81 GB
FP16 (Half) 148.41 GB
INT8 (Quantized) 74.20 GB
INT4 (Quantized) 37.10 GB

🔑 KV Cache (Inference)

Per Token (FP16) 98.30 KB
Max Context FP32 48.00 GB
Max Context FP16 24.00 GB
Max Context INT8 12.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size151,936
Hidden Size2,048
FFN Intermediate Size5,120
Number of Layers48
Attention Heads16
KV Heads2
Head Dimension256

Context & Position

Max Context Length262,144
Layer Attention Types[48 items]
Uses Sliding WindowNo

Attention Configuration

Tied EmbeddingsNo
Attention BiasNo
Attention Dropout0%

Mixture of Experts

MoE Layer Frequency1
Expert FFN Size512
Shared Expert FFN Size512
Experts per Token10
Number of Experts512
Normalize TopK ProbabilitiesYes

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-06

Special Tokens

Pad Token IDNot set
BOS Token ID151,643
EOS Token ID151645

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding