openai/gpt-oss-20b

📊 Model Parameters

Total Parameters 20,914,757,184
Context Length 131,072
Hidden Size 2880
Layers 24
Attention Heads 64
KV Heads 8

💾 Memory Requirements

FP32 (Full) 77.91 GB
FP16 (Half) 38.96 GB
INT8 (Quantized) 19.48 GB
INT4 (Quantized) 9.74 GB

🔑 KV Cache (Inference)

Per Token (FP16) 49.15 KB
Max Context FP32 12.00 GB
Max Context FP16 6.00 GB
Max Context INT8 3.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size201,088
Hidden Size2,880
FFN Intermediate Size2,880
Number of Layers24
Attention Heads64
KV Heads8
Head Dimension64

Context & Position

Sliding Window Size128
RoPE Base Frequency150,000
RoPE Scalingyarn (factor: 32.0)
Layer Attention Types[24 items]
Max Context Length131,072

Attention Configuration

Attention Dropout0%
Attention BiasYes
Tied EmbeddingsNo

Mixture of Experts

Number of Experts32
Experts per Token4

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token IDNot set
Pad Token ID199,999
EOS Token ID200002

Data Type

Model DtypeNot set
Layer Types:
Attention
MLP/FFN
Normalization
Embedding