openai/gpt-oss-120b

📊 Model Parameters

Total Parameters 116,829,156,672
Context Length 131,072
Hidden Size 2880
Layers 36
Attention Heads 64
KV Heads 8

💾 Memory Requirements

FP32 (Full) 435.22 GB
FP16 (Half) 217.61 GB
INT8 (Quantized) 108.81 GB
INT4 (Quantized) 54.40 GB

🔑 KV Cache (Inference)

Per Token (FP16) 73.73 KB
Max Context FP32 18.00 GB
Max Context FP16 9.00 GB
Max Context INT8 4.50 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size201,088
Hidden Size2,880
FFN Intermediate Size2,880
Number of Layers36
Attention Heads64
KV Heads8
Head Dimension64

Context & Position

Sliding Window Size128
RoPE Base Frequency150,000
RoPE Scalingyarn (factor: 32.0)
Layer Attention Types[36 items]
Max Context Length131,072

Attention Configuration

Attention Dropout0%
Attention BiasYes
Tied EmbeddingsNo

Mixture of Experts

Number of Experts128
Experts per Token4

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token IDNot set
Pad Token ID199,999
EOS Token ID200002

Data Type

Model DtypeNot set
Layer Types:
Attention
MLP/FFN
Normalization
Embedding