EleutherAI/gpt-neox-20b

📊 Model Parameters

Total Parameters 20,554,567,680
Context Length 2,048
Hidden Size 6144
Layers 44
Attention Heads 64
KV Heads 64

💾 Memory Requirements

FP32 (Full) 76.57 GB
FP16 (Half) 38.29 GB
INT8 (Quantized) 19.14 GB
INT4 (Quantized) 9.57 GB

🔑 KV Cache (Inference)

Per Token (FP16) 1.08 MB
Max Context FP32 4.12 GB
Max Context FP16 2.06 GB
Max Context INT8 1.03 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size50,432
Hidden Size6,144
Number of Layers44
Attention Heads64
FFN Intermediate Size24,576

Context & Position

Max Context Length2,048
RoPE Base Frequency10,000
RoPE ScalingNot set

Attention Configuration

Tied EmbeddingsNo
Attention Dropout0%
Attention BiasYes

Activation & Normalization

Activation Functiongelu_fast
RMSNorm Epsilon1e-05

Dropout (Training)

Hidden Dropout0%

Special Tokens

BOS Token ID0
Pad Token IDNot set
EOS Token ID0

Data Type

Model Dtypefloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding