meta-llama/Llama-3.1-405B

📊 Model Parameters

Total Parameters 405,853,388,800
Context Length 131,072
Hidden Size 16384
Layers 126
Attention Heads 128
KV Heads 8

💾 Memory Requirements

FP32 (Full) 1511.92 GB
FP16 (Half) 755.96 GB
INT8 (Quantized) 377.98 GB
INT4 (Quantized) 188.99 GB

🔑 KV Cache (Inference)

Per Token (FP16) 516.10 KB
Max Context FP32 126.00 GB
Max Context FP16 63.00 GB
Max Context INT8 31.50 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size128,256
Hidden Size16,384
FFN Intermediate Size53,248
Number of Layers126
Attention Heads128
KV Heads8
Head Dimension128

Context & Position

Max Context Length131,072
RoPE Base Frequency500000.0
RoPE Scalingllama3 (factor: 8.0)

Attention Configuration

Attention BiasNo
Attention Dropout0%
MLP BiasNo
Tied EmbeddingsNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID128,000
Pad Token IDNot set
EOS Token ID128001

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding