meta-llama/Meta-Llama-3-70B

📊 Model Parameters

Total Parameters 70,553,706,496
Context Length 8,192
Hidden Size 8192
Layers 80
Attention Heads 64
KV Heads 8

💾 Memory Requirements

FP32 (Full) 262.83 GB
FP16 (Half) 131.42 GB
INT8 (Quantized) 65.71 GB
INT4 (Quantized) 32.85 GB

🔑 KV Cache (Inference)

Per Token (FP16) 327.68 KB
Max Context FP32 5.00 GB
Max Context FP16 2.50 GB
Max Context INT8 1.25 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size128,256
Hidden Size8,192
FFN Intermediate Size28,672
Number of Layers80
Attention Heads64
KV Heads8
Head Dimension128

Context & Position

Max Context Length8,192
RoPE Base Frequency500000.0
RoPE ScalingNot set

Attention Configuration

Attention BiasNo
Attention Dropout0%
MLP BiasNo
Tied EmbeddingsNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID128,000
Pad Token IDNot set
EOS Token ID128001

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding