meta-llama/Meta-Llama-3-8B

📊 Model Parameters

Total Parameters 8,030,261,248
Context Length 8,192
Hidden Size 4096
Layers 32
Attention Heads 32
KV Heads 8

💾 Memory Requirements

FP32 (Full) 29.92 GB
FP16 (Half) 14.96 GB
INT8 (Quantized) 7.48 GB
INT4 (Quantized) 3.74 GB

🔑 KV Cache (Inference)

Per Token (FP16) 131.07 KB
Max Context FP32 2.00 GB
Max Context FP16 1.00 GB
Max Context INT8 512.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size128,256
Hidden Size4,096
FFN Intermediate Size14,336
Number of Layers32
Attention Heads32
KV Heads8
Head Dimension128

Context & Position

Max Context Length8,192
RoPE Base Frequency500000.0
RoPE ScalingNot set

Attention Configuration

Attention BiasNo
Attention Dropout0%
MLP BiasNo
Tied EmbeddingsNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID128,000
Pad Token IDNot set
EOS Token ID128001

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding