microsoft/Phi-4

📊 Model Parameters

Total Parameters 14,659,507,200
Context Length 16,384
Hidden Size 5120
Layers 40
Attention Heads 40
KV Heads 10

💾 Memory Requirements

FP32 (Full) 54.61 GB
FP16 (Half) 27.31 GB
INT8 (Quantized) 13.65 GB
INT4 (Quantized) 6.83 GB

🔑 KV Cache (Inference)

Per Token (FP16) 204.80 KB
Max Context FP32 6.25 GB
Max Context FP16 3.12 GB
Max Context INT8 1.56 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size100,352
Hidden Size5,120
FFN Intermediate Size17,920
Number of Layers40
Attention Heads40
KV Heads10

Context & Position

Max Context Length16,384
RoPE Base Frequency250,000
RoPE ScalingNot set
Sliding Window SizeNot set

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo
Attention BiasNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Dropout (Training)

Residual Dropout0%
Embedding Dropout0%

Special Tokens

BOS Token ID100,257
Pad Token ID100,349
EOS Token ID100265

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding