microsoft/phi-2

📊 Model Parameters

Total Parameters 2,779,683,840
Context Length 2,048
Hidden Size 2560
Layers 32
Attention Heads 32
KV Heads 32

💾 Memory Requirements

FP32 (Full) 10.36 GB
FP16 (Half) 5.18 GB
INT8 (Quantized) 2.59 GB
INT4 (Quantized) 1.29 GB

🔑 KV Cache (Inference)

Per Token (FP16) 327.68 KB
Max Context FP32 1.25 GB
Max Context FP16 640.0 MB
Max Context INT8 320.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size51,200
Hidden Size2,560
FFN Intermediate Size10,240
Number of Layers32
Attention Heads32
KV Heads32

Context & Position

Max Context Length2,048
RoPE Base Frequency10000.0
RoPE ScalingNot set

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo

Activation & Normalization

Activation Functiongelu_new
RMSNorm Epsilon1e-05

Dropout (Training)

Residual Dropout10.0%
Embedding Dropout0%

Special Tokens

BOS Token ID50,256
Pad Token IDNot set
EOS Token ID50256

Data Type

Model Dtypefloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding