microsoft/Phi-3-mini-4k-instruct

📊 Model Parameters

Total Parameters 3,821,079,552
Context Length 4,096
Hidden Size 3072
Layers 32
Attention Heads 32
KV Heads 32

💾 Memory Requirements

FP32 (Full) 14.23 GB
FP16 (Half) 7.12 GB
INT8 (Quantized) 3.56 GB
INT4 (Quantized) 1.78 GB

🔑 KV Cache (Inference)

Per Token (FP16) 393.22 KB
Max Context FP32 3.00 GB
Max Context FP16 1.50 GB
Max Context INT8 768.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size32,064
Hidden Size3,072
FFN Intermediate Size8,192
Number of Layers32
Attention Heads32
KV Heads32

Context & Position

Max Context Length4,096
RoPE Base Frequency10000.0
RoPE ScalingNot set
Sliding Window Size2,047

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo
Attention BiasNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Dropout (Training)

Residual Dropout0%
Embedding Dropout0%

Special Tokens

BOS Token ID1
Pad Token ID32,000
EOS Token ID32000

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding