microsoft/phi-1

📊 Model Parameters

Total Parameters 1,418,270,720
Context Length 2,048
Hidden Size 2048
Layers 24
Attention Heads 32
KV Heads 32

💾 Memory Requirements

FP32 (Full) 5.28 GB
FP16 (Half) 2.64 GB
INT8 (Quantized) 1.32 GB
INT4 (Quantized) 676.3 MB

🔑 KV Cache (Inference)

Per Token (FP16) 196.61 KB
Max Context FP32 768.0 MB
Max Context FP16 384.0 MB
Max Context INT8 192.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size51,200
Hidden Size2,048
FFN Intermediate Size8,192
Number of Layers24
Attention Heads32
KV Heads32

Context & Position

Max Context Length2,048
RoPE Base Frequency10000.0
RoPE ScalingNot set

Attention Configuration

Attention Dropout0%
Tied EmbeddingsNo

Activation & Normalization

Activation Functiongelu_new
RMSNorm Epsilon1e-05

Dropout (Training)

Residual Dropout0%
Embedding Dropout0%

Special Tokens

BOS Token IDNot set
Pad Token IDNot set
EOS Token IDNot set

Data Type

Model Dtypefloat32
Layer Types:
Attention
MLP/FFN
Normalization
Embedding