google/gemma-2-27b

📊 Model Parameters

Total Parameters 28,406,776,320
Context Length 8,192
Hidden Size 4608
Layers 46
Attention Heads 32
KV Heads 16

💾 Memory Requirements

FP32 (Full) 105.82 GB
FP16 (Half) 52.91 GB
INT8 (Quantized) 26.46 GB
INT4 (Quantized) 13.23 GB

🔑 KV Cache (Inference)

Per Token (FP16) 376.83 KB
Max Context FP32 5.75 GB
Max Context FP16 2.88 GB
Max Context INT8 1.44 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size256,000
Hidden Size4,608
FFN Intermediate Size36,864
Number of Layers46
Attention Heads32
Head Dimension128
KV Heads16

Context & Position

Max Context Length8,192
Sliding Window Size4,096
Layer Attention Types[46 items]

Attention Configuration

Tied EmbeddingsYes
Attention BiasNo
Attention Dropout0%
Query Pre-Attention Scalar144
Output Softcapping30.0
Attention Softcapping50.0

Activation & Normalization

RMSNorm Epsilon1e-06
Activation Functiongelu_pytorch_tanh

Special Tokens

Pad Token ID0
BOS Token ID2
EOS Token ID1

Data Type

Model Dtypefloat32
Layer Types:
Attention
MLP/FFN
Normalization
Embedding