google/gemma-2-27b

📊 Model Parameters

Total Parameters 28,406,776,320
Context Length 8,192
Hidden Size 4608
Layers 46
Attention Heads 32
KV Heads 16

💾 Memory Requirements

FP32 (Full) 105.82 GB
FP16 (Half) 52.91 GB
INT8 (Quantized) 26.46 GB
INT4 (Quantized) 13.23 GB

🔑 KV Cache (Inference)

Per Token (FP16) 376.83 KB
Max Context FP32 5.75 GB
Max Context FP16 2.88 GB
Max Context INT8 1.44 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size256,000
Hidden Size4,608
FFN Intermediate Size36,864
Number of Layers46
Attention Heads32
Head Dimension128
KV Heads16

Context & Position

Sliding Window Size4,096
Max Context Length8,192
RoPE Base Frequency10000.0
Layer Attention Types[46 items]

Attention Configuration

Tied EmbeddingsYes
Attention BiasNo
Attention Dropout0%
Query Pre-Attention Scalar144
Output Softcapping30.0
Attention Softcapping50.0

Activation & Normalization

Activation Functiongelu_pytorch_tanh
RMSNorm Epsilon1e-06

Special Tokens

BOS Token ID2
Pad Token ID0
EOS Token ID1

Data Type

Model Dtypefloat32
Layer Types:
Attention
MLP/FFN
Normalization
Embedding