ByteDance-Seed/Seed-OSS-36B-Base

📊 Model Parameters

Total Parameters 36,151,104,512
Context Length 524,288
Hidden Size 5120
Layers 64
Attention Heads 80
KV Heads 8

💾 Memory Requirements

FP32 (Full) 134.67 GB
FP16 (Half) 67.34 GB
INT8 (Quantized) 33.67 GB
INT4 (Quantized) 16.83 GB

🔑 KV Cache (Inference)

Per Token (FP16) 262.14 KB
Max Context FP32 256.00 GB
Max Context FP16 128.00 GB
Max Context INT8 64.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size155,136
Hidden Size5,120
FFN Intermediate Size27,648
Number of Layers64
Attention Heads80
KV Heads8
Head Dimension128

Context & Position

Max Context Length524,288
RoPE Base Frequency10000000.0
RoPE Scalingdefault (factor: ?)

Attention Configuration

Attention BiasYes
Attention Dropout10.0%
MLP BiasNo
Tied EmbeddingsNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-06

Dropout (Training)

Residual Dropout10.0%

Special Tokens

BOS Token ID0
Pad Token ID1
EOS Token ID2

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding