stepfun-ai/Step-3.5-Flash

📊 Model Parameters

Total Parameters 10,304,724,992
Context Length 262,144
Hidden Size 4096
Layers 45
Attention Heads 64
KV Heads 64

💾 Memory Requirements

FP32 (Full) 38.39 GB
FP16 (Half) 19.19 GB
INT8 (Quantized) 9.60 GB
INT4 (Quantized) 4.80 GB

🔑 KV Cache (Inference)

Per Token (FP16) 1.47 MB
Max Context FP32 720.00 GB
Max Context FP16 360.00 GB
Max Context INT8 180.00 GB

⚙️ Model Configuration

Core Architecture

Hidden Size4,096
FFN Intermediate Size11,264
Attention Heads64
Number of Layers45
Vocabulary Size128,896
Head Dimension128

Context & Position

RoPE Base Frequency[5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0, 5000000.0, 10000.0, 10000.0, 10000.0]
Max Context Length262,144
Layer Attention Types[48 items]
Sliding Window Size512

Mixture of Experts

Expert FFN Size1,280

Speculative Decoding

Next-N Prediction Layers3

Activation & Normalization

RMSNorm Epsilon1e-05

Special Tokens

EOS Token ID1, 2, 128007
BOS Token ID0

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding