rednote-hilab/dots.llm1.base

📊 Model Parameters

Total Parameters 142,774,373,888
Context Length 32,768
Hidden Size 4096
Layers 62
Attention Heads 32
KV Heads 32

💾 Memory Requirements

FP32 (Full) 531.88 GB
FP16 (Half) 265.94 GB
INT8 (Quantized) 132.97 GB
INT4 (Quantized) 66.48 GB

🔑 KV Cache (Inference)

Per Token (FP16) 1.02 MB
Max Context FP32 62.00 GB
Max Context FP16 31.00 GB
Max Context INT8 15.50 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size152,064
Hidden Size4,096
FFN Intermediate Size10,944
Number of Layers62
Attention Heads32
KV Heads32

Context & Position

Max Context Length32,768
RoPE Base Frequency10,000,000
RoPE ScalingNot set
Sliding Window SizeNot set
Window Attention Layers62
Layer Attention Types[62 items]
Uses Sliding WindowNo

Attention Configuration

Attention BiasNo
Attention Dropout0%
Tied EmbeddingsNo

Mixture of Experts

Expert FFN Size1,408
Shared Experts2
Number of Experts128
Experts per Token6
Dense Initial Layers1
Normalize TopK ProbabilitiesYes
Expert Groups1
Groups per Token1
Routing Scale Factor2.5
MoE Layer Frequency1
Router Scoring Functionsigmoid
TopK Methodnoaux_tc

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token IDNot set
Pad Token IDNot set
EOS Token ID151643

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding