ai-sage/GigaChat3-10B-A1.8B

📊 Model Parameters

Total Parameters 67,295,638,016
Context Length 262,144
Hidden Size 1536
Layers 26
Attention Heads 32
KV Heads 32

💾 Memory Requirements

FP32 (Full) 250.70 GB
FP16 (Half) 125.35 GB
INT8 (Quantized) 62.67 GB
INT4 (Quantized) 31.34 GB

🔑 KV Cache (Inference)

Per Token (FP16) 212.99 KB
Max Context FP32 104.00 GB
Max Context FP16 52.00 GB
Max Context INT8 26.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size128,256
Hidden Size1,536
FFN Intermediate Size8,960
Number of Layers26
Attention Heads32
Head Dimension64
KV Heads32

Context & Position

Max Context Length262,144

Attention Configuration

Attention BiasNo
Attention Dropout0%
Tied EmbeddingsNo

Multi-Head Latent Attention

KV LoRA Rank512
Query LoRA RankNot set
QK RoPE Head Dimension64
Value Head Dimension192
QK Non-RoPE Head Dimension128

Mixture of Experts

Expert FFN Size1,280
Shared Experts1
Number of Experts64
Routing Scale Factor1
Expert Groups1
Groups per Token1
Experts per Token4
Dense Initial Layers1
Normalize TopK ProbabilitiesYes
TopK Methodnoaux_tc
MoE Layer Frequency1
Router Scoring Functionsigmoid

Speculative Decoding

Next-N Prediction Layers1

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-06

Special Tokens

BOS Token ID1
Pad Token IDNot set
EOS Token ID2

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding