← All Models
|
Gigachat:
GigaChat3-10B-A1.8B
ai-sage/GigaChat3-10B-A1.8B
📊 Model Parameters
Total Parameters
67,295,638,016
Context Length
262,144
Hidden Size
1536
Layers
26
Attention Heads
32
KV Heads
32
💾 Memory Requirements
FP32 (Full)
250.70 GB
FP16 (Half)
125.35 GB
INT8 (Quantized)
62.67 GB
INT4 (Quantized)
31.34 GB
🔑 KV Cache (Inference)
Per Token (FP16)
212.99 KB
Max Context FP32
104.00 GB
Max Context FP16
52.00 GB
Max Context INT8
26.00 GB
⚙️ Model Configuration
Core Architecture
Vocabulary Size
128,256
Hidden Size
1,536
FFN Intermediate Size
8,960
Number of Layers
26
Attention Heads
32
Head Dimension
64
KV Heads
32
Context & Position
Max Context Length
262,144
Attention Configuration
Attention Bias
No
Attention Dropout
0%
Tied Embeddings
No
Multi-Head Latent Attention
KV LoRA Rank
512
Query LoRA Rank
Not set
QK RoPE Head Dimension
64
Value Head Dimension
192
QK Non-RoPE Head Dimension
128
Mixture of Experts
Expert FFN Size
1,280
Shared Experts
1
Number of Experts
64
Routing Scale Factor
1
Expert Groups
1
Groups per Token
1
Experts per Token
4
Dense Initial Layers
1
Normalize TopK Probabilities
Yes
TopK Method
noaux_tc
MoE Layer Frequency
1
Router Scoring Function
sigmoid
Speculative Decoding
Next-N Prediction Layers
1
Activation & Normalization
Activation Function
silu
RMSNorm Epsilon
1e-06
Special Tokens
BOS Token ID
1
Pad Token ID
Not set
EOS Token ID
2
Data Type
Model Dtype
bfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding
Attention
MLP
Norm
Embedding
Clear
Expand All
Collapse All