Expert FFN Size2,048
Shared Experts1
Number of Experts384
Routing Scale Factor2.827
TopK Methodnoaux_tc
Expert Groups1
Groups per Token1
Experts per Token8
MoE Layer Frequency1
Dense Initial Layers1
Normalize TopK ProbabilitiesYes
Router Scoring Functionsigmoid