Dense Initial Layers1
Expert Groups8
Number of Experts160
Shared Experts2
Routing Scale Factor16.0
Groups per Token3
TopK Methodgroup_limited_greedy
Normalize TopK ProbabilitiesNo
Experts per Token6
Expert FFN Size1,536
MoE Layer Frequency1
Router Scoring Functionsoftmax