meituan-longcat/LongCat-Flash-Chat

📊 Model Parameters

Total Parameters 560,664,958,976
Context Length 131,072
Hidden Size 6144
Layers 28
Attention Heads 64
KV Heads 64

💾 Memory Requirements

FP32 (Full) 2088.64 GB
FP16 (Half) 1044.32 GB
INT8 (Quantized) 522.16 GB
INT4 (Quantized) 261.08 GB

🔑 KV Cache (Inference)

Per Token (FP16) 458.75 KB
Max Context FP32 112.00 GB
Max Context FP16 56.00 GB
Max Context INT8 28.00 GB

⚙️ Model Configuration

Core Architecture

Vocabulary Size131,072
Hidden Size6,144
FFN Intermediate Size12,288
Number of Layers28
Attention Heads64
Head Dimension64
KV Heads64

Context & Position

Max Context Length131,072
RoPE Base Frequency10000000.0

Attention Configuration

Attention BiasNo
Attention Dropout0%
Tied EmbeddingsNo

Multi-Head Latent Attention

KV LoRA Rank512
Query LoRA Rank1,536
QK RoPE Head Dimension64
Value Head Dimension128
QK Non-RoPE Head Dimension128

Mixture of Experts

Expert FFN Size2,048
Number of Experts512
Routing Scale Factor6.0
Experts per Token12
Normalize TopK ProbabilitiesNo

Activation & Normalization

Activation Functionsilu
RMSNorm Epsilon1e-05

Special Tokens

BOS Token ID1
Pad Token IDNot set
EOS Token ID2

Data Type

Model DtypeNot set
Layer Types:
Attention
MLP/FFN
Normalization
Embedding