mistralai/Mamba-Codestral-7B-v0.1

📊 Model Parameters

Total Parameters 7,285,403,648
Context Length 2,048
Hidden Size 4096
Layers 64
Attention Heads 0
KV Heads 0

💾 Memory Requirements

FP32 (Full) 27.14 GB
FP16 (Half) 13.57 GB
INT8 (Quantized) 6.79 GB
INT4 (Quantized) 3.39 GB

🔑 KV Cache (Inference)

Per Token (FP16) 0 B
Max Context FP32 0.0 MB
Max Context FP16 0.0 MB
Max Context INT8 0.0 MB

⚙️ Model Configuration

Core Architecture

Vocabulary Size32,768
Hidden Size4,096
Number of Layers64
Attention Heads128
Head Dimension64
FFN Intermediate Size8,192

Attention Configuration

Tied EmbeddingsNo

Activation & Normalization

RMSNorm Epsilon1e-05
Activation Functionsilu

Special Tokens

BOS Token ID0
EOS Token ID0
Pad Token ID0

Data Type

Model Dtypebfloat16
Layer Types:
Attention
MLP/FFN
Normalization
Embedding