Getting Started with Base Models
This page provides information and resources for exploring and using base large language models (LLMs).
Using a Hosted Base Model
The Llama 3.1 405B parameter base model is hosted on hyperbolic.xyz, which offers a playground for text continuation: Llama 3.1 405B Base Model
Mixtral 8x7B 0.1 is hosted on together.ai, which offers a playground for text continuation: Mixtral-8x7B-v0.1
Software for Looming
Obsidian is a note-taking application that can be used for "looming." The loomsidian plugin is available, but may require manual installation.
Running Your Own Base Model
You can run a base model locally using llama.cpp. Follow these steps:
- Download and build llama.cpp for CPU.
- Download a quantized model such as the 800MB Llama-3.2-1B.Q4_K_M.gguf.
- Launch the llama-server.
Getting Continuations from an API Endpoint
To get text continuations from an API endpoint, use a request like the following:
curl -X POST \
"http://localhost:port/v1/completions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-1B.Q4_K_M.gguf",
"prompt": "'$PROMPT'",
"max_tokens": 150,
"temperature": 0.7,
"n": 1
}'
The llama-server
provides this endpoint with n=1
. For multiple completions use vllm.
API reference: OpenAI Completions
Information About Base Models
LLMs are pretrained to produce a base model, which can then be fine-tuned. Base models can be more difficult to work with but offer unique advantages. Some base models are annealed which makes them less useful for direct use.
Known Good Base Models
- code-davinci-002 (sunsetted by openai)
- gpt-4-base (private researcher access only)
- gpt-2 (available)
- gemma 2 (cursed)
Known Annealed Models
- The Llama models
- Some Gemini models
Papers
Llama
- LLaMA: Open and Efficient Foundation Language Models
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- The Llama 3 Herd of Models
- Llama 3 model card
- Llama 3 model card on github
Gemini
Mistral
Deepseek
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- DeepSeek-V3 Technical Report
Qwen
Methods for Determining if a Base Model is Annealed
- Check for mode collapse using random number prompts: Mysteries of Mode Collapse
- Logit inspection on e.g. assistant question answer style conversations. (thanks to GGB)