LLM Base Model Exploration

Getting Started with Base Models

This page provides information and resources for exploring and using base large language models (LLMs).

Using a Hosted Base Model

The Llama 3.1 405B parameter base model is hosted on hyperbolic.xyz, which offers a playground for text continuation: Llama 3.1 405B Base Model

Mixtral 8x7B 0.1 is hosted on together.ai, which offers a playground for text continuation: Mixtral-8x7B-v0.1

Software for Looming

Obsidian is a note-taking application that can be used for "looming." The loomsidian plugin is available, but may require manual installation.

Running Your Own Base Model

You can run a base model locally using llama.cpp. Follow these steps:

  1. Download and build llama.cpp for CPU.
  2. Download a quantized model such as the 800MB Llama-3.2-1B.Q4_K_M.gguf.
  3. Launch the llama-server.

Getting Continuations from an API Endpoint

To get text continuations from an API endpoint, use a request like the following:

          
curl -X POST \
  "http://localhost:port/v1/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-1B.Q4_K_M.gguf",
    "prompt": "'$PROMPT'",
    "max_tokens": 150,
    "temperature": 0.7,
    "n": 1
  }'
          
        

The llama-server provides this endpoint with n=1. For multiple completions use vllm.

API reference: OpenAI Completions

Information About Base Models

LLMs are pretrained to produce a base model, which can then be fine-tuned. Base models can be more difficult to work with but offer unique advantages. Some base models are annealed which makes them less useful for direct use.

Known Good Base Models

Known Annealed Models

Papers

Llama

Gemini

Mistral

Deepseek

Qwen

Methods for Determining if a Base Model is Annealed

Useful Reading