Skip to main content

Configuration

Environment variables, storage backends, inference settings, auth, and client config.

Server environment variables

Set these in your .env file, docker-compose.ymlenvironment section, or your deployment platform's secret store.

Core

VariableDefaultDescription
DATABASE_URL(required)PostgreSQL connection string — postgresql+asyncpg://user:pass@host/db
SECRET_KEY(required)JWT signing secret — generate with: openssl rand -hex 32
REDIS_URLredis://localhost:6379Redis connection string
ACCESS_TOKEN_EXPIRE_MINUTES10080Session token lifetime in minutes (default: 7 days)
ALLOWED_ORIGINS*CORS allowed origins — comma-separated list or * for all

Object storage (R2 / MinIO)

All variables are prefixed R2_. Despite the name, any S3-compatible store works (AWS S3, MinIO, Backblaze B2, Cloudflare R2).

VariableDefaultDescription
R2_ENDPOINT_URLhttp://minio:9000S3-compatible endpoint URL
R2_ACCESS_KEY_IDminioadminAccess key / key ID
R2_SECRET_ACCESS_KEYminioadminSecret key
R2_BUCKETinferllamaBucket name — must already exist
R2_PUBLIC_URLPublic CDN URL prefix (optional; enables direct browser downloads)
CHUNK_SIZE_BYTES268435456Chunk size for content-addressed splitting (256 MB default)
bash
# Cloudflare R2 example
R2_ENDPOINT_URL=https://abc123.r2.cloudflarestorage.com
R2_ACCESS_KEY_ID=your_key_id
R2_SECRET_ACCESS_KEY=your_secret
R2_BUCKET=inferllama-prod
R2_PUBLIC_URL=https://models.inferllama.com

Rate limits

VariableDefaultDescription
RATE_LIMIT_REQUESTS100Max requests per window per user
RATE_LIMIT_WINDOW_SECONDS60Rate limit window in seconds
UPLOAD_RATE_LIMIT_GB_PER_DAY10Max upload bytes per user per day (GB)

Registration

VariableDefaultDescription
REGISTRATION_OPENtrueAllow new user registration — set false to invite-only
DEFAULT_QUOTA_GB10Storage quota for new users in GB

Tracker environment variables

VariableDefaultDescription
PORT3001Port the tracker HTTP server listens on
ANNOUNCE_URLhttp://tracker:3001/announceWebTorrent announce URL embedded in .torrent files
R2_*(same as server)Object storage config for storing .torrent files

CLI client config

The CLI stores its configuration in ~/.inferllama/config.json:

json
{
  "server_url": "http://localhost:8000",
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9…"
}

Updated automatically by inferllama login.

Directory structure

PathContents
~/.inferllama/config.jsonRegistry URL and auth token
~/.inferllama/models/Assembled GGUF files ready for inference
~/.inferllama/chunks/Content-addressed 256 MB chunks (cache)
~/.inferllama/bin/llama-server binary (if not installed via brew)

Disk management

Chunks and models accumulate over time. To reclaim disk space:

bash
# Remove a model and its exclusive chunks
inferllama rm qwen2-0.5b-instruct

# Prune all chunks not referenced by any local model
inferllama cache prune

# Show disk usage by model
inferllama list --size

Tip

Shared chunks (used by multiple models) are never deleted by rm. They are only removed when the last model referencing them is deleted.

Inference configuration

inferllama run and inferllama serve launchllama-server under the hood. These flags control its behavior:

CLI flagllama-server paramDefaultDescription
--port--port8080Port for the inference server
--ctx--ctx-size4096Context window in tokens
--temperaturen/a (per-request)0.7Default sampling temperature
--max-tokensn/a (per-request)unlimitedMax tokens per completion
--systemn/a (per-session)System prompt prepended to every message

GPU / Metal acceleration

llama-server auto-detects Metal (macOS) and CUDA (Linux/Windows). If you installed via Homebrew, Metal is already enabled. For CUDA:

bash
# Install the CUDA-enabled build
inferllama setup --force --cuda

# Verify GPU layers are loaded (look for "ggml_cuda_init" in output)
inferllama run qwen2-0.5b-instruct 2>&1 | head -20

Full .env example

bash
# ── Core ─────────────────────────────────────
DATABASE_URL=postgresql+asyncpg://postgres:password@postgres:5432/inferllama
SECRET_KEY=replace-me-with-32-hex-chars
REDIS_URL=redis://redis:6379

# ── Storage (production — Cloudflare R2) ─────
R2_ENDPOINT_URL=https://abc123.r2.cloudflarestorage.com
R2_ACCESS_KEY_ID=your_key
R2_SECRET_ACCESS_KEY=your_secret
R2_BUCKET=inferllama-prod
R2_PUBLIC_URL=https://models.inferllama.com

# ── Storage (dev — local MinIO) ─────────────
# R2_ENDPOINT_URL=http://minio:9000
# R2_ACCESS_KEY_ID=minioadmin
# R2_SECRET_ACCESS_KEY=minioadmin
# R2_BUCKET=inferllama

# ── Auth & registration ───────────────────────
REGISTRATION_OPEN=true
DEFAULT_QUOTA_GB=10
ACCESS_TOKEN_EXPIRE_MINUTES=10080

# ── Rate limiting ─────────────────────────────
RATE_LIMIT_REQUESTS=100
RATE_LIMIT_WINDOW_SECONDS=60
UPLOAD_RATE_LIMIT_GB_PER_DAY=10

# ── Next.js web ───────────────────────────────
NEXT_PUBLIC_API_URL=http://localhost:8000
NEXT_PUBLIC_DAEMON_URL=http://localhost:11434