Documentation

Everything you need to ship

Everything you need to get started with InferLlama — install the CLI, run models locally, and use the REST API.

Quickstart

Install the CLI, run your first model, and start chatting in under 5 minutes.

Full reference for every inferllama command — pull, run, serve, push, login, and more.

OpenAI-compatible REST API — endpoints, authentication, request/response schemas.

Source code and issue tracker on GitHub →