Documentation
Everything you need to ship
Everything you need to get started with InferLlama — install the CLI, run models locally, and use the REST API.
Start here
Quickstart
Install the CLI, run your first model, and start chatting in under 5 minutes.
Read guide
CLI Reference
Full reference for every inferllama command — pull, run, serve, push, login, and more.
Read guide
OpenAI-compatible
API Reference
OpenAI-compatible REST API — endpoints, authentication, request/response schemas.
Read guide
Quick reference
Source code and issue tracker on GitHub →