Ollama
Ollama allows you to run Large Language Models (LLMs) locally on your server.
Personal Recommendation: llama-swap
I personally use llama-swap instead of Ollama. It offers better performance, direct HuggingFace support, and finer control over models. However, Ollama is easier for beginners.
Installation
Add the following template to your docker-compose.yml and then run ei23 dc.
Template
ollama:
image: ollama/ollama
container_name: ollama
restart: unless-stopped
environment:
- OLLAMA_HOST=0.0.0.0
- OLLAMA_ORIGINS=moz-extension://*'
ports:
- 11434:11434
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: all
# capabilities: [gpu]
volumes:
- ./volumes/ollama:/root/.ollama
Download Models
After startup, you can download models:
# Via terminal
docker exec ollama ollama pull llama3
docker exec ollama ollama pull mistral
docker exec ollama ollama pull codellama
# Or via API
curl http://localhost:11434/api/pull -d '{"name": "llama3"}'
Recommended Models
| Model | Size | RAM | Description |
|---|---|---|---|
| llama3 | 4.7GB | 8GB | General tasks, good compromise |
| llama3:70b | 40GB | 64GB | Very powerful, needs lots of RAM |
| mistral | 4.1GB | 8GB | Good for European languages |
| codellama | 3.8GB | 8GB | Optimized for code generation |
| phi3 | 2.2GB | 4GB | Compact, efficient model |
| gemma:2b | 1.4GB | 2GB | Very lightweight, for weak hardware |
GPU Recommendation
A GPU is recommended for fast inference. Uncomment the deploy section for NVIDIA GPUs.
Chat Interface
In combination with Open WebUI, you get a convenient browser interface.
Set the environment variable in Open WebUI:
Ollama vs. llama-swap
| Feature | Ollama | llama-swap |
|---|---|---|
| Simplicity | ✅ Very easy | ⚠️ Some configuration needed |
| Performance | Good | ✅ Better (llama.cpp) |
| HuggingFace | ❌ Own format | ✅ GGUF directly |
| Configuration | Limited | ✅ Detailed |
| Swapping | No | ✅ Automatic |
| Model Management | ✅ ollama pull | Manual download |
Recommendation
- Ollama: Ideal for beginners who want to start quickly
- llama-swap: Ideal for advanced users who want maximum performance and control
API
Ollama provides a REST API:
# Send chat request
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Why is the sky blue?"
}'
# List available models
curl http://localhost:11434/api/tags
The API is compatible with the OpenAI API interface and can be used with n8n, NodeRED, and other tools.
Notes
- Models are stored in
./volumes/ollama/ - The API is available at
http://[IP]:11434 - For Home Assistant integration, use the Ollama integration
- Models can also be downloaded via the API