Skip to content

Ollama

Ollama allows you to run Large Language Models (LLMs) locally on your server.

Personal Recommendation: llama-swap

I personally use llama-swap instead of Ollama. It offers better performance, direct HuggingFace support, and finer control over models. However, Ollama is easier for beginners.

Installation

Add the following template to your docker-compose.yml and then run ei23 dc.

Template

  ollama:
    image: ollama/ollama
    container_name: ollama
    restart: unless-stopped
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_ORIGINS=moz-extension://*'
    ports:
      - 11434:11434
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: all
    #           capabilities: [gpu]
    volumes:
      - ./volumes/ollama:/root/.ollama

Download Models

After startup, you can download models:

# Via terminal
docker exec ollama ollama pull llama3
docker exec ollama ollama pull mistral
docker exec ollama ollama pull codellama

# Or via API
curl http://localhost:11434/api/pull -d '{"name": "llama3"}'
Model Size RAM Description
llama3 4.7GB 8GB General tasks, good compromise
llama3:70b 40GB 64GB Very powerful, needs lots of RAM
mistral 4.1GB 8GB Good for European languages
codellama 3.8GB 8GB Optimized for code generation
phi3 2.2GB 4GB Compact, efficient model
gemma:2b 1.4GB 2GB Very lightweight, for weak hardware

GPU Recommendation

A GPU is recommended for fast inference. Uncomment the deploy section for NVIDIA GPUs.

Chat Interface

In combination with Open WebUI, you get a convenient browser interface.

Set the environment variable in Open WebUI:

environment:
  - OLLAMA_BASE_URL=http://ollama:11434

Ollama vs. llama-swap

Feature Ollama llama-swap
Simplicity ✅ Very easy ⚠️ Some configuration needed
Performance Good ✅ Better (llama.cpp)
HuggingFace ❌ Own format ✅ GGUF directly
Configuration Limited ✅ Detailed
Swapping No ✅ Automatic
Model Management ollama pull Manual download

Recommendation

  • Ollama: Ideal for beginners who want to start quickly
  • llama-swap: Ideal for advanced users who want maximum performance and control

API

Ollama provides a REST API:

# Send chat request
curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Why is the sky blue?"
}'

# List available models
curl http://localhost:11434/api/tags

The API is compatible with the OpenAI API interface and can be used with n8n, NodeRED, and other tools.

Notes

  • Models are stored in ./volumes/ollama/
  • The API is available at http://[IP]:11434
  • For Home Assistant integration, use the Ollama integration
  • Models can also be downloaded via the API

Further Information