airweave-ai · 37-AN · Nov 13, 2025 · Nov 13, 2025 · Nov 13, 2025 · cubic-dev-ai
diff --git a/.env.example b/.env.example
@@ -34,6 +34,10 @@ OPENAI_API_KEY=
 ANTHROPIC_API_KEY=
 MISTRAL_API_KEY=
 
+# Local AI Configuration (optional)
+# Ollama base URL for local Llama models
+OLLAMA_BASE_URL=http://localhost:11434
+
 # Development Settings
 LOCAL_DEVELOPMENT=false
 LOCAL_CURSOR_DEVELOPMENT=false

diff --git a/QUICK_START_OLLAMA.md b/QUICK_START_OLLAMA.md
@@ -0,0 +1,222 @@
+# Quick Start: Local Llama with Ollama
+
+## What Was Done
+
+Your Airweave system has been configured to use **local Llama models** via Ollama instead of cloud AI providers. This gives you:
+
+✅ **Zero cost** - No API charges
+✅ **Complete privacy** - Data never leaves your servers
+✅ **Offline capable** - No internet required
+✅ **Full control** - Choose your own models
+
+## 🚀 Quick Start (3 Commands)
+
+### 1. Start Services with Ollama
+
+```bash
+cd docker
+docker-compose --profile local-llm up -d
+```
+
+This starts all Airweave services + Ollama.
+
+### 2. Pull the AI Models
+
+```bash
+# Pull LLM model (~40GB, takes 10-30 minutes)
+docker exec -it airweave-ollama ollama pull llama3.3:70b
+
+# Pull embedding model (~300MB, takes 1-2 minutes)
+docker exec -it airweave-ollama ollama pull nomic-embed-text
+```
+
+**For testing/lower hardware**, use smaller models:
+```bash
+docker exec -it airweave-ollama ollama pull llama3.2:3b
+docker exec -it airweave-ollama ollama pull nomic-embed-text
+```
+
+### 3. Verify Setup
+
+```bash
+# Check models are installed
+docker exec -it airweave-ollama ollama list
+
+# Test generation
+curl http://localhost:11434/api/generate -d '{
+  "model": "llama3.3:70b",
+  "prompt": "Say hello!",
+  "stream": false
+}'
+
+# Check Airweave logs
+docker logs airweave-backend | grep -i ollama
+```
+
+You should see:
+```
+[OllamaProvider] Connected to Ollama at http://ollama:11434
+[OllamaProvider] Using LLM model: llama3.3:70b
+[OllamaProvider] Using embedding model: nomic-embed-text
+```
+
+## ✅ That's It!
+
+Your Airweave system is now using local AI. All search queries, embeddings, and AI operations will use your local Llama models.
+
+## 📊 System Requirements
+
+**Minimum (for testing):**
+- CPU: 4+ cores
+- RAM: 8GB
+- Disk: 50GB free
+- Model: llama3.2:3b
+
+**Recommended (for production):**
+- CPU: 8+ cores
+- RAM: 16GB+
+- GPU: NVIDIA RTX 3090 or better (24GB VRAM)
+- Disk: 100GB+ free
+- Model: llama3.3:70b
+
+## 🔧 Configuration Files Changed
+
+| File | What Changed |
+|------|--------------|
+| `docker/docker-compose.yml` | Added Ollama service with `local-llm` profile |
+| `backend/airweave/search/providers/ollama.py` | New provider implementation |
+| `backend/airweave/search/factory.py` | Integrated Ollama provider |
+| `backend/airweave/search/defaults.yml` | Set Ollama as primary provider |
+| `backend/airweave/core/config.py` | Added `OLLAMA_BASE_URL` setting |
+| `.env.example` | Added Ollama configuration |
+
+## 🎯 Provider Priority
+
+Airweave will try providers in this order:
+
+1. **Ollama (local)** ← Your local models
+2. Cerebras (cloud) ← Fallback if Ollama down
+3. Groq (cloud) ← Second fallback
+4. OpenAI (cloud) ← Final fallback
+
+To force **local-only** mode, remove cloud API keys from `.env`:
+```bash
+# Comment these out in .env
+# OPENAI_API_KEY=...
+# GROQ_API_KEY=...
+# CEREBRAS_API_KEY=...
+```
+
+## 🖥️ GPU Acceleration (Optional)
+
+For 10x faster inference, enable GPU support:
+
+1. Install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
+
+2. Uncomment GPU section in `docker/docker-compose.yml`:
+```yaml
+ollama:
+  deploy:
+    resources:
+      reservations:
+        devices:
+          - driver: nvidia
+            count: 1
+            capabilities: [gpu]
+```
+
+3. Restart:
+```bash
+docker-compose --profile local-llm down
+docker-compose --profile local-llm up -d
+```
+
+4. Verify:
+```bash
+docker exec -it airweave-ollama nvidia-smi
+```
+
+## 📖 Model Options
+
+### Text Generation (LLM)
+
+| Model | Size | RAM Needed | Best For |
+|-------|------|------------|----------|
+| `llama3.2:3b` | 4GB | 8GB | Testing, low hardware |
+| `llama3.1:8b` | 8GB | 16GB | Balanced performance |
+| `llama3.3:70b` | 40GB | 64GB or 24GB VRAM | Production quality |
+
+### Embeddings
+
+| Model | Dimensions | Best For |
+|-------|------------|----------|
+| `nomic-embed-text` | 768 | General purpose (default) |
+| `all-minilm` | 384 | Faster, less accurate |
+| `mxbai-embed-large` | 1024 | More accurate |
+
+Change models by editing `backend/airweave/search/defaults.yml`:
+
+```yaml
+ollama:
+  llm:
+    name: "llama3.2:3b"  # Change this
+  embedding:
+    name: "nomic-embed-text"  # Change this
+```
+
+## 🐛 Troubleshooting
+
+**"Connection refused"**
+```bash
+# Check Ollama is running
+docker ps | grep ollama
+
+# Restart if needed
+docker-compose --profile local-llm restart ollama
+```
+
+**"Model not found"**
+```bash
+# Pull the model
+docker exec -it airweave-ollama ollama pull llama3.3:70b
+```
+
+**Slow responses**
+- Use smaller model (`llama3.2:3b`)
+- Enable GPU acceleration (see above)
+- Check CPU/RAM usage: `docker stats`
+
+**Out of memory**
+- Use smaller model
+- Increase Docker memory limit
+- Add more RAM or use GPU
+
+## 📚 Full Documentation
+
+See `docs/LOCAL_LLAMA_SETUP.md` for:
+- Detailed architecture explanation
+- Advanced configuration
+- Production deployment guide
+- Security & privacy considerations
+- Cost comparison vs cloud AI
+- Performance benchmarks
+
+## 🎉 Summary
+
+You now have a **fully local AI system** running in Docker!
+
+**What happens now:**
+- All embeddings → Local Llama models
+- All text generation → Local Llama models
+- All reranking → Local Llama models
+- Zero API costs ✅
+- Complete data privacy ✅
+- Offline capable ✅
+
+**Next steps:**
+1. Start using Airweave normally - it will automatically use local AI
+2. Monitor performance with `docker logs -f airweave-backend`
+3. Tune models in `defaults.yml` based on your needs
+4. Enable GPU for faster inference (optional)
+
+Questions? Check `docs/LOCAL_LLAMA_SETUP.md` or the [Ollama docs](https://ollama.ai/).
diff --git a/backend/airweave/core/config.py b/backend/airweave/core/config.py
@@ -127,6 +127,7 @@ class Settings(BaseSettings):
     GROQ_API_KEY: Optional[str] = None
     COHERE_API_KEY: Optional[str] = None
     CEREBRAS_API_KEY: Optional[str] = None
+    OLLAMA_BASE_URL: Optional[str] = None
     AZURE_KEYVAULT_NAME: Optional[str] = None
 
     # Temporal configuration

diff --git a/backend/airweave/search/defaults.yml b/backend/airweave/search/defaults.yml
@@ -75,11 +75,30 @@ provider_models:
       max_tokens_per_doc: 8192
       max_documents: 1000
 
+  ollama:
+    llm:
+      name: "llama3.3:70b"
+      tokenizer: "cl100k_base"
+      context_window: 128000
+    embedding:
+      name: "nomic-embed-text"
+      tokenizer: "cl100k_base"
+      dimensions: 768
+      max_tokens: 8192
+    rerank:
+      name: "llama3.3:70b"
+      tokenizer: "cl100k_base"
+      context_window: 128000
+
 # Operation preferences - which provider and which models to use
 # Format: provider: {llm: model_key, embedding: model_key, rerank: model_key}
 operation_preferences:
   query_expansion:
     order:
+      - provider: ollama
+        llm: llm
+        embedding: null
+        rerank: null
       - provider: cerebras
         llm: llm
         embedding: null
@@ -95,6 +114,10 @@ operation_preferences:
 
   query_interpretation:
     order:
+      - provider: ollama
+        llm: llm
+        embedding: null
+        rerank: null
       - provider: cerebras
         llm: llm
         embedding: null
@@ -110,13 +133,21 @@ operation_preferences:
 
   embed_query:
     order:
+      - provider: ollama
+        llm: null
+        embedding: embedding
+        rerank: null
       - provider: openai
         llm: null
         embedding: embedding
         rerank: null
 
   reranking:
     order:
+      - provider: ollama
+        llm: null
+        embedding: null
+        rerank: rerank
       - provider: cohere
         llm: null
         embedding: null
@@ -132,6 +163,10 @@ operation_preferences:
 
   generate_answer:
     order:
+      - provider: ollama
+        llm: llm
+        embedding: null
+        rerank: null
       - provider: cerebras
         llm: llm
         embedding: null
@@ -147,6 +182,10 @@ operation_preferences:
 
   federated_search:
     order:
+      - provider: ollama
+        llm: llm
+        embedding: null
+        rerank: null
       - provider: cerebras
         llm: llm
         embedding: null

diff --git a/backend/airweave/search/factory.py b/backend/airweave/search/factory.py
@@ -36,6 +36,7 @@
 from airweave.search.providers.cerebras import CerebrasProvider
 from airweave.search.providers.cohere import CohereProvider
 from airweave.search.providers.groq import GroqProvider
+from airweave.search.providers.ollama import OllamaProvider
 from airweave.search.providers.openai import OpenAIProvider
 from airweave.search.providers.schemas import (
     EmbeddingModelConfig,
@@ -370,6 +371,7 @@ def _get_available_api_keys(self) -> Dict[str, Optional[str]]:
             "groq": getattr(settings, "GROQ_API_KEY", None),
             "openai": getattr(settings, "OPENAI_API_KEY", None),
             "cohere": getattr(settings, "COHERE_API_KEY", None),
+            "ollama": getattr(settings, "OLLAMA_BASE_URL", None),
         }
 
     def _create_provider_for_each_operation(
@@ -678,6 +680,11 @@ def _init_all_providers_for_operation(
                         f"[Factory] Attempting to initialize CohereProvider for {operation_name}"
                     )
                     provider = CohereProvider(api_key=api_key, model_spec=model_spec, ctx=ctx)
+                elif provider_name == "ollama":
+                    ctx.logger.debug(
+                        f"[Factory] Attempting to initialize OllamaProvider for {operation_name}"
+                    )
+                    provider = OllamaProvider(base_url=api_key, model_spec=model_spec, ctx=ctx)
 
                 if provider:
                     initialized_providers.append(provider)