Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ OPENAI_API_KEY=
ANTHROPIC_API_KEY=
MISTRAL_API_KEY=

# Local AI Configuration (optional)
# Ollama base URL for local Llama models
OLLAMA_BASE_URL=http://localhost:11434

# Development Settings
LOCAL_DEVELOPMENT=false
LOCAL_CURSOR_DEVELOPMENT=false
Expand Down
222 changes: 222 additions & 0 deletions QUICK_START_OLLAMA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
# Quick Start: Local Llama with Ollama

## What Was Done

Your Airweave system has been configured to use **local Llama models** via Ollama instead of cloud AI providers. This gives you:

✅ **Zero cost** - No API charges
✅ **Complete privacy** - Data never leaves your servers
✅ **Offline capable** - No internet required
✅ **Full control** - Choose your own models

## 🚀 Quick Start (3 Commands)

### 1. Start Services with Ollama

```bash
cd docker
docker-compose --profile local-llm up -d
```

This starts all Airweave services + Ollama.

### 2. Pull the AI Models

```bash
# Pull LLM model (~40GB, takes 10-30 minutes)
docker exec -it airweave-ollama ollama pull llama3.3:70b

# Pull embedding model (~300MB, takes 1-2 minutes)
docker exec -it airweave-ollama ollama pull nomic-embed-text
```

**For testing/lower hardware**, use smaller models:
```bash
docker exec -it airweave-ollama ollama pull llama3.2:3b
docker exec -it airweave-ollama ollama pull nomic-embed-text
```

### 3. Verify Setup

```bash
# Check models are installed
docker exec -it airweave-ollama ollama list

# Test generation
curl http://localhost:11434/api/generate -d '{
"model": "llama3.3:70b",
"prompt": "Say hello!",
"stream": false
}'

# Check Airweave logs
docker logs airweave-backend | grep -i ollama
```

You should see:
```
[OllamaProvider] Connected to Ollama at http://ollama:11434
[OllamaProvider] Using LLM model: llama3.3:70b
[OllamaProvider] Using embedding model: nomic-embed-text
```

## ✅ That's It!

Your Airweave system is now using local AI. All search queries, embeddings, and AI operations will use your local Llama models.

## 📊 System Requirements

**Minimum (for testing):**
- CPU: 4+ cores
- RAM: 8GB
- Disk: 50GB free
- Model: llama3.2:3b

**Recommended (for production):**
- CPU: 8+ cores
- RAM: 16GB+
- GPU: NVIDIA RTX 3090 or better (24GB VRAM)
- Disk: 100GB+ free
- Model: llama3.3:70b

## 🔧 Configuration Files Changed

| File | What Changed |
|------|--------------|
| `docker/docker-compose.yml` | Added Ollama service with `local-llm` profile |
| `backend/airweave/search/providers/ollama.py` | New provider implementation |
| `backend/airweave/search/factory.py` | Integrated Ollama provider |
| `backend/airweave/search/defaults.yml` | Set Ollama as primary provider |
| `backend/airweave/core/config.py` | Added `OLLAMA_BASE_URL` setting |
| `.env.example` | Added Ollama configuration |

## 🎯 Provider Priority

Airweave will try providers in this order:

1. **Ollama (local)** ← Your local models
2. Cerebras (cloud) ← Fallback if Ollama down
3. Groq (cloud) ← Second fallback
4. OpenAI (cloud) ← Final fallback

To force **local-only** mode, remove cloud API keys from `.env`:
```bash
# Comment these out in .env
# OPENAI_API_KEY=...
# GROQ_API_KEY=...
# CEREBRAS_API_KEY=...
```

## 🖥️ GPU Acceleration (Optional)

For 10x faster inference, enable GPU support:

1. Install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)

2. Uncomment GPU section in `docker/docker-compose.yml`:
```yaml
ollama:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```

3. Restart:
```bash
docker-compose --profile local-llm down
docker-compose --profile local-llm up -d
```

4. Verify:
```bash
docker exec -it airweave-ollama nvidia-smi
```

## 📖 Model Options

### Text Generation (LLM)

| Model | Size | RAM Needed | Best For |
|-------|------|------------|----------|
| `llama3.2:3b` | 4GB | 8GB | Testing, low hardware |
| `llama3.1:8b` | 8GB | 16GB | Balanced performance |
| `llama3.3:70b` | 40GB | 64GB or 24GB VRAM | Production quality |

### Embeddings

| Model | Dimensions | Best For |
|-------|------------|----------|
| `nomic-embed-text` | 768 | General purpose (default) |
| `all-minilm` | 384 | Faster, less accurate |
| `mxbai-embed-large` | 1024 | More accurate |

Change models by editing `backend/airweave/search/defaults.yml`:

```yaml
ollama:
llm:
name: "llama3.2:3b" # Change this
embedding:
name: "nomic-embed-text" # Change this
```

## 🐛 Troubleshooting

**"Connection refused"**
```bash
# Check Ollama is running
docker ps | grep ollama

# Restart if needed
docker-compose --profile local-llm restart ollama
```

**"Model not found"**
```bash
# Pull the model
docker exec -it airweave-ollama ollama pull llama3.3:70b
```

**Slow responses**
- Use smaller model (`llama3.2:3b`)
- Enable GPU acceleration (see above)
- Check CPU/RAM usage: `docker stats`

**Out of memory**
- Use smaller model
- Increase Docker memory limit
- Add more RAM or use GPU

## 📚 Full Documentation

See `docs/LOCAL_LLAMA_SETUP.md` for:
- Detailed architecture explanation
- Advanced configuration
- Production deployment guide
- Security & privacy considerations
- Cost comparison vs cloud AI
- Performance benchmarks

## 🎉 Summary

You now have a **fully local AI system** running in Docker!

**What happens now:**
- All embeddings → Local Llama models
- All text generation → Local Llama models
- All reranking → Local Llama models
- Zero API costs ✅
- Complete data privacy ✅
- Offline capable ✅

**Next steps:**
1. Start using Airweave normally - it will automatically use local AI
2. Monitor performance with `docker logs -f airweave-backend`
3. Tune models in `defaults.yml` based on your needs
4. Enable GPU for faster inference (optional)

Questions? Check `docs/LOCAL_LLAMA_SETUP.md` or the [Ollama docs](https://ollama.ai/).
1 change: 1 addition & 0 deletions backend/airweave/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ class Settings(BaseSettings):
GROQ_API_KEY: Optional[str] = None
COHERE_API_KEY: Optional[str] = None
CEREBRAS_API_KEY: Optional[str] = None
OLLAMA_BASE_URL: Optional[str] = None
AZURE_KEYVAULT_NAME: Optional[str] = None

# Temporal configuration
Expand Down
39 changes: 39 additions & 0 deletions backend/airweave/search/defaults.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,11 +75,30 @@ provider_models:
max_tokens_per_doc: 8192
max_documents: 1000

ollama:
llm:
name: "llama3.3:70b"
tokenizer: "cl100k_base"
context_window: 128000
embedding:
name: "nomic-embed-text"
tokenizer: "cl100k_base"
dimensions: 768
max_tokens: 8192
rerank:
name: "llama3.3:70b"
tokenizer: "cl100k_base"
context_window: 128000

# Operation preferences - which provider and which models to use
# Format: provider: {llm: model_key, embedding: model_key, rerank: model_key}
operation_preferences:
query_expansion:
order:
- provider: ollama
llm: llm
embedding: null
rerank: null
- provider: cerebras
llm: llm
embedding: null
Expand All @@ -95,6 +114,10 @@ operation_preferences:

query_interpretation:
order:
- provider: ollama
llm: llm
embedding: null
rerank: null
- provider: cerebras
llm: llm
embedding: null
Expand All @@ -110,13 +133,21 @@ operation_preferences:

embed_query:
order:
- provider: ollama
llm: null
embedding: embedding
rerank: null
- provider: openai
llm: null
embedding: embedding
rerank: null

reranking:
order:
- provider: ollama
llm: null
embedding: null
rerank: rerank
- provider: cohere
llm: null
embedding: null
Expand All @@ -132,6 +163,10 @@ operation_preferences:

generate_answer:
order:
- provider: ollama
llm: llm
embedding: null
rerank: null
- provider: cerebras
llm: llm
embedding: null
Expand All @@ -147,6 +182,10 @@ operation_preferences:

federated_search:
order:
- provider: ollama
llm: llm
embedding: null
rerank: null
- provider: cerebras
llm: llm
embedding: null
Expand Down
7 changes: 7 additions & 0 deletions backend/airweave/search/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
from airweave.search.providers.cerebras import CerebrasProvider
from airweave.search.providers.cohere import CohereProvider
from airweave.search.providers.groq import GroqProvider
from airweave.search.providers.ollama import OllamaProvider
from airweave.search.providers.openai import OpenAIProvider
from airweave.search.providers.schemas import (
EmbeddingModelConfig,
Expand Down Expand Up @@ -370,6 +371,7 @@ def _get_available_api_keys(self) -> Dict[str, Optional[str]]:
"groq": getattr(settings, "GROQ_API_KEY", None),
"openai": getattr(settings, "OPENAI_API_KEY", None),
"cohere": getattr(settings, "COHERE_API_KEY", None),
"ollama": getattr(settings, "OLLAMA_BASE_URL", None),
}

def _create_provider_for_each_operation(
Expand Down Expand Up @@ -678,6 +680,11 @@ def _init_all_providers_for_operation(
f"[Factory] Attempting to initialize CohereProvider for {operation_name}"
)
provider = CohereProvider(api_key=api_key, model_spec=model_spec, ctx=ctx)
elif provider_name == "ollama":
ctx.logger.debug(
f"[Factory] Attempting to initialize OllamaProvider for {operation_name}"
)
provider = OllamaProvider(base_url=api_key, model_spec=model_spec, ctx=ctx)
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rule violated: **Check for Cursor Rules Drift** This change introduces a new Ollama provider, but `.cursor/rules/search-module.mdc` still documents only Cerebras, OpenAI, Groq, and Cohere. Please update the Cursor rule to include Ollama’s capabilities, configuration keys (e.g., `OLLAMA_BASE_URL`), and operation preferences so Cursor users get accurate guidance. (Based on your team's feedback about verifying Cursor rules per provider.)

View Feedback

Prompt for AI agents ```text Address the following comment on backend/airweave/search/factory.py at line 687: This change introduces a new Ollama provider, but `.cursor/rules/search-module.mdc` still documents only Cerebras, OpenAI, Groq, and Cohere. Please update the Cursor rule to include Ollama’s capabilities, configuration keys (e.g., `OLLAMA_BASE_URL`), and operation preferences so Cursor users get accurate guidance. (Based on your team's feedback about verifying Cursor rules per provider.) @@ -678,6 +680,11 @@ def _init_all_providers_for_operation( + ctx.logger.debug( + f"[Factory] Attempting to initialize OllamaProvider for {operation_name}" + ) + provider = OllamaProvider(base_url=api_key, model_spec=model_spec, ctx=ctx) if provider: ```
Fix with Cubic


if provider:
initialized_providers.append(provider)
Expand Down
Loading
Loading