Skip to content

Conversation

@YMahut
Copy link

@YMahut YMahut commented Nov 30, 2025

Summary

Add Weaviate as a new search engine option, bringing semantic/vector search capabilities to Wiki.js.

Features

  • Hybrid search: Combined BM25 keyword + semantic vector search with configurable alpha balance
  • Multiple search modes: hybrid, bm25, nearText
  • Incremental rebuild: Hash-based change detection, only reindexes modified pages
  • Orphan cleanup: Automatically removes deleted pages from index after rebuild
  • Rate limit handling: Exponential backoff with configurable batch delays
  • Result caching: Configurable TTL with cluster-wide invalidation via WIKI.events
  • Result highlighting: Query terms highlighted with <mark> tags
  • Search analytics: Track top searches and zero-result queries (in-memory)
  • Health monitoring: Periodic health checks with metrics endpoint
  • Configurable field boosting: Title, description, tags weights

Requirements

  • Weaviate 1.32+
  • Weaviate class must be pre-created with vectorizer configured (module does not create schema)

Configuration options

Category Settings
Connection host, httpPort, grpcPort, httpSecure, grpcSecure, skipTLSVerify, apiKey, timeout
Search searchType, alpha, searchLimit, cacheTtl, boostTitle, boostDescription, boostTags
Indexing batchSize, batchDelayMs, maxBatchBytes, forceFullRebuild

Why Weaviate?

  • Open source vector database with hybrid search
  • Supports multiple vectorizers (OpenAI, Cohere, HuggingFace, local transformers)
  • Scales horizontally for large wikis
  • Active community and enterprise support available

Test plan

  • Configure Weaviate connection in Administration > Search
  • Run "Rebuild Index" and verify pages are indexed
  • Test search with various queries (keyword, semantic, hybrid)
  • Verify incremental rebuild only updates changed pages
  • Test page create/update/delete operations reflect in search
  • Verify cache invalidation works across cluster nodes

Related

Add semantic search capabilities using Weaviate vector database.

Features:
- Hybrid search (BM25 + semantic vectors)
- Multiple search modes: hybrid, bm25, nearText
- Incremental rebuild with hash-based change detection
- Orphan page cleanup after rebuild
- Rate limit handling with exponential backoff
- Configurable batch indexing (size, delay, max bytes)
- Result caching with cluster-wide invalidation
- Search analytics (top searches, zero-result tracking)
- Health monitoring and metrics

Requirements:
- Weaviate 1.32+
- Pre-configured Weaviate class with vectorizer

Closes #XXXX
@auto-assign auto-assign bot requested a review from NGPixel November 30, 2025 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants