Skip to content

Conversation

@Mahanteshambi
Copy link

@Mahanteshambi Mahanteshambi commented Nov 30, 2025

Summary

This PR implements a user-selectable document conversion model feature for Google Drive sources. Users can now choose between "OpenAI", "Mistral", or "Local" (MarkItDown) when configuring a source.

Changes

  • Frontend: Added a dropdown to SourceConfigView to select the conversion model.
  • Backend:
    • Updated GoogleDriveConfig to include the model field with UI hints.
    • Refactored SmartConverter to accept and prioritize a preferred_model argument.
    • Updated EntityPipeline to pass the user's selection from the source config to the converter.

Summary by cubic

Added a user-selectable document conversion model for Google Drive and introduced a SmartConverter that supports MarkItDown (local), OpenAI, and Mistral for better control and quality.

  • New Features

    • Model dropdown in SourceConfigView to choose "openai", "mistral", or "local".
    • GoogleDriveConfig includes a model field with enum and UI hints; backend config fields now support options and ui_component.
    • SmartConverter honors the preferred model and falls back by availability; MarkItDown can use OpenAI when a key is set. EntityPipeline passes the preference to batch conversions.
    • PDF/DOCX/PPTX/IMG routes updated to go through SmartConverter.
    • Added Weaviate destination, local docker-compose, and inspection scripts for testing.
  • Migration

    • Set OPENAI_API_KEY or MISTRAL_API_KEY to enable those models; "local" works without keys.
    • Install markitdown (and openai if using OpenAI) for conversion.
    • Use docker-compose.weaviate.yml to run Weaviate locally if testing the new destination.

Written for commit 908fb64. Summary will update automatically on new commits.

Implements user-selectable document conversion models (OpenAI, Mistral, Local) for Google Drive sources.

- Adds model selection dropdown to SourceConfigView

- Updates GoogleDriveConfig to include model field with UI hints

- Refactors SmartConverter to respect preferred_model

- Updates EntityPipeline to pass user preference during sync
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 15 files

Prompt for AI agents (all 3 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/airweave/platform/destinations/weaviate.py">

<violation number="1" location="backend/airweave/platform/destinations/weaviate.py:52">
P2: Collection names are declared to be alphanumeric only, yet the code inserts an underscore (Airweave_&lt;id&gt;), so collection creation will be rejected by Weaviate.</violation>

<violation number="2" location="backend/airweave/platform/destinations/weaviate.py:58">
P1: Port parsing treats the scheme separator as a port, so URLs without an explicit port (e.g., https://host) crash with ValueError and the destination can never be created.</violation>
</file>

<file name="backend/airweave/platform/converters/smart_converter.py">

<violation number="1" location="backend/airweave/platform/converters/smart_converter.py:58">
P1: Selecting the “Local” conversion model still uses OpenAI whenever an OPENAI_API_KEY is configured, so the new preference cannot disable remote processing.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

# Weaviate collection names must start with a capital letter and be alphanumeric
# We'll use a prefix + sanitized collection ID
sanitized_id = str(collection_id).replace("-", "")
instance.collection_name = f"Airweave_{sanitized_id}"
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Collection names are declared to be alphanumeric only, yet the code inserts an underscore (Airweave_), so collection creation will be rejected by Weaviate.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/airweave/platform/destinations/weaviate.py, line 52:

<comment>Collection names are declared to be alphanumeric only, yet the code inserts an underscore (Airweave_&lt;id&gt;), so collection creation will be rejected by Weaviate.</comment>

<file context>
@@ -0,0 +1,174 @@
+        # Weaviate collection names must start with a capital letter and be alphanumeric
+        # We&#39;ll use a prefix + sanitized collection ID
+        sanitized_id = str(collection_id).replace(&quot;-&quot;, &quot;&quot;)
+        instance.collection_name = f&quot;Airweave_{sanitized_id}&quot;
+
+        if credentials:
</file context>
Suggested change
instance.collection_name = f"Airweave_{sanitized_id}"
+ instance.collection_name = f"Airweave{sanitized_id}"
Fix with Cubic

instance.client = weaviate.use_async_with_params(
client=weaviate.connect.ConnectionParams.from_params(
http_host=credentials.cluster_url.replace("http://", "").replace("https://", "").split(":")[0],
http_port=int(credentials.cluster_url.split(":")[-1]) if ":" in credentials.cluster_url else 8080,
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Port parsing treats the scheme separator as a port, so URLs without an explicit port (e.g., https://host) crash with ValueError and the destination can never be created.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/airweave/platform/destinations/weaviate.py, line 58:

<comment>Port parsing treats the scheme separator as a port, so URLs without an explicit port (e.g., https://host) crash with ValueError and the destination can never be created.</comment>

<file context>
@@ -0,0 +1,174 @@
+            instance.client = weaviate.use_async_with_params(
+                client=weaviate.connect.ConnectionParams.from_params(
+                    http_host=credentials.cluster_url.replace(&quot;http://&quot;, &quot;&quot;).replace(&quot;https://&quot;, &quot;&quot;).split(&quot;:&quot;)[0],
+                    http_port=int(credentials.cluster_url.split(&quot;:&quot;)[-1]) if &quot;:&quot; in credentials.cluster_url else 8080,
+                    http_secure=credentials.cluster_url.startswith(&quot;https&quot;),
+                    grpc_host=credentials.cluster_url.replace(&quot;http://&quot;, &quot;&quot;).replace(&quot;https://&quot;, &quot;&quot;).split(&quot;:&quot;)[0],
</file context>
Fix with Cubic

# 3. Local Strategy (Fallback)
# Always available via MarkItDown (it falls back to local if no LLM client)
try:
self.local_converter = MarkItDownConverter()
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Selecting the “Local” conversion model still uses OpenAI whenever an OPENAI_API_KEY is configured, so the new preference cannot disable remote processing.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/airweave/platform/converters/smart_converter.py, line 58:

<comment>Selecting the “Local” conversion model still uses OpenAI whenever an OPENAI_API_KEY is configured, so the new preference cannot disable remote processing.</comment>

<file context>
@@ -0,0 +1,132 @@
+        # 3. Local Strategy (Fallback)
+        # Always available via MarkItDown (it falls back to local if no LLM client)
+        try:
+            self.local_converter = MarkItDownConverter()
+            logger.info(&quot;SmartConverter: Local strategy available&quot;)
+        except Exception as e:
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant