-
Notifications
You must be signed in to change notification settings - Fork 641
Add Neo4j Graph Database Connector #1136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Neo4j Graph Database Connector #1136
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 6 files
Prompt for AI agents (all 2 issues)
Understand the root cause of the following 2 issues and fix them.
<file name="backend/tests/unit/platform/sources/test_neo4j.py">
<violation number="1" location="backend/tests/unit/platform/sources/test_neo4j.py:368">
`test_neo4j_entity_class_creation` should assert that reserved properties use the normalized field name (e.g., `name_field`) instead of the raw `name`, otherwise the test cannot catch collisions.</violation>
<violation number="2" location="backend/tests/unit/platform/sources/test_neo4j.py:419">
The full-sync test should assert that exactly four entities were yielded to match the documented expectation, otherwise missing labels go undetected.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
|
|
||
| # Verify we got entities from all labels | ||
| # 2 Person + 1 Company + 1 Product = 4 entities | ||
| assert len(entities) >= 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full-sync test should assert that exactly four entities were yielded to match the documented expectation, otherwise missing labels go undetected.
Prompt for AI agents
Address the following comment on backend/tests/unit/platform/sources/test_neo4j.py at line 419:
<comment>The full-sync test should assert that exactly four entities were yielded to match the documented expectation, otherwise missing labels go undetected.</comment>
<file context>
@@ -0,0 +1,539 @@
+
+ # Verify we got entities from all labels
+ # 2 Person + 1 Company + 1 Product = 4 entities
+ assert len(entities) >= 3
+
+ # Verify entity properties
</file context>
| assert len(entities) >= 3 | |
| assert len(entities) == 4 |
| assert entity_class is not None | ||
| assert hasattr(entity_class, "model_fields") | ||
| assert "id_" in entity_class.model_fields # 'id' normalized to 'id_' | ||
| assert "name" in entity_class.model_fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_neo4j_entity_class_creation should assert that reserved properties use the normalized field name (e.g., name_field) instead of the raw name, otherwise the test cannot catch collisions.
Prompt for AI agents
Address the following comment on backend/tests/unit/platform/sources/test_neo4j.py at line 368:
<comment>`test_neo4j_entity_class_creation` should assert that reserved properties use the normalized field name (e.g., `name_field`) instead of the raw `name`, otherwise the test cannot catch collisions.</comment>
<file context>
@@ -0,0 +1,539 @@
+ assert entity_class is not None
+ assert hasattr(entity_class, "model_fields")
+ assert "id_" in entity_class.model_fields # 'id' normalized to 'id_'
+ assert "name" in entity_class.model_fields
+ assert "email" in entity_class.model_fields
+ assert "age" in entity_class.model_fields
</file context>
| assert "name" in entity_class.model_fields | |
| assert "name_field" in entity_class.model_fields |
Issue
Issue #1117 requested integration with Neo4j, a leading graph database platform. The request was minimal ("neo4j as a source would also be helpful"), without specifying requirements, use cases, or what data should be synced.
Problem Identification
After analyzing the request and Neo4j's capabilities, I identified the following requirements:
Data to Index: Neo4j stores graph data with two primary components:
Authentication: Neo4j uses URI + username/password authentication:
neo4j://localhost:7687)Key Use Cases:
Technical Challenges:
Solution Design
I designed a solution following Airweave's existing PostgreSQL connector pattern, as Neo4j is also a database:
Architecture Decisions
Dynamic Entity Creation:
PolymorphicEntity(like PostgreSQL) for dynamic label-based entitiesSchema Discovery:
CALL db.labels()to discover all node labelsConfiguration:
Streaming Strategy:
Entity ID Generation:
id,uuid,guid)Error Handling:
Implementation
Components Delivered
1. Authentication Configuration
Enhanced
Neo4jAuthConfigwith proper field validation:min_length=1constraint2. Source Configuration
Simple
Neo4jConfigclass:SourceConfigfor now3. Neo4j Source Connector
Built comprehensive connector (
backend/airweave/platform/sources/neo4j.py, 492 lines):db.labels()procedurePolymorphicEntitycreation per label4. Comprehensive Testing
Wrote 15 unit tests (
backend/tests/unit/platform/sources/test_neo4j.py, 539 lines):All tests use exact mock responses matching Neo4j's driver API format with custom
MockNodeandMockResultclasses.5. Frontend Integration
Testing Strategy
Similar to n8n, comprehensive mocking approach:
MockNodeclass mimicking Neo4j's node structureMockResultclass for async iteration of query resultsAsyncGraphDatabase.driver()and session managementResults
Test Coverage
Fixes : #1117
Summary by cubic
Adds a Neo4j connector to sync graph nodes by label into dynamic entities with type mapping and efficient streaming. Enables quick indexing and search of Neo4j data; addresses #1117.
New Features
Testing
Written for commit 4d1f3f5. Summary will update automatically on new commits.