Repgine

This project provides a robust and scalable Change Data Capture (CDC) system for synchronizing data between two PostgreSQL databases in near real-time. It is designed to be highly configurable and easy to deploy, making it ideal for a variety of use cases, including data replication, ETL pipelines, and database migrations.

Features

Real-time Data Sync: Captures and applies changes as they happen.
Declarative Configuration: Define your sync jobs using a simple YAML file.
Schema-Aware: Automatically detects your database schema to simplify configuration.
Extensible Architecture: The system is composed of a CLI, a Control Plane, and an Agent, which can be scaled and customized independently.
LLM-Powered: Leverages Large Language Models for advanced features like conversational configuration edits and automatic schema drift handling.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Docker
Go (version 1.19 or later)
Node.js (version 14 or later)
Python (version 3.9 or later) with pip

Installation

Clone the repository:

git clone [email protected]:mintunitish/repgine.git
cd repgine

Install Python dependencies:
```
pip install -r cli/requirements.txt
```

Install Node.js dependencies:

npm install --prefix control/api
npm install --prefix control/ui

Set up environment variables: The init command requires the SOURCE_DSN environment variable to be set to your source PostgreSQL database connection string.
```
export SOURCE_DSN="postgresql://user:password@host:port/dbname"
```

Usage

The primary way to interact with the system is through the repgine CLI.

1. Initialize Configuration

Generate a sync.yaml file by introspecting your source database.

python cli/main.py init

This will create a sync.yaml file in your project root, pre-populated with the tables and columns from your source database.

2. Review and Customize `sync.yaml`

Open the generated sync.yaml and customize it to your needs. You can select which tables to sync, rename columns, or apply transformations.

version: "1.0"
source_dsn: "postgresql://user:pass@host:port/sourcedb"
target_dsn: "postgresql://user:pass@host:port/targetdb"
tables:
  - table_name: "public.users"
    columns:
      - "id"
      - "name"
      - "email"
  - table_name: "public.orders"
    # This table will be excluded
    sync: false

3. Validate the Configuration

Check your sync.yaml file for correctness against the schema.

python cli/main.py validate sync.yaml

4. Deploy the Sync Agent

Start the control plane and deploy the agent to begin the synchronization process.

# Start the control plane API
node control/api/index.js &

# Deploy the sync job
python cli/main.py deploy

5. Monitor Status

Check the status and metrics of your running sync job.

python cli/main.py status

Architecture

The system is composed of three core components that work together to provide a seamless data synchronization experience.

+-----------------+      +-----------------+      +---------------+
|       CLI       |----->|  Control Plane  |<-----|     Agent     |
|   (main.py)     |      |   (API & UI)    |      | (reader, etc) |
+-----------------+      +-----------------+      +---------------+

CLI (cli): A Python-based command-line interface for initializing configurations, validating schemas, and deploying sync jobs.
Control Plane (control): A Node.js application that provides a central API for managing and monitoring sync jobs. It also includes a minimal UI for observability.
Agent (agent): A high-performance Go agent that performs the heavy lifting. It connects to the source database, reads the Write-Ahead Log (WAL) using pgoutput, transforms the data according to the sync.yaml configuration, and applies it to the target database.

Streamer Component

The Streamer is a key part of the agent responsible for applying changes from the source database to the target database. It reads WALMessage events (INSERT, UPDATE, DELETE) and constructs the appropriate SQL statements to execute against the target.

Configuration

The Streamer is configured via the sync.yaml file. This file defines which tables to watch and how to handle them. The Streamer specifically uses the tables and pk fields to construct its SQL queries.

Here is an example of a sync.yaml configuration:

tables:
  users:
    name: users
    pk: id
  products:
    name: products
    pk: product_id

In this example, the Streamer will apply changes to the users and products tables, using the id and product_id columns as the primary keys for UPDATE and DELETE operations, respectively.

Testing the Streamer

The Streamer includes both unit and integration tests.

Unit Tests: These tests use mocking to verify the SQL generation logic without requiring a live database.
```
go test ./agent/stream/...
```

Integration Tests: These tests require live source and target database connections and verify the end-to-end data flow. To run them, set the SOURCE_DB_DSN and TARGET_DB_DSN environment variables:

export SOURCE_DB_DSN="postgresql://user:pass@host:port/sourcedb"
export TARGET_DB_DSN="postgresql://user:pass@host:port/targetdb"
go test -v -run TestStreamer_Integration ./agent/stream

Development

To set up a development environment, ensure you have the prerequisites installed, then follow the installation steps.

Running Tests

The project includes unit and integration tests for each module.

CLI Tests:
```
pytest tests/cli/
```
Control Plane API Tests:
```
npm test --prefix control/api
```
Agent Tests:
```
go test ./agent/...
```

Building with Docker

A Dockerfile is provided to build and run the agent in a containerized environment.

docker build -t repgine-agent .
docker run -e SOURCE_DSN="..." -e TARGET_DSN="..." repgine-agent

Contributing

Contributions are welcome! Please feel free to submit a pull request.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
cli		cli
control		control
tests/cli		tests/cli
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Repgine

Features

Getting Started

Prerequisites

Installation

Usage

1. Initialize Configuration

2. Review and Customize `sync.yaml`

3. Validate the Configuration

4. Deploy the Sync Agent

5. Monitor Status

Architecture

Streamer Component

Configuration

Testing the Streamer

Development

Running Tests

Building with Docker

Contributing

License

About

Uh oh!

Releases

Packages

Languages

mintunitish/repgine

Folders and files

Latest commit

History

Repository files navigation

Repgine

Features

Getting Started

Prerequisites

Installation

Usage

1. Initialize Configuration

2. Review and Customize sync.yaml

3. Validate the Configuration

4. Deploy the Sync Agent

5. Monitor Status

Architecture

Streamer Component

Configuration

Testing the Streamer

Development

Running Tests

Building with Docker

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

2. Review and Customize `sync.yaml`

Packages