🦉 Data Versioning and ML Experiments
-
Updated
Dec 9, 2025 - Python
🦉 Data Versioning and ML Experiments
Refine high-quality datasets and visual AI models
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Neo4j graph construction from unstructured data using LLMs
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
A system for agentic LLM-powered data processing and ETL
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
Nomic Developer API SDK
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
ContextGem: Effortless LLM extraction from documents
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Get clean data from tricky documents, powered by vision-language models ⚡
AI-Powered Data Processing: Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code
A curated list of resources for Document Understanding (DU) topic
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
Interactively explore unstructured datasets from your dataframe.
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
Curate better data for LLMs
Add a description, image, and links to the unstructured-data topic page so that developers can more easily learn about it.
To associate your repository with the unstructured-data topic, visit your repo's landing page and select "manage topics."