Releases · ThoughtWorksInc/daffy

27 Nov 06:35

vertti

v1.0.0

5de2ddb

v1.0.0 Latest

Latest

Stable Release

Daffy 1.0.0 marks the first stable release. The public API (df_in, df_out, df_log) is now considered stable and follows semantic versioning.

Changed

Development status upgraded from Beta to Production/Stable
Updated documentation to reflect current tooling (uv, ruff, pyrefly)

Fixed

Improved error handling for invalid regex patterns in column specifications
Better error messages when parameter extraction fails

Internal

Extracted duplicate row validation logic into shared helper function
Added docstrings to public-facing utility functions

API Stability

As of 1.0.0, Daffy follows semantic versioning:

Major versions (2.0, 3.0) may contain breaking changes
Minor versions (1.1, 1.2) add features without breaking changes
Patch versions (1.0.1, 1.0.2) contain bug fixes only

Assets 4

26 Nov 08:34

vertti

v0.19.0

fe37e64

v0.19.0

What's Changed

Early termination optimization (v0.19.0) - 71-124x speedup for error cases by @vertti in #43

Contributors

vertti

Assets 4

23 Nov 14:33

vertti

v0.18.0

84ae5fd

v0.18.0

Major Performance Improvements

2x faster row validation - Optimized DataFrame conversion and validation pipeline
- 767K rows/sec on simple validation (was 400K)
- 165K rows/sec on complex bioinformatics data (32 columns, 5% missing values)
- Changed from batch TypeAdapter validation to optimized row-by-row with fast DataFrame iteration
- Use itertuples() for efficient row access while preserving None values

Critical Bug Fix

Fixed NaN handling for Optional fields - NaN values in Optional fields are now properly converted to None
- Previous implementation failed validation on legitimate missing data
- Now correctly handles nullable numeric fields in pandas DataFrames
- Converts numeric columns with NaN to object dtype to preserve None values

New Features

Bioinformatics benchmark (scripts/benchmark_bioinformatics.py) - Realistic validation testing
- 32-column feature store schema modeling cancer research data
- Gene expression, clinical measurements, patient demographics, mutations, outcomes
- Mixed types: floats, ints, strings, bools, Optional fields, Literal enums
- Missing data patterns (~5%) typical in real-world datasets
- Cross-field validation (e.g., disease-free survival ≤ follow-up time)
Performance benchmarking suite - Compare Daffy against competing libraries
- Test multiple scenarios (simple, medium complexity)
- Multiple dataset sizes (1k, 10k, 100k rows)
- Compare pandas vs polars implementations

Internal Improvements

Removed unused internal functions (_pandas_to_records_fast, _iterate_dataframe_with_index)
Simplified error formatting (removed dead code branches)
Improved test coverage to 98.34%
Better handling of edge cases in validation error reporting

Assets 4

23 Nov 07:24

vertti

v0.17.0

7deff0d

v0.17.0

What's Changed

Add optional row-level validation with Pydantic

Assets 4

26 Oct 13:41

vertti

v0.16.1

aa61643

v0.16.1

Internal refactoring: extracted DataFrame type handling to dedicated module for better code organization and maintainability

Assets 4

26 Oct 13:32

vertti

v0.16.0

4a0fe2f

v0.16.0

Removed Pandas and Polars from required dependencies. Daffy will not pull in Polars if your project just uses Pandas
and vice versa. All combinations are dynamically supported and require no changes from existing users.

Testing & CI

Added comprehensive CI testing for all dependency combinations
New test suite validates optional dependency behavior
Manual testing script for developers (scripts/test_isolated_deps.py)
Updated CI to test pandas-only, polars-only, both, and none scenarios

Assets 4

26 Oct 13:26

vertti

v0.15.0

4b139e2

v0.15.0

Exception messages now include function names to improve debugging
- Input validation: "Missing columns: ['Col'] in function 'my_func' parameter 'param'. Got columns: ['Other']"
Return value validation messages now clearly state "return value" instead of just showing function name
- Output validation: "Missing columns: ['Col'] in function 'my_func' return value. Got columns: ['Other']"

Assets 4

26 Oct 13:25

vertti

v0.14.2

118a491

v0.14.2

Internal code quality improvements

Assets 4

26 Oct 13:21

vertti

v0.14.1

15cd40b

v0.14.1

Internal code quality improvements

Assets 4

26 Oct 13:16

vertti

v0.14.0

6b734cc

v0.14.0

Improve df_in error messages to include parameter names

Assets 4

Releases: ThoughtWorksInc/daffy

v1.0.0

Stable Release

Changed

Fixed

Internal

API Stability

Uh oh!

v0.19.0

What's Changed

Contributors

Uh oh!

v0.18.0

Major Performance Improvements

Critical Bug Fix

New Features

Internal Improvements

Uh oh!

v0.17.0

What's Changed

Uh oh!

v0.16.1

Uh oh!

v0.16.0

Testing & CI

Uh oh!

v0.15.0

Uh oh!

v0.14.2

Uh oh!

v0.14.1

Uh oh!

v0.14.0

Uh oh!