Releases: ThoughtWorksInc/daffy
v1.0.0
Stable Release
Daffy 1.0.0 marks the first stable release. The public API (df_in, df_out, df_log) is now considered stable and follows semantic versioning.
Changed
Development status upgraded from Beta to Production/Stable
Updated documentation to reflect current tooling (uv, ruff, pyrefly)
Fixed
Improved error handling for invalid regex patterns in column specifications
Better error messages when parameter extraction fails
Internal
Extracted duplicate row validation logic into shared helper function
Added docstrings to public-facing utility functions
API Stability
As of 1.0.0, Daffy follows semantic versioning:
Major versions (2.0, 3.0) may contain breaking changes
Minor versions (1.1, 1.2) add features without breaking changes
Patch versions (1.0.1, 1.0.2) contain bug fixes only
v0.19.0
v0.18.0
Major Performance Improvements
- 2x faster row validation - Optimized DataFrame conversion and validation pipeline
- 767K rows/sec on simple validation (was 400K)
- 165K rows/sec on complex bioinformatics data (32 columns, 5% missing values)
- Changed from batch TypeAdapter validation to optimized row-by-row with fast DataFrame iteration
- Use
itertuples()for efficient row access while preserving None values
Critical Bug Fix
- Fixed NaN handling for Optional fields - NaN values in Optional fields are now properly converted to None
- Previous implementation failed validation on legitimate missing data
- Now correctly handles nullable numeric fields in pandas DataFrames
- Converts numeric columns with NaN to object dtype to preserve None values
New Features
-
Bioinformatics benchmark (
scripts/benchmark_bioinformatics.py) - Realistic validation testing- 32-column feature store schema modeling cancer research data
- Gene expression, clinical measurements, patient demographics, mutations, outcomes
- Mixed types: floats, ints, strings, bools, Optional fields, Literal enums
- Missing data patterns (~5%) typical in real-world datasets
- Cross-field validation (e.g., disease-free survival ≤ follow-up time)
-
Performance benchmarking suite - Compare Daffy against competing libraries
- Test multiple scenarios (simple, medium complexity)
- Multiple dataset sizes (1k, 10k, 100k rows)
- Compare pandas vs polars implementations
Internal Improvements
- Removed unused internal functions (
_pandas_to_records_fast,_iterate_dataframe_with_index) - Simplified error formatting (removed dead code branches)
- Improved test coverage to 98.34%
- Better handling of edge cases in validation error reporting
v0.17.0
v0.16.1
v0.16.0
- Removed Pandas and Polars from required dependencies. Daffy will not pull in Polars if your project just uses Pandas
and vice versa. All combinations are dynamically supported and require no changes from existing users.
Testing & CI
- Added comprehensive CI testing for all dependency combinations
- New test suite validates optional dependency behavior
- Manual testing script for developers (
scripts/test_isolated_deps.py) - Updated CI to test pandas-only, polars-only, both, and none scenarios
v0.15.0
- Exception messages now include function names to improve debugging
- Input validation:
"Missing columns: ['Col'] in function 'my_func' parameter 'param'. Got columns: ['Other']"
- Input validation:
- Return value validation messages now clearly state "return value" instead of just showing function name
- Output validation:
"Missing columns: ['Col'] in function 'my_func' return value. Got columns: ['Other']"
- Output validation: