Skip to content

Releases: ThoughtWorksInc/daffy

v1.0.0

27 Nov 06:35
5de2ddb

Choose a tag to compare

Stable Release

Daffy 1.0.0 marks the first stable release. The public API (df_in, df_out, df_log) is now considered stable and follows semantic versioning.

Changed

Development status upgraded from Beta to Production/Stable
Updated documentation to reflect current tooling (uv, ruff, pyrefly)

Fixed

Improved error handling for invalid regex patterns in column specifications
Better error messages when parameter extraction fails

Internal

Extracted duplicate row validation logic into shared helper function
Added docstrings to public-facing utility functions

API Stability

As of 1.0.0, Daffy follows semantic versioning:

Major versions (2.0, 3.0) may contain breaking changes
Minor versions (1.1, 1.2) add features without breaking changes
Patch versions (1.0.1, 1.0.2) contain bug fixes only

v0.19.0

26 Nov 08:34
fe37e64

Choose a tag to compare

What's Changed

  • Early termination optimization (v0.19.0) - 71-124x speedup for error cases by @vertti in #43

v0.18.0

23 Nov 14:33
84ae5fd

Choose a tag to compare

Major Performance Improvements

  • 2x faster row validation - Optimized DataFrame conversion and validation pipeline
    • 767K rows/sec on simple validation (was 400K)
    • 165K rows/sec on complex bioinformatics data (32 columns, 5% missing values)
    • Changed from batch TypeAdapter validation to optimized row-by-row with fast DataFrame iteration
    • Use itertuples() for efficient row access while preserving None values

Critical Bug Fix

  • Fixed NaN handling for Optional fields - NaN values in Optional fields are now properly converted to None
    • Previous implementation failed validation on legitimate missing data
    • Now correctly handles nullable numeric fields in pandas DataFrames
    • Converts numeric columns with NaN to object dtype to preserve None values

New Features

  • Bioinformatics benchmark (scripts/benchmark_bioinformatics.py) - Realistic validation testing

    • 32-column feature store schema modeling cancer research data
    • Gene expression, clinical measurements, patient demographics, mutations, outcomes
    • Mixed types: floats, ints, strings, bools, Optional fields, Literal enums
    • Missing data patterns (~5%) typical in real-world datasets
    • Cross-field validation (e.g., disease-free survival ≤ follow-up time)
  • Performance benchmarking suite - Compare Daffy against competing libraries

    • Test multiple scenarios (simple, medium complexity)
    • Multiple dataset sizes (1k, 10k, 100k rows)
    • Compare pandas vs polars implementations

Internal Improvements

  • Removed unused internal functions (_pandas_to_records_fast, _iterate_dataframe_with_index)
  • Simplified error formatting (removed dead code branches)
  • Improved test coverage to 98.34%
  • Better handling of edge cases in validation error reporting

v0.17.0

23 Nov 07:24
7deff0d

Choose a tag to compare

What's Changed

  • Add optional row-level validation with Pydantic

v0.16.1

26 Oct 13:41

Choose a tag to compare

  • Internal refactoring: extracted DataFrame type handling to dedicated module for better code organization and maintainability

v0.16.0

26 Oct 13:32
4a0fe2f

Choose a tag to compare

  • Removed Pandas and Polars from required dependencies. Daffy will not pull in Polars if your project just uses Pandas
    and vice versa. All combinations are dynamically supported and require no changes from existing users.

Testing & CI

  • Added comprehensive CI testing for all dependency combinations
  • New test suite validates optional dependency behavior
  • Manual testing script for developers (scripts/test_isolated_deps.py)
  • Updated CI to test pandas-only, polars-only, both, and none scenarios

v0.15.0

26 Oct 13:26

Choose a tag to compare

  • Exception messages now include function names to improve debugging
    • Input validation: "Missing columns: ['Col'] in function 'my_func' parameter 'param'. Got columns: ['Other']"
  • Return value validation messages now clearly state "return value" instead of just showing function name
    • Output validation: "Missing columns: ['Col'] in function 'my_func' return value. Got columns: ['Other']"

v0.14.2

26 Oct 13:25
118a491

Choose a tag to compare

  • Internal code quality improvements

v0.14.1

26 Oct 13:21
15cd40b

Choose a tag to compare

  • Internal code quality improvements

v0.14.0

26 Oct 13:16
6b734cc

Choose a tag to compare

  • Improve df_in error messages to include parameter names