add configurable write_mode to BigQueryIOManager #32998
Open
+215
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary & Motivation
Added a configurable write_mode option to BigQueryIOManager to support schema evolution and different writing strategies for non-partitioned tables.
Currently, the IO manager forces a TRUNCATE operation, which deletes rows but preserves the schema. This causes failures when the DataFrame schema changes (e.g., adding/removing columns).
How I Tested These Changes
Unit Testing: Created a new test file dagster_gcp_tests/bigquery_tests/test_bigquery_write_modes.py using unittest.mock.
Verified that write_mode="truncate" triggers a TRUNCATE TABLE query.
Verified that write_mode="replace" triggers a DROP TABLE IF EXISTS query.
Verified that write_mode="append" triggers no cleanup query.
Verified that partitioned tables ignore the write mode and use the legacy DELETE FROM logic.
Added a factory test to ensure the BigQueryClient is initialized with the correct default write_mode for backward compatibility.
Changelog
feat(gcp): Add write_mode configuration to BigQueryIOManager (supports truncate, replace, append) to allow schema evolution and append-only workflows.