Composite cv chaining #11540

ianballou · 2025-10-27T17:52:43Z

Note: this PR contains some code generated by Claude Code.

Note: this PR requires theforeman/foreman-tasks#786 and Dynflow/dynflow#446.

Note, as of writing, this PR requires the following patch on top of the Dynflow one:

diff --git a/lib/dynflow/persistence_adapters/sequel.rb b/lib/dynflow/persistence_adapters/sequel.rb
index b36298b..502aadf 100644
--- a/lib/dynflow/persistence_adapters/sequel.rb
+++ b/lib/dynflow/persistence_adapters/sequel.rb
@@ -146,14 +146,22 @@ module Dynflow
 
       def find_ready_delayed_plans(time)
         table_name = :delayed
+        # Find delayed plans where ALL dependencies (if any) are either non-existent or stopped
+        # We use NOT EXISTS to ensure no dependency is in a non-stopped state
         table(table_name)
-          .left_join(TABLES[:execution_plan_dependency], execution_plan_uuid: :execution_plan_uuid)
-          .left_join(TABLES[:execution_plan], uuid: :blocked_by_uuid)
           .where(::Sequel.lit('start_at IS NULL OR (start_at <= ? OR (start_before IS NOT NULL AND start_before <= ?))', time, time))
-          .where(::Sequel[{ state: nil }] | ::Sequel[{ state: 'stopped' }])
           .where(:frozen => false)
+          .where(::Sequel.lit(
+            "NOT EXISTS (
+              SELECT 1
+              FROM #{TABLES[:execution_plan_dependency]} dep
+              LEFT JOIN #{TABLES[:execution_plan]} ep ON dep.blocked_by_uuid = ep.uuid
+              WHERE dep.execution_plan_uuid = #{TABLES[table_name]}.execution_plan_uuid
+              AND ep.state IS NOT NULL
+              AND ep.state != 'stopped'
+            )"
+          ))
           .order_by(:start_at)
-          .select_all(TABLES[table_name])
           .all
           .map { |plan| load_data(plan, table_name) }
       end

Note: everything in config/initializers/foreman_tasks_chaining.rb should actually exist in ForemanTasks. For this prototype I'm defining it here to make the test setup easier.

What are the changes introduced in this pull request?

Implements Dynflow task chaining to improve composite content view auto-publish.

Imagine a composite content view (CCV) with 3 components, A, B, and C.

If A and B are publishing at the same time, and A finishes first, CCV will start publishing while B is still publishing. This is bad because version B.latest exists but is partially-created. So, this PR 'chains' the CCV auto-publish to any publishing children.

There is yet another problem: even if the CCV publish is chained to A and B publishing, two CCV publishes will be triggered because auto-publish is triggered for each child. This doesn't make sense when A and B are publishing together though - there should only be one publish - any more are extra. So, this PR also causes composite publishes to be skipped if the composite has a scheduled publish already waiting.

If a CCV publish is running and then A publishing triggers another CCV publish, then the existing code will:

Create an auto-publish event on an event queue
Attempt to publish via that event, but fail
Try to publish 5 minutes later, and so on

Considerations taken when implementing this change?

The 'when to skip vs when to use polling' is up for debate. At the moment I don't see any chances for content to be missed, but please consider it. Could auto-publish be skipped when it shouldn't have been?

What are the testing steps for this pull request?

Try normal CV publishing
Try normal CCV publishing
Create a CV that publishes slowly and some that publish quickly; add them to a CCV with auto-publish
a. Don't forget to set the CCV to always use the latest version of the components
Publish the slow CV, then publish the fast CVs
See that a publish task is scheduled but not running before the slow CV finishes
See that the composite publish waits until the slow CV is done to run
See that only a single composite publish is done
Test the existing polling logic by triggering a composite CV auto-publish while one is already running for the same CCV.

Summary by Sourcery

Implement Dynflow-based task chaining for composite content view auto-publishing, preventing race conditions by skipping duplicates, scheduling retries, and ensuring a single coordinated publish per composite.

New Features:

Chain composite content view auto-publish tasks so they wait for all component publishes before running
Introduce retry events to requeue composite publishes when a publish is already running

Enhancements:

Skip duplicate composite publishes when one is already scheduled
Propagate execution plan IDs through publish actions to coordinate chaining
Provide private helpers to detect publish status and locate sibling component tasks

Tests:

Add comprehensive tests for immediate vs chained auto-publish, duplicate-skipping, retry scheduling, and lock conflict handling

Chores:

Monkey-patch ForemanTasks.chain to support Dynflow execution plan chaining

Summary by Sourcery

Add Dynflow task chaining to composite content view auto-publish to coordinate component publishes, avoid race conditions, and eliminate duplicate publishes. Detect existing scheduled or running composite publishes to skip or reschedule accordingly. Propagate the triggering component version ID through the publish action and event metadata. Include comprehensive tests covering immediate vs chained triggers, skip/schedule logic, lock conflicts, and task discovery.

New Features:

Introduce Dynflow chaining for composite content view auto-publish to wait on sibling component publishes before triggering the composite publish
Skip scheduling duplicate composite publishes when one is already scheduled
Schedule event-based retry of composite publishes when a publish is already running

Enhancements:

Add composite_publish_status to detect scheduled or running composite publishes
Implement helper methods to find sibling component and running composite publish tasks
Propagate triggered_by_id through Publish action plan and AutoPublishCompositeView event metadata

Tests:

Add tests for auto_publish_composites behavior: immediate vs chained triggers, skipping duplicates, scheduling retries, lock conflict handling, and sibling task discovery

ianballou · 2025-10-29T15:59:23Z

After a meeting with Adam who owns the related Dynflow implementation, we have these results:

Adam will implement some basic chaining UI info in the Dynflow PR and add error handing for failed dependency tasks + polish.
-> If a parent chained task has a dependency that fails, the parent task will be taken off the schedule. This will likely be the default behavior, but can be optionally turned off.
ForemanTasks PR needs opening, likely by Ian.
We've verified that the logic being introduced to Dynflow makes sense and is safe.

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Consider extracting the large auto_publish_composites! method into a dedicated service or command object to simplify the model and make the chaining logic easier to maintain and test.
The current approach to detecting scheduled and running tasks pulls full task sets into Ruby and filters them—consider pushing the composite CV filtering down into the database query (e.g. more restrictive WHERE clauses) for better performance.
In the Sequel adapter patch, switch from embedding a raw NOT EXISTS SQL literal to using Sequel’s built-in DSL for EXISTS/NOT EXISTS to improve readability and guard against quoting issues.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider extracting the large `auto_publish_composites!` method into a dedicated service or command object to simplify the model and make the chaining logic easier to maintain and test.
- The current approach to detecting scheduled and running tasks pulls full task sets into Ruby and filters them—consider pushing the composite CV filtering down into the database query (e.g. more restrictive WHERE clauses) for better performance.
- In the Sequel adapter patch, switch from embedding a raw NOT EXISTS SQL literal to using Sequel’s built-in DSL for EXISTS/NOT EXISTS to improve readability and guard against quoting issues.

## Individual Comments

### Comment 1
<location> `app/models/katello/content_view_version.rb:427-433` </location>
<code_context>
+    # @param task [ForemanTasks::Task] The task to check
+    # @param composite_cv [Katello::ContentView] The composite content view
+    # @return [Boolean] true if task is for this composite CV
+    def scheduled_task_for_composite?(task, composite_cv)
+      delayed_plan = ForemanTasks.dynflow.world.persistence.load_delayed_plan(task.external_id)
+      args = delayed_plan.args
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Rescue block in scheduled_task_for_composite? may mask unexpected errors.

Rescuing StandardError will cause all errors to return false, potentially hiding unrelated issues. Rescue only the exceptions you expect to handle.

```suggestion
    def scheduled_task_for_composite?(task, composite_cv)
      begin
        delayed_plan = ForemanTasks.dynflow.world.persistence.load_delayed_plan(task.external_id)
        args = delayed_plan.args
        args.first.is_a?(::Katello::ContentView) && args.first.id == composite_cv.id
      rescue ForemanTasks::Dynflow::PersistenceError, NoMethodError
        false
      end
    end
```
</issue_to_address>

### Comment 2
<location> `app/models/katello/content_view_version.rb:439` </location>
<code_context>
+    # @param composite_cv [Katello::ContentView] The composite content view
+    # @param sibling_task_ids [Array<String>] Task IDs to wait for
+    # @param description [String] Description for the publish task
+    def trigger_composite_publish(composite_cv, sibling_task_ids, description)
+      if sibling_task_ids.any?
+        trigger_chained_composite_publish(composite_cv, sibling_task_ids, description)
</code_context>

<issue_to_address>
**suggestion:** Consider logging the exception object for StandardError.

Logging the full exception, including its backtrace, will provide more context for troubleshooting.
</issue_to_address>

### Comment 3
<location> `test/models/events/auto_publish_composite_view_test.rb:33` </location>
<code_context>
       end

       def test_run_with_lock_error
+        metadata = { description: "Auto Publish - Test", version_id: component_version.id }
+
</code_context>

<issue_to_address>
**suggestion (testing):** Suggestion to add a test for generic error handling in event run method.

Please add a test that raises a generic exception in the event run method to confirm proper error logging and handling.

Suggested implementation:

```ruby
      def test_run_with_generic_error
        metadata = { description: "Auto Publish - Test", version_id: component_version.id }

        event = AutoPublishCompositeView.new(composite_view.id) do |instance|
          instance.metadata = metadata
        end

        # Stub a method called during run to raise a generic error
        event.stubs(:publish).raises(StandardError.new("Generic error"))

        # Expect error to be logged, not raised
        Rails.logger.expects(:error).with(regexp_matches(/Generic error/))

        assert_nothing_raised do
          event.run
        end
      end

      def test_run_with_publish
        metadata = { description: "Auto Publish - Test", version_id: component_version.id }

```

- If the `publish` method is not the one called during `run`, replace it with the correct method that should be stubbed to raise the error.
- Ensure that the event's `run` method actually logs errors using `Rails.logger.error`. If it does not, you may need to add error logging in the implementation.
- If you use a different logger or error handling mechanism, adjust the expectation accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

app/models/katello/content_view_version.rb

sjha4 · 2025-11-04T18:33:44Z

app/models/katello/content_view_version.rb

+            # A composite publish is currently running - we need to schedule another one
+            # after it finishes to include this new component version
+            Rails.logger.info("Composite CV #{composite_cv.name} publish running, scheduling event for retry")
+            schedule_auto_publish_event(composite_cv, description)


Trying to follow the code path..Do we need this event if we are triggering composite publish right after this case block?

There should be a nil case here that runs trigger_composite_publish. We don't need to trigger another composite publish. In the case that there is one running, trigger_composite_publish will fail on the lock error, so we can just skip it.

qcjames53

I'm still wrapping my head around the chaining but I wanted to throw my hat in for this scenario: What happens if a user publishes components A and B in a CCV, and A finishes, but B fails. It seems like the CCV won't be updated with A's new content like in the old system. Wouldn't this leave us in a broken state with an outdated CCV?

qcjames53 · 2025-11-04T22:08:28Z

app/models/katello/content_view_version.rb

+    end
+
+    # Schedule an event to retry composite publish after current one finishes
+    # @param composite_cv [Katello::ContentView] The composite content view


@jeremylenz used to yell at me for putting these @param comments in my PRs. Is this acceptable now? I'm going to do it again if so.

I'm on the fence, I like seeing info, but it stands out since not all methods will have em. I certainly wouldn't add them if I had written it instead of Claude. I'll probably remove them.

Looks like Claude is using https://yardoc.org/ formatting. Anyway, I don't think we need it since we don't use YARD.

ianballou · 2025-11-05T22:35:57Z

I'm still wrapping my head around the chaining but I wanted to throw my hat in for this scenario: What happens if a user publishes components A and B in a CCV, and A finishes, but B fails. It seems like the CCV won't be updated with A's new content like in the old system. Wouldn't this leave us in a broken state with an outdated CCV?

See this comment here: Dynflow/dynflow#446 (comment)

"(TODO) Error handling so that a chained task is removed from the schedule if a dependency fails (optional, but on by default)"

If B fails, we don't want the composite to publish, the user will need to go in and investigate the situation. I think an outdated CCV in that case is a good thing - it's better than the publish happening and potentially including garbage from the failed B. What do you think?

qcjames53 · 2025-11-06T14:06:34Z

I'm still wrapping my head around the chaining but I wanted to throw my hat in for this scenario: What happens if a user publishes components A and B in a CCV, and A finishes, but B fails. It seems like the CCV won't be updated with A's new content like in the old system. Wouldn't this leave us in a broken state with an outdated CCV?

See this comment here: Dynflow/dynflow#446 (comment)

"(TODO) Error handling so that a chained task is removed from the schedule if a dependency fails (optional, but on by default)"

If B fails, we don't want the composite to publish, the user will need to go in and investigate the situation. I think an outdated CCV in that case is a good thing - it's better than the publish happening and potentially including garbage from the failed B. What do you think?

That makes sense to me. In fact it's the entire point of this PR. Is there a secret third option where after all child components either pass or fail, we publish only the working components? Or would that require reworking the CCV way too much?

I'm imagining a list of auto-updated components "to_be_published" and a list of "published_successfully" and once the former list empties itself it includes everything on list 2 for the autopublish? Or is this dumb?

qcjames53 · 2025-11-06T14:08:10Z

Oh also I want to confirm this only updates components set to auto-publish. If there's a newer version of C available, will A and B updating also cause C to update? This was an issue in a bug earlier this year.

ianballou · 2025-11-06T17:08:17Z

That makes sense to me. In fact it's the entire point of this PR. Is there a secret third option where after all child components either pass or fail, we publish only the working components? Or would that require reworking the CCV way too much?
I'm imagining a list of auto-updated components "to_be_published" and a list of "published_successfully" and once the former list empties itself it includes everything on list 2 for the autopublish? Or is this dumb?

That might be a bit complicated since the composite publish is already scheduled by this point - its dynflow inputs would need to be edited, or the scheduled task would need to be replaced by a newer one.

I think it's possible though & worthy of an incremental improvement on this feature. Dynflow could potentially add a feature where the scheduled task gets 'hot swapped' with a new definition. I think we discussed this in a meeting in fact with @adamruzicka and @sjha4 .

We'd want to consider the UX impacts of allowing these failures to be more transparent versus halting the entire process to have the user go in and inspect.

Oh also I want to confirm this only updates components set to auto-publish. If there's a newer version of C available, will A and B updating also cause C to update? This was an issue in a bug earlier this year.

This feature does indeed only affect auto-publish items where, on the composite, the content view is set to be always at the latest version. Worth a test to double check though!

ianballou · 2025-11-17T20:11:15Z

Here is a hackmd with more information about how this change works: https://hackmd.io/Ab6mGkicTVKURKYlL5fwjw?both

sjha4 · 2025-12-02T18:47:46Z

app/models/katello/content_view_version.rb

+
+      args = delayed_plan.args
+      args.first.is_a?(::Katello::ContentView) && args.first.id == composite_cv.id
+    rescue NoMethodError, TypeError, Dynflow::Error


Suggested change

rescue NoMethodError, TypeError, Dynflow::Error

rescue NoMethodError, TypeError, Dynflow::Error => e

* All composite publishes now go through event system (not just :running case) * Event handler performs coordination and chaining * Pass calling_task_id through event metadata for sibling exclusion

pr-processor bot added the Not yet reviewed label Oct 27, 2025

ianballou mentioned this pull request Oct 27, 2025

Add execution plan chaining Dynflow/dynflow#446

Open

ianballou force-pushed the composite-cv-chaining branch from 47c7477 to 973cb03 Compare November 3, 2025 20:40

ianballou marked this pull request as ready for review November 3, 2025 20:54

sourcery-ai bot reviewed Nov 3, 2025

View reviewed changes

app/models/katello/content_view_version.rb Show resolved Hide resolved

app/models/katello/content_view_version.rb Outdated Show resolved Hide resolved

pr-processor bot removed the Not yet reviewed label Nov 3, 2025

sjha4 reviewed Nov 4, 2025

View reviewed changes

qcjames53 reviewed Nov 5, 2025

View reviewed changes

ianballou force-pushed the composite-cv-chaining branch 4 times, most recently from 53b2f26 to 746c701 Compare November 5, 2025 21:58

ianballou force-pushed the composite-cv-chaining branch from 746c701 to 0dd3524 Compare November 5, 2025 22:40

ianballou force-pushed the composite-cv-chaining branch from 0dd3524 to 545590b Compare November 6, 2025 17:20

ianballou force-pushed the composite-cv-chaining branch 2 times, most recently from 48386bd to c0a9243 Compare November 17, 2025 19:48

ianballou force-pushed the composite-cv-chaining branch from fe6acea to 42da244 Compare November 21, 2025 20:42

sjha4 reviewed Dec 2, 2025

View reviewed changes

ianballou added 6 commits December 2, 2025 20:51

Fixes #38856 - chain composite CV publishes to wait on children

a38d4a6

Refs #38856 - do not run ccv publish unnecessary times

f95740f

Refs #38856 - properly look up other scheduled CCV tasks

6b8b115

Refs #38856 - refactor CCV chaining logic

2e3a789

Refs #38856 - use event polling to chain running CCV publishes.

89fbaaa

Refs #38856 - address rubocop concerns.

383b737

ianballou added 2 commits December 2, 2025 20:51

Refs #38856 - remove ForemanTasks patch for chaining

f020e7a

Refs #38856 - simplify auto-publish to always use events

30b3b75

* All composite publishes now go through event system (not just :running case) * Event handler performs coordination and chaining * Pass calling_task_id through event metadata for sibling exclusion

ianballou force-pushed the composite-cv-chaining branch from 42da244 to 4799c95 Compare December 2, 2025 21:13

Refs #38856 - address rubocop concerns again.

4489865

ianballou force-pushed the composite-cv-chaining branch from 4799c95 to 4489865 Compare December 2, 2025 21:21

	rescue NoMethodError, TypeError, Dynflow::Error
	rescue NoMethodError, TypeError, Dynflow::Error => e

Composite cv chaining #11540

Are you sure you want to change the base?

Composite cv chaining #11540

Conversation

ianballou commented Oct 27, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are the changes introduced in this pull request?

Considerations taken when implementing this change?

What are the testing steps for this pull request?

Summary by Sourcery

Summary by Sourcery

Uh oh!

ianballou commented Oct 29, 2025

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sjha4 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

ianballou Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

qcjames53 left a comment

Choose a reason for hiding this comment

Uh oh!

qcjames53 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

ianballou Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

ianballou Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

ianballou commented Nov 5, 2025

Uh oh!

qcjames53 commented Nov 6, 2025

Uh oh!

qcjames53 commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianballou commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianballou commented Nov 17, 2025

Uh oh!

sjha4 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ianballou commented Oct 27, 2025 •

edited by sourcery-ai bot

Loading

qcjames53 commented Nov 6, 2025 •

edited

Loading

ianballou commented Nov 6, 2025 •

edited

Loading