Skip to content

Conversation

@ianballou
Copy link
Member

@ianballou ianballou commented Oct 27, 2025

Note: this PR contains some code generated by Claude Code.

Note: this PR requires theforeman/foreman-tasks#786 and Dynflow/dynflow#446.

Note, as of writing, this PR requires the following patch on top of the Dynflow one:

diff --git a/lib/dynflow/persistence_adapters/sequel.rb b/lib/dynflow/persistence_adapters/sequel.rb
index b36298b..502aadf 100644
--- a/lib/dynflow/persistence_adapters/sequel.rb
+++ b/lib/dynflow/persistence_adapters/sequel.rb
@@ -146,14 +146,22 @@ module Dynflow
 
       def find_ready_delayed_plans(time)
         table_name = :delayed
+        # Find delayed plans where ALL dependencies (if any) are either non-existent or stopped
+        # We use NOT EXISTS to ensure no dependency is in a non-stopped state
         table(table_name)
-          .left_join(TABLES[:execution_plan_dependency], execution_plan_uuid: :execution_plan_uuid)
-          .left_join(TABLES[:execution_plan], uuid: :blocked_by_uuid)
           .where(::Sequel.lit('start_at IS NULL OR (start_at <= ? OR (start_before IS NOT NULL AND start_before <= ?))', time, time))
-          .where(::Sequel[{ state: nil }] | ::Sequel[{ state: 'stopped' }])
           .where(:frozen => false)
+          .where(::Sequel.lit(
+            "NOT EXISTS (
+              SELECT 1
+              FROM #{TABLES[:execution_plan_dependency]} dep
+              LEFT JOIN #{TABLES[:execution_plan]} ep ON dep.blocked_by_uuid = ep.uuid
+              WHERE dep.execution_plan_uuid = #{TABLES[table_name]}.execution_plan_uuid
+              AND ep.state IS NOT NULL
+              AND ep.state != 'stopped'
+            )"
+          ))
           .order_by(:start_at)
-          .select_all(TABLES[table_name])
           .all
           .map { |plan| load_data(plan, table_name) }
       end

Note: everything in config/initializers/foreman_tasks_chaining.rb should actually exist in ForemanTasks. For this prototype I'm defining it here to make the test setup easier.

What are the changes introduced in this pull request?

Implements Dynflow task chaining to improve composite content view auto-publish.

Imagine a composite content view (CCV) with 3 components, A, B, and C.

If A and B are publishing at the same time, and A finishes first, CCV will start publishing while B is still publishing. This is bad because version B.latest exists but is partially-created. So, this PR 'chains' the CCV auto-publish to any publishing children.

There is yet another problem: even if the CCV publish is chained to A and B publishing, two CCV publishes will be triggered because auto-publish is triggered for each child. This doesn't make sense when A and B are publishing together though - there should only be one publish - any more are extra. So, this PR also causes composite publishes to be skipped if the composite has a scheduled publish already waiting.

If a CCV publish is running and then A publishing triggers another CCV publish, then the existing code will:

  1. Create an auto-publish event on an event queue
  2. Attempt to publish via that event, but fail
  3. Try to publish 5 minutes later, and so on

Considerations taken when implementing this change?

The 'when to skip vs when to use polling' is up for debate. At the moment I don't see any chances for content to be missed, but please consider it. Could auto-publish be skipped when it shouldn't have been?

What are the testing steps for this pull request?

  1. Try normal CV publishing
  2. Try normal CCV publishing
  3. Create a CV that publishes slowly and some that publish quickly; add them to a CCV with auto-publish
    a. Don't forget to set the CCV to always use the latest version of the components
  4. Publish the slow CV, then publish the fast CVs
  5. See that a publish task is scheduled but not running before the slow CV finishes
  6. See that the composite publish waits until the slow CV is done to run
  7. See that only a single composite publish is done
  8. Test the existing polling logic by triggering a composite CV auto-publish while one is already running for the same CCV.

Summary by Sourcery

Implement Dynflow-based task chaining for composite content view auto-publishing, preventing race conditions by skipping duplicates, scheduling retries, and ensuring a single coordinated publish per composite.

New Features:

  • Chain composite content view auto-publish tasks so they wait for all component publishes before running
  • Introduce retry events to requeue composite publishes when a publish is already running

Enhancements:

  • Skip duplicate composite publishes when one is already scheduled
  • Propagate execution plan IDs through publish actions to coordinate chaining
  • Provide private helpers to detect publish status and locate sibling component tasks

Tests:

  • Add comprehensive tests for immediate vs chained auto-publish, duplicate-skipping, retry scheduling, and lock conflict handling

Chores:

  • Monkey-patch ForemanTasks.chain to support Dynflow execution plan chaining

Summary by Sourcery

Add Dynflow task chaining to composite content view auto-publish to coordinate component publishes, avoid race conditions, and eliminate duplicate publishes. Detect existing scheduled or running composite publishes to skip or reschedule accordingly. Propagate the triggering component version ID through the publish action and event metadata. Include comprehensive tests covering immediate vs chained triggers, skip/schedule logic, lock conflicts, and task discovery.

New Features:

  • Introduce Dynflow chaining for composite content view auto-publish to wait on sibling component publishes before triggering the composite publish
  • Skip scheduling duplicate composite publishes when one is already scheduled
  • Schedule event-based retry of composite publishes when a publish is already running

Enhancements:

  • Add composite_publish_status to detect scheduled or running composite publishes
  • Implement helper methods to find sibling component and running composite publish tasks
  • Propagate triggered_by_id through Publish action plan and AutoPublishCompositeView event metadata

Tests:

  • Add tests for auto_publish_composites behavior: immediate vs chained triggers, skipping duplicates, scheduling retries, lock conflict handling, and sibling task discovery

@ianballou
Copy link
Member Author

After a meeting with Adam who owns the related Dynflow implementation, we have these results:

  1. Adam will implement some basic chaining UI info in the Dynflow PR and add error handing for failed dependency tasks + polish.
    -> If a parent chained task has a dependency that fails, the parent task will be taken off the schedule. This will likely be the default behavior, but can be optionally turned off.

  2. ForemanTasks PR needs opening, likely by Ian.

  3. We've verified that the logic being introduced to Dynflow makes sense and is safe.

@ianballou ianballou force-pushed the composite-cv-chaining branch from 47c7477 to 973cb03 Compare November 3, 2025 20:40
@ianballou ianballou marked this pull request as ready for review November 3, 2025 20:54
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider extracting the large auto_publish_composites! method into a dedicated service or command object to simplify the model and make the chaining logic easier to maintain and test.
  • The current approach to detecting scheduled and running tasks pulls full task sets into Ruby and filters them—consider pushing the composite CV filtering down into the database query (e.g. more restrictive WHERE clauses) for better performance.
  • In the Sequel adapter patch, switch from embedding a raw NOT EXISTS SQL literal to using Sequel’s built-in DSL for EXISTS/NOT EXISTS to improve readability and guard against quoting issues.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider extracting the large `auto_publish_composites!` method into a dedicated service or command object to simplify the model and make the chaining logic easier to maintain and test.
- The current approach to detecting scheduled and running tasks pulls full task sets into Ruby and filters them—consider pushing the composite CV filtering down into the database query (e.g. more restrictive WHERE clauses) for better performance.
- In the Sequel adapter patch, switch from embedding a raw NOT EXISTS SQL literal to using Sequel’s built-in DSL for EXISTS/NOT EXISTS to improve readability and guard against quoting issues.

## Individual Comments

### Comment 1
<location> `app/models/katello/content_view_version.rb:427-433` </location>
<code_context>
+    # @param task [ForemanTasks::Task] The task to check
+    # @param composite_cv [Katello::ContentView] The composite content view
+    # @return [Boolean] true if task is for this composite CV
+    def scheduled_task_for_composite?(task, composite_cv)
+      delayed_plan = ForemanTasks.dynflow.world.persistence.load_delayed_plan(task.external_id)
+      args = delayed_plan.args
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Rescue block in scheduled_task_for_composite? may mask unexpected errors.

Rescuing StandardError will cause all errors to return false, potentially hiding unrelated issues. Rescue only the exceptions you expect to handle.

```suggestion
    def scheduled_task_for_composite?(task, composite_cv)
      begin
        delayed_plan = ForemanTasks.dynflow.world.persistence.load_delayed_plan(task.external_id)
        args = delayed_plan.args
        args.first.is_a?(::Katello::ContentView) && args.first.id == composite_cv.id
      rescue ForemanTasks::Dynflow::PersistenceError, NoMethodError
        false
      end
    end
```
</issue_to_address>

### Comment 2
<location> `app/models/katello/content_view_version.rb:439` </location>
<code_context>
+    # @param composite_cv [Katello::ContentView] The composite content view
+    # @param sibling_task_ids [Array<String>] Task IDs to wait for
+    # @param description [String] Description for the publish task
+    def trigger_composite_publish(composite_cv, sibling_task_ids, description)
+      if sibling_task_ids.any?
+        trigger_chained_composite_publish(composite_cv, sibling_task_ids, description)
</code_context>

<issue_to_address>
**suggestion:** Consider logging the exception object for StandardError.

Logging the full exception, including its backtrace, will provide more context for troubleshooting.
</issue_to_address>

### Comment 3
<location> `test/models/events/auto_publish_composite_view_test.rb:33` </location>
<code_context>
       end

       def test_run_with_lock_error
+        metadata = { description: "Auto Publish - Test", version_id: component_version.id }
+
</code_context>

<issue_to_address>
**suggestion (testing):** Suggestion to add a test for generic error handling in event run method.

Please add a test that raises a generic exception in the event run method to confirm proper error logging and handling.

Suggested implementation:

```ruby
      def test_run_with_generic_error
        metadata = { description: "Auto Publish - Test", version_id: component_version.id }

        event = AutoPublishCompositeView.new(composite_view.id) do |instance|
          instance.metadata = metadata
        end

        # Stub a method called during run to raise a generic error
        event.stubs(:publish).raises(StandardError.new("Generic error"))

        # Expect error to be logged, not raised
        Rails.logger.expects(:error).with(regexp_matches(/Generic error/))

        assert_nothing_raised do
          event.run
        end
      end

      def test_run_with_publish
        metadata = { description: "Auto Publish - Test", version_id: component_version.id }

```

- If the `publish` method is not the one called during `run`, replace it with the correct method that should be stubbed to raise the error.
- Ensure that the event's `run` method actually logs errors using `Rails.logger.error`. If it does not, you may need to add error logging in the implementation.
- If you use a different logger or error handling mechanism, adjust the expectation accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

# A composite publish is currently running - we need to schedule another one
# after it finishes to include this new component version
Rails.logger.info("Composite CV #{composite_cv.name} publish running, scheduling event for retry")
schedule_auto_publish_event(composite_cv, description)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to follow the code path..Do we need this event if we are triggering composite publish right after this case block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a nil case here that runs trigger_composite_publish. We don't need to trigger another composite publish. In the case that there is one running, trigger_composite_publish will fail on the lock error, so we can just skip it.

Copy link
Contributor

@qcjames53 qcjames53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still wrapping my head around the chaining but I wanted to throw my hat in for this scenario: What happens if a user publishes components A and B in a CCV, and A finishes, but B fails. It seems like the CCV won't be updated with A's new content like in the old system. Wouldn't this leave us in a broken state with an outdated CCV?

end

# Schedule an event to retry composite publish after current one finishes
# @param composite_cv [Katello::ContentView] The composite content view
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeremylenz used to yell at me for putting these @param comments in my PRs. Is this acceptable now? I'm going to do it again if so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence, I like seeing info, but it stands out since not all methods will have em. I certainly wouldn't add them if I had written it instead of Claude. I'll probably remove them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Claude is using https://yardoc.org/ formatting. Anyway, I don't think we need it since we don't use YARD.

@ianballou ianballou force-pushed the composite-cv-chaining branch 4 times, most recently from 53b2f26 to 746c701 Compare November 5, 2025 21:58
@ianballou
Copy link
Member Author

I'm still wrapping my head around the chaining but I wanted to throw my hat in for this scenario: What happens if a user publishes components A and B in a CCV, and A finishes, but B fails. It seems like the CCV won't be updated with A's new content like in the old system. Wouldn't this leave us in a broken state with an outdated CCV?

See this comment here: Dynflow/dynflow#446 (comment)

"(TODO) Error handling so that a chained task is removed from the schedule if a dependency fails (optional, but on by default)"

If B fails, we don't want the composite to publish, the user will need to go in and investigate the situation. I think an outdated CCV in that case is a good thing - it's better than the publish happening and potentially including garbage from the failed B. What do you think?

@ianballou ianballou force-pushed the composite-cv-chaining branch from 746c701 to 0dd3524 Compare November 5, 2025 22:40
@qcjames53
Copy link
Contributor

I'm still wrapping my head around the chaining but I wanted to throw my hat in for this scenario: What happens if a user publishes components A and B in a CCV, and A finishes, but B fails. It seems like the CCV won't be updated with A's new content like in the old system. Wouldn't this leave us in a broken state with an outdated CCV?

See this comment here: Dynflow/dynflow#446 (comment)

"(TODO) Error handling so that a chained task is removed from the schedule if a dependency fails (optional, but on by default)"

If B fails, we don't want the composite to publish, the user will need to go in and investigate the situation. I think an outdated CCV in that case is a good thing - it's better than the publish happening and potentially including garbage from the failed B. What do you think?

That makes sense to me. In fact it's the entire point of this PR. Is there a secret third option where after all child components either pass or fail, we publish only the working components? Or would that require reworking the CCV way too much?

I'm imagining a list of auto-updated components "to_be_published" and a list of "published_successfully" and once the former list empties itself it includes everything on list 2 for the autopublish? Or is this dumb?

@qcjames53
Copy link
Contributor

qcjames53 commented Nov 6, 2025

Oh also I want to confirm this only updates components set to auto-publish. If there's a newer version of C available, will A and B updating also cause C to update? This was an issue in a bug earlier this year.

@ianballou
Copy link
Member Author

ianballou commented Nov 6, 2025

That makes sense to me. In fact it's the entire point of this PR. Is there a secret third option where after all child components either pass or fail, we publish only the working components? Or would that require reworking the CCV way too much?
I'm imagining a list of auto-updated components "to_be_published" and a list of "published_successfully" and once the former list empties itself it includes everything on list 2 for the autopublish? Or is this dumb?

That might be a bit complicated since the composite publish is already scheduled by this point - its dynflow inputs would need to be edited, or the scheduled task would need to be replaced by a newer one.

I think it's possible though & worthy of an incremental improvement on this feature. Dynflow could potentially add a feature where the scheduled task gets 'hot swapped' with a new definition. I think we discussed this in a meeting in fact with @adamruzicka and @sjha4 .

We'd want to consider the UX impacts of allowing these failures to be more transparent versus halting the entire process to have the user go in and inspect.

Oh also I want to confirm this only updates components set to auto-publish. If there's a newer version of C available, will A and B updating also cause C to update? This was an issue in a bug earlier this year.

This feature does indeed only affect auto-publish items where, on the composite, the content view is set to be always at the latest version. Worth a test to double check though!

@ianballou ianballou force-pushed the composite-cv-chaining branch from 0dd3524 to 545590b Compare November 6, 2025 17:20
@ianballou ianballou force-pushed the composite-cv-chaining branch 2 times, most recently from 48386bd to c0a9243 Compare November 17, 2025 19:48
@ianballou
Copy link
Member Author

Here is a hackmd with more information about how this change works: https://hackmd.io/Ab6mGkicTVKURKYlL5fwjw?both

@ianballou ianballou force-pushed the composite-cv-chaining branch from fe6acea to 42da244 Compare November 21, 2025 20:42

args = delayed_plan.args
args.first.is_a?(::Katello::ContentView) && args.first.id == composite_cv.id
rescue NoMethodError, TypeError, Dynflow::Error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rescue NoMethodError, TypeError, Dynflow::Error
rescue NoMethodError, TypeError, Dynflow::Error => e

* All composite publishes now go through event system (not just :running case)
* Event handler performs coordination and chaining
* Pass calling_task_id through event metadata for sibling exclusion
@ianballou ianballou force-pushed the composite-cv-chaining branch from 42da244 to 4799c95 Compare December 2, 2025 21:13
@ianballou ianballou force-pushed the composite-cv-chaining branch from 4799c95 to 4489865 Compare December 2, 2025 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants