Skip to content

Conversation

@elfosardo
Copy link
Member

@elfosardo elfosardo commented Nov 20, 2025

When deleting a host in inspecting or cleaning state, it would get stuck because Ironic rejects power changes while a target provision state is active. This fix detects these states and sends an abort command first, allowing deletion to proceed immediately instead of waiting for inspection/cleaning to complete.

Assisted-By: Claude Sonnet 4.5 (commercial license)

Closes: #2478

@metal3-io-bot metal3-io-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 20, 2025
@elfosardo elfosardo force-pushed the abort-inspection-delete branch from 163362a to 1a00f2a Compare November 20, 2025 17:02
@elfosardo
Copy link
Member Author

/cc @dtantsur @lentzi90 @Rozzii

ironicNode,
nodes.ProvisionStateOpts{Target: nodes.TargetAbort},
)
case nodes.CleanWait, nodes.Cleaning:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only abort manual cleaning (here you can actually check TargetProvisionState) or automated cleaning if the user has disabled it in the meantime (I'm not sure if you can check Node.AutomatedClean, BMO probably won't be able to update it during cleaning; you may want to get this information all the way from the controller).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, makes sense
maybe check ironicNode AutomatedClean?
I'll propose a change, see if that could work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ironicNode.AutomatedClean may be outdated if BMO is unable to set its value (I don't remember if it's updateable during *WAIT states).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmmmm what if we abort only manual cleaning and let automated cleaning finish ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with limiting the scope for now, but we really need to allow aborting automated cleaning by disabling it through BMH.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before reducing the scope, I'll propose another implementation to include the "automated cleaning disabled" scenario
the only way I can think about is passing AutomatedCleaningMode from BMH spec, changing the Provisioner interface
it should not have any external impact

@elfosardo elfosardo force-pushed the abort-inspection-delete branch from 1a00f2a to a441a64 Compare December 1, 2025 10:33
@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 1, 2025
@elfosardo elfosardo force-pushed the abort-inspection-delete branch 2 times, most recently from c8d6e9c to 655cc16 Compare December 1, 2025 10:38
Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! It has been a pain for too long...

@elfosardo elfosardo force-pushed the abort-inspection-delete branch 2 times, most recently from 53064b8 to d98d9c0 Compare December 1, 2025 15:07
@elfosardo
Copy link
Member Author

/retest
weird failure, this change should not influence the failing part

@tuminoid
Copy link
Member

tuminoid commented Dec 2, 2025

/retest

@elfosardo elfosardo force-pushed the abort-inspection-delete branch from d98d9c0 to ccc5ccd Compare December 2, 2025 15:23
Copy link
Member

@Rozzii Rozzii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Rozzii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2025
Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Adding hold in case I missed somethings still being discussed. Feel free to drop it if all was resolved already.
/hold

@metal3-io-bot metal3-io-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Dec 3, 2025
if err != nil || result.Dirty {
return result, err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies in advance if I am mistaken here, but once p.abortInspectionOrCleaning returns successfully, if shouldnt go into the next if block, correct? L1700-L1730?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to apologize!
that's correct, only when the node is NOT in inspection/cleaning state, we will go through the next if statement (the actual power off logic)
we return early in case of abort command sent or if we're waiting for inspection or cleaning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return result at L1732?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean instead of return operationComplete() ?
we could, they both work, and are basically identical
it's just a style improvement, probably returning result is a bit more clean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it, it's still an improvement :)

When deleting a host in inspecting or cleaning state, it would get
stuck because Ironic rejects power changes while a target provision
state is active. This fix detects these states and sends an abort
command first, allowing deletion to proceed immediately instead of
waiting for inspection/cleaning to complete.

Assistec-By: Claude Sonnet 4.5 (commercial license)
Signed-off-by: Riccardo Pittau <[email protected]>
@elfosardo elfosardo force-pushed the abort-inspection-delete branch from ccc5ccd to 4489951 Compare December 5, 2025 14:37
@metal3-io-bot metal3-io-bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2025
@elfosardo
Copy link
Member Author

/unhold

@metal3-io-bot metal3-io-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2025
@MahnoorAsghar
Copy link
Contributor

/lgtm

@metal3-io-bot
Copy link
Contributor

@MahnoorAsghar: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Member

@Rozzii Rozzii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @kashifest
I see this has been approved and lgtm-d a lot, let's merge it.

Copy link
Member

@lentzi90 lentzi90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@metal3-io-bot metal3-io-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2025
@metal3-io-bot metal3-io-bot merged commit 8ab202a into metal3-io:main Dec 8, 2025
18 checks passed
@metal3-io-bot metal3-io-bot added this to the BMO - v0.12 milestone Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BMO cannot abort inspection when deletion is requested

7 participants