-
Notifications
You must be signed in to change notification settings - Fork 290
🐛 Abort inspection/cleaning before powering off during deletion #2796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Abort inspection/cleaning before powering off during deletion #2796
Conversation
163362a to
1a00f2a
Compare
pkg/provisioner/ironic/ironic.go
Outdated
| ironicNode, | ||
| nodes.ProvisionStateOpts{Target: nodes.TargetAbort}, | ||
| ) | ||
| case nodes.CleanWait, nodes.Cleaning: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only abort manual cleaning (here you can actually check TargetProvisionState) or automated cleaning if the user has disabled it in the meantime (I'm not sure if you can check Node.AutomatedClean, BMO probably won't be able to update it during cleaning; you may want to get this information all the way from the controller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, makes sense
maybe check ironicNode AutomatedClean?
I'll propose a change, see if that could work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ironicNode.AutomatedClean may be outdated if BMO is unable to set its value (I don't remember if it's updateable during *WAIT states).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmmmm what if we abort only manual cleaning and let automated cleaning finish ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with limiting the scope for now, but we really need to allow aborting automated cleaning by disabling it through BMH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before reducing the scope, I'll propose another implementation to include the "automated cleaning disabled" scenario
the only way I can think about is passing AutomatedCleaningMode from BMH spec, changing the Provisioner interface
it should not have any external impact
1a00f2a to
a441a64
Compare
c8d6e9c to
655cc16
Compare
lentzi90
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! It has been a pain for too long...
53064b8 to
d98d9c0
Compare
|
/retest |
|
/retest |
d98d9c0 to
ccc5ccd
Compare
Rozzii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Rozzii The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
lentzi90
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Adding hold in case I missed somethings still being discussed. Feel free to drop it if all was resolved already.
/hold
| if err != nil || result.Dirty { | ||
| return result, err | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies in advance if I am mistaken here, but once p.abortInspectionOrCleaning returns successfully, if shouldnt go into the next if block, correct? L1700-L1730?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to apologize!
that's correct, only when the node is NOT in inspection/cleaning state, we will go through the next if statement (the actual power off logic)
we return early in case of abort command sent or if we're waiting for inspection or cleaning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to return result at L1732?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean instead of return operationComplete() ?
we could, they both work, and are basically identical
it's just a style improvement, probably returning result is a bit more clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it, it's still an improvement :)
When deleting a host in inspecting or cleaning state, it would get stuck because Ironic rejects power changes while a target provision state is active. This fix detects these states and sends an abort command first, allowing deletion to proceed immediately instead of waiting for inspection/cleaning to complete. Assistec-By: Claude Sonnet 4.5 (commercial license) Signed-off-by: Riccardo Pittau <[email protected]>
ccc5ccd to
4489951
Compare
|
/unhold |
|
/lgtm |
|
@MahnoorAsghar: adding LGTM is restricted to approvers and reviewers in OWNERS files. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Rozzii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @kashifest
I see this has been approved and lgtm-d a lot, let's merge it.
lentzi90
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
When deleting a host in inspecting or cleaning state, it would get stuck because Ironic rejects power changes while a target provision state is active. This fix detects these states and sends an abort command first, allowing deletion to proceed immediately instead of waiting for inspection/cleaning to complete.
Assisted-By: Claude Sonnet 4.5 (commercial license)
Closes: #2478