Release v0.4.1
This is a hotfix release addressing bugs discovered in v0.4.0. We recommend all users running v0.4.0 upgrade to v0.4.1.
🐛 Bug Fixes
Fault Quarantine Uncordoning Issue
Fixed: Resolved a critical issue where the fault quarantine module's node annotations map could become stale, preventing proper uncordoning of nodes. This fix ensures that manual uncordon operations and automated recovery workflows function correctly.
Event Exporter Package Publishing
Fixed: Corrected the event exporter package publishing configuration, ensuring the event exporter component is properly included in releases and can be deployed as expected.
CRIO Runtime Support
Fixed: Added ability to unset runtimeclass as a workaround for CRIO environments where the default runtime class configuration may cause deployment issues. This provides better compatibility with different container runtime configurations.
🔄 Upgrade Instructions
To upgrade from v0.4.0:
helm upgrade nvsentinel oci://ghcr.io/nvidia/nvsentinel \
--version v0.4.1 \
--namespace nvsentinel \
--reuse-valuesTo install v0.4.1:
helm install nvsentinel oci://ghcr.io/nvidia/nvsentinel \
--version v0.4.1 \
--namespace nvsentinel \
--create-namespace🙏 Acknowledgments
This hotfix release includes contributions from:
Thank you for the quick turnaround on these critical fixes!
📦 What's Included
All 15 container images from v0.4.0 with the above bug fixes applied.
🔗 Resources
- GitHub Repository: https://github.com/NVIDIA/NVSentinel
- Container Registry: ghcr.io/nvidia/nvsentinel
- Documentation: See
/docsdirectory in repository - Issue Tracker: https://github.com/NVIDIA/NVSentinel/issues
- Discussions: https://github.com/NVIDIA/NVSentinel/discussions
⚠️ Known Limitations
- This is an experimental/preview release - use caution in production environments
- Some features are disabled by default and must be explicitly enabled
- Manual intervention may still be required for certain complex failure scenarios