Skip to content

Release v0.4.1

Latest

Choose a tag to compare

@github-actions github-actions released this 25 Nov 11:11
· 38 commits to main since this release
v0.4.1
a96ba5e

Release v0.4.1

This is a hotfix release addressing bugs discovered in v0.4.0. We recommend all users running v0.4.0 upgrade to v0.4.1.

🐛 Bug Fixes

Fault Quarantine Uncordoning Issue

Fixed: Resolved a critical issue where the fault quarantine module's node annotations map could become stale, preventing proper uncordoning of nodes. This fix ensures that manual uncordon operations and automated recovery workflows function correctly.

Event Exporter Package Publishing

Fixed: Corrected the event exporter package publishing configuration, ensuring the event exporter component is properly included in releases and can be deployed as expected.

CRIO Runtime Support

Fixed: Added ability to unset runtimeclass as a workaround for CRIO environments where the default runtime class configuration may cause deployment issues. This provides better compatibility with different container runtime configurations.

🔄 Upgrade Instructions

To upgrade from v0.4.0:

helm upgrade nvsentinel oci://ghcr.io/nvidia/nvsentinel \
  --version v0.4.1 \
  --namespace nvsentinel \
  --reuse-values

To install v0.4.1:

helm install nvsentinel oci://ghcr.io/nvidia/nvsentinel \
  --version v0.4.1 \
  --namespace nvsentinel \
  --create-namespace

🙏 Acknowledgments

This hotfix release includes contributions from:

Thank you for the quick turnaround on these critical fixes!

📦 What's Included

All 15 container images from v0.4.0 with the above bug fixes applied.

🔗 Resources

⚠️ Known Limitations

  • This is an experimental/preview release - use caution in production environments
  • Some features are disabled by default and must be explicitly enabled
  • Manual intervention may still be required for certain complex failure scenarios