-
Notifications
You must be signed in to change notification settings - Fork 693
Fix YOLO-NAS model loading by patching SuperGradients URLs #6624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
SuperGradients pretrained model weights were originally hosted at https://sghub.deci.ai, but this domain no longer resolves via DNS. The weights have been migrated to AWS S3 at https://sg-hub-nv.s3.amazonaws.com. This patch automatically updates the URLs in the SuperGradients pretrained models registry before loading YOLO-NAS models, ensuring compatibility with the new hosting location. Fixes loading failures with error: `<urlopen error [Errno -2] Name or service not known>` Tested with fresh downloads of all three YOLO-NAS variants (S, M, L). See: Deci-AI/super-gradients#2064
WalkthroughAdded a private function Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant TorchYoloNasModel
participant _patch_super_gradients_urls
participant SuperGradients
participant PatchedLoader
Caller->>TorchYoloNasModel: _load_model()
activate TorchYoloNasModel
TorchYoloNasModel->>_patch_super_gradients_urls: invoke()
activate _patch_super_gradients_urls
_patch_super_gradients_urls->>SuperGradients: import registry / MODEL_URLS
_patch_super_gradients_urls->>SuperGradients: replace old domain URLs -> AWS S3
_patch_super_gradients_urls->>SuperGradients: replace/wrap load_pretrained_weights -> PatchedLoader
_patch_super_gradients_urls-->>TorchYoloNasModel: return (patched, logged)
deactivate _patch_super_gradients_urls
TorchYoloNasModel->>PatchedLoader: request checkpoint load (uses patched URLs/loader)
PatchedLoader->>SuperGradients: fetch weights (old or new URL handled)
PatchedLoader-->>TorchYoloNasModel: return weights
TorchYoloNasModel-->>Caller: return loaded model
deactivate TorchYoloNasModel
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
fiftyone/utils/super_gradients.py (1)
53-54: Consider narrowing the exception types caught.The broad
Exceptioncatch is flagged by static analysis (BLE001). While defensive error handling is appropriate here to avoid breaking model loading, consider catching more specific exceptions to avoid masking unexpected errors:- except Exception as e: + except (ImportError, AttributeError, KeyError, TypeError) as e: logger.warning(f"Failed to patch super-gradients URLs: {e}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/super_gradients.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
fiftyone/utils/super_gradients.py
53-53: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: e2e / test-e2e
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: lint / eslint
- GitHub Check: test / test-app
- GitHub Check: build / build
- GitHub Check: build
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (2)
fiftyone/utils/super_gradients.py (2)
120-128: LGTM!The patch function is correctly placed before the
super_gradients.training.models.get()call. While it runs unconditionally (even whenconfig.modelis pre-loaded), the overhead is minimal and the idempotent nature of the patching makes repeated calls safe.
39-48: I need to verify whetherPRETRAINED_NUM_CLASSESis the correct registry to patch. Let me investigate the super-gradients library structure and what this dictionary actually contains.
<function_calls>
<function_calls>
#!/bin/bashSearch for super_gradients imports and PRETRAINED_NUM_CLASSES usage in the codebase
rg -n "PRETRAINED_NUM_CLASSES|PRETRAINED_WEIGHTS" --type py -C 3
</function_calls>
</function_calls><function_calls>
#!/bin/bashFind super_gradients references in the codebase to understand import structure
rg -n "super_gradients" --type py -C 2 | head -100
</function_calls><function_calls>
#!/bin/bashRead the file under review to see the full context
cat -n fiftyone/utils/super_gradients.py | head -80
</function_calls><function_calls>
super-gradients PRETRAINED_NUM_CLASSES dictionary structure
</function_calls>
|
Hi @CharlesCNorton , thanks again for the PR! I have one question about the for key, value in pretrained_models.PRETRAINED_NUM_CLASSES.items():But I think we should use: for key, value in pretrained_models.MODEL_URLS.items():Is there a specific reason we iterate over the number of classes instead of the model URLs? Here are my logs: PRETRAINED_NUM_CLASSES: MODEL_URLS (sample): Also, which Super-Gradients version are you using? Just want to make sure we’re aligned with the correct version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fiftyone/utils/super_gradients.py (2)
53-54: Refine exception handling to be more specific.The current implementation catches all exceptions, which the static analyzer correctly flags. Consider catching specific exceptions to better diagnose potential issues while still maintaining defensive behavior.
Apply this diff to catch more specific exceptions:
- except Exception as e: - logger.warning(f"Failed to patch super-gradients URLs: {e}") + except (ImportError, AttributeError, TypeError) as e: + logger.warning(f"Failed to patch super-gradients URLs: {e}")
121-121: Consider one-time patching for minor efficiency improvement.The patching function is called every time a model is loaded. While the overhead is minimal (subsequent calls just iterate and find no matches), you could optimize this with a module-level flag to patch only once.
Apply this diff to add one-time patching:
+_urls_patched = False + def _patch_super_gradients_urls(): """Patches super-gradients pretrained model URLs to use the current hosting location. SuperGradients model weights were originally hosted at https://sghub.deci.ai but have been migrated to https://sg-hub-nv.s3.amazonaws.com. This function updates the URLs in the pretrained models registry to point to the new location. See: https://github.com/Deci-AI/super-gradients/issues/2064 """ + global _urls_patched + if _urls_patched: + return + try: import super_gradients.training.pretrained_models as pretrained_models if hasattr(pretrained_models, "MODEL_URLS"): old_domain = "https://sghub.deci.ai" new_domain = "https://sg-hub-nv.s3.amazonaws.com" patched_count = 0 for key, value in pretrained_models.MODEL_URLS.items(): if isinstance(value, str) and value.startswith(old_domain): new_url = value.replace(old_domain, new_domain) pretrained_models.MODEL_URLS[key] = new_url patched_count += 1 logger.debug(f"Patched URL for {key}: {value} -> {new_url}") if patched_count > 0: logger.info(f"Patched {patched_count} super-gradients model URLs to use AWS S3") + + _urls_patched = True except (ImportError, AttributeError, TypeError) as e: logger.warning(f"Failed to patch super-gradients URLs: {e}")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/super_gradients.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
fiftyone/utils/super_gradients.py
53-53: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-app
- GitHub Check: e2e / test-e2e
- GitHub Check: lint / eslint
- GitHub Check: build / build
- GitHub Check: build
🔇 Additional comments (3)
fiftyone/utils/super_gradients.py (3)
27-55: Well implemented URL patching logic.The function correctly uses
pretrained_models.MODEL_URLSto iterate over model URLs and replace the deprecated domain with the new AWS S3 location. The defensivehasattrcheck and logging are good practices.
121-121: Correct placement of the patching call.Calling
_patch_super_gradients_urls()at the start of_load_modelensures URLs are patched before any model loading occurs, which is the right approach.
39-49: The original review comment is malformed and incomplete—it ends with `` and shell script instructions that are not part of a proper review comment.The code itself is well-designed for compatibility: the
hasattrcheck on line 39 ensuresMODEL_URLSexists before accessing it, and the entire operation is wrapped in a try-except block (lines 36–54) that gracefully handles any errors. This defensive approach already addresses version compatibility concerns across different SuperGradients versions.Likely an incorrect or invalid review comment.
The MODEL_URLS patch alone is insufficient - checkpoint_utils.py:1592
has hardcoded domain splitting for filename extraction:
url.split("https://sghub.deci.ai/models/")[1]
This fails with IndexError after URLs are patched to AWS S3.
Fix by replacing load_pretrained_weights with a version using urlparse
for domain-agnostic filename extraction. Patches both checkpoint_utils
and model_factory since model_factory imports the function directly.
Tested with fresh download from AWS S3 and verified detection on COCO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
fiftyone/utils/super_gradients.py (1)
27-55: URL dict patching is correct; consider using the lazy import and narrowing exceptionsUsing
pretrained_models.MODEL_URLShere is the right registry to touch and matches whatcheckpoint_utilsimports, and thestartswith(old_domain)check makes the patch idempotent across repeated calls.Two small robustness tweaks you might consider:
- Instead of a direct
import super_gradients.training.pretrained_models as pretrained_models, use the existing lazy import (pretrained_models = super_gradients.training.pretrained_models) or callfou.ensure_package("super-gradients")first, so the patch still runs on first use even ifsuper-gradientswasn’t imported earlier via the lazy stub.- Replace the broad
except Exceptionwith more targeted exceptions (e.g.,ImportError,AttributeError) so unexpected issues in this block don’t get silently swallowed.Please double‑check against the
super-gradientsversion(s) you support thattraining.pretrained_models.MODEL_URLSis still the canonical URL mapping; if not, we should gate this on feature detection or version.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fiftyone/utils/super_gradients.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
fiftyone/utils/super_gradients.py (1)
fiftyone/utils/torch.py (2)
get_local_rank(2682-2697)device(814-816)
🪛 Ruff (0.14.7)
fiftyone/utils/super_gradients.py
53-53: Do not catch blind exception: Exception
(BLE001)
76-76: Abstract raise to an inner function
(TRY301)
114-114: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: lint / eslint
- GitHub Check: e2e / test-e2e
- GitHub Check: test / test-app
- GitHub Check: build / build
- GitHub Check: build
|
My bad @AdonaiVera, you're right! Switched to MODEL_URLS and I'm using super-gradients 3.7.1 - I also found we need to patch load_pretrained_weights itself because it has hardcoded domain splitting that breaks after the URL swap. Should work fine now. |
Summary
Fixes YOLO-NAS model loading failures caused by SuperGradients weights being migrated from the deprecated
sghub.deci.aidomain to AWS S3.Problem
The domain
https://sghub.deci.aino longer resolves via DNS, causing all YOLO-NAS model loads to fail with:This has been affecting CI runs for 11+ days.
Solution
Added a URL patching function that automatically updates SuperGradients pretrained model URLs before loading YOLO-NAS models. The patch replaces the old domain with the new AWS S3 location:
https://sg-hub-nv.s3.amazonaws.comTesting
References
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.