Fix YOLO-NAS model loading by patching SuperGradients URLs #6624

CharlesCNorton · 2025-11-26T19:50:33Z

Summary

Fixes YOLO-NAS model loading failures caused by SuperGradients weights being migrated from the deprecated sghub.deci.ai domain to AWS S3.

Problem

The domain https://sghub.deci.ai no longer resolves via DNS, causing all YOLO-NAS model loads to fail with:

<urlopen error [Errno -2] Name or service not known>

This has been affecting CI runs for 11+ days.

Solution

Added a URL patching function that automatically updates SuperGradients pretrained model URLs before loading YOLO-NAS models. The patch replaces the old domain with the new AWS S3 location: https://sg-hub-nv.s3.amazonaws.com

Testing

Tested fresh downloads of all three YOLO-NAS variants (S, M, L)
All models load successfully and perform inference
Verified weights download from the new AWS S3 location

References

Upstream issue: can't load pre-trained weights for YOLO-NAS using the SuperGradients Deci-AI/super-gradients#2064

Summary by CodeRabbit

Bug Fixes
- Fixed an issue where SuperGradients pretrained models could fail to load due to outdated host references; pretrained weights now resolve to the updated AWS S3 hosting location, improving reliability when loading models.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

SuperGradients pretrained model weights were originally hosted at https://sghub.deci.ai, but this domain no longer resolves via DNS. The weights have been migrated to AWS S3 at https://sg-hub-nv.s3.amazonaws.com. This patch automatically updates the URLs in the SuperGradients pretrained models registry before loading YOLO-NAS models, ensuring compatibility with the new hosting location. Fixes loading failures with error: `<urlopen error [Errno -2] Name or service not known>` Tested with fresh downloads of all three YOLO-NAS variants (S, M, L). See: Deci-AI/super-gradients#2064

coderabbitai · 2025-11-26T19:51:16Z

Walkthrough

Added a private function _patch_super_gradients_urls() that rewrites SuperGradients pretrained model URLs from the old domain to an AWS S3 host and patches the checkpoint loading utility (load_pretrained_weights) to accept both URL formats. The function is called at the start of TorchYoloNasModel._load_model() and logs or warns on exceptions without raising them.

Changes

Cohort / File(s)	Summary
SuperGradients URL & loader patching `fiftyone/utils/super_gradients.py`	Added `_patch_super_gradients_urls()` which patches `MODEL_URLS` entries (replacing old domain → AWS S3), wraps/replaces `load_pretrained_weights` with a patched loader that handles old and new URL formats, logs patched entries and patch counts, and guards exceptions. Invoked this function at the start of `TorchYoloNasModel._load_model()` so patching runs before model loading.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant TorchYoloNasModel
    participant _patch_super_gradients_urls
    participant SuperGradients
    participant PatchedLoader

    Caller->>TorchYoloNasModel: _load_model()
    activate TorchYoloNasModel
    TorchYoloNasModel->>_patch_super_gradients_urls: invoke()
    activate _patch_super_gradients_urls
    _patch_super_gradients_urls->>SuperGradients: import registry / MODEL_URLS
    _patch_super_gradients_urls->>SuperGradients: replace old domain URLs -> AWS S3
    _patch_super_gradients_urls->>SuperGradients: replace/wrap load_pretrained_weights -> PatchedLoader
    _patch_super_gradients_urls-->>TorchYoloNasModel: return (patched, logged)
    deactivate _patch_super_gradients_urls
    TorchYoloNasModel->>PatchedLoader: request checkpoint load (uses patched URLs/loader)
    PatchedLoader->>SuperGradients: fetch weights (old or new URL handled)
    PatchedLoader-->>TorchYoloNasModel: return weights
    TorchYoloNasModel-->>Caller: return loaded model
    deactivate TorchYoloNasModel

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Inspect URL replacement logic for correct matching, edge cases, and idempotence.
Verify the patched load_pretrained_weights preserves original behavior (signatures, error handling) and is robust across SG versions.
Confirm invocation placement in _load_model() doesn't run unnecessarily for pre-loaded model instances and that logging/exception handling is appropriate.

Poem

🐰 I nibbled through strings at break of dawn,
Swapped old hosts for S3 on the lawn,
Models now fetch from paths anew,
I patched the loader and hopped on through,
A tiny rabbit, code and carrot drawn 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: fixing YOLO-NAS model loading by patching SuperGradients URLs, which aligns with the core objective of the PR.
Description check	✅ Passed	The PR description includes most required sections: a clear summary of the problem, the solution approach, and testing details. However, it lacks the formal Release Notes section required by the template.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix-yolo-nas-url

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

fiftyone/utils/super_gradients.py (1)
53-54: Consider narrowing the exception types caught.

The broad Exception catch is flagged by static analysis (BLE001). While defensive error handling is appropriate here to avoid breaking model loading, consider catching more specific exceptions to avoid masking unexpected errors:
-    except Exception as e:
+    except (ImportError, AttributeError, KeyError, TypeError) as e:
         logger.warning(f"Failed to patch super-gradients URLs: {e}")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 867c633 and 9a8a3cc.

📒 Files selected for processing (1)

fiftyone/utils/super_gradients.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.5)

fiftyone/utils/super_gradients.py

53-53: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)

GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: e2e / test-e2e
GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: lint / eslint
GitHub Check: test / test-app
GitHub Check: build / build
GitHub Check: build
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (2)

fiftyone/utils/super_gradients.py (2)

120-128: LGTM!

The patch function is correctly placed before the super_gradients.training.models.get() call. While it runs unconditionally (even when config.model is pre-loaded), the overhead is minimal and the idempotent nature of the patching makes repeated calls safe.

39-48: I need to verify whether PRETRAINED_NUM_CLASSES is the correct registry to patch. Let me investigate the super-gradients library structure and what this dictionary actually contains.
<function_calls>
<function_calls>

#!/bin/bash

Search for super_gradients imports and PRETRAINED_NUM_CLASSES usage in the codebase

rg -n "PRETRAINED_NUM_CLASSES|PRETRAINED_WEIGHTS" --type py -C 3

</function_calls>

</function_calls>

<function_calls>

#!/bin/bash

Find super_gradients references in the codebase to understand import structure

rg -n "super_gradients" --type py -C 2 | head -100

</function_calls>

<function_calls>

#!/bin/bash

Read the file under review to see the full context

cat -n fiftyone/utils/super_gradients.py | head -80

</function_calls>

<function_calls>

super-gradients PRETRAINED_NUM_CLASSES dictionary structure

</function_calls>

AdonaiVera · 2025-12-04T15:26:05Z

Hi @CharlesCNorton , thanks again for the PR!

I have one question about the PRETRAINED_NUM_CLASSES.
Right now the loop is:

for key, value in pretrained_models.PRETRAINED_NUM_CLASSES.items():

But I think we should use:

for key, value in pretrained_models.MODEL_URLS.items():

Is there a specific reason we iterate over the number of classes instead of the model URLs?
When I tested it, the model name could not be found using PRETRAINED_NUM_CLASSES, but after switching to MODEL_URLS the replacement worked correctly.

Here are my logs:

PRETRAINED_NUM_CLASSES:
{'imagenet': 1000, 'imagenet21k': 21843, 'coco_segmentation_subclass': 21, 'cityscapes': 19, 'coco': 80, 'coco_pose': 17, 'cifar10': 10}
These are dataset names and class counts.

MODEL_URLS (sample):
{'regnetY800_imagenet': 'https://sghub.deci.ai/models/regnetY800_imagenet.pth', ...}
These contain the actual model keys we need to match.

Also, which Super-Gradients version are you using?

Just want to make sure we’re aligned with the correct version.

…_CLASSES

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

fiftyone/utils/super_gradients.py (2)

53-54: Refine exception handling to be more specific.

The current implementation catches all exceptions, which the static analyzer correctly flags. Consider catching specific exceptions to better diagnose potential issues while still maintaining defensive behavior.

Apply this diff to catch more specific exceptions:

-    except Exception as e:
-        logger.warning(f"Failed to patch super-gradients URLs: {e}")
+    except (ImportError, AttributeError, TypeError) as e:
+        logger.warning(f"Failed to patch super-gradients URLs: {e}")

121-121: Consider one-time patching for minor efficiency improvement.

The patching function is called every time a model is loaded. While the overhead is minimal (subsequent calls just iterate and find no matches), you could optimize this with a module-level flag to patch only once.

Apply this diff to add one-time patching:

+_urls_patched = False
+
 def _patch_super_gradients_urls():
     """Patches super-gradients pretrained model URLs to use the current hosting location.
 
     SuperGradients model weights were originally hosted at https://sghub.deci.ai
     but have been migrated to https://sg-hub-nv.s3.amazonaws.com. This function
     updates the URLs in the pretrained models registry to point to the new location.
 
     See: https://github.com/Deci-AI/super-gradients/issues/2064
     """
+    global _urls_patched
+    if _urls_patched:
+        return
+    
     try:
         import super_gradients.training.pretrained_models as pretrained_models
 
         if hasattr(pretrained_models, "MODEL_URLS"):
             old_domain = "https://sghub.deci.ai"
             new_domain = "https://sg-hub-nv.s3.amazonaws.com"
             patched_count = 0
 
             for key, value in pretrained_models.MODEL_URLS.items():
                 if isinstance(value, str) and value.startswith(old_domain):
                     new_url = value.replace(old_domain, new_domain)
                     pretrained_models.MODEL_URLS[key] = new_url
                     patched_count += 1
                     logger.debug(f"Patched URL for {key}: {value} -> {new_url}")
 
             if patched_count > 0:
                 logger.info(f"Patched {patched_count} super-gradients model URLs to use AWS S3")
+            
+            _urls_patched = True
     except (ImportError, AttributeError, TypeError) as e:
         logger.warning(f"Failed to patch super-gradients URLs: {e}")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a8a3cc and e039d47.

📒 Files selected for processing (1)

fiftyone/utils/super_gradients.py (2 hunks)

🧰 Additional context used

🪛 Ruff (0.14.7)

fiftyone/utils/super_gradients.py

53-53: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-app
GitHub Check: e2e / test-e2e
GitHub Check: lint / eslint
GitHub Check: build / build
GitHub Check: build

🔇 Additional comments (3)

fiftyone/utils/super_gradients.py (3)

27-55: Well implemented URL patching logic.

The function correctly uses pretrained_models.MODEL_URLS to iterate over model URLs and replace the deprecated domain with the new AWS S3 location. The defensive hasattr check and logging are good practices.

121-121: Correct placement of the patching call.

Calling _patch_super_gradients_urls() at the start of _load_model ensures URLs are patched before any model loading occurs, which is the right approach.

39-49: The original review comment is malformed and incomplete—it ends with `` and shell script instructions that are not part of a proper review comment.

The code itself is well-designed for compatibility: the hasattr check on line 39 ensures MODEL_URLS exists before accessing it, and the entire operation is wrapped in a try-except block (lines 36–54) that gracefully handles any errors. This defensive approach already addresses version compatibility concerns across different SuperGradients versions.

Likely an incorrect or invalid review comment.

The MODEL_URLS patch alone is insufficient - checkpoint_utils.py:1592 has hardcoded domain splitting for filename extraction: url.split("https://sghub.deci.ai/models/")[1] This fails with IndexError after URLs are patched to AWS S3. Fix by replacing load_pretrained_weights with a version using urlparse for domain-agnostic filename extraction. Patches both checkpoint_utils and model_factory since model_factory imports the function directly. Tested with fresh download from AWS S3 and verified detection on COCO.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

fiftyone/utils/super_gradients.py (1)

27-55: URL dict patching is correct; consider using the lazy import and narrowing exceptions

Using pretrained_models.MODEL_URLS here is the right registry to touch and matches what checkpoint_utils imports, and the startswith(old_domain) check makes the patch idempotent across repeated calls.

Two small robustness tweaks you might consider:

Instead of a direct import super_gradients.training.pretrained_models as pretrained_models, use the existing lazy import (pretrained_models = super_gradients.training.pretrained_models) or call fou.ensure_package("super-gradients") first, so the patch still runs on first use even if super-gradients wasn’t imported earlier via the lazy stub.

Replace the broad except Exception with more targeted exceptions (e.g., ImportError, AttributeError) so unexpected issues in this block don’t get silently swallowed.

Please double‑check against the super-gradients version(s) you support that training.pretrained_models.MODEL_URLS is still the canonical URL mapping; if not, we should gate this on feature detection or version.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e039d47 and d9a63df.

📒 Files selected for processing (1)

fiftyone/utils/super_gradients.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

fiftyone/utils/super_gradients.py (1)

fiftyone/utils/torch.py (2)

get_local_rank (2682-2697)

device (814-816)

🪛 Ruff (0.14.7)

fiftyone/utils/super_gradients.py

53-53: Do not catch blind exception: Exception

(BLE001)

76-76: Abstract raise to an inner function

(TRY301)

114-114: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: lint / eslint
GitHub Check: e2e / test-e2e
GitHub Check: test / test-app
GitHub Check: build / build
GitHub Check: build

CharlesCNorton · 2025-12-05T17:56:50Z

My bad @AdonaiVera, you're right! Switched to MODEL_URLS and I'm using super-gradients 3.7.1 - I also found we need to patch load_pretrained_weights itself because it has hardcoded domain splitting that breaks after the URL swap. Should work fine now.

CharlesCNorton requested a review from a team as a code owner November 26, 2025 19:50

CharlesCNorton self-assigned this Nov 26, 2025

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

Fix YOLO-NAS URL patching to use MODEL_URLS instead of PRETRAINED_NUM…

e039d47

…_CLASSES

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix YOLO-NAS model loading by patching SuperGradients URLs #6624

Fix YOLO-NAS model loading by patching SuperGradients URLs #6624

CharlesCNorton commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Search for super_gradients imports and PRETRAINED_NUM_CLASSES usage in the codebase

Find super_gradients references in the codebase to understand import structure

Read the file under review to see the full context

Uh oh!

AdonaiVera commented Dec 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

CharlesCNorton commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix YOLO-NAS model loading by patching SuperGradients URLs #6624

Are you sure you want to change the base?

Fix YOLO-NAS model loading by patching SuperGradients URLs #6624

Conversation

CharlesCNorton commented Nov 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

References

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Search for super_gradients imports and PRETRAINED_NUM_CLASSES usage in the codebase

Find super_gradients references in the codebase to understand import structure

Read the file under review to see the full context

Uh oh!

AdonaiVera commented Dec 4, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

CharlesCNorton commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CharlesCNorton commented Nov 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 26, 2025 •

edited

Loading