Skip to content

Conversation

@CharlesCNorton
Copy link
Member

@CharlesCNorton CharlesCNorton commented Nov 26, 2025

Summary

Fixes YOLO-NAS model loading failures caused by SuperGradients weights being migrated from the deprecated sghub.deci.ai domain to AWS S3.

Problem

The domain https://sghub.deci.ai no longer resolves via DNS, causing all YOLO-NAS model loads to fail with:

<urlopen error [Errno -2] Name or service not known>

This has been affecting CI runs for 11+ days.

Solution

Added a URL patching function that automatically updates SuperGradients pretrained model URLs before loading YOLO-NAS models. The patch replaces the old domain with the new AWS S3 location: https://sg-hub-nv.s3.amazonaws.com

Testing

  • Tested fresh downloads of all three YOLO-NAS variants (S, M, L)
  • All models load successfully and perform inference
  • Verified weights download from the new AWS S3 location

References

Summary by CodeRabbit

  • Bug Fixes
    • Fixed an issue where SuperGradients pretrained models could fail to load due to outdated host references; pretrained weights now resolve to the updated AWS S3 hosting location, improving reliability when loading models.

✏️ Tip: You can customize this high-level summary in your review settings.

SuperGradients pretrained model weights were originally hosted at
https://sghub.deci.ai, but this domain no longer resolves via DNS.
The weights have been migrated to AWS S3 at https://sg-hub-nv.s3.amazonaws.com.

This patch automatically updates the URLs in the SuperGradients pretrained
models registry before loading YOLO-NAS models, ensuring compatibility with
the new hosting location.

Fixes loading failures with error:
`<urlopen error [Errno -2] Name or service not known>`

Tested with fresh downloads of all three YOLO-NAS variants (S, M, L).

See: Deci-AI/super-gradients#2064
@CharlesCNorton CharlesCNorton requested a review from a team as a code owner November 26, 2025 19:50
@CharlesCNorton CharlesCNorton self-assigned this Nov 26, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 26, 2025

Walkthrough

Added a private function _patch_super_gradients_urls() that rewrites SuperGradients pretrained model URLs from the old domain to an AWS S3 host and patches the checkpoint loading utility (load_pretrained_weights) to accept both URL formats. The function is called at the start of TorchYoloNasModel._load_model() and logs or warns on exceptions without raising them.

Changes

Cohort / File(s) Summary
SuperGradients URL & loader patching
fiftyone/utils/super_gradients.py
Added _patch_super_gradients_urls() which patches MODEL_URLS entries (replacing old domain → AWS S3), wraps/replaces load_pretrained_weights with a patched loader that handles old and new URL formats, logs patched entries and patch counts, and guards exceptions. Invoked this function at the start of TorchYoloNasModel._load_model() so patching runs before model loading.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant TorchYoloNasModel
    participant _patch_super_gradients_urls
    participant SuperGradients
    participant PatchedLoader

    Caller->>TorchYoloNasModel: _load_model()
    activate TorchYoloNasModel
    TorchYoloNasModel->>_patch_super_gradients_urls: invoke()
    activate _patch_super_gradients_urls
    _patch_super_gradients_urls->>SuperGradients: import registry / MODEL_URLS
    _patch_super_gradients_urls->>SuperGradients: replace old domain URLs -> AWS S3
    _patch_super_gradients_urls->>SuperGradients: replace/wrap load_pretrained_weights -> PatchedLoader
    _patch_super_gradients_urls-->>TorchYoloNasModel: return (patched, logged)
    deactivate _patch_super_gradients_urls
    TorchYoloNasModel->>PatchedLoader: request checkpoint load (uses patched URLs/loader)
    PatchedLoader->>SuperGradients: fetch weights (old or new URL handled)
    PatchedLoader-->>TorchYoloNasModel: return weights
    TorchYoloNasModel-->>Caller: return loaded model
    deactivate TorchYoloNasModel
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Inspect URL replacement logic for correct matching, edge cases, and idempotence.
  • Verify the patched load_pretrained_weights preserves original behavior (signatures, error handling) and is robust across SG versions.
  • Confirm invocation placement in _load_model() doesn't run unnecessarily for pre-loaded model instances and that logging/exception handling is appropriate.

Poem

🐰 I nibbled through strings at break of dawn,
Swapped old hosts for S3 on the lawn,
Models now fetch from paths anew,
I patched the loader and hopped on through,
A tiny rabbit, code and carrot drawn 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing YOLO-NAS model loading by patching SuperGradients URLs, which aligns with the core objective of the PR.
Description check ✅ Passed The PR description includes most required sections: a clear summary of the problem, the solution approach, and testing details. However, it lacks the formal Release Notes section required by the template.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-yolo-nas-url

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fiftyone/utils/super_gradients.py (1)

53-54: Consider narrowing the exception types caught.

The broad Exception catch is flagged by static analysis (BLE001). While defensive error handling is appropriate here to avoid breaking model loading, consider catching more specific exceptions to avoid masking unexpected errors:

-    except Exception as e:
+    except (ImportError, AttributeError, KeyError, TypeError) as e:
         logger.warning(f"Failed to patch super-gradients URLs: {e}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 867c633 and 9a8a3cc.

📒 Files selected for processing (1)
  • fiftyone/utils/super_gradients.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.5)
fiftyone/utils/super_gradients.py

53-53: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: lint / eslint
  • GitHub Check: test / test-app
  • GitHub Check: build / build
  • GitHub Check: build
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (2)
fiftyone/utils/super_gradients.py (2)

120-128: LGTM!

The patch function is correctly placed before the super_gradients.training.models.get() call. While it runs unconditionally (even when config.model is pre-loaded), the overhead is minimal and the idempotent nature of the patching makes repeated calls safe.


39-48: I need to verify whether PRETRAINED_NUM_CLASSES is the correct registry to patch. Let me investigate the super-gradients library structure and what this dictionary actually contains.
<function_calls>
<function_calls>


#!/bin/bash

Search for super_gradients imports and PRETRAINED_NUM_CLASSES usage in the codebase

rg -n "PRETRAINED_NUM_CLASSES|PRETRAINED_WEIGHTS" --type py -C 3


</function_calls>

</function_calls>

<function_calls>


#!/bin/bash

Find super_gradients references in the codebase to understand import structure

rg -n "super_gradients" --type py -C 2 | head -100


</function_calls>

<function_calls>


#!/bin/bash

Read the file under review to see the full context

cat -n fiftyone/utils/super_gradients.py | head -80


</function_calls>

<function_calls>

super-gradients PRETRAINED_NUM_CLASSES dictionary structure

</function_calls>

@AdonaiVera
Copy link
Contributor

Hi @CharlesCNorton , thanks again for the PR!

I have one question about the PRETRAINED_NUM_CLASSES.
Right now the loop is:

for key, value in pretrained_models.PRETRAINED_NUM_CLASSES.items():

But I think we should use:

for key, value in pretrained_models.MODEL_URLS.items():

Is there a specific reason we iterate over the number of classes instead of the model URLs?
When I tested it, the model name could not be found using PRETRAINED_NUM_CLASSES, but after switching to MODEL_URLS the replacement worked correctly.

Here are my logs:

PRETRAINED_NUM_CLASSES:
{'imagenet': 1000, 'imagenet21k': 21843, 'coco_segmentation_subclass': 21, 'cityscapes': 19, 'coco': 80, 'coco_pose': 17, 'cifar10': 10}
These are dataset names and class counts.

MODEL_URLS (sample):
{'regnetY800_imagenet': 'https://sghub.deci.ai/models/regnetY800_imagenet.pth', ...}
These contain the actual model keys we need to match.

Also, which Super-Gradients version are you using?

Just want to make sure we’re aligned with the correct version.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
fiftyone/utils/super_gradients.py (2)

53-54: Refine exception handling to be more specific.

The current implementation catches all exceptions, which the static analyzer correctly flags. Consider catching specific exceptions to better diagnose potential issues while still maintaining defensive behavior.

Apply this diff to catch more specific exceptions:

-    except Exception as e:
-        logger.warning(f"Failed to patch super-gradients URLs: {e}")
+    except (ImportError, AttributeError, TypeError) as e:
+        logger.warning(f"Failed to patch super-gradients URLs: {e}")

121-121: Consider one-time patching for minor efficiency improvement.

The patching function is called every time a model is loaded. While the overhead is minimal (subsequent calls just iterate and find no matches), you could optimize this with a module-level flag to patch only once.

Apply this diff to add one-time patching:

+_urls_patched = False
+
 def _patch_super_gradients_urls():
     """Patches super-gradients pretrained model URLs to use the current hosting location.
 
     SuperGradients model weights were originally hosted at https://sghub.deci.ai
     but have been migrated to https://sg-hub-nv.s3.amazonaws.com. This function
     updates the URLs in the pretrained models registry to point to the new location.
 
     See: https://github.com/Deci-AI/super-gradients/issues/2064
     """
+    global _urls_patched
+    if _urls_patched:
+        return
+    
     try:
         import super_gradients.training.pretrained_models as pretrained_models
 
         if hasattr(pretrained_models, "MODEL_URLS"):
             old_domain = "https://sghub.deci.ai"
             new_domain = "https://sg-hub-nv.s3.amazonaws.com"
             patched_count = 0
 
             for key, value in pretrained_models.MODEL_URLS.items():
                 if isinstance(value, str) and value.startswith(old_domain):
                     new_url = value.replace(old_domain, new_domain)
                     pretrained_models.MODEL_URLS[key] = new_url
                     patched_count += 1
                     logger.debug(f"Patched URL for {key}: {value} -> {new_url}")
 
             if patched_count > 0:
                 logger.info(f"Patched {patched_count} super-gradients model URLs to use AWS S3")
+            
+            _urls_patched = True
     except (ImportError, AttributeError, TypeError) as e:
         logger.warning(f"Failed to patch super-gradients URLs: {e}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9a8a3cc and e039d47.

📒 Files selected for processing (1)
  • fiftyone/utils/super_gradients.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
fiftyone/utils/super_gradients.py

53-53: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-app
  • GitHub Check: e2e / test-e2e
  • GitHub Check: lint / eslint
  • GitHub Check: build / build
  • GitHub Check: build
🔇 Additional comments (3)
fiftyone/utils/super_gradients.py (3)

27-55: Well implemented URL patching logic.

The function correctly uses pretrained_models.MODEL_URLS to iterate over model URLs and replace the deprecated domain with the new AWS S3 location. The defensive hasattr check and logging are good practices.


121-121: Correct placement of the patching call.

Calling _patch_super_gradients_urls() at the start of _load_model ensures URLs are patched before any model loading occurs, which is the right approach.


39-49: The original review comment is malformed and incomplete—it ends with `` and shell script instructions that are not part of a proper review comment.

The code itself is well-designed for compatibility: the hasattr check on line 39 ensures MODEL_URLS exists before accessing it, and the entire operation is wrapped in a try-except block (lines 36–54) that gracefully handles any errors. This defensive approach already addresses version compatibility concerns across different SuperGradients versions.

Likely an incorrect or invalid review comment.

The MODEL_URLS patch alone is insufficient - checkpoint_utils.py:1592
has hardcoded domain splitting for filename extraction:

    url.split("https://sghub.deci.ai/models/")[1]

This fails with IndexError after URLs are patched to AWS S3.

Fix by replacing load_pretrained_weights with a version using urlparse
for domain-agnostic filename extraction. Patches both checkpoint_utils
and model_factory since model_factory imports the function directly.

Tested with fresh download from AWS S3 and verified detection on COCO.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
fiftyone/utils/super_gradients.py (1)

27-55: URL dict patching is correct; consider using the lazy import and narrowing exceptions

Using pretrained_models.MODEL_URLS here is the right registry to touch and matches what checkpoint_utils imports, and the startswith(old_domain) check makes the patch idempotent across repeated calls.

Two small robustness tweaks you might consider:

  • Instead of a direct import super_gradients.training.pretrained_models as pretrained_models, use the existing lazy import (pretrained_models = super_gradients.training.pretrained_models) or call fou.ensure_package("super-gradients") first, so the patch still runs on first use even if super-gradients wasn’t imported earlier via the lazy stub.
  • Replace the broad except Exception with more targeted exceptions (e.g., ImportError, AttributeError) so unexpected issues in this block don’t get silently swallowed.

Please double‑check against the super-gradients version(s) you support that training.pretrained_models.MODEL_URLS is still the canonical URL mapping; if not, we should gate this on feature detection or version.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e039d47 and d9a63df.

📒 Files selected for processing (1)
  • fiftyone/utils/super_gradients.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
fiftyone/utils/super_gradients.py (1)
fiftyone/utils/torch.py (2)
  • get_local_rank (2682-2697)
  • device (814-816)
🪛 Ruff (0.14.7)
fiftyone/utils/super_gradients.py

53-53: Do not catch blind exception: Exception

(BLE001)


76-76: Abstract raise to an inner function

(TRY301)


114-114: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: lint / eslint
  • GitHub Check: e2e / test-e2e
  • GitHub Check: test / test-app
  • GitHub Check: build / build
  • GitHub Check: build

@CharlesCNorton
Copy link
Member Author

My bad @AdonaiVera, you're right! Switched to MODEL_URLS and I'm using super-gradients 3.7.1 - I also found we need to patch load_pretrained_weights itself because it has hardcoded domain splitting that breaks after the URL swap. Should work fine now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants