Implement FiftyOne plugin for zero-shot coreset selection with z-score panel #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements a complete FiftyOne plugin for zero-shot coreset selection, providing an interactive z-score panel for visualizing sample importance in unlabeled datasets. The plugin enables users to identify the most valuable samples for annotation by combining redundancy and coverage metrics computed in an embedding space.
Implementation
The plugin implements the zero-shot coreset selection algorithm from arXiv:2411.15349 and follows the same architectural pattern as the image-deduplication-plugin, but outputs z-scores instead of deduplication results.
Core Components
Operators:
ComputeZScores: Computes z-scores for all samples in a dataset based on their embeddings
zscore,zscore_redundancy,zscore_coverageZScoresPanel: Interactive panel displaying z-score statistics
Algorithm:
zscore = coverage_z - redundancy_zSamples with high z-scores are ideal coreset candidates as they are both unique and representative.
Usage Example
Files Added
fiftyone.yml,__init__.py,coreset.pyREADME.md,QUICKSTART.md,ARCHITECTURE.md,CONTRIBUTING.md,SUMMARY.mdtests.py(all tests passing ✅),examples.pyrequirements.txt,setup.py,LICENSE,.gitignoreTesting
Comprehensive test suite validates:
All tests pass successfully.
Documentation
The plugin includes extensive documentation:
Benefits
This plugin enables efficient coreset selection for:
Samples are selected to be both diverse (low redundancy) and representative (high coverage), ensuring optimal dataset coverage with minimal samples.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
https://api.github.com/repos/jacobmarks/image-deduplication-plugin/git/trees/maincurl -s REDACTED(http block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.