-
Notifications
You must be signed in to change notification settings - Fork 48
[RFC] Pipeline-generator #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
425a818 to
8af04f0
Compare
567217e to
5c7a71a
Compare
| from ..pipeline_config import PipelineGeneratorConfig | ||
|
|
||
|
|
||
| def should_run_step(test_step: TestStep, config: PipelineGeneratorConfig) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would love to see if we can separate test selection logic for "pipeline" and "step"
|
I'm sorry but this PR is too big to review in 1 go. Can we walk through the design together & break this down to multiple PRs? |
5c7a71a to
ba64a8f
Compare
Signed-off-by: Reza Barazesh <[email protected]> Simplify folder structure Signed-off-by: Reza Barazesh <[email protected]> Less magic strings Signed-off-by: Reza Barazesh <[email protected]> Update readme Signed-off-by: Reza Barazesh <[email protected]> Deduplicate CI and fastcheck Update readme Lint and formatting Minimal version
8091c94 to
e542cce
Compare
|
This PR was updated to significantly reduce the complexity and SLOC |
khluu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I tried to review but this is still too long & way too verbose. Let me see what I can do to make a few jobs more declarative & come up with a design for this pipeline generator.
Also, we should have more unit tests, not just integration test, and we should not just compare the result to jinja. When this is put in use, jinja will be deprecated anyway so the logic won't be updated across 2 generators.
| @property | ||
| def container_image_cu118(self): | ||
| """Get the CUDA 11.8 container image.""" | ||
| return f"public.ecr.aws/q9t5s3a7/vllm-ci-{self._get_repo_suffix()}-repo:$BUILDKITE_COMMIT-cu118" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should be able to reuse a constant var for public.ecr.aws/q9t5s3a7 here and have vllm-ci-test-repo or vllm-ci-postmerge.. to be a different constant, not by suffix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason is we don't always rely on suffix
| """Get repository suffix based on branch (postmerge for main, test otherwise).""" | ||
| return "postmerge" if self.branch == "main" else "test" | ||
|
|
||
| @property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like all of these property funcs can be merged in one func get_container_image(..) so that when you change the logic, it's easier when everything is in 1 place
|
|
||
| # ECR and Images | ||
| VLLM_ECR_URL = "public.ecr.aws/q9t5s3a7" | ||
| VLLM_ECR_REPO = f"{VLLM_ECR_URL}/vllm-ci-test-repo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can just be vllm-ci-test-repo
| BUILD_KEY_AMD = "amd-build" | ||
| BUILD_KEY_TORCH_NIGHTLY = "image-build-torch-nightly" | ||
|
|
||
| # Agent Queues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we convert this to an enum class?
| # ============================================================================== | ||
|
|
||
|
|
||
| def generate_hardware_tests(config) -> List[Dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole function is too verbose.. Step should be its own class/abstraction.
This continues the work on the pipeline-generator. The new code will act as a full drop-in replacement for the jinja templates.
Currently the test-pipeline and fastcheck have been tested.
For more details read the README.MD files
This helps us in a few ways:
1-Python is easier to write and review
2-We added a lot of unit tests so changes in ci-infra are less likely to break the entire CI
3-Supports both “CI” and “Fastcheck” mode currently
4-The code was written with github actions in mind so it should be able to help with GHA migration
Testing
We have some unit tests that run as part of the CI. But more importantly we have two integration tests:
buildkite/pipeline_generator/tests/test_integration_comprehensive.py
buildkite/pipeline_generator/tests/test_integration_fastcheck.py
These tests will compare the generated yaml trees between jinja and pipeline-generator and ensure they are 100% equal under various scenarios.