Skip to content

Conversation

@rzabarazesh
Copy link
Collaborator

@rzabarazesh rzabarazesh commented Oct 17, 2025

This continues the work on the pipeline-generator. The new code will act as a full drop-in replacement for the jinja templates.
Currently the test-pipeline and fastcheck have been tested.

For more details read the README.MD files

This helps us in a few ways:
1-Python is easier to write and review
2-We added a lot of unit tests so changes in ci-infra are less likely to break the entire CI
3-Supports both “CI” and “Fastcheck” mode currently
4-The code was written with github actions in mind so it should be able to help with GHA migration


Testing

We have some unit tests that run as part of the CI. But more importantly we have two integration tests:

buildkite/pipeline_generator/tests/test_integration_comprehensive.py
buildkite/pipeline_generator/tests/test_integration_fastcheck.py

These tests will compare the generated yaml trees between jinja and pipeline-generator and ensure they are 100% equal under various scenarios.

@rzabarazesh rzabarazesh force-pushed the rzabarazesh/pipeline-generator branch 3 times, most recently from 425a818 to 8af04f0 Compare October 17, 2025 20:55
@rzabarazesh rzabarazesh force-pushed the rzabarazesh/pipeline-generator branch from 567217e to 5c7a71a Compare October 21, 2025 20:55
from ..pipeline_config import PipelineGeneratorConfig


def should_run_step(test_step: TestStep, config: PipelineGeneratorConfig) -> bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would love to see if we can separate test selection logic for "pipeline" and "step"

@khluu
Copy link
Collaborator

khluu commented Oct 28, 2025

I'm sorry but this PR is too big to review in 1 go. Can we walk through the design together & break this down to multiple PRs?

@rzabarazesh rzabarazesh force-pushed the rzabarazesh/pipeline-generator branch from 5c7a71a to ba64a8f Compare October 31, 2025 17:32
Signed-off-by: Reza Barazesh <[email protected]>

Simplify folder structure

Signed-off-by: Reza Barazesh <[email protected]>

Less magic strings

Signed-off-by: Reza Barazesh <[email protected]>

Update readme

Signed-off-by: Reza Barazesh <[email protected]>

Deduplicate CI and fastcheck

Update readme

Lint and formatting

Minimal version
@rzabarazesh rzabarazesh force-pushed the rzabarazesh/pipeline-generator branch from 8091c94 to e542cce Compare October 31, 2025 21:41
@rzabarazesh
Copy link
Collaborator Author

This PR was updated to significantly reduce the complexity and SLOC

@rzabarazesh rzabarazesh requested a review from khluu November 5, 2025 07:03
Copy link
Collaborator

@khluu khluu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I tried to review but this is still too long & way too verbose. Let me see what I can do to make a few jobs more declarative & come up with a design for this pipeline generator.
Also, we should have more unit tests, not just integration test, and we should not just compare the result to jinja. When this is put in use, jinja will be deprecated anyway so the logic won't be updated across 2 generators.

@property
def container_image_cu118(self):
"""Get the CUDA 11.8 container image."""
return f"public.ecr.aws/q9t5s3a7/vllm-ci-{self._get_repo_suffix()}-repo:$BUILDKITE_COMMIT-cu118"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should be able to reuse a constant var for public.ecr.aws/q9t5s3a7 here and have vllm-ci-test-repo or vllm-ci-postmerge.. to be a different constant, not by suffix

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason is we don't always rely on suffix

"""Get repository suffix based on branch (postmerge for main, test otherwise)."""
return "postmerge" if self.branch == "main" else "test"

@property
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like all of these property funcs can be merged in one func get_container_image(..) so that when you change the logic, it's easier when everything is in 1 place


# ECR and Images
VLLM_ECR_URL = "public.ecr.aws/q9t5s3a7"
VLLM_ECR_REPO = f"{VLLM_ECR_URL}/vllm-ci-test-repo"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can just be vllm-ci-test-repo

BUILD_KEY_AMD = "amd-build"
BUILD_KEY_TORCH_NIGHTLY = "image-build-torch-nightly"

# Agent Queues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we convert this to an enum class?

# ==============================================================================


def generate_hardware_tests(config) -> List[Dict[str, Any]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole function is too verbose.. Step should be its own class/abstraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants