-
Notifications
You must be signed in to change notification settings - Fork 18
Carlota/baseline pipeline #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
modules/python/clusterloader2/slo/config/load-config-hyperion.yaml
Outdated
Show resolved
Hide resolved
| run_id=$(Build.BuildId)-$(System.JobId) | ||
| echo "Run ID: $run_id" | ||
| echo "##vso[task.setvariable variable=RUN_ID]$run_id" | ||
| displayName: "Set unique Run ID before publish" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumanthreddy29 This pipeline is to be run on a warm cluster. Removing this step will cause an error by creating a blob a non unique name
| @@ -0,0 +1,111 @@ | |||
| name: load-config | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename it to something else, as we already have a load-config file in slo.
in pipeline we pass " cl2_config_file: .yaml" so this can create some issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other frameworks have a load-config file, so I think we should follow the convention
|
|
||
| # Config options for test parameters | ||
| {{$nodesPerNamespace := DefaultParam .CL2_NODES_PER_NAMESPACE 100}} | ||
| {{$podsPerNode := DefaultParam .CL2_PODS_PER_NODE 50}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please change it to 40, i know you are passing it from pipeline too, but all our testing is with 40 pods per node, so keeping that default for your configs that make more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this param
| # Feature gates | ||
| {{$podStartupLatencyThreshold := DefaultParam .CL2_POD_STARTUP_LATENCY_THRESHOLD "15s"}} | ||
| {{$ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE true}} | ||
| {{$PROMETHEUS_SCRAPE_KUBE_PROXY := DefaultParam .PROMETHEUS_SCRAPE_KUBE_PROXY false}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this parameter used in any test case? If not, please delete it and the corresponding if/else block below that relies on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| {{$ENABLE_VIOLATIONS_FOR_NETWORK_PROGRAMMING_LATENCIES := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_NETWORK_PROGRAMMING_LATENCIES false}} | ||
| {{$NETWORK_LATENCY_THRESHOLD := DefaultParam .CL2_NETWORK_LATENCY_THRESHOLD "0s"}} | ||
| {{$PROBE_MEASUREMENTS_PING_SLEEP_DURATION := DefaultParam .CL2_PROBE_MEASUREMENTS_PING_SLEEP_DURATION "1s"}} | ||
| {{$ENABLE_IN_CLUSTER_NETWORK_LATENCY := DefaultParam .CL2_ENABLE_IN_CLUSTER_NETWORK_LATENCY true}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for this parameter. If it's not used, delete it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| # Probe measurements shared parameter | ||
| {{$PROBE_MEASUREMENTS_CHECK_PROBES_READY_TIMEOUT := DefaultParam .CL2_PROBE_MEASUREMENTS_CHECK_PROBES_READY_TIMEOUT "15m"}} | ||
| {{$ENABLE_TERMINATED_WATCHES_MEASUREMENT := DefaultParam .CL2_ENABLE_TERMINATED_WATCHES_MEASUREMENT true}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is true by default and is not set anywhere in the pipeline. Can it be deleted and the corresponding measurement be added to file without if/else block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
|
@carlotaarvela is this PR ready for review ? CC: @alexcastilio @agrawaliti @sumanthreddy29 |
dae5f03 to
c1589dc
Compare
|
@pavneeta Yes, this PR is ready for review |
| - name: ENV_VAR | ||
| value: a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| unit: s | ||
| queries: | ||
| - name: Perc99 | ||
| query: histogram_quantile(0.99, sum(rate(cilium_service_implementation_delay_bucket[%v:])) by (le)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you checked if sum(rate(cilium_service_implementation_delay_bucket[%v:])) by (le) is really what you want? Perhaps you want sum_over_time(rate(<metric>)[<some interval to get rate>])[%v:]), similar to CPU and Memory metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that sum_over_time would be useful to identify trends, however the metrics that you pointed out were added by the Cilium team 3+ months ago to benchmark the cilium performance on the slo framework - I just replicated them here to separate the logic for these tests.
Since we have been comparing the metrics collected in this test with cilium metrics framework, I think we should be collecting metrics the same way, so we are comparing apples with apples.
Besides that, we want to establish slos for these metrics.
| query: histogram_quantile(0.99, sum(rate(cilium_policy_implementation_delay_bucket[%v:])) by (le)) | ||
| - name: Perc95 | ||
| query: histogram_quantile(0.95, sum(rate(cilium_policy_implementation_delay_bucket[%v:])) by (le)) | ||
| - name: Perc50 | ||
| query: histogram_quantile(0.50, sum(rate(cilium_policy_implementation_delay_bucket[%v:])) by (le)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here and for other metrics
| - resource | ||
| queries: | ||
| - name: Terminated watches | ||
| query: sum(increase(apiserver_terminated_watchers_total[%v:])) by (resource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about metrics here.
We want a pipeline that schedules test runs and record benchmarking metrics.
Added cilium-hyperion-project topology
Update slo framework to include allow NO_OF_NAMESPACES param so it is customizable for test runs and added NO_OF_NAMESPACES and SMALL_GROUP_SIZE(number of replicas per deployment) to be included on the execute step
Add cilium-project-hyperion-baseline.yml pipeline and load-config
Pipeline