Carlota/baseline pipeline #659

carlotaarvela · 2025-05-19T16:08:27Z

We want a pipeline that schedules test runs and record benchmarking metrics.

Added cilium-hyperion-project topology
Update slo framework to include allow NO_OF_NAMESPACES param so it is customizable for test runs and added NO_OF_NAMESPACES and SMALL_GROUP_SIZE(number of replicas per deployment) to be included on the execute step
Add cilium-project-hyperion-baseline.yml pipeline and load-config
Pipeline

modules/python/clusterloader2/slo/config/load-config-hyperion.yaml

carlotaarvela · 2025-05-20T15:31:05Z

steps/topology/cilium-hyperion-project/collect-clusterloader2.yml

+    run_id=$(Build.BuildId)-$(System.JobId)
+    echo "Run ID: $run_id"
+    echo "##vso[task.setvariable variable=RUN_ID]$run_id"
+  displayName: "Set unique Run ID before publish"


@sumanthreddy29 This pipeline is to be run on a warm cluster. Removing this step will cause an error by creating a blob a non unique name

agrawaliti · 2025-05-29T16:55:40Z

modules/python/clusterloader2/hyperion/config/load-config.yaml

@@ -0,0 +1,111 @@
+name: load-config


Can we rename it to something else, as we already have a load-config file in slo.
in pipeline we pass " cl2_config_file: .yaml" so this can create some issue.

All other frameworks have a load-config file, so I think we should follow the convention

agrawaliti · 2025-05-29T16:56:49Z

modules/python/clusterloader2/hyperion/config/load-config.yaml

+
+# Config options for test parameters
+{{$nodesPerNamespace := DefaultParam .CL2_NODES_PER_NAMESPACE 100}}
+{{$podsPerNode := DefaultParam .CL2_PODS_PER_NODE 50}}


Can we please change it to 40, i know you are passing it from pipeline too, but all our testing is with 40 pods per node, so keeping that default for your configs that make more sense

I have removed this param

alexcastilio · 2025-05-30T09:45:04Z

modules/python/clusterloader2/hyperion/config/modules/measurements.yaml

+# Feature gates
+{{$podStartupLatencyThreshold := DefaultParam .CL2_POD_STARTUP_LATENCY_THRESHOLD "15s"}}
+{{$ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_API_CALL_PROMETHEUS_SIMPLE true}}
+{{$PROMETHEUS_SCRAPE_KUBE_PROXY := DefaultParam .PROMETHEUS_SCRAPE_KUBE_PROXY false}}


Is this parameter used in any test case? If not, please delete it and the corresponding if/else block below that relies on it.

alexcastilio · 2025-05-30T09:45:59Z

modules/python/clusterloader2/hyperion/config/modules/measurements.yaml

+{{$ENABLE_VIOLATIONS_FOR_NETWORK_PROGRAMMING_LATENCIES := DefaultParam .CL2_ENABLE_VIOLATIONS_FOR_NETWORK_PROGRAMMING_LATENCIES false}}
+{{$NETWORK_LATENCY_THRESHOLD := DefaultParam .CL2_NETWORK_LATENCY_THRESHOLD "0s"}}
+{{$PROBE_MEASUREMENTS_PING_SLEEP_DURATION := DefaultParam .CL2_PROBE_MEASUREMENTS_PING_SLEEP_DURATION "1s"}}
+{{$ENABLE_IN_CLUSTER_NETWORK_LATENCY := DefaultParam .CL2_ENABLE_IN_CLUSTER_NETWORK_LATENCY true}}


Same for this parameter. If it's not used, delete it.

alexcastilio · 2025-05-30T09:48:05Z

modules/python/clusterloader2/hyperion/config/modules/measurements.yaml

+
+# Probe measurements shared parameter
+{{$PROBE_MEASUREMENTS_CHECK_PROBES_READY_TIMEOUT := DefaultParam .CL2_PROBE_MEASUREMENTS_CHECK_PROBES_READY_TIMEOUT "15m"}}
+{{$ENABLE_TERMINATED_WATCHES_MEASUREMENT := DefaultParam .CL2_ENABLE_TERMINATED_WATCHES_MEASUREMENT true}}


This parameter is true by default and is not set anywhere in the pipeline. Can it be deleted and the corresponding measurement be added to file without if/else block?

pavneeta · 2025-06-09T17:33:45Z

@carlotaarvela is this PR ready for review ? CC: @alexcastilio @agrawaliti @sumanthreddy29

…line

carlotaarvela · 2025-06-10T14:59:32Z

@pavneeta Yes, this PR is ready for review
I have discussed the changes requested by Iti and Alex will do a final review today or tomorrow

alexcastilio · 2025-06-11T08:14:13Z

modules/python/clusterloader2/hyperion/config/deployment_template.yaml

+        - name: ENV_VAR
+          value: a


Is this being used?

alexcastilio · 2025-06-11T08:24:53Z

modules/python/clusterloader2/hyperion/config/modules/cilium-measurements.yaml

+        unit: s
+        queries:
+        - name: Perc99
+          query: histogram_quantile(0.99, sum(rate(cilium_service_implementation_delay_bucket[%v:])) by (le))


have you checked if sum(rate(cilium_service_implementation_delay_bucket[%v:])) by (le) is really what you want? Perhaps you want sum_over_time(rate(<metric>)[<some interval to get rate>])[%v:]), similar to CPU and Memory metrics?

I do agree that sum_over_time would be useful to identify trends, however the metrics that you pointed out were added by the Cilium team 3+ months ago to benchmark the cilium performance on the slo framework - I just replicated them here to separate the logic for these tests.
Since we have been comparing the metrics collected in this test with cilium metrics framework, I think we should be collecting metrics the same way, so we are comparing apples with apples.
Besides that, we want to establish slos for these metrics.

alexcastilio · 2025-06-11T08:25:30Z

modules/python/clusterloader2/hyperion/config/modules/cilium-measurements.yaml

+          query: histogram_quantile(0.99, sum(rate(cilium_policy_implementation_delay_bucket[%v:])) by (le))
+        - name: Perc95
+          query: histogram_quantile(0.95, sum(rate(cilium_policy_implementation_delay_bucket[%v:])) by (le))
+        - name: Perc50
+          query: histogram_quantile(0.50, sum(rate(cilium_policy_implementation_delay_bucket[%v:])) by (le))


Same question here and for other metrics

alexcastilio · 2025-06-11T08:32:07Z

modules/python/clusterloader2/hyperion/config/modules/measurements.yaml

+          - resource
+          queries:
+          - name: Terminated watches
+            query: sum(increase(apiserver_terminated_watchers_total[%v:])) by (resource)


Same question about metrics here.

…mplate

carlotaarvela marked this pull request as ready for review May 19, 2025 16:20

carlotaarvela requested review from alyssa1303, anson627, rafael-mendes-pereira and sumanthreddy29 as code owners May 19, 2025 16:20

carlotaarvela requested a review from agrawaliti May 19, 2025 16:20

alexcastilio reviewed May 20, 2025

View reviewed changes

modules/python/clusterloader2/slo/config/load-config-hyperion.yaml Outdated Show resolved Hide resolved

carlotaarvela commented May 20, 2025

View reviewed changes

carlotaarvela requested a review from alexcastilio May 29, 2025 16:40

agrawaliti reviewed May 29, 2025

View reviewed changes

alexcastilio reviewed May 30, 2025

View reviewed changes

Add hyperion benchmark tests

c1589dc

carlotaarvela force-pushed the carlota/baseline-pipeline branch from dae5f03 to c1589dc Compare June 10, 2025 11:47

carlotaarvela added 2 commits June 10, 2025 12:51

Merge remote-tracking branch 'origin/main' into carlota/baseline-pipe…

cc4cb4e

…line

cleanup

6ba300b

alexcastilio reviewed Jun 11, 2025

View reviewed changes

carlotaarvela and others added 2 commits June 11, 2025 18:40

Remove unnecessary environment variable definition from deployment te…

26f3942

…mplate

Merge branch 'main' into carlota/baseline-pipeline

7b63973

carlotaarvela requested a review from alexcastilio June 13, 2025 14:46

Carlota/baseline pipeline #659

Are you sure you want to change the base?

Carlota/baseline pipeline #659

Uh oh!

Conversation

carlotaarvela commented May 19, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pavneeta commented Jun 9, 2025

Uh oh!

carlotaarvela commented Jun 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants