Skip to content

feat: h2 pod scheduling rate benchmark#1131

Open
vittoriasalim wants to merge 33 commits intov2from
vitto/v2
Open

feat: h2 pod scheduling rate benchmark#1131
vittoriasalim wants to merge 33 commits intov2from
vitto/v2

Conversation

@vittoriasalim
Copy link
Copy Markdown
Contributor

as title

@vittoriasalim vittoriasalim marked this pull request as ready for review April 17, 2026 05:04
}""")

createClusterScript = """
az cloud update --endpoint-resource-manager https://eastus2euap.management.azure.com/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to set ARM endpoint anymore

SUBSCRIPTION_ID,
CL2_POOL,
"Standard_D8S_v4",
4,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need 4 VMs for CL2?

Comment on lines +101 to +102
- Identifier: HighThroughputPodStartupLatency
Method: PodStartupLatency
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use kwok nodes, do we still want to measure startup latency?

backoffLimit: 0 # Don't retry failed CL2 runs.
template:
spec:
initContainers:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it for debugging?

@@ -0,0 +1,51 @@
apiVersion: v1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not merge cl2-config.yaml into cl2.yaml?

volumeMounts:
- mountPath: /override
name: cl2-override
- mountPath: /root/perf-tests/clusterloader2/testing/load/cl2-config.yaml
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you avoid using the same file name for /root/perf-tests/clusterloader2/testing/load/cl2-config.yaml and kcl/ccp_team/hyperscale_pod_scheduling/cl2-config.yaml. It's easy to get confused.

operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 900
- key: "kwok.x-k8s.io/node"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a taint kwok.x-k8s.io/node, can this be removed?

"node-lease-duration-seconds": "100"
"pod-play-stage-parallelism": "500"
}),
azure.AzCli(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need AzCli if your script only uses kubectl - a Bash should do it. AzCli involves getting credential and do az login, which is not necessary since the pipeline already pulled the cluster config. Same for some other steps later in the pipeline.

name: cl2-simple-deployment
namespace: clusterloader2
data:
simple-deployment.yaml: |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need another simple-deployment.yaml cl2 already has one https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/load/modules/scheduler-throughput/simple-deployment.yaml

You can use RUN_ON_ARM_NODES override to allow it to run on kwok nodes. see

value: arm64 # This is a hack to allow Cl2 pods to run on Kwok nodes.

"SkipLinuxAzSecPack": "true"
},
"properties": {
"controlPlaneScalingProfile": {"scalingSize": "H4"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hardcoded as H4 while PR title says h2

NODE_COUNT = 2000
KWOK_NODES_PER_CONTROLLER = 100
KWOK_POOL = "kwokpool"
KWOK_POOL_VM_SIZE = "Standard_D8_v3"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need 8 cores? Would D4_v3 do well for us

name: cl2
namespace: clusterloader2
spec:
completions: 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants