Add Job workload support to CRUD benchmarking framework by engineeredcurlz · Pull Request #1133 · Azure/telescope

engineeredcurlz · 2026-04-14T19:27:57Z

Summary

Implements the jobs workload type as the third and final planned workload
method (deployment, statefulset, jobs) in the CRUD benchmarking framework.

Changes

workload_templates/job.yml — New Kubernetes manifest template for Jobs
using batch/v1 API. No Service required as Jobs are run-to-completion
workloads. Uses restartPolicy: Never and JOB_COMPLETIONS placeholder.
No parallelism field — defaults to match completions.
node_pool_crud.py — New create_job() method following the same loop
pattern as create_deployment. Uses complete condition instead of
available/ready since Jobs terminate after completion. No
wait_for_pods_ready call since pods exit after the job finishes.
main.py — Added jobs subparser with --node-pool-name,
--number-of-jobs, --completions, and --manifest-dir arguments.
Added elif command == "jobs" routing in handle_workload_operations.
steps/engine/crud/k8s/execute.yml — Added jobs script block that
calls python3 main.py jobs with the appropriate CLI flags. Added
number_of_jobs and completions parameters.
steps/topology/k8s-crud-gpu/execute-crud.yml — Wires number_of_jobs
and completions through to the engine step.

Tests (to be added)

4 unit tests to be added to test_azure_node_pool_crud.py:

test_create_job_success
test_create_job_failure
test_create_job_no_client
test_create_job_partial_success

Dependencies

This branch is based on test-refactor and depends on it being merged
before this can merge to main. It is independent of
dipowell/crud-statefulset.

…to test-refactor

…down

yaml.safe_load_all() enters an infinite loop when passed a MagicMock object because PyYAML detects the .read attribute and treats it as a file-like stream, then loops forever waiting to buffer enough bytes (len(MagicMock()) returns 0 by default). Fix by setting create_template.return_value to a valid YAML string in the three create_deployment tests, so yaml.safe_load_all receives a real string and parses it via the non-blocking code path. Affected tests: - test_create_deployment_success - test_create_deployment_failure - test_create_deployment_partial_success

begin_create_or_update() returns an LROPoller that was being discarded, allowing execution to continue while Azure still had an operation in-progress. Subsequent scale/delete calls were then rejected with OperationNotAllowed. Fix by calling poller.result() in scale_node_pool and _progressive_scale to block until Azure fully completes each operation before proceeding.

…yment

nginx-container was hardcoded in deployment template and in create deployment method - add label_selector to parameters - replace nginx-container in deployment.yaml (label_alue) - derive label_value from selector - pass label_selector directly

Implements Job workload creation following the same loop pattern as create_deployment

add elif routing branch to call create_job and subparser

add jobs command through execute.yml and script block that calls jobs in main.py

engineeredcurlz and others added 30 commits September 26, 2025 17:16

initial update

984bfb2

Merge branch 'main' into test-refactor

aeab336

wip: add create_deployment function to crud

032f59b

add import for handle_worload_operation function

bb4bca8

add test for success

47caf3c

change operation name

c435e36

update operation name in test

11be2fc

add test for failure

5e73464

add exception test

8530049

Merge branch 'main' into test-refactor

b724860

Linting error: removed elif and else

4e7a73b

Merge branch 'test-refactor' of https://github.com/Azure/telescope in…

69ce1e1

…to test-refactor

fixed the spacing

7a8dad4

removed extra spaces

5e52b71

Add deployment_name for consistency and to reference later

364c264

verify deployment using wait condition

0bc8275

Add logging for maniest and to wait for deployment - debug

6604ac0

add logger for deployment success

3e6beea

verify pods are available in deployment

ae99b54

add failure count

6623cb2

add logger to verify deployment

8916a99

add unit test for create_deployment method

a4273ac

ran lint

bf6143a

Add test for deployment partial sucess

712939e

Add test for multiple deployments

b6f248c

Add test for progressive scaling failure

e0d8037

Add test in node_pool_crud for returns false early exit

e6feced

Add test in node_pool_crud for scale up fails but continues to scale …

e3d142d

…down

Add test for node_pool_crud for scale down fails operation continues

769573e

Add test in node_pool_crud for deployment partial success

ce8197e

engineeredcurlz added 30 commits March 17, 2026 15:58

add correct indentation

d9ac29f

iterate multi-doc YAML generator when applying deployment manifests

e6ccf1f

refactor: seperate deploy workloads into its own pipelinee step

eae0409

fix: execute k8s workload operations displayname

3a6a9b0

Merge branch 'main' into test-refactor

d0b6578

fix: replace hardcoded timeout with self.step_timeout in create_deplo…

6b7a448

…yment

refactor: convert f-string logger calls to %-style in create_deployment

8e3445c

feat: remove hardcoding add namespace parameter

7522f1a

fix: remove --deployment-name CLI

65fdd00

fix: use hyphen for --number-of-deployments

b0be1b1

fix: return error on unknown workload command

c5d01be

revert: restore original docstring line wrapping

946dea9

Merge branch 'main' into test-refactor

e349428

Merge branch 'main' into test-refactor

1ed3985

revert: testing complete, revert back to original pipeline

e3d7174

fix: use dashes to match argparse argument in main

ef0f393

fix: remove --deployment-name cli (main.p + execute.yml)

008c8f6

Merge branch 'main' into test-refactor

80dc153

feat: add job workload template

3d317a9

feat: add create_job method to NodePoolCRUD

08e87c3

Implements Job workload creation following the same loop pattern as create_deployment

feat: add jobs command to main.py

a3914b3

add elif routing branch to call create_job and subparser

feat: add jobs command and engine step

92e1783

add jobs command through execute.yml and script block that calls jobs in main.py

feat: add number_of_jobs and completions to k8s-crud-gpu topology

00ddcdb

test: add test_create_job_success unit test

1341b3f

test: add test_create_job_failure unit test

637e94b

test: add test_create_job_partial_success unit test

3d8c568

Test: Add pipeline test yaml for build

8f45b22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Job workload support to CRUD benchmarking framework#1133

Add Job workload support to CRUD benchmarking framework#1133
engineeredcurlz wants to merge 79 commits intomainfrom
dipowell/crud-jobs

engineeredcurlz commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

engineeredcurlz commented Apr 14, 2026

Summary

Changes

Tests (to be added)

Dependencies

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants