Add Job workload support to CRUD benchmarking framework#1133
Draft
engineeredcurlz wants to merge 79 commits intomainfrom
Draft
Add Job workload support to CRUD benchmarking framework#1133engineeredcurlz wants to merge 79 commits intomainfrom
engineeredcurlz wants to merge 79 commits intomainfrom
Conversation
…to test-refactor
yaml.safe_load_all() enters an infinite loop when passed a MagicMock object because PyYAML detects the .read attribute and treats it as a file-like stream, then loops forever waiting to buffer enough bytes (len(MagicMock()) returns 0 by default). Fix by setting create_template.return_value to a valid YAML string in the three create_deployment tests, so yaml.safe_load_all receives a real string and parses it via the non-blocking code path. Affected tests: - test_create_deployment_success - test_create_deployment_failure - test_create_deployment_partial_success
begin_create_or_update() returns an LROPoller that was being discarded, allowing execution to continue while Azure still had an operation in-progress. Subsequent scale/delete calls were then rejected with OperationNotAllowed. Fix by calling poller.result() in scale_node_pool and _progressive_scale to block until Azure fully completes each operation before proceeding.
nginx-container was hardcoded in deployment template and in create deployment method - add label_selector to parameters - replace nginx-container in deployment.yaml (label_alue) - derive label_value from selector - pass label_selector directly
Implements Job workload creation following the same loop pattern as create_deployment
add elif routing branch to call create_job and subparser
add jobs command through execute.yml and script block that calls jobs in main.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the
jobsworkload type as the third and final planned workloadmethod (
deployment,statefulset,jobs) in the CRUD benchmarking framework.Changes
workload_templates/job.yml— New Kubernetes manifest template for Jobsusing
batch/v1API. No Service required as Jobs are run-to-completionworkloads. Uses
restartPolicy: NeverandJOB_COMPLETIONSplaceholder.No
parallelismfield — defaults to matchcompletions.node_pool_crud.py— Newcreate_job()method following the same looppattern as
create_deployment. Usescompletecondition instead ofavailable/readysince Jobs terminate after completion. Nowait_for_pods_readycall since pods exit after the job finishes.main.py— Addedjobssubparser with--node-pool-name,--number-of-jobs,--completions, and--manifest-dirarguments.Added
elif command == "jobs"routing inhandle_workload_operations.steps/engine/crud/k8s/execute.yml— Addedjobsscript block thatcalls
python3 main.py jobswith the appropriate CLI flags. Addednumber_of_jobsandcompletionsparameters.steps/topology/k8s-crud-gpu/execute-crud.yml— Wiresnumber_of_jobsand
completionsthrough to the engine step.Tests (to be added)
4 unit tests to be added to
test_azure_node_pool_crud.py:test_create_job_successtest_create_job_failuretest_create_job_no_clienttest_create_job_partial_successDependencies
This branch is based on
test-refactorand depends on it being mergedbefore this can merge to
main. It is independent ofdipowell/crud-statefulset.