fix(core): enable fault-tolerant execution and incremental result persistence#424
fix(core): enable fault-tolerant execution and incremental result persistence#424Rishabh-git10 wants to merge 2 commits into
Conversation
…sistence Signed-off-by: Rishabh Dewangan <107680241+Rishabh-git10@users.noreply.github.com>
|
Welcome @Rishabh-git10! It looks like this is your first PR to kubeedge/ianvs 🎉 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Rishabh-git10 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces incremental saving of test results and enhances fault tolerance by allowing the benchmark to continue after individual test case failures. Feedback highlights a significant performance bottleneck and potential data loss issue caused by frequent I/O operations in the saving logic. Additionally, it is recommended to wrap the incremental save callback in a try-except block to ensure that persistence failures do not interrupt the overall execution.
…hed save with fault-tolerant execution Signed-off-by: Rishabh Dewangan <107680241+Rishabh-git10@users.noreply.github.com>
What type of PR is this?
/kind bug
What this PR does / why we need it:
Enables fault-tolerant execution and incremental data persistence to prevent data loss during multi-configuration benchmarks.
Previously, a downstream configuration failure caused a fatal
RuntimeError, terminating the execution loop and erasing all prior successful results from memory.Changes:
run_testcasesintestcasecontroller.pyto catch, log, and isolate individual test exceptions, allowing the loop to continue safely.incremental_save_cbcallback inbenchmarkingjob.pyconnected to theRankmodule's concatenation logic. Successful results are now written to disk immediately after each test completes.Which issue(s) this PR fixes:
Fixes #423