Skip to content

fix(core): enable fault-tolerant execution and incremental result persistence#424

Open
Rishabh-git10 wants to merge 2 commits into
kubeedge:mainfrom
Rishabh-git10:fix/intermediate-result-persistence
Open

fix(core): enable fault-tolerant execution and incremental result persistence#424
Rishabh-git10 wants to merge 2 commits into
kubeedge:mainfrom
Rishabh-git10:fix/intermediate-result-persistence

Conversation

@Rishabh-git10
Copy link
Copy Markdown

What type of PR is this?
/kind bug

What this PR does / why we need it:
Enables fault-tolerant execution and incremental data persistence to prevent data loss during multi-configuration benchmarks.

Previously, a downstream configuration failure caused a fatal RuntimeError, terminating the execution loop and erasing all prior successful results from memory.

Changes:

  1. Exception Isolation: Modified run_testcases in testcasecontroller.py to catch, log, and isolate individual test exceptions, allowing the loop to continue safely.
  2. Incremental Persistence: Introduced an incremental_save_cb callback in benchmarkingjob.py connected to the Rank module's concatenation logic. Successful results are now written to disk immediately after each test completes.

Which issue(s) this PR fixes:
Fixes #423

…sistence

Signed-off-by: Rishabh Dewangan <107680241+Rishabh-git10@users.noreply.github.com>
@kubeedge-bot kubeedge-bot added the kind/bug Categorizes issue or PR as related to a bug. label May 10, 2026
@kubeedge-bot kubeedge-bot requested review from MooreZheng and hsj576 May 10, 2026 18:24
@kubeedge-bot
Copy link
Copy Markdown
Collaborator

Welcome @Rishabh-git10! It looks like this is your first PR to kubeedge/ianvs 🎉

@kubeedge-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Rishabh-git10
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 10, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces incremental saving of test results and enhances fault tolerance by allowing the benchmark to continue after individual test case failures. Feedback highlights a significant performance bottleneck and potential data loss issue caused by frequent I/O operations in the saving logic. Additionally, it is recommended to wrap the incremental save callback in a try-except block to ensure that persistence failures do not interrupt the overall execution.

Comment thread core/cmd/obj/benchmarkingjob.py
Comment thread core/testcasecontroller/testcasecontroller.py Outdated
…hed save with fault-tolerant execution

Signed-off-by: Rishabh Dewangan <107680241+Rishabh-git10@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Framework Crash on Downstream Configuration Fails to Persist Prior Successful Results

2 participants