Skip to content

fix: resolve stale label corruption in _get_train_dataset#418

Open
dev-aditya-hub wants to merge 3 commits into
kubeedge:mainfrom
dev-aditya-hub:fix/incremental-learning-stale-label
Open

fix: resolve stale label corruption in _get_train_dataset#418
dev-aditya-hub wants to merge 3 commits into
kubeedge:mainfrom
dev-aditya-hub:fix/incremental-learning-stale-label

Conversation

@dev-aditya-hub
Copy link
Copy Markdown

Summary

_get_train_dataset in IncrementalLearning had two issues that this PR addresses:

  • Unbound variable on first misslabel was only assigned inside if len(index[0]) == 1, so if the very first hard example had no match in data_labels.x, Python raised UnboundLocalError buried inside a generic RuntimeError("pipeline runs failed") with no actionable message
  • Silent label corruption on subsequent misses — if any hard example after the first had no match, label silently retained the value from the previous loop iteration, writing the wrong label for that image to the training file with no error, no log, and no indication anything was wrong; training then proceeded on corrupted supervision signal, producing invalid benchmark metrics
  • O(n) numpy scan per hard examplenp.where(data_labels.x == old) did a full array scan on every iteration instead of once

The developer had suppressed pylint's E0606 ("possibly-used-before-assignment") warning on this function rather than fixing the underlying logic.

Fix

Build a label_index dict (path → label) once from data_labels.x and data_labels.y. For each hard example, look up the label in O(1) via the dict. If a path is not found, raise a RuntimeError immediately with a message that names the missing path and the index file — giving the user an actionable diagnosis (path normalization mismatch, missing entry) instead of a silent wrong result or an opaque crash.

# Before
if len(index[0]) == 1:
    label = data_labels.y[index[0][0]]
file.write(f"{new} {label}\n")   # label could be stale or unbound

# After
label_index = {path: lbl for path, lbl in zip(data_labels.x, data_labels.y)}
if old not in label_index:
    raise RuntimeError(f"Hard example '{old}' has no matching label in '{data_label_file}'. "
                       f"This would write a stale or unbound label, corrupting training data.")
file.write(f"{new} {label_index[old]}\n")

Files Changed

  • core/testcasecontroller/algorithm/paradigm/incremental_learning/incremental_learning.py — rewrite _get_train_dataset to use a dict lookup with explicit error on missing path; remove pylint: disable=E0606 suppression that was masking the bug

Add .github/workflows/spell-check.yml running codespell on every push
and pull_request, and a .codespellrc limiting the check to core/.

Fix all typos caught by the new check:
- Deafult -> Default (log.py)
- caculate -> calculate (federated_class_incremental_learning.py)
- enviroment -> environment (simulation.py, simulation_system_admin.py)
- segmantation -> segmentation (dataset.py)

Signed-off-by: dev-aditya-hub <premjadhvar95@gmail.com>
- "os" -> "of" in RuntimeError message (check_host_cpu)
- rename destory_simulation_environment -> destroy_simulation_environment
- fix docstring: "build" -> "destroy" for destroy_simulation_environment

Signed-off-by: dev-aditya-hub <premjadhvar95@gmail.com>
@kubeedge-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dev-aditya-hub
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot requested review from MooreZheng and hsj576 May 8, 2026 08:38
@kubeedge-bot kubeedge-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 8, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes various typos in docstrings, logs, and function names across the codebase and adds a .codespellrc configuration. It also refactors the _get_train_dataset method in incremental_learning.py to use a dictionary for efficient label lookups and adds error handling for missing labels. Feedback was provided to improve the grammar and pluralization of a log message in simulation_system_admin.py.

if build_simulation_env_ret.returncode == 0:
LOGGER.info(
"Congratulation! The simulation enviroment build successful!")
"Congratulation! The simulation environment build successful!")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message contains a typo ('Congratulation' should be plural) and the phrasing 'build successful' is grammatically incomplete. Consider using 'build was successful' or 'built successfully'.

Suggested change
"Congratulation! The simulation environment build successful!")
"Congratulations! The simulation environment build was successful!")

…aset

Signed-off-by: dev-aditya-hub <premjadhvar95@gmail.com>
@dev-aditya-hub dev-aditya-hub force-pushed the fix/incremental-learning-stale-label branch from db3d571 to 638ea25 Compare May 8, 2026 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants