Skip to content

fix: skip hard examples with no unique label match in incremental learning#415

Open
dev-aditya-hub wants to merge 1 commit into
kubeedge:mainfrom
dev-aditya-hub:fix/incremental-learning-label-bleed
Open

fix: skip hard examples with no unique label match in incremental learning#415
dev-aditya-hub wants to merge 1 commit into
kubeedge:mainfrom
dev-aditya-hub:fix/incremental-learning-label-bleed

Conversation

@dev-aditya-hub
Copy link
Copy Markdown

@dev-aditya-hub dev-aditya-hub commented May 6, 2026

Summary

  • IncrementalLearning._get_train_dataset() assigned label inside an if len(index[0]) == 1: block but called file.write(f"{new} {label}\n") unconditionally outside it
  • When np.where() finds no unique path match for a hard example, label retains the value from the prior loop iteration — silently writing the wrong label to the training file; on the very first iteration it raises UnboundLocalError
  • Fixed by inverting the guard to if len(index[0]) != 1: continue so unmatched samples are skipped and label is only assigned on the valid path, matching the intent of the surrounding logic

Test plan

  • Verify hard examples with no label match are skipped rather than written with a stale label
  • Verify a benchmark run with incremental learning completes without UnboundLocalError on the first unmatched hard example
  • Verify the generated training file contains only correctly matched and labeled entries
  • Verify incremental learning metrics are stable across repeated runs on the same dataset
  • CI passes on all platforms

Summary by CodeRabbit

Bug Fixes
Corrected label assignment in hard-example training dataset construction, preventing stale labels from prior loop iterations being silently written to the training file and corrupting incremental model training.

Tests
Added regression test validating that unmatched hard examples are skipped and matched ones are written with the correct label to the training dataset file.

Signed-off-by: dev-aditya-hub premjadhvar95@gmail.com

…rning

Signed-off-by: dev-aditya-hub <premjadhvar95@gmail.com>
@kubeedge-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dev-aditya-hub
To complete the pull request process, please assign moorezheng after the PR has been reviewed.
You can assign the PR to them by writing /assign @moorezheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot requested review from MooreZheng and hsj576 May 6, 2026 18:41
@kubeedge-bot
Copy link
Copy Markdown
Collaborator

Welcome @dev-aditya-hub! It looks like this is your first PR to kubeedge/ianvs 🎉

@kubeedge-bot kubeedge-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 6, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request updates the _get_train_dataset method in incremental_learning.py to skip hard examples that do not have a unique match in the dataset, while also removing unnecessary pylint disable comments. The reviewer suggests adding a warning log when multiple matches are encountered to help identify potential dataset inconsistencies.

Comment on lines +156 to +157
if len(index[0]) != 1:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic skips hard examples if multiple matches are found in the label file (len(index[0]) != 1). While this aligns with the PR's goal of ensuring unique matches, it might be beneficial to log a warning when multiple matches are found, as this could indicate inconsistencies or duplicates in the dataset that the user should be aware of.

@dev-aditya-hub
Copy link
Copy Markdown
Author

/assign @MooreZheng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants