fix: resolve stale label corruption in _get_train_dataset#418
fix: resolve stale label corruption in _get_train_dataset#418dev-aditya-hub wants to merge 3 commits into
Conversation
Add .github/workflows/spell-check.yml running codespell on every push and pull_request, and a .codespellrc limiting the check to core/. Fix all typos caught by the new check: - Deafult -> Default (log.py) - caculate -> calculate (federated_class_incremental_learning.py) - enviroment -> environment (simulation.py, simulation_system_admin.py) - segmantation -> segmentation (dataset.py) Signed-off-by: dev-aditya-hub <premjadhvar95@gmail.com>
- "os" -> "of" in RuntimeError message (check_host_cpu) - rename destory_simulation_environment -> destroy_simulation_environment - fix docstring: "build" -> "destroy" for destroy_simulation_environment Signed-off-by: dev-aditya-hub <premjadhvar95@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dev-aditya-hub The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request fixes various typos in docstrings, logs, and function names across the codebase and adds a .codespellrc configuration. It also refactors the _get_train_dataset method in incremental_learning.py to use a dictionary for efficient label lookups and adds error handling for missing labels. Feedback was provided to improve the grammar and pluralization of a log message in simulation_system_admin.py.
| if build_simulation_env_ret.returncode == 0: | ||
| LOGGER.info( | ||
| "Congratulation! The simulation enviroment build successful!") | ||
| "Congratulation! The simulation environment build successful!") |
There was a problem hiding this comment.
The log message contains a typo ('Congratulation' should be plural) and the phrasing 'build successful' is grammatically incomplete. Consider using 'build was successful' or 'built successfully'.
| "Congratulation! The simulation environment build successful!") | |
| "Congratulations! The simulation environment build was successful!") |
…aset Signed-off-by: dev-aditya-hub <premjadhvar95@gmail.com>
db3d571 to
638ea25
Compare
Summary
_get_train_datasetinIncrementalLearninghad two issues that this PR addresses:labelwas only assigned insideif len(index[0]) == 1, so if the very first hard example had no match indata_labels.x, Python raisedUnboundLocalErrorburied inside a genericRuntimeError("pipeline runs failed")with no actionable messagelabelsilently retained the value from the previous loop iteration, writing the wrong label for that image to the training file with no error, no log, and no indication anything was wrong; training then proceeded on corrupted supervision signal, producing invalid benchmark metricsnp.where(data_labels.x == old)did a full array scan on every iteration instead of onceThe developer had suppressed pylint's
E0606("possibly-used-before-assignment") warning on this function rather than fixing the underlying logic.Fix
Build a
label_indexdict (path → label) once fromdata_labels.xanddata_labels.y. For each hard example, look up the label in O(1) via the dict. If a path is not found, raise aRuntimeErrorimmediately with a message that names the missing path and the index file — giving the user an actionable diagnosis (path normalization mismatch, missing entry) instead of a silent wrong result or an opaque crash.Files Changed
core/testcasecontroller/algorithm/paradigm/incremental_learning/incremental_learning.py— rewrite_get_train_datasetto use a dict lookup with explicit error on missing path; removepylint: disable=E0606suppression that was masking the bug