Refactor using tqdm and pathlib#206
Conversation
|
@CiSong10 thanks for this. Would you please realign you branch with the new python-ci adjustments I made to master? Also please see this autogenerated review: I reviewed the diff and the intent is generally good — modernizing path handling (pathlib), adding progress bars (tqdm), cleaning up string formatting and JSON/file handling. I found several places that may cause bugs or break other code; below I list the issues grouped by file with suggested fixes. General / cross-cutting
detectree2/models/evaluation.py
detectree2/models/outputs.py
detectree2/models/predict.py
detectree2/preprocessing/tiling.py
Other stylistic/robustness notes
Summary of highest-priority fixes
|
|
Hi, I have merged up-to-date changes from master into this PR. I humanly reviewed the AI-generated reviews and there's not much to change. Conflicts resolved, etc. etc. Please check this PR for merging. Thanks. |
|
The failing job is due to a mypy type-checking error at line 224 in detectree2/models/outputs.py: error: Unsupported target for indexed assignment ("Collection[str]") [index] Line 224 is attempting an indexed assignment on a type that does not support it—likely a type annotated as Collection rather than a mutable sequence like list. The problematic code is: geofile["features"][i] = updated_feature Or, in your context, you might do something like: some_collection[index] = value Solution: "features": List[Feature] and be sure that everywhere features is created and used, you keep it as a list (not as a generic collection or other immutable sequence). If a variable is annotated as Collection[str], you cannot assign to indices (e.g., collection[0] = 'foo'). Change the variable's annotation to list[str] or List[str]: from typing import List features: List[str] = [] Go to the definition or location at line 224 and adjust the types as shown above. For more context, refer to the file at this ref: detectree2/models/outputs.py@24d614f06f4a8dff35d829bcee8b0945dcadb1bd. Summary:
After making these changes, the mypy error will be resolved. |
|
@CiSong10 please see guidance on programming style and pre-commit hooks: https://patball1.github.io/detectree2/contributing.html#programming-style Code Formatting (isort/flake8) Failures:
Once you’ve:
Your CI action should pass both dependency installation and linting checks. |
- Including predict_on_data(), project_to_geojson(), stitch_crowns(), etc. - Simplify file path: `Path(file_roots[num[i]]` is a string of tile stem, its `.suffix` will return an empty string. Simply tile name stem + .geojson is good. - Update get_tree_dicts and get_filenmes function, accept Path as argument - Explicitly annotate feature as Feature class
7717f3c to
f7541bc
Compare
f7541bc to
e2d32fa
Compare
|
@PatBall1 Happy New Year. I have fixed the import sorting issues and the Detectree2 CI / build check was passed successfully. The Detectree docker CI test / build was cancelled ("The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s"). I don't know how to fix this. Please take a look. Thanks. |
|
Happy new year @CiSong10 ! And thank you for your contributions. Apologies, but I am overloaded at the moment and am not able to review right now but will aim to do so in a week or so. |
PatBall1
left a comment
There was a problem hiding this comment.
AI assisted review
🎉 Overall Assessment
This is a well-executed modernization and cleanup PR with consistent improvements across the codebase. The changes follow best practices and improve code quality significantly.
✅ Strengths
1. Path Handling Modernization
- Excellent migration from string concatenation (
os.path.join, string operations) topathlib.Path - Consistent use of
/operator for path joining - Proper use of
.stem,.suffix,.nameinstead of string manipulation
Examples:
# Before
filename = file. replace(".geojson", "")
filename_split = filename.split("_")
# After
file = Path(file)
filename_split = file.stem.split("_")2. Type Hints
- Good additions like
str | Pathfor function parameters - Return type annotations (
-> None) - Makes the API much clearer
3. Code Simplification
- Removed unnecessary
continuestatements in else blocks - Consolidated variable declarations (
total_tps = total_fps = total_fns = 0) - Removed redundant parentheses in arithmetic operations
4. Better User Experience
- Added
tqdmprogress bars throughout - great for long-running operations - Formatted output with
.format()for cleaner display - Commented out debug print statements instead of removing them
5. String Formatting
- Good use of f-strings replacing older
.format()and concatenation - More readable error messages
🔍 Issues & Concerns
1. ⚠️ CRITICAL: Unintentional Change
detectree2/models/train.py line 456:
-self.early_stop = False
+self.early_stop = TrueThis appears to be an accidental change that will fundamentally alter training behavior by enabling early stopping by default. This should likely be reverted unless it was intentional.
2. Breaking Change in File Iteration
detectree2/models/outputs.py lines 71-72:
-for file in entries:
- if ". json" in file:
+for file in directory.iterdir():
+ if file.suffix != ".json":
+ continueThe logic changed from if ". json" in file to if file.suffix != ".json". This is more correct, but it's a behavioral change:
- Old: matches files with ". json" anywhere in the name (e.g., "my. json.backup")
- New: only matches files ending in ".json"
This is an improvement, but should be noted in release notes.
3. cv2.RETR_TREE → cv2.RETR_EXTERNAL Change
detectree2/models/outputs.py line 40:
-contours, _ = cv2.findContours(masked_arr, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
+contours, _ = cv2.findContours(masked_arr, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)This changes the contour retrieval mode:
RETR_TREE: retrieves all contours and creates a full hierarchyRETR_EXTERNAL: retrieves only extreme outer contours
This could affect polygon detection results for masks with holes or nested contours. Was this intentional?
4. Error Message Improvement Needed
detectree2/models/outputs.py line 313:
-raise FileNotFoundError("No geojson files found in folder.")
+raise FileNotFoundError(f"No geojson files found in {crowns_path}.")Good improvement, but the variable name is crowns_path not folder - nice consistency.
5. Removed Code Without Comment
detectree2/models/outputs.py lines 53, 142-143:
-[x, y, w, h] = cv2.boundingRect(masked_arr)
+# [x, y, w, h] = cv2.boundingRect(masked_arr)This code is commented out but never used afterward. If it's dead code, it should be removed entirely. If it might be needed, a comment explaining why it's kept would help.
6. Inconsistent File Reading Pattern
Some places use file.open(), others use open(file). While both work, consistency would be better:
with file.open() as f: # New pathlib style
with open(file_path) as f: # Traditional style7. Division by Zero Check
detectree2/models/evaluation.py lines 426-427:
+if union_area == 0:
+ continueGood addition! But should this log a warning? Silently skipping might hide issues.
🐛 Potential Bugs
1. Path Conversion in cv2/rasterio calls
Several places now pass Path objects to functions expecting strings:
img = cv2.imread(file_name) # file_name might be a Path objectWhile this usually works, it's not guaranteed in older versions. Consider:
img = cv2.imread(str(file_name))Actually, I see you did this in some places (line 751-752 in train.py) but not others (line 1143).
2. Glob Pattern Change
detectree2/models/outputs.py line 71:
-entries = os.listdir(directory)
+for file in directory.iterdir():The old code would have worked with string paths, but new code requires Path. Make sure all callers are updated.
💡 Suggestions
1. Consistent Path Handling
Add this pattern at the start of functions:
def function_name(directory: str | Path):
directory = Path(directory) # ✅ You're already doing this!
# ... rest of codeYou're already doing this in most places - great!
2. Consider Context Managers
The rasterio changes are good:
with rasterio.open(tifpath) as data:
epsg = data.crs.to_epsg()But ensure all file operations use context managers (you're mostly doing this already).
3. Add Docstring Updates
Some docstrings still reference old parameter types. For example, some still say str when they now accept str | Path.
You updated some (lines 640-641 in evaluation.py) - consider updating all of them.
📝 Minor Nitpicks
-
Typo Fix: Line 383 in
outputs.py- "clearn_crowns" → "clean_crowns" ✅ (good catch!) -
Import Cleanup: Removed unused
globimport ✅ -
Removed unused
osimport in predict.py ✅ -
Formatting: Some trailing spaces might be removed (line 238 in outputs.py shows extra spaces)
🎯 Recommendations
Must Address:
- Revert or explain the
self.early_stop = Truechange - Ensure cv2.RETR_EXTERNAL change is intentional and document it
- Make Path→str conversion consistent for cv2.imread calls
Should Address:
- Document the breaking change in file filtering behavior
- Consider logging when
union_area == 0instead of silent continue - Remove fully dead code (commented bounding rect calculation)
Nice to Have:
- Update all docstrings to reflect
str | Pathtypes - Add release notes documenting behavioral changes
- Correct flake8 style formatting
✨ Summary
This is a high-quality refactoring PR that modernizes the codebase significantly. The main concerns are:
- One likely accidental change (
early_stop = True) - One behavioral change that needs verification (
RETR_EXTERNAL) - Some inconsistencies in Path→str conversion
Once those items are addressed, this would be an excellent merge! The code is much more maintainable and pythonic after these changes.
**Recommendation: Request changes for the critical items, then approve. ** 👍
|
Thank you for the AI-generated review. I’ve made the requested changes and added my responses to the comments below. Please let me know if it's ok or I need to make further changes. Response to "Issues & Concerns"
Response to "Potential Bugs":
Response to "Suggestions"
|
PatBall1
left a comment
There was a problem hiding this comment.
Overview of Changes
The PR modernizes the codebase with better practices across multiple modules:
Positive Changes ✅
-
Path handling modernization
- Converts string-based path operations to
pathlib.Paththroughout - More robust and cross-platform compatible
- Examples:
Path(directory) / filenameinstead ofdirectory + "/" + file
- Converts string-based path operations to
-
Progress indicators
- Adds
tqdmprogress bars for long-running operations - Much better user experience during predictions, stitching, and evaluation
- Adds
-
Code simplification
- Removes unnecessary variables and intermediate steps
- Uses f-strings consistently instead of
.format()in some places - List comprehensions instead of manual loops
-
Bug fixes
- Adds check for
union_area == 0to prevent division by zero (line 426 in evaluation.py) - Changes
cv2.RETR_TREEtocv2.RETR_EXTERNALin contour finding (outputs. py) - better for polygon extraction
- Adds check for
-
Better error messages
- More informative error in
stitch_crowns:f"No geojson files found in {crowns_path}. "
- More informative error in
Potential Issues ⚠️
-
Mixed formatting styles (lines 615, 712 in evaluation.py)
- Some places use
"{:. 2f}".format(prec), others use f-strings - Should be consistent (prefer f-strings:
f"{prec:.2f}")
- Some places use
-
Type hints inconsistency
- Good:
str | Pathtype hints added - Missing: Return type hints not always added (some functions still lack them)
- Good:
-
Minor logic issue in outputs.py (line 633)
if file. suffix != "json":
Should be:
if file.suffix != ".json": # Missing the dot!
-
Debugging comments
- Many print statements commented out but left in code
- Should either remove or use proper logging
-
Indentation changes in evaluation.py
- Lines 672-680 show significant indentation fixes, suggesting previous code had incorrect nesting
- Good fix, but worth double-checking the logic flow
Specific Concerns
outputs.py line 40: Change from cv2.RETR_TREE to cv2.RETR_EXTERNAL
RETR_TREEretrieves all contours and creates full hierarchyRETR_EXTERNALonly retrieves extreme outer contours- This could change behavior if there are nested polygons, but is likely correct for crown polygons
train.py line 1260:
return str(latest_model_file)This returns just the filename, not the full path (removed os.path.join). This looks like a bug - should be:
return str(Path(output_dir) / latest_model_file)Recommendations
- Fix the bug in train.py line 1260 - critical issue
- Fix missing dot in outputs.py line 633 - will cause filtering to fail
- Standardize on f-strings throughout instead of
.format() - Remove commented print statements or convert to proper logging
- Add more type hints for return values
- Test the contour retrieval change to ensure polygon extraction still works correctly
Summary
This is generally a good modernization PR that improves code quality, but has 2 bugs that need fixing before merging:
- Path construction bug in
get_latest_model_path - Missing dot in suffix comparison in
clean_predictions
The other changes look solid and will improve maintainability.
detectree2/models/outputs.py
Outdated
| pred_fold = Path(directory) | ||
|
|
||
| for file in pred_fold.iterdir(): | ||
| if file.suffix != "json": |
There was a problem hiding this comment.
| if file.suffix != "json": | |
| if file.suffix != ".json": |
|
|
||
| # Return the full path to the latest model file | ||
| return os.path.join(output_dir, latest_model_file) | ||
| return str(latest_model_file) |
There was a problem hiding this comment.
| return str(latest_model_file) | |
| return str(Path(output_dir) / latest_model_file) |
Comprehensive Code Review - PR #206Critical Bugs 🐛
Flake8 Issues to Fixevaluation.py# Line 426: Remove trailing whitespace
if union_area == 0:
continue # Remove space after 'continue'
# Lines 675-677, 684-686, 694: Fix indentation (E127)
all_test_feats = initialise_feats2(tile_directory, file. name,
lidar_img, area_threshold,
conf_threshold, border_filter,
tile_width, tile_origin, epsg)
pred_file = "Prediction_" + file.stem + "_eval.geojson"
all_pred_feats = initialise_feats2(pred_directory, pred_file,
lidar_img, area_threshold,
conf_threshold, border_filter,
tile_width, tile_origin, epsg)
tps, fps, fns = positives_test(all_test_feats, all_pred_feats,
IoU_threshold, min_height, max_height)outputs.py# Line 8: Remove unused import
# Delete: import os
# Line 94: Fix spacing (E272)
with file.open() as prediction_file: # Remove extra space before 'as'
# Lines 220, 225, 333, 618: Remove trailing whitespace and whitespace on blank lines
# Line 413: Remove extra blank line (E303)
# Lines 627-628: Fix indentation
rescaled_coords = [
[crown_coords[i], crown_coords[i + 1]]
for i in range(0, len(crown_coords), 2)
]predict.py# Line 26: Fix spacing around equals (E252)
eval: bool = False, # Add spaces around =train.py# Line 381: Fix comment formatting (E265) and line length (E501)
# Split the long comment into multiple lines
# Line 399: Fix indentation (E126)
# Line 410: Fix comment spacing (E262)
# Add space after #
# Lines 549, 552: Break long lines (E501)
# Lines 1080-1082, 1205: Fix line breaks and indentation (W504, E128)tiling.pyMultiple issues - many pre-existing to this PR. Focus on ones introduced by changes: # Lines 676, 679, 683, 690, 695, 699, 701, 705, 713, 719, 723:
# Remove whitespace from blank lines (W293)
# Line 727: Remove trailing whitespace (W291)
# Line 895, 900: Remove whitespace from blank lines (W293)Suggested Fixed Code Snippetsoutputs.py - Complete fixes# Line 8: Remove os import
from pathlib import Path # Keep this
# import os # DELETE THIS LINE
# Line 94:
with file.open() as prediction_file: # Fix spacing
# Line 220:
"coordinates": [moved_coords],
}, # Remove trailing space
}
# Line 225:
# Remove whitespace from blank line
# Line 627-628:
rescaled_coords = [
[crown_coords[i], crown_coords[i + 1]]
for i in range(0, len(crown_coords), 2)
]predict.py - Line 26def predict_on_data(
directory: str | Path = "./",
out_folder: str | Path = "predictions",
predictor=DefaultPredictor,
eval: bool = False, # FIX: Add spaces around =
save: bool = True,
num_predictions=0,
) -> None:evaluation.py - Line 426if union_area == 0:
continue # FIX: Remove trailing whitespaceevaluation.py - Lines 674-687 (indentation fix)all_test_feats = initialise_feats2(
tile_directory, file.name,
lidar_img, area_threshold,
conf_threshold, border_filter,
tile_width, tile_origin, epsg)
new_heights = get_heights(all_test_feats, min_height, max_height)
heights. extend(new_heights)
pred_file = "Prediction_" + file.stem + "_eval.geojson"
all_pred_feats = initialise_feats2(
pred_directory, pred_file,
lidar_img, area_threshold,
conf_threshold, border_filter,
tile_width, tile_origin, epsg)train.py - Line 1260 (CRITICAL BUG)def get_latest_model_path(output_dir: str) -> str:
"""Get the path to the latest model file in the output directory."""
# Regular expression to match model files with the pattern "model_X.pth"
model_pattern = re.compile(r"model_(\d+)\.pth")
# Find all files that match the pattern and extract their indices
model_files = []
output_path = Path(output_dir) # ADD THIS
for f in output_path.iterdir():
match = model_pattern.match(f.name)
if match:
model_files.append((f, int(match.group(1))))
if not model_files:
raise FileNotFoundError(f"No model files found in {output_dir}")
# Sort by index and get the latest model file
latest_model_file = max(model_files, key=lambda x: x[1])[0]
# Return the full path to the latest model file
return str(latest_model_file) # FIX: latest_model_file is already a Path object from iterdir()Summary of Required Changes
Priority Action ItemsMUST FIX (blocking):
SHOULD FIX (quality):
|
|
Hi, I have made edits according to the latest change request. The two "critical bugs" are fixed. Other format issues are fixed as well, although some of them are pre-existing, not related to my PR, and I believe should be addressed separately. |
96e1354 to
5160ded
Compare
|
@CiSong10 thank you! I really appreciate the help. Sorry for being so demanding! I'll review shortly. |
Rewrite some functions using tqdm process bar and pathlib. Make other small code improvements as well.