Skip to content

Refactor using tqdm and pathlib#206

Merged
PatBall1 merged 6 commits intoPatBall1:masterfrom
CiSong10:refactor/tqdm-pathlib
Jan 15, 2026
Merged

Refactor using tqdm and pathlib#206
PatBall1 merged 6 commits intoPatBall1:masterfrom
CiSong10:refactor/tqdm-pathlib

Conversation

@CiSong10
Copy link
Copy Markdown
Contributor

Rewrite some functions using tqdm process bar and pathlib. Make other small code improvements as well.

@PatBall1
Copy link
Copy Markdown
Owner

@CiSong10 thanks for this. Would you please realign you branch with the new python-ci adjustments I made to master?

Also please see this autogenerated review:

I reviewed the diff and the intent is generally good — modernizing path handling (pathlib), adding progress bars (tqdm), cleaning up string formatting and JSON/file handling. I found several places that may cause bugs or break other code; below I list the issues grouped by file with suggested fixes.

General / cross-cutting

  • Many edits switch from str paths to pathlib.Path. Make sure each module imports Path (from pathlib import Path) at the top. I did not see new imports for Path in the diffs you showed — add them where missing.
  • Similarly, tqdm is used in multiple files but I only see it imported in predict.py. Add "from tqdm import tqdm" in outputs.py and any other modules using tqdm.
  • Changing functions to accept Path objects is fine, but ensure all call sites across the codebase pass Path or str consistently. Either accept both (Path(file)) or explicitly convert at function entry (as you did in a few places).

detectree2/models/evaluation.py

  • initialise_feats2: you renamed the parameter file -> filename. That is fine for positional calls, but any external callers passing keyword file=... will break. Also you changed the GeoFeature constructor call: GeoFeature(filename, ...) (previously GeoFeature(file,...)). Confirm GeoFeature expects the filename string (not full path) — or convert to str earlier.
  • site_f1_score2: tile_width appears to be referenced when calling initialise_feats2, but I see the earlier tile_width computation commented out:
    • You do: for file in test_directory.iterdir(): ... tile_origin = get_tile_origin(file) epsg = get_epsg(file) then call initialise_feats2(..., tile_width, tile_origin, epsg) — but tile_width is not defined in this scope (the commented line "# tile_width = get_tile_width(file) * scaling[0]" suggests it used to be set). This will raise a NameError. Restore or recompute tile_width before use.
  • site_f1_score2: you converted test_directory to Path and iterate over Path objects; you then pass file.name to initialise_feats2 which expects the filename (you changed that), OK — but ensure initialise_feats2 uses that name consistently (it now converts to Path inside).
  • site_f1_score2: printing inside the try/except — if a ZeroDivisionError is raised, the except prints but then the function still returns prec, rec, f1_score. Those variables may be undefined if the exception happened before they were set -> this can raise an UnboundLocalError. Ensure you return sensible defaults (e.g., 0, 0, 0) from the except block or re-raise.
  • get_tile_width / get_epsg / get_tile_origin: you now call file = Path(file) and file.stem.split("") which is fine for both str and Path. Confirm callers passing things like "Prediction*_eval.geojson" still work (they pass file.name or file.stem in some places).

detectree2/models/outputs.py

  • Missing imports: this file uses Path, tqdm, and may rely on CRS/rasterio already imported — ensure Path and tqdm are imported at top. I did not see tqdm import added here.
  • to_eval_geojson: you set directory = Path(directory) but the function signature still allows directory=None; calling with None will raise. Either keep signature non-optional or add a guard.
  • to_eval_geojson: you assign img_dict["filename"] = file.name (string) then later use file.open() — you still use the Path, so that's fine. Ensure later code that expects filename to be a path uses file (Path) rather than img_dict["filename"] (string).
  • to_eval_geojson: you use f-strings for EPSG ("f'urn:ogc:...'") — good.
  • project_to_geojson:
    • You use tqdm but outputs.py doesn't show importing it. Add import.
    • The new pattern opens the raster with "with rasterio.open(tifpath) as data:" and then sets raster_transform = data.transform. That variable is used later in transforming coords; OK.
    • entries = [file for file in Path(pred_fold).iterdir() if file.suffix == ".json"] — make sure pred_fold exists or handle it.
    • You changed output file writing to use Path.open/write_text (good). Confirm permission/encoding expectations.
  • filename_geoinfo: now uses Path(filename).stem and maps last 5 parts to int — OK but will raise ValueError if filename doesn't include 5 underscore-separated parts or parts are non-int. This mirrors previous behavior but keep in mind.
  • stitch_crowns:
    • You use tqdm here too; ensure tqdm imported.
    • You switched glob("geojson") -> glob(".geojson") (good).
    • You changed the FileNotFoundError message to include folder path — good.
    • box_filter(file, shift) now receives a Path. Confirm box_filter accepts Path or convert to str.

detectree2/models/predict.py

  • Imports: you added "from tqdm import tqdm" which is good. Ensure Path is imported (I don't see it in the snippet) if you're using Path in annotations and in code.
  • predict_on_data signature now allows Path for directory/out_folder — OK.
  • You replaced the outer loop with a tqdm loop. I would confirm the loop body correctly handles both png and tif cases (I saw the png branch; ensure other branches (tif/raster) remain intact).
  • When saving JSON you switched to output_file.write_text(json.dumps(evaluations)). That works, but be aware this writes compact JSON with no indentation; if you prefer pretty JSON, use json.dump with indent or write_text(json.dumps(..., indent=2)).
  • instances_to_coco_json is called with str(file_name) — correct because some functions expect path strings.
  • Ensure predictor outputs["instances"] exists (unchanged from before).

detectree2/preprocessing/tiling.py

  • You replaced os.makedirs(out_path, exist_ok=True) with out_path.mkdir(parents=True, exist_ok=True) — good. Ensure out_path is a Path.
  • record_classes: same mkdir change, OK.
  • to_traintest_folders: you replaced os.path.exists(tiles_dir) with tiles_dir.exists(), but tiles_dir is a Path and that is fine. However you then raise IOError without message; consider raising FileNotFoundError with message for clarity.

Other stylistic/robustness notes

  • Mixed use of Path and str: be careful passing Path objects into functions that expect strings (e.g., some third-party libs or old helper functions). If you move to Path everywhere, prefer converting at the function entrance (file = Path(file)) — which you did in a few places; do that consistently.
  • Logging: you added prints and tqdm; consider using logging so users can control verbosity.
  • Tests: run the test suite or a quick integration smoke test for these flows: predict_on_data, project_to_geojson, to_eval_geojson, site_f1_score2 and stitch_crowns. The undefined tile_width and missing imports are the highest-risk problems to surface quickly.
  • Type hints: you added some type unions (str | Path). Consider importing from typing for Python <3.10 compatibility, or ensure project runs on 3.10+.

Summary of highest-priority fixes

  1. Define tile_width before use in site_f1_score2 (currently NameError).
  2. Add missing imports:
    • from pathlib import Path in modules where Path is used but not imported.
    • from tqdm import tqdm in outputs.py and any other file using tqdm.
  3. Ensure initialise_feats2 signature/parameter rename is compatible with all callers or revert to original name or accept both.
  4. Fix returning potentially undefined variables in site_f1_score2 except block (return defaults or re-raise).
  5. Run integration tests to catch Path vs str mismatches (e.g., GeoFeature(), box_filter(), gpd.read_file and any other function that may expect strings).

@CiSong10
Copy link
Copy Markdown
Contributor Author

Hi, I have merged up-to-date changes from master into this PR. I humanly reviewed the AI-generated reviews and there's not much to change. Conflicts resolved, etc. etc. Please check this PR for merging. Thanks.

@PatBall1
Copy link
Copy Markdown
Owner

The failing job is due to a mypy type-checking error at line 224 in detectree2/models/outputs.py:

error: Unsupported target for indexed assignment ("Collection[str]") [index]

Line 224 is attempting an indexed assignment on a type that does not support it—likely a type annotated as Collection rather than a mutable sequence like list. The problematic code is:

geofile["features"][i] = updated_feature

Or, in your context, you might do something like:

some_collection[index] = value

Solution:
If you need the collection to be writable by index, declare it as a list, not as a generic Collection. Update the type hint for features in GeoFile from:

"features": List[Feature]

and be sure that everywhere features is created and used, you keep it as a list (not as a generic collection or other immutable sequence).

If a variable is annotated as Collection[str], you cannot assign to indices (e.g., collection[0] = 'foo'). Change the variable's annotation to list[str] or List[str]:

from typing import List

features: List[str] = []
features[0] = "foo" # now valid

Go to the definition or location at line 224 and adjust the types as shown above. For more context, refer to the file at this ref: detectree2/models/outputs.py@24d614f06f4a8dff35d829bcee8b0945dcadb1bd.

Summary:

  • Change type annotations from Collection[...] to list[...] or List[...].
  • Ensure the actual variable is a list at runtime.
  • Do not index-assign into Collection types.

After making these changes, the mypy error will be resolved.

@PatBall1
Copy link
Copy Markdown
Owner

PatBall1 commented Nov 20, 2025

@CiSong10 please see guidance on programming style and pre-commit hooks: https://patball1.github.io/detectree2/contributing.html#programming-style

Code Formatting (isort/flake8) Failures:

  • The following files have "imports incorrectly sorted and/or formatted":

  • Solution: Run the following commands locally to fix import ordering:

    isort detectree2/models/train.py
    isort detectree2/models/predict.py

    Add and commit the changes.

  • Additionally, you can auto-fix many flake8 issues using:

    autoflake --in-place --remove-all-unused-imports detectree2/models/train.py
    autoflake --in-place --remove-all-unused-imports detectree2/models/predict.py

Once you’ve:

  • Corrected the import order and formatting in the code

Your CI action should pass both dependency installation and linting checks.

- Including predict_on_data(), project_to_geojson(), stitch_crowns(), etc.
- Simplify file path: `Path(file_roots[num[i]]` is a string of tile stem, its `.suffix` will return an empty string.
Simply tile name stem + .geojson is good.
- Update get_tree_dicts and get_filenmes function, accept Path as argument
- Explicitly annotate feature as Feature class
@CiSong10 CiSong10 force-pushed the refactor/tqdm-pathlib branch from 7717f3c to f7541bc Compare January 1, 2026 21:47
@CiSong10 CiSong10 force-pushed the refactor/tqdm-pathlib branch from f7541bc to e2d32fa Compare January 1, 2026 21:52
@CiSong10
Copy link
Copy Markdown
Contributor Author

CiSong10 commented Jan 2, 2026

@PatBall1 Happy New Year. I have fixed the import sorting issues and the Detectree2 CI / build check was passed successfully. The Detectree docker CI test / build was cancelled ("The job has exceeded the maximum execution time while awaiting a runner for 24h0m0s"). I don't know how to fix this. Please take a look. Thanks.

@PatBall1
Copy link
Copy Markdown
Owner

PatBall1 commented Jan 6, 2026

Happy new year @CiSong10 ! And thank you for your contributions. Apologies, but I am overloaded at the moment and am not able to review right now but will aim to do so in a week or so.

Copy link
Copy Markdown
Owner

@PatBall1 PatBall1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI assisted review

🎉 Overall Assessment

This is a well-executed modernization and cleanup PR with consistent improvements across the codebase. The changes follow best practices and improve code quality significantly.


✅ Strengths

1. Path Handling Modernization

  • Excellent migration from string concatenation (os.path.join, string operations) to pathlib.Path
  • Consistent use of / operator for path joining
  • Proper use of .stem, .suffix, .name instead of string manipulation

Examples:

# Before
filename = file. replace(".geojson", "")
filename_split = filename.split("_")

# After
file = Path(file)
filename_split = file.stem.split("_")

2. Type Hints

  • Good additions like str | Path for function parameters
  • Return type annotations (-> None)
  • Makes the API much clearer

3. Code Simplification

  • Removed unnecessary continue statements in else blocks
  • Consolidated variable declarations (total_tps = total_fps = total_fns = 0)
  • Removed redundant parentheses in arithmetic operations

4. Better User Experience

  • Added tqdm progress bars throughout - great for long-running operations
  • Formatted output with .format() for cleaner display
  • Commented out debug print statements instead of removing them

5. String Formatting

  • Good use of f-strings replacing older .format() and concatenation
  • More readable error messages

🔍 Issues & Concerns

1. ⚠️ CRITICAL: Unintentional Change

detectree2/models/train.py line 456:

-self.early_stop = False
+self.early_stop = True

This appears to be an accidental change that will fundamentally alter training behavior by enabling early stopping by default. This should likely be reverted unless it was intentional.

2. Breaking Change in File Iteration

detectree2/models/outputs.py lines 71-72:

-for file in entries:
-    if ". json" in file:
+for file in directory.iterdir():
+    if file.suffix != ".json":
+        continue

The logic changed from if ". json" in file to if file.suffix != ".json". This is more correct, but it's a behavioral change:

  • Old: matches files with ". json" anywhere in the name (e.g., "my. json.backup")
  • New: only matches files ending in ".json"

This is an improvement, but should be noted in release notes.

3. cv2.RETR_TREE → cv2.RETR_EXTERNAL Change

detectree2/models/outputs.py line 40:

-contours, _ = cv2.findContours(masked_arr, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
+contours, _ = cv2.findContours(masked_arr, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

This changes the contour retrieval mode:

  • RETR_TREE: retrieves all contours and creates a full hierarchy
  • RETR_EXTERNAL: retrieves only extreme outer contours

This could affect polygon detection results for masks with holes or nested contours. Was this intentional?

4. Error Message Improvement Needed

detectree2/models/outputs.py line 313:

-raise FileNotFoundError("No geojson files found in folder.")
+raise FileNotFoundError(f"No geojson files found in {crowns_path}.")

Good improvement, but the variable name is crowns_path not folder - nice consistency.

5. Removed Code Without Comment

detectree2/models/outputs.py lines 53, 142-143:

-[x, y, w, h] = cv2.boundingRect(masked_arr)
+# [x, y, w, h] = cv2.boundingRect(masked_arr)

This code is commented out but never used afterward. If it's dead code, it should be removed entirely. If it might be needed, a comment explaining why it's kept would help.

6. Inconsistent File Reading Pattern

Some places use file.open(), others use open(file). While both work, consistency would be better:

with file.open() as f:        # New pathlib style
with open(file_path) as f:    # Traditional style

7. Division by Zero Check

detectree2/models/evaluation.py lines 426-427:

+if union_area == 0:
+    continue

Good addition! But should this log a warning? Silently skipping might hide issues.


🐛 Potential Bugs

1. Path Conversion in cv2/rasterio calls

Several places now pass Path objects to functions expecting strings:

img = cv2.imread(file_name)  # file_name might be a Path object

While this usually works, it's not guaranteed in older versions. Consider:

img = cv2.imread(str(file_name))

Actually, I see you did this in some places (line 751-752 in train.py) but not others (line 1143).

2. Glob Pattern Change

detectree2/models/outputs.py line 71:

-entries = os.listdir(directory)
+for file in directory.iterdir():

The old code would have worked with string paths, but new code requires Path. Make sure all callers are updated.


💡 Suggestions

1. Consistent Path Handling

Add this pattern at the start of functions:

def function_name(directory: str | Path):
    directory = Path(directory)  # ✅ You're already doing this! 
    # ... rest of code

You're already doing this in most places - great!

2. Consider Context Managers

The rasterio changes are good:

with rasterio.open(tifpath) as data:
    epsg = data.crs.to_epsg()

But ensure all file operations use context managers (you're mostly doing this already).

3. Add Docstring Updates

Some docstrings still reference old parameter types. For example, some still say str when they now accept str | Path.

You updated some (lines 640-641 in evaluation.py) - consider updating all of them.


📝 Minor Nitpicks

  1. Typo Fix: Line 383 in outputs.py - "clearn_crowns" → "clean_crowns" ✅ (good catch!)

  2. Import Cleanup: Removed unused glob import ✅

  3. Removed unused os import in predict.py ✅

  4. Formatting: Some trailing spaces might be removed (line 238 in outputs.py shows extra spaces)


🎯 Recommendations

Must Address:

  1. Revert or explain the self.early_stop = True change
  2. Ensure cv2.RETR_EXTERNAL change is intentional and document it
  3. Make Path→str conversion consistent for cv2.imread calls

Should Address:

  1. Document the breaking change in file filtering behavior
  2. Consider logging when union_area == 0 instead of silent continue
  3. Remove fully dead code (commented bounding rect calculation)

Nice to Have:

  1. Update all docstrings to reflect str | Path types
  2. Add release notes documenting behavioral changes
  3. Correct flake8 style formatting

✨ Summary

This is a high-quality refactoring PR that modernizes the codebase significantly. The main concerns are:

  1. One likely accidental change (early_stop = True)
  2. One behavioral change that needs verification (RETR_EXTERNAL)
  3. Some inconsistencies in Path→str conversion

Once those items are addressed, this would be an excellent merge! The code is much more maintainable and pythonic after these changes.

**Recommendation: Request changes for the critical items, then approve. ** 👍

@CiSong10
Copy link
Copy Markdown
Contributor Author

CiSong10 commented Jan 7, 2026

Thank you for the AI-generated review. I’ve made the requested changes and added my responses to the comments below. Please let me know if it's ok or I need to make further changes.

Response to "Issues & Concerns"

  1. My bad. Corrected it.
  2. No action required.
  3. cv2.RETR_TREEcv2.RETR_EXTERNAL. I think RETR_EXTERNAL is better because it can avoid holes or nested tree canopy contours, which are not ideal output. It should also be slightly more time-efficient than RETR_TREE. That said, I’m happy to switch back if you prefer.
  4. No action required
  5. This line is not used in the function. I commented it out in case it’s needed in the future.
  6. Agree. I found this inconsistency in the clean_predictions() function and refactored it using pathlib.
  7. I chose not to log a warning to avoid overly noisy output, but I can add one if you think it’s useful.

Response to "Potential Bugs":

  1. Apologies for my oversight. Fixed it.
  2. No action required.

Response to "Suggestions"

  1. No action required
  2. No action required
  3. Made a few updates to the docstring.

Copy link
Copy Markdown
Owner

@PatBall1 PatBall1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview of Changes

The PR modernizes the codebase with better practices across multiple modules:

Positive Changes

  1. Path handling modernization

    • Converts string-based path operations to pathlib.Path throughout
    • More robust and cross-platform compatible
    • Examples: Path(directory) / filename instead of directory + "/" + file
  2. Progress indicators

    • Adds tqdm progress bars for long-running operations
    • Much better user experience during predictions, stitching, and evaluation
  3. Code simplification

    • Removes unnecessary variables and intermediate steps
    • Uses f-strings consistently instead of .format() in some places
    • List comprehensions instead of manual loops
  4. Bug fixes

    • Adds check for union_area == 0 to prevent division by zero (line 426 in evaluation.py)
    • Changes cv2.RETR_TREE to cv2.RETR_EXTERNAL in contour finding (outputs. py) - better for polygon extraction
  5. Better error messages

    • More informative error in stitch_crowns: f"No geojson files found in {crowns_path}. "

Potential Issues ⚠️

  1. Mixed formatting styles (lines 615, 712 in evaluation.py)

    • Some places use "{:. 2f}".format(prec), others use f-strings
    • Should be consistent (prefer f-strings: f"{prec:.2f}")
  2. Type hints inconsistency

    • Good: str | Path type hints added
    • Missing: Return type hints not always added (some functions still lack them)
  3. Minor logic issue in outputs.py (line 633)

    if file. suffix != "json":

    Should be:

    if file.suffix != ".json":  # Missing the dot! 
  4. Debugging comments

    • Many print statements commented out but left in code
    • Should either remove or use proper logging
  5. Indentation changes in evaluation.py

    • Lines 672-680 show significant indentation fixes, suggesting previous code had incorrect nesting
    • Good fix, but worth double-checking the logic flow

Specific Concerns

outputs.py line 40: Change from cv2.RETR_TREE to cv2.RETR_EXTERNAL

  • RETR_TREE retrieves all contours and creates full hierarchy
  • RETR_EXTERNAL only retrieves extreme outer contours
  • This could change behavior if there are nested polygons, but is likely correct for crown polygons

train.py line 1260:

return str(latest_model_file)

This returns just the filename, not the full path (removed os.path.join). This looks like a bug - should be:

return str(Path(output_dir) / latest_model_file)

Recommendations

  1. Fix the bug in train.py line 1260 - critical issue
  2. Fix missing dot in outputs.py line 633 - will cause filtering to fail
  3. Standardize on f-strings throughout instead of .format()
  4. Remove commented print statements or convert to proper logging
  5. Add more type hints for return values
  6. Test the contour retrieval change to ensure polygon extraction still works correctly

Summary

This is generally a good modernization PR that improves code quality, but has 2 bugs that need fixing before merging:

  1. Path construction bug in get_latest_model_path
  2. Missing dot in suffix comparison in clean_predictions

The other changes look solid and will improve maintainability.

pred_fold = Path(directory)

for file in pred_fold.iterdir():
if file.suffix != "json":
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if file.suffix != "json":
if file.suffix != ".json":


# Return the full path to the latest model file
return os.path.join(output_dir, latest_model_file)
return str(latest_model_file)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return str(latest_model_file)
return str(Path(output_dir) / latest_model_file)

@PatBall1
Copy link
Copy Markdown
Owner

PatBall1 commented Jan 15, 2026

Comprehensive Code Review - PR #206

Critical Bugs 🐛

  1. train. py line 1260 - Path construction bug

    return str(latest_model_file)

    Should be:

    return str(Path(output_dir) / latest_model_file)
  2. outputs. py line 633 - Missing dot in suffix comparison

    if file. suffix != "json":

    Should be:

    if file.suffix != ".json":

Flake8 Issues to Fix

evaluation.py

# Line 426: Remove trailing whitespace
if union_area == 0:
    continue  # Remove space after 'continue'

# Lines 675-677, 684-686, 694:  Fix indentation (E127)
all_test_feats = initialise_feats2(tile_directory, file. name,
                                   lidar_img, area_threshold,
                                   conf_threshold, border_filter,
                                   tile_width, tile_origin, epsg)

pred_file = "Prediction_" + file.stem + "_eval.geojson"
all_pred_feats = initialise_feats2(pred_directory, pred_file,
                                   lidar_img, area_threshold,
                                   conf_threshold, border_filter,
                                   tile_width, tile_origin, epsg)

tps, fps, fns = positives_test(all_test_feats, all_pred_feats,
                                IoU_threshold, min_height, max_height)

outputs.py

# Line 8: Remove unused import
# Delete:  import os

# Line 94: Fix spacing (E272)
with file.open() as prediction_file:  # Remove extra space before 'as'

# Lines 220, 225, 333, 618:  Remove trailing whitespace and whitespace on blank lines

# Line 413: Remove extra blank line (E303)

# Lines 627-628: Fix indentation
rescaled_coords = [
    [crown_coords[i], crown_coords[i + 1]]
    for i in range(0, len(crown_coords), 2)
]

predict.py

# Line 26: Fix spacing around equals (E252)
eval:  bool = False,  # Add spaces around =

train.py

# Line 381: Fix comment formatting (E265) and line length (E501)
# Split the long comment into multiple lines

# Line 399: Fix indentation (E126)

# Line 410: Fix comment spacing (E262)
# Add space after #

# Lines 549, 552:  Break long lines (E501)

# Lines 1080-1082, 1205:  Fix line breaks and indentation (W504, E128)

tiling.py

Multiple issues - many pre-existing to this PR. Focus on ones introduced by changes:

# Lines 676, 679, 683, 690, 695, 699, 701, 705, 713, 719, 723: 
# Remove whitespace from blank lines (W293)

# Line 727: Remove trailing whitespace (W291)

# Line 895, 900: Remove whitespace from blank lines (W293)

Suggested Fixed Code Snippets

outputs.py - Complete fixes

# Line 8: Remove os import
from pathlib import Path  # Keep this
# import os  # DELETE THIS LINE

# Line 94:
with file.open() as prediction_file:  # Fix spacing

# Line 220:
                "coordinates": [moved_coords],
            },  # Remove trailing space
        }

# Line 225:
                                        # Remove whitespace from blank line

# Line 627-628:
            rescaled_coords = [
                [crown_coords[i], crown_coords[i + 1]]
                for i in range(0, len(crown_coords), 2)
            ]

predict.py - Line 26

def predict_on_data(
    directory: str | Path = "./",
    out_folder: str | Path = "predictions",
    predictor=DefaultPredictor,
    eval: bool = False,  # FIX: Add spaces around =
    save:  bool = True,
    num_predictions=0,
) -> None:

evaluation.py - Line 426

if union_area == 0:
    continue  # FIX: Remove trailing whitespace

evaluation.py - Lines 674-687 (indentation fix)

all_test_feats = initialise_feats2(
    tile_directory, file.name,
    lidar_img, area_threshold,
    conf_threshold, border_filter,
    tile_width, tile_origin, epsg)

new_heights = get_heights(all_test_feats, min_height, max_height)
heights. extend(new_heights)

pred_file = "Prediction_" + file.stem + "_eval.geojson"
all_pred_feats = initialise_feats2(
    pred_directory, pred_file,
    lidar_img, area_threshold,
    conf_threshold, border_filter,
    tile_width, tile_origin, epsg)

train.py - Line 1260 (CRITICAL BUG)

def get_latest_model_path(output_dir: str) -> str:
    """Get the path to the latest model file in the output directory."""
    # Regular expression to match model files with the pattern "model_X.pth"
    model_pattern = re.compile(r"model_(\d+)\.pth")

    # Find all files that match the pattern and extract their indices
    model_files = []
    output_path = Path(output_dir)  # ADD THIS
    for f in output_path.iterdir():
        match = model_pattern.match(f.name)
        if match:
            model_files.append((f, int(match.group(1))))

    if not model_files:
        raise FileNotFoundError(f"No model files found in {output_dir}")

    # Sort by index and get the latest model file
    latest_model_file = max(model_files, key=lambda x:  x[1])[0]

    # Return the full path to the latest model file
    return str(latest_model_file)  # FIX: latest_model_file is already a Path object from iterdir()

Summary of Required Changes

File Critical Bugs Flake8 Issues
train.py 1 ~10
outputs.py 1 ~10
evaluation.py 0 ~8
predict.py 0 2
tiling. py 0 ~44 (mostly pre-existing)

Priority Action Items

MUST FIX (blocking):

  1. outputs.py:633 - Add missing dot: .suffix != ".json"
  2. train.py:1260 - Fix path construction
  3. outputs.py:8 - Remove unused os import
  4. predict.py:26 - Fix spacing: eval: bool = False
  5. evaluation.py:426 - Remove trailing whitespace
  6. ✅ All E127 indentation issues in evaluation.py and outputs.py
  7. ✅ Remove all W293 (blank line whitespace) and W291 (trailing whitespace)

SHOULD FIX (quality):

  • Fix remaining E501 line-too-long issues (though many are pre-existing)
  • Address comment formatting (E261, E262, E265)
  • The C901 complexity warnings are pre-existing and can be addressed separately

@CiSong10 CiSong10 requested a review from PatBall1 January 15, 2026 16:09
@CiSong10
Copy link
Copy Markdown
Contributor Author

CiSong10 commented Jan 15, 2026

Hi, I have made edits according to the latest change request. The two "critical bugs" are fixed. Other format issues are fixed as well, although some of them are pre-existing, not related to my PR, and I believe should be addressed separately.

@CiSong10 CiSong10 force-pushed the refactor/tqdm-pathlib branch from 96e1354 to 5160ded Compare January 15, 2026 16:46
@PatBall1
Copy link
Copy Markdown
Owner

@CiSong10 thank you! I really appreciate the help. Sorry for being so demanding! I'll review shortly.

@PatBall1 PatBall1 merged commit bf7c083 into PatBall1:master Jan 15, 2026
1 of 2 checks passed
@PatBall1 PatBall1 mentioned this pull request Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants