Dear developer,
Thank you for the updates of the clean_crowns function and the post_clean function. I was excited to test it but unfortunately, I ran into a significant over-cleaning issue I'd like to flag.
The screenshot shows the result on one of my datasets. Yellow polygons are the original uncleaned crowns, red are the crowns after clean_crowns, and blue are crowns added back by post_clean. As you see, there are still huge gaps even after post_clean converges.
Root cause
I believe this is due to the "transitivity problem" of the union-find cluster strategy in clean_crowns. Consider three crowns A, B, C arranged in a line:
- A overlaps B, B overlaps C, while A and C does not overlap.
- A and B are merged, B and C are merged. A, B, C will be in one cluster.
- Say A has the highest confidence score, C will be removed even though it does not overlap A at all.
Proposed fix
Rather than keeping one crown per cluster, I'd suggest a greedy NMS (non-maximum suppression) approach that resolves conflicts locally and directly. Crowns are sorted by confidence score descending; a crown is kept if it doesn't directly conflict with any already-kept crown, and only its direct conflicts are suppressed.
def clean_crowns():
# 1. Filter
...
# 2. Spatial join for candidate pairs
...
# 3. Build a conflict graph: edges between crowns that truly conflict:
# (high IoU OR containment).
from collections import defaultdict
conflicts = defaultdict(set) # crown_idx -> set of conflicting crown_idxs
for _, row in tqdm(
join.iterrows(),
...
):
i = row.name
j = row["index_right"]
if i >= j:
continue # Process each pair once
geom_i = crowns.at[i, "geometry"]
geom_j = crowns.at[j, "geometry"]
intersection_area = geom_i.intersection(geom_j).area
# IoU check
iou_val = calc_iou(geom_i, geom_j)
is_conflict = iou_val > iou_threshold
# Containment check
if not is_conflict and containment_threshold is not None:
min_area = min(geom_i.area, geom_j.area)
if min_area > 0 and (intersection_area / min_area) > containment_threshold:
is_conflict = True
if is_conflict:
conflicts[i].add(j)
conflicts[j].add(i)
# 4. Greedy NMS: sort by confidence descending,
# keep a crown if it doesn't conflict with any already-kept crown.
sorted_indices = crowns[field].sort_values(ascending=False).index.tolist()
kept = set()
removed = set()
for i in sorted_indices:
if i in removed:
continue
kept.add(i)
# Only remove direct conflicts, not transitive ones
for j in conflicts[i]:
removed.add(j)
cleaned_crowns = crowns.loc[sorted(kept)].copy()
return gpd.GeoDataFrame(cleaned_crowns, crs=crowns.crs).reset_index(drop=True)
The result is shown below. I think this gives a much better balance between retaining high-confidence predictions and maintaining coverage of the tree canopy. If doing it right, it isn't even necessary to run post_clean, or only one interation of post_clean is enough.
In #192, I also explored a slightly different strategy for the containment case: comparing the average confidence score of the smaller (contained) crown against the larger (container) crown before deciding which to keep. This produces slightly different results from the greedy NMS approach above, while a single-shot NMS is easier and more straightforward. I'd love to hear your thoughts on this.
Thank you for you attention to this issue.
Dear developer,
Thank you for the updates of the
clean_crownsfunction and thepost_cleanfunction. I was excited to test it but unfortunately, I ran into a significant over-cleaning issue I'd like to flag.The screenshot shows the result on one of my datasets. Yellow polygons are the original uncleaned crowns, red are the crowns after
clean_crowns, and blue are crowns added back bypost_clean. As you see, there are still huge gaps even afterpost_cleanconverges.Root cause
I believe this is due to the "transitivity problem" of the union-find cluster strategy in
clean_crowns. Consider three crowns A, B, C arranged in a line:Proposed fix
Rather than keeping one crown per cluster, I'd suggest a greedy NMS (non-maximum suppression) approach that resolves conflicts locally and directly. Crowns are sorted by confidence score descending; a crown is kept if it doesn't directly conflict with any already-kept crown, and only its direct conflicts are suppressed.
The result is shown below. I think this gives a much better balance between retaining high-confidence predictions and maintaining coverage of the tree canopy. If doing it right, it isn't even necessary to run
post_clean, or only one interation ofpost_cleanis enough.In #192, I also explored a slightly different strategy for the containment case: comparing the average confidence score of the smaller (contained) crown against the larger (container) crown before deciding which to keep. This produces slightly different results from the greedy NMS approach above, while a single-shot NMS is easier and more straightforward. I'd love to hear your thoughts on this.
Thank you for you attention to this issue.