Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,7 @@
## 2026-05-18 - Jaccard Similarity Optimization via Set Arithmetic
**Learning:** In retrieval loops calculating Jaccard similarity (e.g. RAG), explicitly building a union set `A.union(B)` is expensive due to memory allocation and population.
**Action:** Use the inclusion-exclusion principle $|A \cup B| = |A| + |B| - |A \cap B|$ to calculate union size in O(1) arithmetic time after calculating the intersection. Pre-calculate $|B|$ (token count) to further reduce overhead. Use `isdisjoint()` for fast early-exit.

## 2025-02-18 - SQLAlchemy Multiple Aggregations vs Standard GROUP BY
**Learning:** When calculating aggregates over categorical columns (e.g., status counts, confirmation types) in SQLAlchemy, using standard `GROUP BY` queries (e.g., `db.query(Model.type, func.count(Model.id)).group_by(Model.type)`) is measurably faster and scales better than stringing together multiple `func.sum(case(...))` statements, reducing database parse time and scan overhead.
**Action:** When auditing ORM queries for performance, replace multiple conditional aggregations (`func.sum(case)`) with standard `GROUP BY` patterns combined with Python-side extraction where categories are cleanly divisible.
18 changes: 12 additions & 6 deletions backend/closure_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,14 +139,20 @@ def check_and_finalize_closure(grievance_id: int, db: Session) -> dict:
).scalar()

# Get all confirmation counts in a single query instead of multiple round-trips
from sqlalchemy import case
# Optimized: Standard GROUP BY is measurably faster than multiple func.sum(case(...)) aggregations
stats = db.query(
func.sum(case((ClosureConfirmation.confirmation_type == 'confirmed', 1), else_=0)).label('confirmed'),
func.sum(case((ClosureConfirmation.confirmation_type == 'disputed', 1), else_=0)).label('disputed')
).filter(ClosureConfirmation.grievance_id == grievance_id).first()
ClosureConfirmation.confirmation_type,
func.count(ClosureConfirmation.id)
).filter(ClosureConfirmation.grievance_id == grievance_id).group_by(ClosureConfirmation.confirmation_type).all()

confirmations_count = stats.confirmed or 0
disputes_count = stats.disputed or 0
confirmations_count = 0
disputes_count = 0

for ctype, count in stats:
if ctype == 'confirmed':
confirmations_count = count
elif ctype == 'disputed':
disputes_count = count
Comment on lines +151 to +155
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle potential enum values for confirmation_type.

The code directly compares ctype against string literals, but other files in this PR defensively handle enum values (see backend/routers/admin.py lines 60-61 and backend/routers/utility.py lines 73-74). For consistency and correctness, apply the same pattern here.

🔧 Proposed fix to handle enum values
 for ctype, count in stats:
-    if ctype == 'confirmed':
+    ctype_val = ctype.value if hasattr(ctype, 'value') else ctype
+    if ctype_val == 'confirmed':
         confirmations_count = count
-    elif ctype == 'disputed':
+    elif ctype_val == 'disputed':
         disputes_count = count
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for ctype, count in stats:
if ctype == 'confirmed':
confirmations_count = count
elif ctype == 'disputed':
disputes_count = count
for ctype, count in stats:
ctype_val = ctype.value if hasattr(ctype, 'value') else ctype
if ctype_val == 'confirmed':
confirmations_count = count
elif ctype_val == 'disputed':
disputes_count = count
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/closure_service.py` around lines 151 - 155, The loop over stats
compares ctype to string literals but doesn't handle enum values; inside the for
loop (where confirmations_count and disputes_count are set) coerce ctype to its
string value before comparisons—e.g., replace direct comparisons with something
like normalized_type = getattr(ctype, "value", ctype) or check isinstance(ctype,
Enum) and use ctype.value—then compare normalized_type == "confirmed" /
"disputed" and assign confirmations_count and disputes_count accordingly.


required_confirmations = max(1, int(total_followers * ClosureService.CONFIRMATION_THRESHOLD))

Expand Down
31 changes: 22 additions & 9 deletions backend/routers/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,30 @@ def get_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):
def get_system_stats(db: Session = Depends(get_db)):
"""
Get system-wide user statistics.
Optimized: Uses a single aggregate query to calculate multiple metrics simultaneously,
reducing database round-trips and scan overhead.
Optimized: Standard GROUP BY is measurably faster than multiple func.sum(case(...)) aggregations.
"""
stats = db.query(
func.count(User.id).label("total"),
func.sum(case((User.role == UserRole.ADMIN, 1), else_=0)).label("admins"),
func.sum(case((User.is_active.is_(True), 1), else_=0)).label("active")
).first()
User.role,
User.is_active,
func.count(User.id)
).group_by(User.role, User.is_active).all()

total_users = 0
admin_count = 0
active_users = 0

for role, is_active, count in stats:
total_users += count

role_val = role.value if hasattr(role, 'value') else role
if role_val == 'admin':
admin_count += count

if is_active:
active_users += count

return {
"total_users": stats.total or 0,
"admin_count": int(stats.admins or 0),
"active_users": int(stats.active or 0),
"total_users": total_users,
"admin_count": admin_count,
"active_users": active_users,
}
40 changes: 28 additions & 12 deletions backend/routers/field_officer.py
Original file line number Diff line number Diff line change
Expand Up @@ -437,22 +437,38 @@ def get_visit_statistics(db: Session = Depends(get_db)):
if cached_json:
return Response(content=cached_json, media_type="application/json")

# Optimized: Use a single aggregate query to fetch multiple statistics in one database roundtrip
stats = db.query(
func.count(FieldOfficerVisit.id).label('total'),
func.sum(case((FieldOfficerVisit.verified_at.isnot(None), 1), else_=0)).label('verified'),
func.sum(case((FieldOfficerVisit.within_geofence == True, 1), else_=0)).label('within_geofence'),
func.sum(case((FieldOfficerVisit.within_geofence == False, 1), else_=0)).label('outside_geofence'),
# Optimized: Standard GROUP BY is measurably faster than multiple func.sum(case(...)) aggregations.
groups = db.query(
FieldOfficerVisit.verified_at.isnot(None).label('is_verified'),
FieldOfficerVisit.within_geofence,
func.count(FieldOfficerVisit.id)
).group_by(
FieldOfficerVisit.verified_at.isnot(None),
FieldOfficerVisit.within_geofence
).all()

# Second query for global aggregates
global_stats = db.query(
func.count(func.distinct(FieldOfficerVisit.officer_email)).label('unique_officers'),
func.avg(FieldOfficerVisit.distance_from_site).label('avg_distance')
).first()
Comment on lines +450 to 454

total_visits = stats.total or 0
verified_visits = int(stats.verified or 0)
within_geofence_count = int(stats.within_geofence or 0)
outside_geofence_count = int(stats.outside_geofence or 0)
unique_officers = stats.unique_officers or 0
average_distance = stats.avg_distance
total_visits = 0
verified_visits = 0
within_geofence_count = 0
outside_geofence_count = 0

for is_ver, in_geo, count in groups:
total_visits += count
if is_ver:
verified_visits += count
if in_geo is True:
within_geofence_count += count
elif in_geo is False:
outside_geofence_count += count

unique_officers = global_stats.unique_officers or 0
average_distance = global_stats.avg_distance

# Round to 2 decimals if not None
if average_distance is not None:
Expand Down
18 changes: 12 additions & 6 deletions backend/routers/grievances.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,14 +435,20 @@ def get_closure_status(
).scalar()

# Get all confirmation counts in a single query instead of multiple round-trips
from sqlalchemy import case
# Optimized: Standard GROUP BY is measurably faster than multiple func.sum(case(...)) aggregations
stats = db.query(
func.sum(case((ClosureConfirmation.confirmation_type == 'confirmed', 1), else_=0)).label('confirmed'),
func.sum(case((ClosureConfirmation.confirmation_type == 'disputed', 1), else_=0)).label('disputed')
).filter(ClosureConfirmation.grievance_id == grievance_id).first()
ClosureConfirmation.confirmation_type,
func.count(ClosureConfirmation.id)
).filter(ClosureConfirmation.grievance_id == grievance_id).group_by(ClosureConfirmation.confirmation_type).all()

confirmations_count = 0
disputes_count = 0

confirmations_count = stats.confirmed or 0
disputes_count = stats.disputed or 0
for ctype, count in stats:
if ctype == 'confirmed':
confirmations_count = count
elif ctype == 'disputed':
disputes_count = count
Comment on lines +447 to +451
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle potential enum values for confirmation_type.

The code directly compares ctype against string literals without checking if it's an enum. This is the same issue flagged in backend/closure_service.py lines 151-155. For consistency with backend/routers/admin.py and backend/routers/utility.py, apply defensive enum handling.

🔧 Proposed fix to handle enum values
 for ctype, count in stats:
-    if ctype == 'confirmed':
+    ctype_val = ctype.value if hasattr(ctype, 'value') else ctype
+    if ctype_val == 'confirmed':
         confirmations_count = count
-    elif ctype == 'disputed':
+    elif ctype_val == 'disputed':
         disputes_count = count
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for ctype, count in stats:
if ctype == 'confirmed':
confirmations_count = count
elif ctype == 'disputed':
disputes_count = count
for ctype, count in stats:
ctype_val = ctype.value if hasattr(ctype, 'value') else ctype
if ctype_val == 'confirmed':
confirmations_count = count
elif ctype_val == 'disputed':
disputes_count = count
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routers/grievances.py` around lines 447 - 451, The loop over stats in
routers/grievances.py assumes ctype is a string; make it resilient to enum
values by normalizing ctype before comparisons (e.g. if ctype has a .value or
.name use that, or cast to str) and then compare against 'confirmed' and
'disputed'; update the block that sets confirmations_count and disputes_count to
derive a normalized_key from ctype (handling enum members and plain strings) and
use normalized_key == 'confirmed' / 'disputed' to assign confirmations_count and
disputes_count.


required_confirmations = max(1, int(total_followers * ClosureService.CONFIRMATION_THRESHOLD))

Expand Down
22 changes: 13 additions & 9 deletions backend/routers/utility.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,25 +53,29 @@ def get_stats(db: Session = Depends(get_db)):
if cached_stats:
return Response(content=cached_stats, media_type="application/json")

# Optimized: Single aggregate query for both category breakdowns and system-wide totals
# Optimized: Standard GROUP BY is measurably faster than multiple func.sum(case(...)) aggregations
# This eliminates a redundant database roundtrip
cat_counts = db.query(
cat_status_counts = db.query(
Issue.category,
func.count(Issue.id).label("total"),
func.sum(case((Issue.status.in_(['resolved', 'verified']), 1), else_=0)).label("resolved")
).group_by(Issue.category).all()
Issue.status,
func.count(Issue.id)
).group_by(Issue.category, Issue.status).all()

total = 0
resolved = 0
issues_by_category = {}

for cat, cat_total, cat_resolved in cat_counts:
for cat, status, count in cat_status_counts:
# Sum up system-wide totals
total += cat_total or 0
resolved += int(cat_resolved or 0)
total += count

# Handle resolving logic based on string or enum value
status_val = status.value if hasattr(status, 'value') else status
if status_val in ['resolved', 'verified']:
resolved += count

# Build category breakdown
issues_by_category[cat] = cat_total or 0
issues_by_category[cat] = issues_by_category.get(cat, 0) + count

# Pending is everything else
pending = total - resolved
Expand Down
96 changes: 0 additions & 96 deletions backend/tests/benchmark_closure_status.py

This file was deleted.

Loading