Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,11 @@
## 2026-02-10 - Group-By for Multi-Count Statistics
**Learning:** Executing multiple `count()` queries with different filters (e.g., for different statuses) causes redundant database scans and network round-trips.
**Action:** Use a single SQL `GROUP BY` query to fetch counts for all categories/statuses at once, then process the results in Python.

## 2026-02-11 - O(1) Blockchain Verification
**Learning:** Verifying the integrity of a blockchain-style chain by querying the database for the previous record's hash on every check is inefficient and adds unnecessary latency.
**Action:** Store the `previous_integrity_hash` directly in the record during creation. This enables O(1) single-record integrity checks without additional database lookups. Use a thread-safe cache to keep the most recent hash in memory to further optimize the creation path.

## 2026-02-11 - Multi-Metric Aggregate Queries
**Learning:** Executing multiple separate `count()` queries to gather system statistics results in multiple database round-trips and redundant table scans.
**Action:** Use a single SQLAlchemy query with `func.count()` and `func.sum(case(...))` to calculate all metrics in one go. This reduces network overhead and allows the database to perform calculations in a single pass.
3 changes: 3 additions & 0 deletions backend/init_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,9 @@ def index_exists(table, index_name):
if not index_exists("issues", "ix_issues_user_email"):
conn.execute(text("CREATE INDEX IF NOT EXISTS ix_issues_user_email ON issues (user_email)"))

if not index_exists("issues", "ix_issues_previous_integrity_hash"):
conn.execute(text("CREATE INDEX IF NOT EXISTS ix_issues_previous_integrity_hash ON issues (previous_integrity_hash)"))

# Voice and Language Support Columns (Issue #291)
if not column_exists("issues", "submission_type"):
conn.execute(text("ALTER TABLE issues ADD COLUMN submission_type VARCHAR DEFAULT 'text'"))
Expand Down
1 change: 1 addition & 0 deletions backend/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ class Issue(Base):
location = Column(String, nullable=True)
action_plan = Column(JSON, nullable=True)
integrity_hash = Column(String, nullable=True) # Blockchain integrity seal
previous_integrity_hash = Column(String, nullable=True, index=True) # Linked hash for O(1) verification

# Voice and Language Support (Issue #291)
submission_type = Column(String, default="text") # 'text', 'voice'
Expand Down
19 changes: 14 additions & 5 deletions backend/routers/admin.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.orm import Session
from sqlalchemy import func, case
from typing import List

from backend.database import get_db
Expand All @@ -20,11 +21,19 @@ def get_users(skip: int = 0, limit: int = 100, db: Session = Depends(get_db)):

@router.get("/stats")
def get_system_stats(db: Session = Depends(get_db)):
total_users = db.query(User).count()
admin_users = db.query(User).filter(User.role == UserRole.ADMIN).count()
"""
Get system-wide user statistics.
Optimized: Uses a single aggregate query to calculate multiple metrics simultaneously,
reducing database round-trips and scan overhead.
"""
stats = db.query(
func.count(User.id).label("total"),
func.sum(case((User.role == UserRole.ADMIN, 1), else_=0)).label("admins"),
func.sum(case((User.is_active.is_(True), 1), else_=0)).label("active")
).first()

return {
"total_users": total_users,
"admin_count": admin_users,
"active_users": db.query(User).filter(User.is_active == True).count(),
"total_users": stats.total or 0,
"admin_count": int(stats.admins or 0),
"active_users": int(stats.active or 0),
}
48 changes: 32 additions & 16 deletions backend/routers/issues.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
send_status_notification
)
from backend.spatial_utils import get_bounding_box, find_nearby_issues
from backend.cache import recent_issues_cache, nearby_issues_cache
from backend.cache import recent_issues_cache, nearby_issues_cache, blockchain_last_hash_cache
from backend.hf_api_service import verify_resolution_vqa
from backend.dependencies import get_http_client
from backend.rag_service import rag_service
Expand Down Expand Up @@ -172,16 +172,23 @@ async def create_issue(
# Save to DB only if no nearby issues found or deduplication failed
if deduplication_info is None or not deduplication_info.has_nearby_issues:
# Blockchain feature: calculate integrity hash for the report
# Optimization: Fetch only the last hash to maintain the chain with minimal overhead
prev_issue = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
)
prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
# Performance Boost: Use thread-safe cache to eliminate DB query for last hash
prev_hash = blockchain_last_hash_cache.get("last_hash")
if prev_hash is None:
# Cache miss: Fetch only the last hash from DB
prev_issue = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
Comment on lines +175 to +180
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get→compute→set sequence for "last_hash" isn’t atomic. Under concurrent create_issue requests, multiple workers/threads can read the same prev_hash and produce a forked chain (and with multiple Uvicorn/Gunicorn workers, each process has its own cache). Consider serializing hash generation with a lock or deriving prev_hash from the database within the same transaction to preserve a single chain.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented atomic hash calculation and record persistence by using a single database transaction with with_for_update() on the tail record. This serializes concurrent creations and prevents chain forks, as requested.

)
prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
blockchain_last_hash_cache.set(data=prev_hash, key="last_hash")

# Simple but effective SHA-256 chaining
# Simple but effective SHA-256 chaining
hash_content = f"{description}|{category}|{prev_hash}"
integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()

# Update cache for next report
blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Do not update blockchain_last_hash_cache before the issue is committed; a failed DB save leaves a phantom tail hash and corrupts subsequent chain links.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/issues.py, line 190:

<comment>Do not update `blockchain_last_hash_cache` before the issue is committed; a failed DB save leaves a phantom tail hash and corrupts subsequent chain links.</comment>

<file context>
@@ -172,16 +172,23 @@ async def create_issue(
             integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()
 
+            # Update cache for next report
+            blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")
+
             # RAG Retrieval (New)
</file context>
Fix with Cubic

Comment on lines +175 to +190
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Race condition breaks blockchain chain integrity.

The read-compute-write sequence for blockchain_last_hash_cache is not atomic. Per backend/cache.py (lines 9-65), ThreadSafeCache only locks individual get() and set() operations, not the entire sequence.

Scenario:

  1. Request A calls get("last_hash") → returns hash_X
  2. Request B calls get("last_hash") → returns hash_X (same value)
  3. Request A computes integrity_hash_A from hash_X, calls set("last_hash", integrity_hash_A)
  4. Request B computes integrity_hash_B from hash_X, calls set("last_hash", integrity_hash_B)

Both issues now have previous_integrity_hash = hash_X, creating a fork in the chain and making subsequent verification unreliable.

🔒 Recommended fix: Use database-level sequencing

The simplest reliable fix is to query the previous hash from the database within the same transaction that inserts the new issue, leveraging DB-level locking:

     try:
         # Save to DB only if no nearby issues found or deduplication failed
         if deduplication_info is None or not deduplication_info.has_nearby_issues:
-            # Blockchain feature: calculate integrity hash for the report
-            # Performance Boost: Use thread-safe cache to eliminate DB query for last hash
-            prev_hash = blockchain_last_hash_cache.get("last_hash")
-            if prev_hash is None:
-                # Cache miss: Fetch only the last hash from DB
-                prev_issue = await run_in_threadpool(
-                    lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
-                )
-                prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
-                blockchain_last_hash_cache.set(data=prev_hash, key="last_hash")
-
-            # Simple but effective SHA-256 chaining
-            hash_content = f"{description}|{category}|{prev_hash}"
-            integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()
-
-            # Update cache for next report
-            blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")
+            # Blockchain feature: calculate integrity hash for the report
+            # Query previous hash within the same transaction for consistency
+            prev_issue = await run_in_threadpool(
+                lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).with_for_update().first()
+            )
+            prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
+
+            # Simple but effective SHA-256 chaining
+            hash_content = f"{description}|{category}|{prev_hash}"
+            integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()

Using with_for_update() serializes concurrent inserts, ensuring each new issue gets the correct previous hash. The cache can still be used for read-only verification scenarios.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Performance Boost: Use thread-safe cache to eliminate DB query for last hash
prev_hash = blockchain_last_hash_cache.get("last_hash")
if prev_hash is None:
# Cache miss: Fetch only the last hash from DB
prev_issue = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
)
prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
blockchain_last_hash_cache.set(data=prev_hash, key="last_hash")
# Simple but effective SHA-256 chaining
# Simple but effective SHA-256 chaining
hash_content = f"{description}|{category}|{prev_hash}"
integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()
# Update cache for next report
blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")
# Blockchain feature: calculate integrity hash for the report
# Query previous hash within the same transaction for consistency
prev_issue = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).with_for_update().first()
)
prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
# Simple but effective SHA-256 chaining
hash_content = f"{description}|{category}|{prev_hash}"
integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/issues.py` around lines 175 - 190, The read-compute-write on
blockchain_last_hash_cache (prev_hash = blockchain_last_hash_cache.get(...);
compute integrity_hash; blockchain_last_hash_cache.set(...)) is not atomic and
allows concurrent forks; instead, move the "read previous hash" into the same DB
transaction that inserts the new Issue using a SELECT ... FOR UPDATE (e.g. query
= db.query(Issue.integrity_hash).order_by(Issue.id.desc()).with_for_update()) so
the previous hash is locked, compute integrity_hash from that locked value,
insert the new Issue inside that transaction, then update
blockchain_last_hash_cache.set(...) after commit; use the same
run_in_threadpool/transaction helper you already use to perform the locked read
+ insert atomically.


Comment on lines +189 to +191
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockchain_last_hash_cache is updated to the newly computed integrity_hash before the issue is committed. If save_issue_db/db.commit fails, the cache will still advance, and subsequent issues may chain off a hash that was never persisted. Update/advance the cache only after a successful commit, and invalidate/refresh it on exceptions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored the code to update blockchain_last_hash_cache only after a successful database commit. If the commit fails, the cache is now invalidated to ensure the next request fetches the correct state from the database.

# RAG Retrieval (New)
relevant_rule = rag_service.retrieve(description)
initial_action_plan = None
Expand All @@ -199,7 +206,8 @@ async def create_issue(
longitude=longitude,
location=location,
action_plan=initial_action_plan,
integrity_hash=integrity_hash
integrity_hash=integrity_hash,
previous_integrity_hash=prev_hash
)

# Offload blocking DB operations to threadpool
Expand Down Expand Up @@ -620,24 +628,32 @@ def get_user_issues(
async def verify_blockchain_integrity(issue_id: int, db: Session = Depends(get_db)):
"""
Verify the cryptographic integrity of a report using the blockchain-style chaining.
Optimized: Uses column projection to fetch only needed data.
Optimized: Uses previous_integrity_hash column for O(1) verification.
"""
# Fetch current issue data
# Fetch current issue data including the link to previous hash
# Performance Boost: Use projected previous_integrity_hash to avoid N+1 or secondary lookups
current_issue = await run_in_threadpool(
lambda: db.query(
Issue.id, Issue.description, Issue.category, Issue.integrity_hash
Issue.id,
Issue.description,
Issue.category,
Issue.integrity_hash,
Issue.previous_integrity_hash
).filter(Issue.id == issue_id).first()
)

if not current_issue:
raise HTTPException(status_code=404, detail="Issue not found")

# Fetch previous issue's integrity hash to verify the chain
prev_issue_hash = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first()
)
# Determine previous hash (use stored link or fallback for legacy records)
prev_hash = current_issue.previous_integrity_hash

prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""
if prev_hash is None:
# Fallback for legacy records created before O(1) optimization
Comment on lines +648 to +652
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verify_blockchain_integrity now trusts Issue.previous_integrity_hash (and only falls back to a DB lookup when it’s NULL). This can report "valid" even if previous_integrity_hash doesn’t match the actual predecessor record in the DB (e.g., due to concurrent creation/cache races). To preserve chain continuity, also fetch the DB predecessor hash (single additional query or a scalar subquery) and treat a mismatch as invalid.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've enhanced verify_blockchain_integrity to double-verify the previous_integrity_hash against the actual predecessor record in the database. This ensures that we don't blindly trust the stored link and maintains absolute chain continuity.

prev_issue_hash = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).filter(Issue.id < issue_id).order_by(Issue.id.desc()).first()
)
prev_hash = prev_issue_hash[0] if prev_issue_hash and prev_issue_hash[0] else ""

# Recompute hash based on current data and previous hash
# Chaining logic: hash(description|category|prev_hash)
Expand Down