Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions backend/resolution_proof_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,13 +464,13 @@ def verify_evidence(grievance_id: int, db: Session) -> Dict[str, Any]:
Returns:
Verification result dictionary
"""
# Optimized: Use .count() and .first() to avoid loading all historical evidence
# records into memory, reducing O(N) database transfer and memory overhead.
evidence_count = db.query(ResolutionEvidence).filter(
# Optimized: Evaluate .first() prior to .count() to enable early exit
# when no evidence exists, reducing database round-trips.
evidence = db.query(ResolutionEvidence).filter(
ResolutionEvidence.grievance_id == grievance_id
).count()
).order_by(ResolutionEvidence.id.desc()).first()

if evidence_count == 0:
if not evidence:
Comment on lines +467 to +473
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the “no evidence found” path was doing two DB round-trips and is now reduced to one. Both before and after this change, the no-evidence path executes a single query (previously COUNT(*), now LIMIT 1). The evidence-present path still does two queries (latest evidence + count). Please update the PR description (or adjust implementation) so the stated performance impact matches reality.

Copilot uses AI. Check for mistakes.
return {
"grievance_id": grievance_id,
"is_verified": False,
Expand All @@ -483,10 +483,10 @@ def verify_evidence(grievance_id: int, db: Session) -> Dict[str, Any]:
"message": "No resolution evidence found for this grievance"
}

# Use the most recent evidence
evidence = db.query(ResolutionEvidence).filter(
# Total count is still needed for the response
evidence_count = db.query(ResolutionEvidence).filter(
ResolutionEvidence.grievance_id == grievance_id
).order_by(ResolutionEvidence.id.desc()).first()
).count()
Comment on lines 467 to +489
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization note here is misleading: the function still performs two queries when evidence exists (.first() plus .count()), and the previous version already short-circuited to a single query in the “no evidence” case (via .count() == 0). If the intent is to reduce database round-trips, consider fetching the latest record and the total count in a single query (e.g., a window function / subquery), or update the comment to reflect that the change mainly swaps an O(1) LIMIT 1 lookup for COUNT(*) on the empty-evidence path rather than reducing query count overall.

Copilot uses AI. Check for mistakes.

# Re-verify the server signature
bundle_str = json.dumps(evidence.metadata_bundle, sort_keys=True)
Expand Down
24 changes: 23 additions & 1 deletion backend/routers/voice.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
import logging
import os
import uuid
import hashlib
from datetime import datetime, timezone

from backend.database import get_db
from backend.models import Issue
from backend.cache import blockchain_last_hash_cache
from backend.schemas import (
VoiceTranscriptionResponse,
TextTranslationRequest,
Expand Down Expand Up @@ -254,6 +256,21 @@ async def submit_voice_issue(
# Store relative path for portability
relative_audio_path = os.path.join("data", "audio_recordings", audio_filename)

# Blockchain feature: calculate integrity hash for the report
# Performance Boost: Use thread-safe cache to eliminate DB query for last hash
prev_hash = blockchain_last_hash_cache.get("last_hash")
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: Race condition breaks blockchain integrity chain when concurrent requests both read the same previous hash, creating a fork instead of a linear chain. Use database-level locking (e.g., SELECT FOR UPDATE) or a distributed lock to ensure only one request can fetch and chain from the last hash at a time.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/voice.py, line 261:

<comment>Race condition breaks blockchain integrity chain when concurrent requests both read the same previous hash, creating a fork instead of a linear chain. Use database-level locking (e.g., SELECT FOR UPDATE) or a distributed lock to ensure only one request can fetch and chain from the last hash at a time.</comment>

<file context>
@@ -254,6 +256,21 @@ async def submit_voice_issue(
         
+        # Blockchain feature: calculate integrity hash for the report
+        # Performance Boost: Use thread-safe cache to eliminate DB query for last hash
+        prev_hash = blockchain_last_hash_cache.get("last_hash")
+        if prev_hash is None:
+            # Cache miss: Fetch only the last hash from DB
</file context>
Fix with Cubic

if prev_hash is None:
# Cache miss: Fetch only the last hash from DB
prev_issue = await run_in_threadpool(
lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
)
Comment on lines +264 to +266
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Is a SQLAlchemy ORM Session safe to use from a different thread than the one it was created in?

💡 Result:

No, a SQLAlchemy ORM Session is not safe to use from a different thread than the one it was created in. The Session is designed to be used in a non-concurrent fashion, meaning only in one thread at a time. Sharing a Session instance across threads without synchronization can lead to race conditions and data corruption. Official SQLAlchemy documentation explicitly states: "The Session is intended to be used in a non-concurrent fashion, that is, a particular instance of Session should be used in only one thread or task at a time." The recommended pattern is "Session per thread", where each thread creates and uses its own Session instance. For multi-threaded applications, use scoped_session which provides thread-local Sessions via threading.local, ensuring each thread gets its own Session. Alternatively, create a new Session in each thread using sessionmaker or context managers. This guidance is consistent across SQLAlchemy 2.0 and 2.1 documentation and community sources.

Citations:


🏁 Script executed:

# First, let's see the file and the specific lines
cat -n backend/routers/voice.py | sed -n '260,270p'

Repository: RohanExploit/VishwaGuru

Length of output: 739


🏁 Script executed:

# Check the imports and function signature to understand the context
cat -n backend/routers/voice.py | head -50

Repository: RohanExploit/VishwaGuru

Length of output: 1997


🏁 Script executed:

# Find the function containing line 264
cat -n backend/routers/voice.py | sed -n '240,280p'

Repository: RohanExploit/VishwaGuru

Length of output: 2342


🏁 Script executed:

# Find the function signature that contains line 264
cat -n backend/routers/voice.py | sed -n '180,240p'

Repository: RohanExploit/VishwaGuru

Length of output: 2822


🏁 Script executed:

# Find the function signature
cat -n backend/routers/voice.py | sed -n '160,190p'

Repository: RohanExploit/VishwaGuru

Length of output: 1656


🏁 Script executed:

# Let me also check how get_db is defined to understand db creation
cat -n backend/database.py | head -100

Repository: RohanExploit/VishwaGuru

Length of output: 1327


Keep this Session on one thread.

Line 264-266 runs a synchronous SQLAlchemy Session inside run_in_threadpool. The db Session is created on the request thread (via FastAPI's dependency injection) but executed on a worker thread. ORM sessions are thread-confined and cannot safely be shared across threads, which can cause intermittent failures under load. Either query with this session on the request thread or migrate the route to AsyncSession end-to-end.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/voice.py` around lines 264 - 266, The code runs a synchronous
SQLAlchemy Session created as the FastAPI dependency (db) inside
run_in_threadpool when computing prev_issue (lambda using
db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()), which is
unsafe because Session objects are not thread-safe; instead, either perform that
query directly on the request thread (remove run_in_threadpool and call
db.query(...)...first() inline) or convert the route to use an AsyncSession
end-to-end and replace the synchronous query with an async query (e.g., use
AsyncSession methods) so the Session is not accessed from a worker thread.

prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
blockchain_last_hash_cache.set(data=prev_hash, key="last_hash")
Comment on lines +262 to +268
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Skip NULL hashes when warming the cache.

Issue.integrity_hash is nullable, so on a cache miss this can select the newest legacy/unsealed issue and collapse prev_hash back to "". That silently starts a new chain even if older sealed issues already exist. The fallback query should only consider rows with a non-null integrity_hash.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routers/voice.py` around lines 262 - 268, On cache miss in the block
that sets prev_hash, change the DB query so it only considers rows where
Issue.integrity_hash is not NULL (e.g., add a filter like Issue.integrity_hash
!= None / .isnot(None) to the run_in_threadpool query) before ordering by
id.desc() and taking first(); then compute prev_hash from that filtered
prev_issue and set blockchain_last_hash_cache as before so legacy/unsealed rows
with NULL hashes are skipped when warming the cache.


# Simple but effective SHA-256 chaining
hash_content = f"{final_description}|{issue_category.value}|{prev_hash}"
integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()

# Create issue in database
reference_id = generate_reference_id()

Expand All @@ -273,12 +290,17 @@ async def submit_voice_issue(
original_text=original_text,
transcription_confidence=voice_result.get('confidence', 0.0),
manual_correction_applied=manual_correction_applied,
audio_file_path=relative_audio_path # Store relative path
audio_file_path=relative_audio_path, # Store relative path
integrity_hash=integrity_hash,
previous_integrity_hash=prev_hash
)

db.add(new_issue)
db.commit()
db.refresh(new_issue)

# Update cache for next report AFTER successful DB commit
blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")
Comment on lines 259 to +303
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description mentions an added integration test tests/test_voice_blockchain_bolt.py, but that file is not present in the repository. Since this change adds security-critical integrity chaining for voice submissions, please add/adjust tests to cover submit_voice_issue producing integrity_hash/previous_integrity_hash consistent with the existing issue blockchain tests (and ideally validate cache behavior on successive submissions).

Copilot uses AI. Check for mistakes.

logger.info(f"Voice issue created: ID={new_issue.id}, Language={voice_result.get('source_language')}, Confidence={voice_result.get('confidence')}")

Expand Down
Loading