Fix Cache Poisoning in Blockchain Integrity Check#632
Conversation
Move `blockchain_last_hash_cache.set()` and `visit_last_hash_cache.set()` after the `db.commit()` logic in `backend/routers/issues.py` and `backend/routers/field_officer.py`. This ensures that the cache is not poisoned with hashes of records that failed to insert into the database, preserving the blockchain integrity.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
❌ Deploy Preview for fixmybharat failed. Why did it fail? →
|
🙏 Thank you for your contribution, @RohanExploit!PR Details:
Quality Checklist:
Review Process:
Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughCache writes for the blockchain Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes a cache-poisoning bug in the “blockchain” integrity-hash chaining logic by ensuring the “last hash” caches are only advanced after the corresponding database write has succeeded, preventing the cache from getting ahead of persisted state.
Changes:
- Move
blockchain_last_hash_cacheupdate to after successful issue persistence increate_issue. - Move
visit_last_hash_cacheupdate to after successful field officer visit persistence inofficer_check_in. - Add a
run_tests.shhelper script for setting up a venv and running backend tests.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
backend/routers/issues.py |
Delays integrity-hash cache advancement until after the issue is saved/committed. |
backend/routers/field_officer.py |
Delays visit-hash cache advancement until after the visit is saved/committed. |
run_tests.sh |
Adds a simple local test runner script (venv + pytest). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Offload blocking DB operations to threadpool | ||
| await run_in_threadpool(save_issue_db, db, new_issue) | ||
|
|
||
| # Update cache for next report AFTER successful DB commit | ||
| blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash") |
There was a problem hiding this comment.
The cache update is now gated on save_issue_db completing successfully, but save_issue_db does both commit() and refresh(). If commit() succeeds and refresh() later raises (e.g., transient DB disconnect), the row is already persisted but the cache won’t be updated, leaving a stale last_hash that can cause the next integrity hash to chain from the wrong predecessor until TTL expiry. Consider updating the cache immediately after a confirmed commit (or restructuring save_issue_db / error handling so cache update happens once commit succeeds even if refresh/response building fails).
| db.refresh(new_visit) | ||
|
|
||
| # Update cache for next visit AFTER successful DB commit | ||
| visit_last_hash_cache.set(data=visit_hash, key="last_hash") | ||
|
|
There was a problem hiding this comment.
Cache update happens after db.refresh(new_visit). If commit() succeeds but refresh() fails (separate DB round-trip), the visit row is persisted but the cache won’t advance, leaving last_hash stale and potentially breaking the visit hash chain for subsequent check-ins until the cache expires. Consider updating the cache immediately after a successful commit (and handling refresh errors separately) so the cache reflects the DB state once the transaction is committed.
| db.refresh(new_visit) | |
| # Update cache for next visit AFTER successful DB commit | |
| visit_last_hash_cache.set(data=visit_hash, key="last_hash") | |
| # Update cache as soon as the transaction is committed so the | |
| # hash chain state stays aligned with persisted DB state even if | |
| # the subsequent refresh round-trip fails. | |
| visit_last_hash_cache.set(data=visit_hash, key="last_hash") | |
| try: | |
| db.refresh(new_visit) | |
| except Exception: | |
| logger.exception( | |
| "Visit committed but failed to refresh FieldOfficerVisit; cache already updated" | |
| ) | |
| raise HTTPException( | |
| status_code=500, | |
| detail="Visit was saved, but the server failed to load the saved visit details." | |
| ) |
| python -m venv venv | ||
| source venv/bin/activate | ||
| pip install -r backend/requirements.txt pytest pytest-asyncio pytest-mock |
There was a problem hiding this comment.
run_tests.sh uses python/pip without verifying they exist or stopping on failure. In this repo other scripts check for python3 (e.g. deploy-backend.sh) and gate subsequent steps on command exit codes. Consider switching to python3 -m venv / python3 -m pip and adding set -e (or explicit exit-code checks) so dependency install failures don’t get masked by later pytest output.
| python -m venv venv | |
| source venv/bin/activate | |
| pip install -r backend/requirements.txt pytest pytest-asyncio pytest-mock | |
| set -e | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| python3 -m pip install -r backend/requirements.txt pytest pytest-asyncio pytest-mock |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routers/field_officer.py`:
- Around line 148-150: The cache update prevents cache-poisoning but doesn't
prevent two concurrent check-ins from reading the same prev_hash and both
committing children, so serialize the chain linking: when persisting a visit in
the handler that computes visit_hash and reads prev_hash from
visit_last_hash_cache, perform an atomic compare-and-set (or DB-backed
conditional update) that only sets visit_last_hash_cache to visit_hash if its
current value equals the prev_hash you read; if the CAS fails, abort/refresh
prev_hash and retry the visit persist (or return a conflict) to avoid forks. Use
the existing visit_last_hash_cache.set/get primitives or add an atomic method
(e.g., compare_and_set) and update the route logic that writes visit_hash to
rely on that atomic operation.
In `@backend/routers/issues.py`:
- Around line 213-214: Concurrent issue creations still compute integrity_hash
from a stale prev_hash; fix by performing predecessor selection and the insert
inside a database transaction with serialization/locking so only one writer can
read/update the chain head at a time. Modify the /issues write path (the handler
that computes integrity_hash and calls session.add/commit) to: begin a
transaction, select the chain head row with a FOR UPDATE (or lock the single row
that stores last_hash) or use a SERIALIZABLE transaction isolation level,
compute integrity_hash using that locked value, insert the new link, commit, and
only after successful commit call
blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash"). Ensure you
use the existing DB session/commit methods in the handler (replace the current
compute-then-commit flow around integrity_hash and session.commit) so
predecessor selection and insert are atomic.
In `@run_tests.sh`:
- Around line 1-5: Add shell strictness and explicit failure checks to the test
runner: enable "set -euo pipefail" (and optionally IFS=$'\n\t') at the top of
run_tests.sh so any command (python -m venv, source venv/bin/activate, pip
install) failing will abort; after "source venv/bin/activate" verify activation
succeeded (e.g., check that $VIRTUAL_ENV is non-empty) and exit with a clear
error if not; ensure pip install and pytest invocations are not run in a
subshell that masks errors so their non-zero exit codes propagate (keep
TELEGRAM_BOT_TOKEN=... pytest invocation as-is but let the script strict flags
cause CI to fail fast on errors).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8d4844d1-5143-47c9-85e4-68aba2e73de2
📒 Files selected for processing (3)
backend/routers/field_officer.pybackend/routers/issues.pyrun_tests.sh
| # Update cache for next visit AFTER successful DB commit | ||
| visit_last_hash_cache.set(data=visit_hash, key="last_hash") | ||
|
|
There was a problem hiding this comment.
Good cache-poisoning fix, but concurrent check-ins can still break chain continuity.
At Line 148, updating cache post-commit is correct, but two concurrent requests can still read the same prev_hash before either commit and both persist children of the same parent hash. Because visit_last_hash_cache is a shared singleton (backend/cache.py:176-183), this can fork the integrity chain.
Suggested direction (serialize write-path chain linking)
- prev_hash = visit_last_hash_cache.get("last_hash")
- if prev_hash is None:
- prev_visit = db.query(FieldOfficerVisit.visit_hash).order_by(FieldOfficerVisit.id.desc()).first()
- prev_hash = prev_visit[0] if prev_visit and prev_visit[0] else ""
- visit_last_hash_cache.set(data=prev_hash, key="last_hash")
+ # For write path, derive predecessor in the same DB transaction and lock ordering.
+ prev_visit = (
+ db.query(FieldOfficerVisit.visit_hash)
+ .order_by(FieldOfficerVisit.id.desc())
+ .with_for_update()
+ .first()
+ )
+ prev_hash = prev_visit[0] if prev_visit and prev_visit[0] else ""
...
db.add(new_visit)
db.commit()
visit_last_hash_cache.set(data=visit_hash, key="last_hash")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/routers/field_officer.py` around lines 148 - 150, The cache update
prevents cache-poisoning but doesn't prevent two concurrent check-ins from
reading the same prev_hash and both committing children, so serialize the chain
linking: when persisting a visit in the handler that computes visit_hash and
reads prev_hash from visit_last_hash_cache, perform an atomic compare-and-set
(or DB-backed conditional update) that only sets visit_last_hash_cache to
visit_hash if its current value equals the prev_hash you read; if the CAS fails,
abort/refresh prev_hash and retry the visit persist (or return a conflict) to
avoid forks. Use the existing visit_last_hash_cache.set/get primitives or add an
atomic method (e.g., compare_and_set) and update the route logic that writes
visit_hash to rely on that atomic operation.
| # Update cache for next report AFTER successful DB commit | ||
| blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash") |
There was a problem hiding this comment.
Post-commit cache update helps, but integrity_hash generation is still race-prone.
At Line 213, cache update timing is fixed, but concurrent /issues creations can still compute integrity_hash from the same stale prev_hash and commit divergent chain links. The write path needs transactional serialization for predecessor selection + insert.
Suggested direction (transactional predecessor selection)
- prev_hash = blockchain_last_hash_cache.get("last_hash")
- if prev_hash is None:
- prev_issue = await run_in_threadpool(
- lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).first()
- )
- prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
- blockchain_last_hash_cache.set(data=prev_hash, key="last_hash")
+ # In write path, read predecessor from DB inside the same transaction with locking.
+ prev_issue = await run_in_threadpool(
+ lambda: db.query(Issue.integrity_hash).order_by(Issue.id.desc()).with_for_update().first()
+ )
+ prev_hash = prev_issue[0] if prev_issue and prev_issue[0] else ""
...
await run_in_threadpool(save_issue_db, db, new_issue)
blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/routers/issues.py` around lines 213 - 214, Concurrent issue creations
still compute integrity_hash from a stale prev_hash; fix by performing
predecessor selection and the insert inside a database transaction with
serialization/locking so only one writer can read/update the chain head at a
time. Modify the /issues write path (the handler that computes integrity_hash
and calls session.add/commit) to: begin a transaction, select the chain head row
with a FOR UPDATE (or lock the single row that stores last_hash) or use a
SERIALIZABLE transaction isolation level, compute integrity_hash using that
locked value, insert the new link, commit, and only after successful commit call
blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash"). Ensure you
use the existing DB session/commit methods in the handler (replace the current
compute-then-commit flow around integrity_hash and session.commit) so
predecessor selection and insert are atomic.
| #!/bin/bash | ||
| python -m venv venv | ||
| source venv/bin/activate | ||
| pip install -r backend/requirements.txt pytest pytest-asyncio pytest-mock | ||
| TELEGRAM_BOT_TOKEN=test PYTHONPATH=. pytest backend/tests/ |
There was a problem hiding this comment.
Make the test runner fail fast to avoid misleading CI results.
Without strict shell flags, setup failures can be masked and the script may continue into pytest with a broken environment (e.g., failed venv activation or partial install).
Suggested hardening patch
#!/bin/bash
+set -euo pipefail
+
python -m venv venv
source venv/bin/activate
-pip install -r backend/requirements.txt pytest pytest-asyncio pytest-mock
+python -m pip install --upgrade pip
+python -m pip install -r backend/requirements.txt pytest pytest-asyncio pytest-mock
TELEGRAM_BOT_TOKEN=test PYTHONPATH=. pytest backend/tests/🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 3-3: Not following: venv/bin/activate was not specified as input (see shellcheck -x).
(SC1091)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@run_tests.sh` around lines 1 - 5, Add shell strictness and explicit failure
checks to the test runner: enable "set -euo pipefail" (and optionally
IFS=$'\n\t') at the top of run_tests.sh so any command (python -m venv, source
venv/bin/activate, pip install) failing will abort; after "source
venv/bin/activate" verify activation succeeded (e.g., check that $VIRTUAL_ENV is
non-empty) and exit with a clear error if not; ensure pip install and pytest
invocations are not run in a subshell that masks errors so their non-zero exit
codes propagate (keep TELEGRAM_BOT_TOKEN=... pytest invocation as-is but let the
script strict flags cause CI to fail fast on errors).
There was a problem hiding this comment.
3 issues found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/routers/field_officer.py">
<violation number="1" location="backend/routers/field_officer.py:149">
P1: Cache update is placed after `db.refresh()`, so a refresh failure after a successful commit can leave the hash cache stale and break subsequent chain linkage.</violation>
</file>
<file name="backend/routers/issues.py">
<violation number="1" location="backend/routers/issues.py:214">
P2: Do not fail the request if post-commit cache update fails; this can return 500 after a successful DB write.</violation>
</file>
<file name="run_tests.sh">
<violation number="1" location="run_tests.sh:2">
P3: Without `set -e`, failures in `python -m venv` or `pip install` will be silently ignored and pytest will run in a broken environment, producing misleading CI results. Add `set -e` (or `set -euo pipefail`) and consider using `python3`/`python3 -m pip` for consistency with other project scripts (e.g., `deploy-backend.sh`).</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| db.refresh(new_visit) | ||
|
|
||
| # Update cache for next visit AFTER successful DB commit | ||
| visit_last_hash_cache.set(data=visit_hash, key="last_hash") |
There was a problem hiding this comment.
P1: Cache update is placed after db.refresh(), so a refresh failure after a successful commit can leave the hash cache stale and break subsequent chain linkage.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/field_officer.py, line 149:
<comment>Cache update is placed after `db.refresh()`, so a refresh failure after a successful commit can leave the hash cache stale and break subsequent chain linkage.</comment>
<file context>
@@ -148,6 +145,9 @@ def officer_check_in(request: OfficerCheckInRequest, db: Session = Depends(get_d
db.refresh(new_visit)
+ # Update cache for next visit AFTER successful DB commit
+ visit_last_hash_cache.set(data=visit_hash, key="last_hash")
+
logger.info(
</file context>
| await run_in_threadpool(save_issue_db, db, new_issue) | ||
|
|
||
| # Update cache for next report AFTER successful DB commit | ||
| blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash") |
There was a problem hiding this comment.
P2: Do not fail the request if post-commit cache update fails; this can return 500 after a successful DB write.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/issues.py, line 214:
<comment>Do not fail the request if post-commit cache update fails; this can return 500 after a successful DB write.</comment>
<file context>
@@ -212,6 +209,9 @@ async def create_issue(
await run_in_threadpool(save_issue_db, db, new_issue)
+
+ # Update cache for next report AFTER successful DB commit
+ blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash")
else:
# Don't create new issue, just return deduplication info
</file context>
| blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash") | |
| try: | |
| blockchain_last_hash_cache.set(data=integrity_hash, key="last_hash") | |
| except Exception as cache_error: | |
| logger.error(f"Error updating blockchain hash cache: {cache_error}", exc_info=True) |
| @@ -0,0 +1,5 @@ | |||
| #!/bin/bash | |||
| python -m venv venv | |||
There was a problem hiding this comment.
P3: Without set -e, failures in python -m venv or pip install will be silently ignored and pytest will run in a broken environment, producing misleading CI results. Add set -e (or set -euo pipefail) and consider using python3/python3 -m pip for consistency with other project scripts (e.g., deploy-backend.sh).
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At run_tests.sh, line 2:
<comment>Without `set -e`, failures in `python -m venv` or `pip install` will be silently ignored and pytest will run in a broken environment, producing misleading CI results. Add `set -e` (or `set -euo pipefail`) and consider using `python3`/`python3 -m pip` for consistency with other project scripts (e.g., `deploy-backend.sh`).</comment>
<file context>
@@ -0,0 +1,5 @@
+#!/bin/bash
+python -m venv venv
+source venv/bin/activate
+pip install -r backend/requirements.txt pytest pytest-asyncio pytest-mock
</file context>
Adds `npm install` to the `netlify.toml` build command to ensure dependencies are properly installed when building the frontend from a subdirectory, resolving the Netlify CI check failures.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@netlify.toml`:
- Line 2: Root netlify config is being ignored because base = "frontend" directs
Netlify to use the nested config; open the nested netlify.toml (the one
referenced by base) and remove the duplicate command entries, consolidating them
to a single command = "npm ci && npm run build", then remove the overridden
command and publish keys from the root netlify.toml so there is no
redundant/ignored configuration; ensure only one build command exists in the
frontend netlify.toml to avoid undefined behavior and keep base = "frontend"
unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
There was a problem hiding this comment.
2 issues found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="netlify.toml">
<violation number="1" location="netlify.toml:2">
P1: This `command` change is ineffective. Because `base = "frontend"` is set on line 4, Netlify resolves configuration from `frontend/netlify.toml` (which exists in this repo), completely overriding the root file's `command` and `publish` values. To actually change the build command, update `frontend/netlify.toml` instead.</violation>
<violation number="2" location="netlify.toml:2">
P2: Use `npm ci` instead of `npm install` in the Netlify build command to keep deploy builds reproducible and aligned with lockfile-based installs.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Deleted `frontend/netlify.toml` and migrated its `[build.environment]`
variables (`NODE_VERSION="20"`, `CI="false"`) into the root `netlify.toml`.
Having multiple configurations was confusing the Netlify build system and
causing deployment failures ("Pages changed", "Header rules", "Redirect rules").
🔍 Quality Reminder |
Because `base = "frontend"` implies `netlify` resolves the `publish` directory and runs commands strictly within the `frontend/` directory, prepending `frontend/` or `cd frontend` creates invalid pathing (e.g., `frontend/frontend/dist`). Reverting the commands to relative paths ensures the CI check runner can correctly find the generated artifacts.
Fixes a logical bug where the application's integrity hash caches were being updated prior to successful database commits. This addresses the "cache poisoning" issue. If a database insert failed, the cache would maintain a hash that didn't exist in the database, breaking the hash chain for all future reports.
PR created automatically by Jules for task 9188716175601989125 started by @RohanExploit
Summary by cubic
Fixes cache poisoning by updating integrity hash caches only after successful DB commits. Unifies
netlify.tomland uses relative paths underbase = "frontend"to fix CI builds.Bug Fixes
visit_last_hash_cache.set()andblockchain_last_hash_cache.set()afterdb.commit()inbackend/routers/field_officer.pyandbackend/routers/issues.py.frontend/netlify.toml, migrate env vars to rootnetlify.toml, setnpm install && npm run build, and use relative paths (publish = "dist") underbase = "frontend"to avoidfrontend/frontend/disterrors.New Features
run_tests.shto create a venv and run backend tests withpytest,pytest-asyncio, andpytest-mock.Written for commit e9c9a07. Summary will update on new commits.
Summary by CodeRabbit
Bug Fixes
Chores