Skip to content

fix: replace peek() with SHA-256 hash to prevent cache poisoning (AC-2)#678

Open
3em0 wants to merge 1 commit intozilliztech:mainfrom
3em0:fix/ac2-peek-cache-poisoning
Open

fix: replace peek() with SHA-256 hash to prevent cache poisoning (AC-2)#678
3em0 wants to merge 1 commit intozilliztech:mainfrom
3em0:fix/ac2-peek-cache-poisoning

Conversation

@3em0
Copy link

@3em0 3em0 commented Mar 25, 2026

Summary

  • Security fix: BufferedReader.peek() only returns the first ~8192 bytes (buffer size) of a file. Cache keys generated from peek() output are vulnerable to collision — two files sharing the same header but different content produce identical cache keys, enabling cache poisoning and information disclosure.
  • Replace peek() with streaming SHA-256 hash of full file content in three affected functions: get_file_bytes(), get_input_str(), and get_image_question()
  • File pointer is reset via seek(0) after hashing so downstream LLM calls still read the complete file
  • Also fixes a resource leak in get_image_question() (open() without close)

Affected Functions

Function Used By Old Behavior New Behavior
get_file_bytes() OpenAI audio adapter file.peek() → ~8KB sha256(file.read()) → full content hash
get_input_str() Replicate adapter str(image.peek()) + question sha256(image) + question
get_image_question() MiniGPT4 adapter str(open(img).peek()) + question sha256(img) + question

Attack Scenario

  1. Attacker sends img_A (legitimate) + question → answer cached with key derived from first 8KB
  2. Attacker constructs img_B with identical first 8KB but completely different content
  3. Attacker sends img_B + same question → cache returns img_A's answer (LLM never called)

See tests/security/AC-2_cache_poisoning_via_peek.md for the full vulnerability report with CVSS scoring.

Test plan

  • 17/17 unit tests passing (tests/unit_tests/processor/test_pre.py)
  • New tests verify: no collision between files with same header, identical files still match, file pointer reset after hashing, file path input works
  • Verified fix against PoC scripts (tests/poc_ac2_peek_collision.py, tests/poc_ac2_e2e_poisoning.py)
  • Reviewers: verify existing adapter tests still pass with the new hash-based keys

🤖 Generated with Claude Code

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 3em0
To complete the pull request process, please assign cxie after the PR has been reviewed.
You can assign the PR to them by writing /assign @cxie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot
Copy link
Collaborator

Welcome @3em0! It looks like this is your first PR to zilliztech/GPTCache 🎉

…AC-2)

BufferedReader.peek() only returns the first ~8192 bytes of a file,
making it trivial to construct different images/files that produce
identical cache keys. This enables cache poisoning where an attacker's
query returns another user's cached answer.

Replace peek() with streaming SHA-256 hash of the full file content
in get_file_bytes(), get_input_str(), and get_image_question().
The file pointer is reset after hashing so downstream LLM calls
can still read the complete file.

Also fixes a resource leak in get_image_question() (open() without close).

Signed-off-by: 3em0 <3em0@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@3em0 3em0 force-pushed the fix/ac2-peek-cache-poisoning branch from e901655 to 8fde3d2 Compare March 25, 2026 04:50
@mergify mergify bot added dco-passed and removed needs-dco labels Mar 25, 2026
@3em0
Copy link
Author

3em0 commented Mar 25, 2026

@xiaofan-luan hi! how it is going?

@3em0
Copy link
Author

3em0 commented Mar 26, 2026

/assign @cxie

Hi @cxie and @xiaofan-luan, DCO is fixed and the PR is ready for review.
When you have a moment, could you please take a look? If it looks good, an /approve would help move it forward. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants