fix: replace peek() with SHA-256 hash to prevent cache poisoning (AC-2) by 3em0 · Pull Request #678 · zilliztech/GPTCache

3em0 · 2026-03-25T04:35:04Z

Summary

Security fix: BufferedReader.peek() only returns the first ~8192 bytes (buffer size) of a file. Cache keys generated from peek() output are vulnerable to collision — two files sharing the same header but different content produce identical cache keys, enabling cache poisoning and information disclosure.
Replace peek() with streaming SHA-256 hash of full file content in three affected functions: get_file_bytes(), get_input_str(), and get_image_question()
File pointer is reset via seek(0) after hashing so downstream LLM calls still read the complete file
Also fixes a resource leak in get_image_question() (open() without close)

Affected Functions

Function	Used By	Old Behavior	New Behavior
`get_file_bytes()`	OpenAI audio adapter	`file.peek()` → ~8KB	`sha256(file.read())` → full content hash
`get_input_str()`	Replicate adapter	`str(image.peek()) + question`	`sha256(image) + question`
`get_image_question()`	MiniGPT4 adapter	`str(open(img).peek()) + question`	`sha256(img) + question`

Attack Scenario

Attacker sends img_A (legitimate) + question → answer cached with key derived from first 8KB
Attacker constructs img_B with identical first 8KB but completely different content
Attacker sends img_B + same question → cache returns img_A's answer (LLM never called)

See tests/security/AC-2_cache_poisoning_via_peek.md for the full vulnerability report with CVSS scoring.

Test plan

17/17 unit tests passing (tests/unit_tests/processor/test_pre.py)
New tests verify: no collision between files with same header, identical files still match, file pointer reset after hashing, file path input works
Verified fix against PoC scripts (tests/poc_ac2_peek_collision.py, tests/poc_ac2_e2e_poisoning.py)
Reviewers: verify existing adapter tests still pass with the new hash-based keys

🤖 Generated with Claude Code

sre-ci-robot · 2026-03-25T04:35:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: 3em0
To complete the pull request process, please assign cxie after the PR has been reviewed.
You can assign the PR to them by writing /assign @cxie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot · 2026-03-25T04:35:15Z

Welcome @3em0! It looks like this is your first PR to zilliztech/GPTCache 🎉

…AC-2) BufferedReader.peek() only returns the first ~8192 bytes of a file, making it trivial to construct different images/files that produce identical cache keys. This enables cache poisoning where an attacker's query returns another user's cached answer. Replace peek() with streaming SHA-256 hash of the full file content in get_file_bytes(), get_input_str(), and get_image_question(). The file pointer is reset after hashing so downstream LLM calls can still read the complete file. Also fixes a resource leak in get_image_question() (open() without close). Signed-off-by: 3em0 <3em0@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

3em0 · 2026-03-25T05:15:00Z

@xiaofan-luan hi！ how it is going?

3em0 · 2026-03-26T03:41:19Z

/assign @cxie

Hi @cxie and @xiaofan-luan, DCO is fixed and the PR is ready for review.
When you have a moment, could you please take a look? If it looks good, an /approve would help move it forward. Thanks!

sre-ci-robot requested review from cxie and xiaofan-luan March 25, 2026 04:35

sre-ci-robot added the size/XL label Mar 25, 2026

mergify bot added the needs-dco label Mar 25, 2026

3em0 force-pushed the fix/ac2-peek-cache-poisoning branch from e901655 to 8fde3d2 Compare March 25, 2026 04:50

mergify bot added dco-passed and removed needs-dco labels Mar 25, 2026

sre-ci-robot assigned cxie Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: replace peek() with SHA-256 hash to prevent cache poisoning (AC-2)#678

fix: replace peek() with SHA-256 hash to prevent cache poisoning (AC-2)#678
3em0 wants to merge 1 commit intozilliztech:mainfrom
3em0:fix/ac2-peek-cache-poisoning

3em0 commented Mar 25, 2026

Uh oh!

sre-ci-robot commented Mar 25, 2026

Uh oh!

sre-ci-robot commented Mar 25, 2026

Uh oh!

3em0 commented Mar 25, 2026

Uh oh!

3em0 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

3em0 commented Mar 25, 2026

Summary

Affected Functions

Attack Scenario

Test plan

Uh oh!

sre-ci-robot commented Mar 25, 2026

Uh oh!

sre-ci-robot commented Mar 25, 2026

Uh oh!

3em0 commented Mar 25, 2026

Uh oh!

3em0 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants