Skip to content

feat(decryption): streaming decryption with cipher-agnostic BufferedDecryptingStream and DelayedAuthDecryptingStream#150

Merged
texastony merged 33 commits into
stagingfrom
tonyknap/feat-buffered-decryption-aes-gcm
Apr 2, 2026
Merged

feat(decryption): streaming decryption with cipher-agnostic BufferedDecryptingStream and DelayedAuthDecryptingStream#150
texastony merged 33 commits into
stagingfrom
tonyknap/feat-buffered-decryption-aes-gcm

Conversation

@texastony

@texastony texastony commented Mar 9, 2026

Copy link
Copy Markdown

Changes

Streaming decryptors accept cipher objects

  • BufferedDecryptingStream and DelayedAuthDecryptingStream now take a decryptor object and tag_length instead of raw key/nonce
  • Supports all algorithm suites: AES-GCM, key-committing AES-GCM (HKDF), and AES-CBC
  • Optional unpadder parameter for CBC PKCS7 padding removal

Algorithm suite dispatch returns streams

  • GetEncryptedObjectPipeline.decrypt() returns a streaming decryptor for all algorithm suites instead of eagerly decrypting in memory
  • CBC path creates a cipher decryptor + PKCS7 unpadder
  • GCM path creates a GCM decryptor with tag_length=16
  • Key-committing GCM path performs HKDF derivation, commitment verification, then returns a GCM decryptor with AAD

Merged with staging

  • Integrates key commitment policy, commitment policy enforcement, algorithm suite configuration, CBC decryption, and V3 decryption from staging
  • All tests updated to pass enable_delayed_authentication and call .read() on stream results

Use AlgorithmSuite properties for tag length and block size

  • Add cipher_tag_length_bytes and cipher_block_size_bytes properties to AlgorithmSuite
  • Replace hardcoded GCM_TAG_LENGTH constant and PKCS7(128) with algorithm suite properties
  • tag_length is now required (no default) on DelayedAuthDecryptingStream
  • Remove dead code: _decrypt_cbc_content()
  • Make _make_decrypting_stream and _decrypt_kc_gcm_content static methods

Update: address kessplas review comments (a6fb136)

  • Make instruction_suffix and enable_delayed_authentication positional (required) args on decrypt()
  • Move duvet annotation from enable_delayed_authentication None-check to BufferedDecryptingStream return
  • Hardcode CBC path to always stream (enable_delayed_authentication=True) — no auth tag, matches Java behavior
  • Move duvet annotations from _decrypt_kc_gcm_content to _decrypt_kc_gcm_streaming
  • Remove unused _decrypt_kc_gcm_content method
  • Fix DelayedAuthDecryptingStream CBC unpadding: peek to detect stream exhaustion + incremental unpadder
  • Add 6 CBC unit tests for both stream types (roundtrip, chunked reads, finalize, padding)
  • Add delayed authentication mode integration test with duvet citation

Update: stream class refactor + comprehensive tests + minimum read size

Split stream classes by cipher (8d4dbdc, 4991e1b)

  • Split DelayedAuthDecryptingStream into DelayedAuthCBCDecryptingStream and DelayedAuthGCMDecryptingStream
  • Rename BufferedDecryptingStreamBufferedDecryptingGCMStream (GCM does not need unpadder)
  • CBC path directly returns DelayedAuthCBCDecryptingStream (no longer routed through GCM factory)
  • Rename _make_decrypting_stream_make_decrypting_gcm_stream

Enforce minimum read size on DelayedAuthGCMDecryptingStream (dad00c2)

  • read(amt) raises S3EncryptionClientError if 0 < amt < tag_length + 1
  • Prevents empty intermediate reads caused by the rolling GCM tag buffer
  • Matches Java S3EC behavior (empty ByteBuffer downstream) but adapted for Python's read() contract
  • Design doc: streaming.md

Comprehensive unit tests (07dbee4, 4a35e88)

  • 61 new unit tests across all stream classes
  • TestDelayedAuthCBCDecryption: read-after-finalized, readable, close, context manager, wrong key, empty ciphertext
  • TestBufferedDecryptingGCMStream: full/partial reads, tell, readable, readinto, close, wrong key, tampered ciphertext, idempotent decrypt
  • TestDelayedAuthGCMDecryption: full/chunked reads, finalized, readable, close, wrong key, tampered tag, small/large data
  • TestEdgeCasePlaintextLengths: parameterized across [0, 1, 8, 15, 16, 17, 31, 32, 33, 47, 48, 49] for all stream types
  • test_read_too_small_raises_error for minimum read size guard

Streaming integration tests (07dbee4)

  • New test_i_s3_encryption_streaming.py with 17 tests
  • Parametrized across AES_GCM and KC_GCM algorithm suites
  • Covers: buffered roundtrip, partial reads, delayed-auth roundtrip, chunked reads, both-modes-same-plaintext, empty body, 1 MB large object, CBC delayed-auth

Test results

  • 177 unit tests passing
  • 57 integration tests passing, 1 skipped

Introduce BufferedDecryptingStream that wraps the S3 StreamingBody and
decrypts lazily on first read. No plaintext is released until the entire
ciphertext is read and the GCM auth tag is verified, matching the Java
S3EC's BufferedCipherSubscriber behavior.

- Add stream.py with BufferedDecryptingStream (read, iter_chunks, close)
- Pipeline returns BufferedDecryptingStream instead of decrypted bytes
- Event handler passes stream directly as parsed["Body"]
@texastony texastony requested a review from kessplas March 9, 2026 16:22
…ations

Add enable_delayed_authentication field to S3EncryptionClientConfig,
defaulting to False. Includes duvet specification citations from
client.md#enable-delayed-authentication.
…eaming

Add DelayedAuthDecryptingStream that releases plaintext incrementally
via AES-GCM cipher.update() before tag verification. The GCM tag (last
16 bytes) is held back and verified on stream exhaustion via
finalize_with_tag(). Matches Java S3EC CipherSubscriber pattern.
… handler

- Pipeline.decrypt() accepts enable_delayed_authentication param and
  returns DelayedAuthDecryptingStream when True, BufferedDecryptingStream
  when False. Raises error if param is None (must be explicitly set).
- Event handler passes config flag to pipeline.
- Remove duplicated defaults from pipeline params — config is single
  source of truth.
- Update unit tests to pass instruction_suffix explicitly.
…delayed auth

Replace individual delayed-auth tests with pytest.mark.parametrize
covering both buffered and delayed-auth modes across ascii, empty,
unicode, utf-8, latin-1, binary data, and no-body cases.
@texastony texastony changed the title feat(decryption): add buffered streaming decryption for AES-GCM feat(decryption): add buffered and delayed-auth streaming decryption for AES-GCM Mar 9, 2026
@texastony texastony force-pushed the tonyknap/feat-buffered-decryption-aes-gcm branch from 0809734 to 257fda6 Compare March 9, 2026 19:12
…n citations

Add unit tests that verify the behavioral contract of both stream modes:
- DelayedAuthDecryptingStream releases plaintext before GCM tag verification
- BufferedDecryptingStream withholds all plaintext until tag is verified

Includes duvet type=test citations for enable-delayed-authentication spec.
@texastony texastony force-pushed the tonyknap/feat-buffered-decryption-aes-gcm branch from 257fda6 to 622b5a1 Compare March 9, 2026 20:04
…layed-auth streaming

- Add 50 MB V2 delayed-auth streaming decryption test against static object
- Add 50 MB V3 test (skipped, V3 not yet implemented)
- Add 61 GiB V2/V3 placeholder tests marked @pytest.mark.large (skipped,
  static objects not yet created)
- Parametrize existing instruction file tests with buffered/delayed-auth modes
…h plaintext

- Fix pipeline decrypt() docstring to reflect both return types
- Add assertion that full delayed-auth stream output matches expected plaintext
- Remove 61 GiB V2/V3 placeholder tests (static objects not yet created)
- Remove large pytest mark registration (no longer used)
@texastony texastony marked this pull request as draft March 18, 2026 21:37
BufferedDecryptingStream and DelayedAuthDecryptingStream now take a
decryptor object and tag_length instead of raw key/nonce. This makes
them reusable across algorithm suites (GCM, key-committing GCM, CBC).
… algorithm suite dispatch

- Merge staging's key commitment, commitment policy, algorithm suite
  config, CBC decryption, and V3 decryption into the streaming branch
- Algorithm suite dispatch now returns streaming decryptors for all paths
  (GCM, key-committing GCM, CBC) instead of eager in-memory decryption
- Add unpadder support to streams for CBC PKCS7 padding removal
- Update all tests to pass enable_delayed_authentication and use .read()
  on stream results
@texastony texastony changed the title feat(decryption): add buffered and delayed-auth streaming decryption for AES-GCM feat(decryption): streaming decryption with cipher-agnostic BufferedDecryptingStream and DelayedAuthDecryptingStream Mar 19, 2026
Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Comment thread test/integration/test_i_s3_encryption_instruction_file.py Outdated
- Add cipher_tag_length_bytes and cipher_block_size_bytes properties to AlgorithmSuite
- Replace hardcoded GCM_TAG_LENGTH and PKCS7(128) with algorithm suite properties
- Remove dead code: _decrypt_cbc_content()
- Make _make_decrypting_stream and _decrypt_kc_gcm_content static methods
- Remove GCM_TAG_LENGTH constant from stream.py
- Make tag_length required (no default) on DelayedAuthDecryptingStream
@texastony texastony marked this pull request as ready for review March 19, 2026 18:58

@kessplas kessplas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking changes for the dead method/ghost Duvet annotations.

Comment thread src/s3_encryption/__init__.py
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py
Comment thread src/s3_encryption/stream.py Outdated
- Make instruction_suffix and enable_delayed_authentication positional args
- Move duvet annotation to BufferedDecryptingStream return
- Hardcode CBC to always stream (no auth tag, matches Java behavior)
- Move duvet annotations from _decrypt_kc_gcm_content to _decrypt_kc_gcm_streaming
- Remove unused _decrypt_kc_gcm_content method
- Fix DelayedAuthDecryptingStream CBC unpadding (peek + incremental unpadder)
- Add CBC unit tests for both stream types (roundtrip, chunked, finalize, padding)
- Add delayed authentication mode integration test with duvet citation
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/stream.py
Comment thread test/test_decryption.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
Split DelayedAuthDecryptingStream into DelayedAuthCBCDecryptingStream
and DelayedAuthGCMDecryptingStream. CBC and GCM are mutually exclusive
paths — CBC uses an unpadder with no auth tag, GCM uses a rolling tag
buffer with no padding — so the single-class design carried impossible
field combinations and conditional branching in read().

All three stream classes (BufferedDecryptingStream and the two new
delayed-auth classes) now extend botocore's StreamingBody with
@define(slots=False), inheriting iter_chunks, iter_lines, __iter__,
and __next__ for free.

Updated _make_decrypting_stream dispatch in pipelines.py and test
constructors in test_stream.py.
# Delayed-Auth Streams: Empty Read Behavior

## Problem

DelayedAuthGCMDecryptingStream.read(amt) can return b"" mid-stream before
the stream is exhausted. This happens when the read size is small relative
to the GCM tag length (16 bytes) — the stream can't distinguish ciphertext
from the trailing auth tag until it has accumulated more than tag_length bytes.

Example with 20 bytes of ciphertext+tag, read(7):

1. read(7) → 7 bytes buffered, <= 16 → returns b""
2. read(7) → 14 bytes buffered, <= 16 → returns b""
3. read(7) → 20 bytes total, splits ciphertext/tag → returns plaintext

In Python, read() returning b"" conventionally signals EOF. This breaks
common patterns like: while chunk := stream.read(7)

DelayedAuthCBCDecryptingStream does not have this issue — CBC cipher.update()
always produces output when given input.

## Java Behavior

Java's CipherSubscriber (the delayed-auth equivalent) does the same thing.
When cipher.update() returns null/empty, it explicitly sends an empty
ByteBuffer downstream. This is fine in Java's reactive streams model where
empty emissions are normal signaling. In Python's read() API, it's surprising.

## Options Considered

1. Keep as-is, document it — match Java semantics.
2. Loop internally in read() — more Pythonic, but violates io.py Reader.read
   contract: "If size is specified, at most size items will be read."
3. Require minimum read size (chosen) — raise if amt < tag_length + 1.

## ESDK-Python Comparison

ESDK-Python's StreamDecryptor never has this problem because it decrypts at
the frame level. Each frame has its own IV and tag, so authentication is
per-frame. S3EC operates on a single non-framed GCM ciphertext where the tag
is simply appended — the stream must separate tag from ciphertext on the fly.

@kessplas kessplas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion,
we can simplify the stream implementation quite a bit,
because we know the length of the object from the GetObject response.
Then we don't need to do the lookahead stuff.

Comment thread src/s3_encryption/stream.py Outdated
Comment thread src/s3_encryption/stream.py Outdated
- Refactor DelayedAuthGCMDecryptingStream to use content_length instead
  of rolling tag buffer and peek-ahead
- Add ContentLength validation in on_get_object_after_call
- Pass content_length through pipeline to all stream constructors
- Rename stream classes: BufferedDecryptingGCMStream → GCMBufferedDecryptingStream,
  DelayedAuthCBCDecryptingStream → CBCDecryptingStream,
  DelayedAuthGCMDecryptingStream → GCMDelayedAuthDecryptingStream
- Track _amount_read for progress in all three streams
- Remove minimum read size restriction
@texastony texastony marked this pull request as draft March 26, 2026 23:23
- GCMBufferedDecryptingStream.__enter__ returns self for consistent
  context manager behavior across all stream classes
- GCMDelayedAuthDecryptingStream raises on content_length < tag_length
- Clarify content_length comment as ciphertext content length
@texastony texastony marked this pull request as ready for review March 26, 2026 23:53
Comment thread src/s3_encryption/__init__.py
Comment thread src/s3_encryption/stream.py Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to do this when we know the content length.

Comment thread src/s3_encryption/stream.py Outdated
self._decrypt()
return self._raw_stream.tell()

def __enter__(self): # noqa: D105

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this relates to using the stream in a with block we should test that too.
Ideally, both a unit test that can assert close is called upon exit,
and an integration test that shows this pattern is functional.

Comment thread src/s3_encryption/stream.py Outdated
def __attrs_post_init__(self): # noqa: D105
# By passing in content_length, and updating _amount_read in read(),
# we support the super's normal progression.
# However, we do not support the super's _verify_content_length.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not support calling _verify_content_length?
This is a problem as we can't just assume this won't get called.
We need to implement this correctly.

Comment thread src/s3_encryption/stream.py Outdated
_tag_length: int = field()
# _content_length intentionally collides with super's _content_length
_content_length: int = field()
_ciphertext_remaining: int = field(init=False)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be clear if this means ciphertext as in "the ciphertext object" (meaning ciphertext + tag) or ciphertext as in "the ciphertext" (meaning just ciphertext and no tag) since it is ambiguous and critically important. Ideally change the variable name or at least leave a comment.

Personally especially in terms of lengths I interpret "ciphertext" as ciphertext + tag, and "plaintext" as the original content length (which is also the length of the ciphertext without the tag).

This is particularly relevant for eg implementing _verify_content_length and whether or not we overwrite the content length of the plaintext stream returned from GetObject.

Let's discuss further offline.

Comment thread src/s3_encryption/stream.py Outdated
return b""
self._finalized = True
try:
tag = self._body.read(self._tag_length)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the theme of ciphertext vs plaintext length, we need to update self._amount_read here as well since that is tracking ciphertext + tag length.

@texastony texastony marked this pull request as draft March 27, 2026 19:44
- Add decryptor.py with AesCbcDecryptor and AesGcmDecryptor
- AesGcmDecryptor uses rolling tail buffer to hold back GCM tag bytes
- Rename streams to BufferedDecryptingStream and DecryptingStream
- Remove CBCDecryptingStream (CBC now uses DecryptingStream)
- Streams are cipher-agnostic, delegating all crypto to Decryptor
- DecryptingStream loops on empty update() results to avoid
  returning spurious empty bytes mid-stream
- Override all StreamingBody methods explicitly on both stream classes
… iterator tests

- Add buffered_decrypt.py with one_shot_decrypt() returning plain StreamingBody
- Remove BufferedDecryptingStream class from stream.py
- Add docstrings to DecryptingStream iterator methods
- Wire _verify_content_length into _finalize to catch truncated HTTP responses
- Add unit tests for iter_chunks, iter_lines, __iter__, __next__, readinto, readlines
- raise S3EncryptionClientSecurityError for CBC/GCM decryption failures
- remove magic number tolerance in _verify_content_length
- fix type annotations (object -> StreamingBody)
- fix __iter__/__next__ inconsistency in DecryptingStream
- Remove unnecessary __del__ and set_socket_timeout overrides
- Update test assertions to expect SecurityError
@texastony texastony marked this pull request as ready for review March 31, 2026 19:30
@texastony texastony merged commit c461909 into staging Apr 2, 2026
5 checks passed
@texastony texastony deleted the tonyknap/feat-buffered-decryption-aes-gcm branch April 2, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants