Skip to content

feat(metadata): Add Instruction File Support#140

Merged
texastony merged 27 commits into
stagingfrom
tonyknapp/instruction-files
Mar 4, 2026
Merged

feat(metadata): Add Instruction File Support#140
texastony merged 27 commits into
stagingfrom
tonyknapp/instruction-files

Conversation

@texastony

@texastony texastony commented Feb 16, 2026

Copy link
Copy Markdown

Instruction File Support

Adds support for reading encrypted objects that store content metadata in S3 instruction files, enabling cross-language compatibility with Java, Ruby, PHP, Go, and .NET S3 Encryption Client implementations.

What

Instruction files are companion S3 objects (e.g., my-object.instruction) that store encryption metadata separately from the encrypted object's S3 metadata. This is used by some S3EC implementations and is required for V3 format objects where the encrypted data key is stored separately from content metadata.

Changes

Metadata & Version Detection

  • V3 format support in ObjectMetadata with compressed metadata keys
  • is_v1_format(), is_v2_format(), is_v3_format() detection methods
  • should_use_instruction_file() to determine when instruction file fallback is needed
  • has_exclusive_key_collision() to detect conflicting version keys

Decrypt Pipeline

  • Refactored decrypt() to use version detection methods
  • Extracted _decrypt_v1(), _decrypt_v2(), _decrypt_v3() for per-version logic
  • Automatic instruction file fallback when object metadata matches no known format
  • V1/V2 validation that all metadata is present in instruction file

Instruction File Handling

  • instruction_file.py module with JSON parsing and S3EC key validation
  • Thread-local instruction_file_mode flag to prevent recursive decryption
  • Event handler intercepts instruction file responses and places parsed metadata
  • Configurable instruction_file_suffix on S3EncryptionClientConfig (default: .instruction)

Infrastructure

  • S3ECStaticTestObjectsBucket CDK resource for cross-language test objects

Specification Coverage

7 duvet citations with type=implementation and type=test annotations covering:

  • Custom and default instruction file suffix behavior
  • JSON serialization requirements
  • V1/V2 full metadata storage in instruction files
  • V3 split metadata between object metadata and instruction file
  • Instruction file fallback when no format matches

Testing

Test Status
Unit tests (43) ✅ Passing
V2 instruction file (integration) ✅ Passing
V2 custom suffix (integration) ✅ Passing
Invalid instruction file (integration) ✅ Passing
V1 instruction file (integration) ⏭️ Skipped — CBC decryption not yet implemented
V3 instruction file (integration) ⏭️ Skipped — V3 decryption not yet implemented
V3 custom suffix (integration) ⏭️ Skipped — V3 decryption not yet implemented

Review Notes

  • Ruby and PHP do not set x-amz-crypto-instr-file marker metadata, so the strict marker check was removed for cross-language compatibility (per kessplas review feedback).
  • Static test objects in s3ec-static-test-objects bucket were created by Java S3EC V4 (and V1 for the V1 test object).

Base automatically changed from tonyknapp/duvet-v3 to staging February 18, 2026 17:36
- Add V3 metadata field constants (x-amz-c, x-amz-3, x-amz-w, etc.)
- Add V3 field attributes to ObjectMetadata class
- Update from_dict() and to_dict() to handle V3 fields
- Add is_v1_format(), is_v2_format(), is_v3_format() methods
- Add has_exclusive_key_collision() to detect version key conflicts

This enables the client to recognize and parse V3 format encrypted
objects, which use compressed metadata keys.
- Add test_from_dict_v3_fields() to verify V3 field parsing
- Add test_to_dict_v3_fields() to verify V3 field serialization
- Add test_is_v1_format() to verify V1 format detection
- Add test_is_v2_format() to verify V2 format detection
- Add test_is_v3_format() to verify V3 format detection
- Add test_has_exclusive_key_collision() to verify collision detection

All tests verify correct behavior including exclusion of other
version keys and proper collision detection across V1/V2/V3.
Remove trailing whitespace to comply with Black formatting rules.
- Add s3_client field to GetEncryptedObjectPipeline for instruction file support
- Refactor decrypt() to use is_v1_format(), is_v2_format(), is_v3_format()
- Extract version-specific logic into _decrypt_v1(), _decrypt_v2(), _decrypt_v3()
- Separate materials preparation from decryption operation
- Add return type annotations to _decrypt methods

In metadata.py:
- Add is_v3_in_object_metadata() to detect V3 with instruction file
- Add should_use_instruction_file() to determine when to fetch instruction file

This refactoring prepares the pipeline for instruction file support by
making version detection explicit and separating concerns.
- Add test_decrypt_v1_from_instruction_file
- Add test_decrypt_v2_from_instruction_file
- Add test_decrypt_v3_from_instruction_file
- Add json import to pipelines.py
- Tests verify instruction file fetching works correctly
- Pass s3_client to GetEncryptedObjectPipeline constructor
- Pass bucket and key from params to pipeline.decrypt()
- Required for instruction file support
@texastony texastony force-pushed the tonyknapp/instruction-files branch from 0497224 to 3d9ad73 Compare February 19, 2026 20:09
This reverts commit fda6240.
- Add S3EC_INTERNAL_PLAINTEXT_MODE constant for internal plaintext fetching
- Implement plaintext mode check in on_get_object_after_call to skip decryption
- Create instruction_file.py module with strict validation:
  - Validates instruction file is valid JSON
  - Ensures only S3EC metadata keys are present
  - Verifies x-amz-crypto-instr-file marker in response metadata
  - Fetches instruction files in plaintext mode to prevent recursive decryption
- Update TODO comments to reference plaintext_mode refactoring

This enables secure instruction file fetching without attempting to decrypt
the instruction file itself, preventing infinite recursion.
@texastony texastony changed the title feat(metadata): Add V3 format support to ObjectMetadata feat(metadata): Add Instruction File Support Feb 23, 2026
…f separate client

- Remove instruction_file_client parameter from S3EncryptionClient
- Implement thread-local plaintext_mode flag for instruction file fetching
- Expose plugin context on wrapped client for instruction file access
- Event handler validates instruction files and replaces body stream
- Update GetEncryptedObjectPipeline to use s3_client instead of instruction_file_client
- Refactor fetch_instruction_file to set/clear plaintext mode flag
- Split instruction file logic into parse_instruction_file and fetch_instruction_file
- Update integration and unit tests to remove instruction_file_client parameter
- Add test stub for invalid instruction file parsing

This enables instruction file fetching without requiring a separate S3 client,
preventing infinite recursion by using thread-local context instead of passing
parameters through boto3's validation layer.
- Add s3ec-static-test-objects bucket to CDK stack
- Grant object-level permissions (GetObject, PutObject, DeleteObject)
- Grant bucket-level permissions (ListBucket)
- Accessible by GitHub Actions and ToolsDevelopment role
- Supports static test objects created by Java S3EC for Python compatibility testing
- Update bucket to s3ec-static-test-objects
- Update KMS key to S3ECTestServerKMSKey (a3889cd9-99eb-4138-a93a-aea9d52ec2ef)
- Use static V2 test object: static-v2-instruction-file-from-java-v4
- Use static V3 test object: static-v3-instruction-file-from-java-v4
- V2 test active and expects exact content match
- V3 test skipped until V3 decryption implemented
- Add V1 instruction file test (skipped until CBC decryption implemented)
  - Uses static-v1-instruction-file-from-java-v1 test object
  - Enables legacy wrapping algorithms for V1 format
- Add negative test for invalid instruction file
  - Uses NEGATIVE-static-v2-instruction-file-test-from-java-v4 test object
  - Expects S3EncryptionClientError when instruction file is invalid
  - Prints error message for debugging
- Replace skipped unit test stub with full integration test
- Event handler now parses instruction file and places metadata in Metadata field
- Clear Body when in plaintext mode (instruction files shouldn't return body content)
- fetch_instruction_file returns metadata from response Metadata field
- Add validation that instruction file metadata is not empty and contains S3EC keys
- Update unit tests to mock new behavior (metadata in Metadata, empty Body)
- All 37 unit tests passing
… tests

- Fix: Check x-amz-crypto-instr-file marker in S3 object metadata, not parsed JSON body
- Add 5 unit tests for on_get_object_after_call instruction file behavior
  - Test successful parsing and metadata update
  - Test missing marker error
  - Test invalid JSON error
  - Test non-dict JSON error
  - Test invalid keys error
- All 42 unit tests passing
Comment thread src/s3_encryption/__init__.py Outdated
Comment thread src/s3_encryption/__init__.py Outdated
Comment thread src/s3_encryption/instruction_file.py Outdated
Comment thread src/s3_encryption/instruction_file.py Outdated
Comment thread src/s3_encryption/instruction_file.py Outdated
Comment thread test/integration/test_i_s3_encryption_instruction_file.py
Comment thread test/test_s3_encryption_client_plugin.py
Comment thread test/test_pipelines.py Outdated
Comment thread test/test_pipelines.py Outdated
Comment thread test/test_pipelines.py Outdated
…nd improve error handling

- Extract instruction file processing logic into process_instruction_file method
- Add docstring for process_instruction_file method
- Add error check for plaintext mode in put_object (should never happen)
- Move VALID_S3EC_METADATA_KEYS to metadata.py for better organization
- Add error if plugin context not available when fetching instruction file
- Remove _fetch_instruction_file wrapper method, call fetch_instruction_file directly
- Rename test file: test_on_get_object_after_call.py -> test_s3_encryption_client_plugin.py
- Add TODO comments for V1 CBC and V3 decryption in integration tests
- Add x-amz-meta-x-amz-unencrypted-content-length to V1 test metadata
- Fix linting: line length, missing docstring, remove commented code
@texastony texastony marked this pull request as ready for review February 27, 2026 20:00
- Add 7 Duvet citations across instruction_file.py, pipelines.py, and metadata.py
- instruction_file.py: JSON serialization, default suffix, custom suffix support
- pipelines.py: instruction file fallback, V1/V2 metadata storage
- metadata.py: V3 instruction file split between object metadata and instruction file
Comment thread src/s3_encryption/__init__.py Outdated

@kessplas kessplas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for the strict metadata key check

Comment thread src/s3_encryption/__init__.py Outdated
if hasattr(self._plugin._context, "encryption_context"):
delattr(self._plugin._context, "encryption_context")
# Clean up thread-local storage;
# do not clean up the client as it is not thread local only

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

musing: this makes we wonder if we do want to use a weakref here...
i think it's fine to not use one,
since it is reasonable for the wrapped client to stay in scope when there are still in-flight requests.

Comment thread src/s3_encryption/instruction_file.py Outdated
Comment thread src/s3_encryption/instruction_file.py Outdated

return self.cmm.decrypt_materials(dec_materials)

def _decrypt_v1(self, metadata, encryption_context) -> DecryptionMaterials:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some of the code is duplicated between v1/v2, but i don't mind this approach since it makes it obvious they are different formats

Comment thread src/s3_encryption/pipelines.py Outdated
Comment thread src/s3_encryption/pipelines.py
Comment thread src/s3_encryption/instruction_file.py Outdated
Comment thread src/s3_encryption/instruction_file.py Outdated
Comment thread src/s3_encryption/instruction_file.py Outdated
- Add instruction_file_suffix config field with duvet citations
- Pass instruction_suffix through to decrypt pipeline
- Build instruction key in pipeline instead of instruction_file.py
- Remove suffix param from fetch_instruction_file (receives full key)
- Add V1/V2 object metadata validation check for instruction files
- Move duvet citations to appropriate locations
- Fix instruction_suffix -> instruction_file_suffix typo in __init__.py
…erage

- Add test for should_use_instruction_file with duvet citation
- Add test for custom instruction_file_suffix with duvet citation
- Annotate existing tests with duvet citations for default suffix,
  JSON serialization, V1/V2 instruction file, and V3 instruction file
- All 7 instruction file spec citations now have type=test annotations
- Remove strict x-amz-crypto-instr-file marker check for Ruby/PHP compat
- Rename plaintext_mode to instruction_file_mode
- Remove marker validation test
- Update docstrings to reflect relaxed validation
Comment thread src/s3_encryption/instruction_file.py Outdated

@kessplas kessplas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be extremely groovy if you added an integration test for custom-suffix as well (in the static-test-objects bucket).

Otherwise the code looks good to me.

… suffix integration tests

- Replace type=citation with type=implementation across all 7 duvet annotations
- Add V2 custom suffix integration test (active)
- Add V3 custom suffix integration test (skipped, V3 not yet implemented)
- Copy instruction files to .custom-suffix-instruction keys in S3
@texastony texastony merged commit f8a6c4c into staging Mar 4, 2026
5 checks passed
@texastony texastony deleted the tonyknapp/instruction-files branch March 4, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants