Skip to content

feat(aws-sm-vault): detect SSO/credential errors and prepend remediation hint (#228)#126

Merged
stack72 merged 1 commit intomainfrom
worktree-streamed-zooming-wirth
May 6, 2026
Merged

feat(aws-sm-vault): detect SSO/credential errors and prepend remediation hint (#228)#126
stack72 merged 1 commit intomainfrom
worktree-streamed-zooming-wirth

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented May 6, 2026

Summary

Follow-up to swamp-club#226 (PRs #120/#121 covered s3-datastore and gcs-datastore). The aws-sm vault wraps the AWS SDK and surfaces the same CredentialsProviderError discoverability issue when SSO sessions expire mid-secret-fetch.

  • New vault/aws-sm/extensions/vaults/aws_sm_errors.tsAwsSmOperationError (preserves SDK error.name) + pure helpers classifyAwsCredentialError / deriveAwsErrorCode / formatAwsCredentialHint + wrapAwsSmError. Mirrors the s3 pattern with vault-flavoured wording ("Vault session expired:" / "Vault credentials rejected by AWS:") and points at aws sso login --profile <P>.
  • aws_sm.ts wraps every .send() call. The put() create-on-missing fallback migrates from error instanceof ResourceNotFoundException to error.name === "ResourceNotFoundException" — functionally equivalent, removes the SDK class import.
  • Inline SDK pin bumped @3.1010.0@3.1024.0 so it agrees with deno.json.
  • Manifest 2026.04.22.22026.05.06.1.

Empirical grounding (probe at @3.1024.0, 2026-05-06)

Two SDK shapes were verified against a local mock before writing the assertions:

  • Pre-flight credential failure surfaces as name === "CredentialsProviderError" directly on the outer error with no cause chain. The older swamp-club#226 stack trace had cause.name === "_CredentialsProviderError". Both shapes are covered: the direct case is the canonical 3.1024.0 path; the cause-walk + leading-underscore-strip in deriveAwsErrorCode remains as defensive coverage for minified SDK builds.
  • HTTP 400 with body {} produces err.name="Unknown" and err.message="UnknownError". wrapAwsSmError mirrors the s3 wrapper's code !== "Unknown" and message !== "UnknownError" filters so the wrapped message stays clean (no leaked "Unknown" / "UnknownError" strings).

Test plan

  • deno check extensions/vaults/aws_sm.ts extensions/vaults/aws_sm_errors.ts from vault/aws-sm/
  • deno lint extensions/vaults/
  • deno fmt --check extensions/vaults/
  • deno test --allow-env --allow-net --allow-sys extensions/vaults/37 passed, 0 failed
    • 24 in new aws_sm_errors_test.ts (helpers + direct wrapAwsSmError for both pre-flight shapes + noise-filter negative test + AWS_PROFILE embed)
    • 13 in aws_sm_test.ts (6 new behavioural cases: Vault session expired: prefix on get, Vault credentials rejected by AWS: on 403, propagation through list and put, put create-on-missing regression guard for the instanceof → name migration, malformed-body defense)
  • deno install --frozen clean (lockfile matches the bumped pin)
  • assertVaultExportConformance and assertVaultConformance still green

Notes

  • withMockAws now scrubs AWS_PROFILE alongside the other AWS_* env vars so a developer's shell env can't poison hint assertions (the wrapper reads AWS_PROFILE at wrap time).
  • No CI matrix changes — aws-sm is already in .github/workflows/ci.yml lines 58 and 90.
  • SDK pin bump is two minor versions; probe verified the credential-error and malformed-body shapes match expectations at @3.1024.0. A changelog scan beyond the credential-error path was not done — flagging for review awareness.
  • AWS Secrets Manager uses AWS JSON 1.1 (not XML), so installErrorBodyCapture middleware from s3_client.ts does not apply and is intentionally omitted.

Refs: swamp-club#228, swamp-club#226

🤖 Generated with Claude Code

…ion hint (#228)

Follow-up to swamp-club#226 (PR #120/#121 covered s3-datastore and
gcs-datastore). The aws-sm vault wraps the AWS SDK and surfaced the
same CredentialsProviderError discoverability issue when SSO sessions
expire mid-secret-fetch.

Adds vault/aws-sm/extensions/vaults/aws_sm_errors.ts containing:

  - AwsSmOperationError: error wrapper that preserves the SDK error's
    `name` (so error.name === "ResourceNotFoundException" still works)
    and exposes httpStatusCode, code, and requestId.
  - classifyAwsCredentialError, deriveAwsErrorCode,
    formatAwsCredentialHint: pure helpers mirroring the s3 pattern.
  - wrapAwsSmError: front-loads "Vault session expired:" or "Vault
    credentials rejected by AWS:" with `aws sso login --profile <P>`
    pointing at the next action.

aws_sm.ts wraps every .send() call and routes errors through
wrapAwsSmError. The put() create-on-missing fallback migrates from
`error instanceof ResourceNotFoundException` to
`error.name === "ResourceNotFoundException"`, which is functionally
equivalent and removes the SDK class import.

Empirical grounding (probe at @aws-sdk/client-secrets-manager@3.1024.0,
2026-05-06):

  - Pre-flight credential failure surfaces as
    `name === "CredentialsProviderError"` directly on the outer error
    with no cause chain. The cause-walk + leading-underscore-strip in
    deriveAwsErrorCode remains as defensive coverage for minified SDK
    builds and the older #226 stack-trace pattern.
  - HTTP 400 with a body lacking `__type` produces err.name="Unknown"
    and err.message="UnknownError". wrapAwsSmError mirrors the s3
    wrapper's `code !== "Unknown"` and `message !== "UnknownError"`
    filters so the wrapped output stays clean.

Both shapes are covered by direct wrapAwsSmError tests in
aws_sm_errors_test.ts (24 tests). aws_sm_test.ts adds 6 behavioural
tests covering all four operations, the put create-on-missing
regression guard for the instanceof → name migration, and the
malformed-body defense.

Bumps the inline SDK pin in aws_sm.ts from @3.1010.0 → @3.1024.0 so
the new aws_sm_errors.ts and aws_sm.ts agree (matches deno.json's
import map).

Manifest: 2026.04.22.2 → 2026.05.06.1.

Refs: swamp-club#228, swamp-club#226

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. **Stale entries in ** — The lock file still contains both and entries. Since already had before this PR, the old entries were already stale; the inline-import bump in this PR didn't clean them up. passes regardless (all current deps are locked), so this is cosmetic, but running a fresh and committing the result would keep the lockfile tidy.

  2. Duplicate comment block () — The explanation for appears twice in a row. The first instance (lines 189-191) can be dropped; the second (lines 193-195) is the better-worded one and is sufficient.

  3. Slightly misleading test name"aws-sm vault: list propagates the wrapper on credential error (covers all four ops)" — the parenthetical implies this single test covers four operations, but it only exercises . The collection of tests in the section covers all four ops, so either drop the parenthetical or move it to a section comment.


Overall the implementation is clean and well-scoped. The error-classification helpers are pure functions with thorough unit tests, the migration from to is correctly handled with a regression guard, env-var teardown in uses as required, and is in place for all SDK-client tests. The approach mirrors the established s3/gcs pattern faithfully.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. Stale lock entries — deno.lock still contains both @aws-sdk/client-secrets-manager@3.1010.0 and @3.1024.0 entries. Since deno.json already had @3.1024.0 before this PR, the old entries were already stale; the inline-import bump did not clean them up. deno install --frozen passes regardless (all current deps are locked), so this is cosmetic, but running a fresh deno install and committing the result would keep the lockfile tidy.

  2. Duplicate sanitizeResources comment block (aws_sm_test.ts:189-195) — The explanation for why sanitizeResources: false is needed appears twice in a row. The first block (lines 189-191) can be dropped; the second (lines 193-195) is the better-worded one and sufficient.

  3. Slightly misleading test name — "aws-sm vault: list propagates the wrapper on credential error (covers all four ops)" — the parenthetical implies this single test covers four operations, but it only exercises list(). The section as a whole covers all four SDK operations; either drop the parenthetical or move it to a comment above the block.


Overall the implementation is clean and well-scoped. Error-classification helpers are pure functions with thorough unit tests, the put() migration from instanceof ResourceNotFoundException to name === "ResourceNotFoundException" is correctly handled with a regression guard, env-var teardown in withMockAws uses a finally block as required, and sanitizeResources: false is in place for all SDK-client tests. The approach mirrors the established s3/gcs pattern faithfully.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

I traced every code path in the new error module and the modified aws_sm.ts, examined edge cases in the helpers, checked for security surfaces, and compared against the s3 pattern. This is solid work.

Critical / High

None.

Medium

  1. put() create-on-missing relies on wrapped.name, credential classification relies on wrapped.code — different source properties (aws_sm.ts:98-100, aws_sm_errors.ts:207,180).
    AwsSmOperationError.name is set from e.name (the raw SDK error name), while .code is derived via deriveAwsErrorCode which prefers e.Code over e.name. For a JSON-protocol service like Secrets Manager, e.name IS the error code and e.Code is typically unset, so both properties agree today. But if a future SDK version (after the pin is bumped) starts populating e.Code with the error type while setting e.name to a generic class name, the create-on-missing check (wrapped.name === "ResourceNotFoundException") would silently fail — put() would rethrow instead of falling through to CreateSecret.
    Mitigated by: the SDK pin at @3.1024.0, the empirical probe notes, and the regression test at line 368 of aws_sm_test.ts. No action needed now, but worth re-verifying on the next SDK bump.

Low

  1. deriveAwsErrorCode walks only one cause level (aws_sm_errors.ts:127). If a future SDK build nests the real error class at e.cause.cause or deeper, the classifier won't find it. Empirically verified at the pinned version; the one-level walk is reasonable defensive coverage.

  2. formatAwsCredentialHint embeds awsProfile into a suggested shell command without escaping shell metacharacters (aws_sm_errors.ts:149). A profile name like `whoami` or $(cmd) would produce a copy-pasteable command that could execute code. Since AWS_PROFILE is set by the user's own environment (not external input), this isn't exploitable by an attacker — but if this pattern is ever reused with external input, it would need sanitization.

  3. Cross-file env-var race under --parallel: aws_sm_errors_test.ts and aws_sm_test.ts both mutate AWS_PROFILE globally. If Deno runs both files in parallel (not the default), concurrent mutations could interfere. Within each file, tests are sequential, so this only matters with an explicit --parallel flag.

Verdict

PASS — No blocking issues. The error module is well-decomposed with pure helpers, the wrapping is consistent with the s3 pattern, and the test coverage is thorough (24 unit + 13 behavioral tests, including regression guards for the instanceofname migration and Unknown/UnknownError noise filters). The three env vars in withMockAws that use truthiness checks for restore (originalEndpoint, originalKey, originalSecret) predate this PR — the new originalProfile correctly uses !== undefined.

@stack72 stack72 merged commit 726ee97 into main May 6, 2026
35 checks passed
@stack72 stack72 deleted the worktree-streamed-zooming-wirth branch May 6, 2026 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant