Skip to content

OSIS-165: cap getCredentials access-denied recovery to a single retry#162

Closed
anurag4DSB wants to merge 1 commit into
improvement/OSIS-163-error-boundary-log-oncefrom
bugfix/OSIS-165-cap-getcredentials-recursion
Closed

OSIS-165: cap getCredentials access-denied recovery to a single retry#162
anurag4DSB wants to merge 1 commit into
improvement/OSIS-163-error-boundary-log-oncefrom
bugfix/OSIS-165-cap-getcredentials-recursion

Conversation

@anurag4DSB

Copy link
Copy Markdown
Contributor

Intent: why does this change exist?

A persistent Vault ACCESS_DENIED on assume-role made ScalityTenantSession.getCredentials recreate the role and then call itself again, with no bound, so a Vault misconfiguration could recurse forever and risk a StackOverflowError. OSIS-162 preserved this behavior as-is; OSIS-165 caps it.

System impact: what's affected, including downstream?

Only the assume-role credential path used by getIamClient / getS3Client in ScalityTenantSession. Callers that previously could hang/overflow on a misconfigured role now get a fast, clear VaultServiceException carrying the original AccessDenied error code.

Preserved behavior: what explicitly stays the same?

The happy path is unchanged: a first-time AccessDenied still triggers setupAssumeRole (resolving the account name via getAccount) followed by a successful retry. Non-AccessDenied Vault errors are still rethrown immediately without any recovery.

Intended change: what's different after this PR?

The unbounded self-recursion is replaced by a one-shot retry: recover once, retry once, and if AccessDenied persists rethrow the existing VaultServiceException instead of recursing again. The retry/recovery routine (setupAssumeRole) now runs at most once.

Verification: how do we know this worked, or how would we know if it didn't?

Added stopsRetryingWhenAccessDeniedPersistsAfterRecreatingRole which feeds a persistent AccessDenied and asserts a thrown VaultServiceException, setupAssumeRole invoked exactly once, and exactly two assume-role attempts (no infinite loop). Ran :osis-core:test (all 12 ScalityTenantSessionTest cases pass) and jacocoTestCoverageVerification (passes, stays >=75%).

@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown

LGTM — the unbounded recursion is correctly replaced with a one-shot retry: recover once, retry once, rethrow on persistent failure. The two-arg setupAssumeRole is synchronous (only the OsisTenant overload is @async), so the retry correctly waits for role recreation to finish. Test coverage for the persistent-failure path is solid.

Review by Claude Code

@anurag4DSB

Copy link
Copy Markdown
Contributor Author

Superseded by #168, which re-points the same single-retry cap at the original getCredentials on main (the recursion bug exists in production code, so the refactor is not needed to fix it). Closing this one.

@anurag4DSB anurag4DSB closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant