Skip to content

Add support for SAML#19

Open
jlegate wants to merge 6 commits intomainfrom
jl/saml
Open

Add support for SAML#19
jlegate wants to merge 6 commits intomainfrom
jl/saml

Conversation

@jlegate
Copy link

@jlegate jlegate commented Mar 11, 2026

PR Description

Based on the official client changes, this incorporates similar support for SAML while retaining previous behavior.

TESTED: Ran client locally, validated user auth works, validated SAML works. Verified credential process is successful.

PR Checklist

  • New automated tests have been written to the extent possible.
  • The code has been checked for structural/syntactic validity.
    • AMI/application: a build was performed
    • terraform changes: "terraform plan" checked on every affected environment
  • (If applicable) the code has been manually tested on our infrastructure.
    • AMI/application: deployed an a test or dev environment
    • terraform changes: applied to test or dev environment
    • script: run against test or dev environment
  • Likely failure points and new functionality have been identified and tested manually.
    Examples:
    • Application manually run in a way that triggers any new branches
    • AMI logged into and changes verified from login shell
  • Pull request description includes a description of all the manual steps performed to accomplish the above.

To provide feedback on this template, visit https://docs.google.com/document/d/1YfTv7Amyop5G_8w1c2GJ_Mu-70L0KkZHhm9f9umDi3U/edit

jlegate added 5 commits March 11, 2026 12:04
Adds internal/saml package implementing a full SAML SP-initiated flow:
- Loads and validates IDP metadata from a URL or local file
- Binds a local HTTP server on port 8400 (registered with the IDP as
  the ACS URL) to receive the SAML assertion from the browser
- Forwards the assertion to Kion's /api/v1/saml/callback, extracts the
  SSO code, and exchanges it for a bearer token via /api/v2/login/sso-provider
- Opens the system browser via github.com/pkg/browser; falls back to
  printing the URL if the browser cannot be opened

Extends internal/client with:
- IDMSTypeSAML constant (idms_type_id == 3)
- JSON struct tags and TypeID field on IDMS to support type detection
- NewWithToken constructor for clients authenticating with a bearer token
config: add SAMLTokenConfig to cache the short-lived Kion bearer token
obtained after SAML authentication (~10 min lifetime, stored in
~/.config/kion/saml-token.yml).

util: add newSAMLClient which loads the cached token (with a 30s grace
period for re-auth) or triggers a fresh browser-based SAML flow. Move
the SAML check to the top of NewClient so a leftover key.yml does not
cause a spurious app-api-key-duration error for SAML users.

setup: drive the password vs SAML setup path from the selected IDMS
type (idms_type_id == 3) rather than an upfront auth-method question.
SAML setup optionally creates an App API Key (triggering a browser auth
to do so) and asks about auto-rotation and key duration, mirroring the
password setup flow. The SP issuer is derived automatically as
https://<host>/api/v1/saml/auth.
key create: replace hardcoded keyring/username flow with util.NewClient
so SAML users are not prompted for credentials that don't exist.

login: return a clear error for SAML users; browser-based auth needs no
stored credentials.

logout: clear the cached SAML token (~/.config/kion/saml-token.yml)
instead of attempting to delete a non-existent keyring entry.

credential-process: make username optional (cfg.String instead of
cfg.StringErr) since it is only used as part of the on-disk cache key
and SAML users have no username in their config.
Previously SAML always took precedence over a configured app API key.
Now NewClient uses the app API key if it exists and has not expired,
only falling through to SAML (or password) auth when the key is absent
or expired. This also fixes the rotation guard so it only fires on a
still-valid key rather than an already-expired one.
Calls out specific SAML related configuration options and clarifies when logins/prompts may occur
Copy link

@dacut dacut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions/comments...

timeout := 60 * time.Second
if printURL {
fmt.Printf("Please open the following URL in your browser to authenticate:\n\n%s\n\n", authURL)
timeout = 180 * time.Second
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a constant at the top of the file? Also, any special magic behind 3 minutes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are re-setting the value when we print the URL, it shouldn't be a const. The extra time is for the user to be able to copy/paste and bring up a browser window. The actual times were what kion-cli also used.

Thoughts on something like this?

	timeout := func() time.Duration {
		if printURL {
			fmt.Printf("Please open the following URL in your browser to authenticate:\n\n%s\n\n", authURL)
			return 3 * 60 * time.Second
		}
		browser.Stdout = os.Stderr
		if err := browser.OpenURL(authURL); err != nil {
			fmt.Printf("Could not open browser automatically. Please open the following URL:\n\n%s\n\n", authURL)
			return 3 * 60 * time.Second
		}
		return 60 * time.Second
	}()

	timer := time.NewTimer(timeout)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the 60 * timeSecond and 180 * time.Second parts; should they be defined somewhere as defaultTimeout and defaultSAMLTimeout?

}

// Kion SAML tokens are valid for 10 minutes.
expires := time.Now().Add(10 * time.Minute)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the prior value of time.Now().

if err != nil {
return fmt.Errorf("SAML authentication failed: %w", err)
}
kion := client.NewWithToken(host, token, time.Now().Add(10*time.Minute))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change all occurrences of time.Now() to time.Now().UTC() so we're not dealing with DST issues and local time zones?

Copy link
Author

@jlegate jlegate Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against this change, but I don't know that it would make a functional difference in the operation. All of the places where we use time.Now() should be self-referential. The source string is ISO 8601 in the response (cf apiV1getAppAPIKey). I've changed them to all use UTC

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere, one of these gets persisted out to ~/.config/kion/saml-token.yaml:

expires: 2026-03-16T11:20:34.236699-07:00

The CLI still has a bug somewhere where it doesn't respect the expiration date of credentials in the credentials cache, and part of me once thought it might be due to a TZ issue.

return nil, err
}
// Use the app API key if it has not yet expired.
if time.Now().Before(expiry) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to UTC and store this in a variable so we can reuse it ...

return
}

ssoCodeRegexp := regexp.MustCompile(`code=(.+)">`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before doing this, we need to validate the SAML token against the public key in the metadata. Did I miss where we're doing this somewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're letting KION/Cloudtamer itself do that validation. The callback is just passing it through to the real cloudtamer. Do you feel we should do it ourselves as well?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is Kion sending us the key? Is it another SAML handshake? If so, in theory we should be validating that the SAML matches Kion's private key.

But now that I think about this more, maybe that's not how we're communicating with Kion, and it's actually:

  • We generate a SAML request
  • IDM forces us to log in if needed, sends the response to CloudTamer
  • CloudTamer just generates an API key and sends it to us. (How? Redirect back to localhost?)

If so, we don't need any more validation.

Could potentially protect against some edge-of-expiry-window race conditions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants