Skip to content

do not merge - vsphere client mgmt#10581

Draft
jcpowermac wants to merge 3 commits into
openshift:mainfrom
jcpowermac:hermes-test
Draft

do not merge - vsphere client mgmt#10581
jcpowermac wants to merge 3 commits into
openshift:mainfrom
jcpowermac:hermes-test

Conversation

@jcpowermac
Copy link
Copy Markdown
Contributor

@jcpowermac jcpowermac commented May 29, 2026

Messing around with Hermes, vLLM and https://huggingface.co/RedHatAI/Qwen3.6-35B-A3B-NVFP4

model: "RedHatAI/Qwen3.6-35B-A3B-NVFP4"
enforce-eager: false
enable-prefix-caching: true
gpu-memory-utilization: 0.88
kv-cache-dtype: fp8
language-model-only: true
max-model-len: auto
max-num-seqs: 1
reasoning-parser: qwen3
tensor-parallel-size: 2
tool-call-parser: "qwen3_coder"
trust-remote-code: true
enable-auto-tool-choice: true
enable-expert-parallel: true
enable-chunked-prefill: true
speculative-config: '{"method":"mtp","num_speculative_tokens":1}'
max-num-batched-tokens: 8192
moe-backend: flashinfer_cutlass

Summary by CodeRabbit

  • New Features

    • Configurable vSphere session options (timeout, TLS/insecure toggle, structured logging).
  • Refactor

    • Unified session-based vSphere connection management with deterministic caching to avoid redundant logins.
    • Centralized and improved session cleanup/logout for more reliable connection teardown.
    • Deprecated legacy direct client-creation API in favor of the new session approach.
  • Tests

    • Added comprehensive session/cache and option validation tests.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Walkthrough

This PR introduces a unified vSphere Session abstraction that centralizes SOAP/REST client creation, caching, and cleanup, deprecates the legacy CreateVSphereClients wrapper, updates consumers to use NewSession, and adds comprehensive session tests.

Changes

vSphere Session Abstraction

Layer / File(s) Summary
Session types and configuration
pkg/asset/installconfig/vsphere/session.go
Session holds vSphere connection state with SessionOptions (timeout default 60s, insecure toggle, logger via functional options) and hostname redaction helper.
Session caching and lifecycle management
pkg/asset/installconfig/vsphere/session.go
SessionCacheKey (truncated SHA-256) and a mutex-guarded cache reuse/evict logic; NewSession creates or reuses sessions with a context timeout and returns an idempotent cleanup closure; Session.Close() logs out REST then SOAP via sync.Once.
Session accessor methods
pkg/asset/installconfig/vsphere/session.go
Exported getters: Vim25Client(), RestClient(), CNSClient(), PBMClient(), Finder(), Server().
Legacy client factory deprecation
pkg/asset/installconfig/vsphere/client.go
CreateVSphereClients marked deprecated and rewritten as a thin wrapper around NewSession; removed unused direct-connection imports.
Consumer migration to NewSession
pkg/asset/installconfig/vsphere/validation.go, pkg/asset/installconfig/vsphere/vsphere.go
Replaces CreateVSphereClients with NewSession, wires managers/clients from session getters, and returns the session cleanup closure.
Destroy client lifecycle update
pkg/destroy/vsphere/client.go
Client now stores *vsphere.Session; NewClient creates a session and derives Vim25/REST/CNS clients; Logout closes the session directly and NewClient closes session on CNS init failure.
Session comprehensive test coverage
pkg/asset/installconfig/vsphere/session_test.go
Tests cache-key format/determinism/sensitivity, NewSession error paths and timeout, Session lifecycle and idempotent close, redactHostname, and SessionOptions defaults/overrides.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 12 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'do not merge - vsphere client mgmt' is vague and uses the prefix 'do not merge', which indicates a draft/WIP status rather than describing the actual change. Replace with a clear, descriptive title that summarizes the main change, such as 'Refactor vSphere client management to use NewSession wrapper' or similar.
Test Structure And Quality ❓ Inconclusive Custom check requests Ginkgo test review, but PR contains only standard Go tests using testify/assert. Check is not applicable to this PR. The custom check specifies "Review Ginkgo test code" but session_test.go uses standard Go testing with testify/assert, not Ginkgo (no Describe/It/BeforeEach blocks). The check criteria are not applicable.
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 31 new tests in session_test.go use static test names with no dynamic content, timestamps, UUIDs, or interpolation. Standard Go testing package used, not Ginkgo.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests added. PR only includes unit tests using standard Go testing framework, not Ginkgo patterns (It, Describe, Context, When). Check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests found. Only standard Go unit tests in session_test.go; check is not applicable to this PR.
Topology-Aware Scheduling Compatibility ✅ Passed PR contains only vSphere client management refactoring. No Kubernetes deployment manifests, pod specs, or scheduling constraints are introduced or modified.
Ote Binary Stdout Contract ✅ Passed No stdout writes detected in process-level code; all logging uses logrus which defaults to stderr. No process-level entry points in modified files.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds no Ginkgo e2e tests, only implementation code and standard Go unit tests, making this check not applicable.
No-Weak-Crypto ✅ Passed No weak cryptography found; only SHA-256 is used for cache keys and log redaction, with no MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB, custom crypto, or non-constant-time comparisons detected.
Container-Privileges ✅ Passed PR contains only Go source code refactoring vSphere client management; no Kubernetes/container manifests or privileged container settings present.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data exposed in logs. Hostnames are consistently redacted using SHA-256 hashing before logging. Usernames and passwords are never logged anywhere in the code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@jcpowermac
Copy link
Copy Markdown
Contributor Author

@coderabbitat please review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

@jcpowermac Sure, I'll review the changes now!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@jcpowermac
Copy link
Copy Markdown
Contributor Author

/test e2e-vsphere-ovn

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/asset/installconfig/vsphere/session.go`:
- Around line 148-149: The log call s.logger.WithField("vcenter",
s.server).Debug("Closing vSphere session") (and the similar occurrences at the
other spots) is emitting raw vCenter hostnames; replace the raw s.server value
with a redacted or hashed representation before logging. Implement or call a
helper (e.g., redactVCenter or hashHostname) that deterministically masks the
hostname (or returns a one-way hash) and use that helper in all
logger.WithField("vcenter", ...) calls in session.go (refer to the
s.logger.WithField calls around the Closing vSphere session line and the other
occurrences noted) so logs never contain the plain hostname but still allow
correlating entries.
- Around line 105-109: The cache may return a previously closed session because
Close() doesn't remove entries from sessions; update the lookup logic (where
sessionsMu, sessions and sess are used — e.g., the NewSession/cache-hit paths at
the shown locations) to detect a closed/dead sess after retrieving it from
sessions, delete sessions[key] and continue to create a fresh session instead of
returning it; in practice, after reading sess from sessions, check its
closed/invalid state (use the session object's existing IsClosed/Valid/State
check or equivalent), if closed then delete(sessions, key) while holding
sessionsMu, unlock and proceed to create and store a new session; apply the same
eviction/check logic to the other similar cache-return sites referenced (the
blocks at the other indicated locations).
- Around line 123-126: The race path drops the newly created sess when the
second cache check finds an existing session, leaking authenticated clients;
update the cache-return branch so that when you detect an existing session (the
block checking sessions[key] -> s), you call sess.Close() (the just-created
session variable) before returning the cached s, while still unlocking
sessionsMu and returning func() { s.Close() }, nil; ensure you only close the
newly created sess (not the cached s) to avoid double-closing.

In `@pkg/destroy/vsphere/client.go`:
- Around line 70-73: When creating the CNS client, ensure the opened session is
closed if cns.NewClient fails: in the block where cns.NewClient(context.TODO(),
sess.Vim25Client()) is called, call sess.Close() (or the appropriate session
cleanup method on the sess object) before returning the error so the session is
not leaked; update the error path in the function that initializes cnsClient to
close sess on failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 04893483-3d6d-4abe-b149-b4222be98a12

📥 Commits

Reviewing files that changed from the base of the PR and between ae2ae05 and 2801673.

📒 Files selected for processing (3)
  • pkg/asset/installconfig/vsphere/client.go
  • pkg/asset/installconfig/vsphere/session.go
  • pkg/destroy/vsphere/client.go

Comment thread pkg/asset/installconfig/vsphere/session.go
Comment thread pkg/asset/installconfig/vsphere/session.go
Comment thread pkg/asset/installconfig/vsphere/session.go Outdated
Comment thread pkg/destroy/vsphere/client.go
Redact vCenter hostnames from structured logs using a deterministic
SHA-256 hash to avoid exposing raw hostnames. Fixes the session cache
stale-entry return (closed sessions from cache are now evicted),
the double-check race that leaked newly created sessions, and the
CNS client error path that leaked its session.
@jcpowermac
Copy link
Copy Markdown
Contributor Author

@coderabbitat why do you keep not reviewing, please review!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

@jcpowermac Apologies for the delay! I'll kick off a full review of the changes right away. Note that this PR is currently in draft status — CodeRabbit's automatic reviews may be configured to skip draft PRs. If you'd like automatic reviews on draft PRs in the future, you can enable that in the CodeRabbit settings.

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
pkg/asset/installconfig/vsphere/session.go (1)

190-194: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Error() calling once.Do prevents Close() from running.

If Error() is called before Close(), the empty once.Do(func() {}) at line 192 marks the sync.Once as done, and Close() will never execute its cleanup logic. This can leak authenticated sessions.

The comment "ensure once guard is initialized" is incorrect—sync.Once doesn't need initialization; calling Do with an empty func consumes the one-time execution.

Proposed fix — use a separate closed flag
 type Session struct {
 	server   string
 	govmomi  *govmomi.Client
 	vim25    *vim25.Client
 	rest     *rest.Client
 	cns      *cns.Client
 	pbm      *pbm.Client
 	finder   *find.Finder
 	logger   logrus.FieldLogger
 	once     sync.Once
 	closeErr error
+	closed   bool
 }

 // Close releases all resources held by the session.
 func (s *Session) Close() {
 	s.once.Do(func() {
 		// ... existing cleanup logic ...
 		s.closeErr = firstErr
+		s.closed = true
 		s.logger.WithField("vcenter", redactHostname(s.server)).Debug("vSphere session closed")
 	})
 }

 // Error returns any error that occurred during Close.
 func (s *Session) Error() error {
-	s.once.Do(func() {}) // ensure once guard is initialized
+	if !s.closed {
+		return nil // Not yet closed
+	}
 	return s.closeErr
 }

Or, if callers need to distinguish "not closed" from "closed with no error", consider returning (error, bool) or checking s.closed explicitly in the cache eviction logic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 190 - 194, The
Error() method must not call s.once.Do(...) because that prematurely marks the
once guard and prevents Close() from running; remove the empty once.Do call from
Error() and instead track closed state explicitly (e.g., add a s.closed bool set
to true inside Close() after running the once-protected cleanup, protected by
the existing mutex or an atomic), then have Error() return s.closeErr (and/or
provide a variant returning (error,bool) if callers need to know whether Close()
ran). Ensure Close() still uses s.once.Do to perform cleanup and sets s.closed
and s.closeErr inside that Do block so Error() can observe accurate state.
🧹 Nitpick comments (1)
pkg/destroy/vsphere/client.go (1)

70-74: 💤 Low value

Consider reusing the session's CNS client.

The Session already creates a CNS client internally (optional, logged at Debug on failure). Creating another here duplicates the work. If the session's CNS client suffices, use sess.CNSClient() with a nil check.

However, since destroy requires a CNS client while the session treats it as optional, the current approach ensures the error surfaces. This is acceptable if intentional.

Optional: reuse session's CNS client
-	cnsClient, err := cns.NewClient(context.TODO(), sess.Vim25Client())
-	if err != nil {
-		sess.Close()
-		return nil, err
+	cnsClient := sess.CNSClient()
+	if cnsClient == nil {
+		// Session's optional CNS creation failed; retry explicitly.
+		var err error
+		cnsClient, err = cns.NewClient(context.TODO(), sess.Vim25Client())
+		if err != nil {
+			sess.Close()
+			return nil, err
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/destroy/vsphere/client.go` around lines 70 - 74, Replace the
unconditional creation of a new CNS client with reuse of the session's CNS
client: check sess.CNSClient() first and if non-nil assign it to cnsClient;
otherwise fall back to calling cns.NewClient(context.TODO(), sess.Vim25Client())
and preserve the current error handling (sess.Close() and return nil, err) to
surface failures. Use the existing variables cnsClient and sess and the methods
sess.CNSClient() and cns.NewClient(...) to implement this conditional reuse.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/asset/installconfig/vsphere/session.go`:
- Around line 112-124: The mutex sessionsMu is being unlocked twice when a
cached session has sess.Error() != nil: the code calls sessionsMu.Unlock()
inside that branch and then falls through to the subsequent sessionsMu.Unlock(),
causing a panic. Fix by ensuring only one unlock path when evicting a closed
session — remove the inner Unlock after delete(sessions, key) and let the single
Unlock after the if-block run, or restructure to return immediately after
eviction; reference sessionsMu, sessions, sess.Error(), delete(sessions, key)
and the existing return path that returns sess with func() { sess.Close() } to
guide where to change behavior.

---

Duplicate comments:
In `@pkg/asset/installconfig/vsphere/session.go`:
- Around line 190-194: The Error() method must not call s.once.Do(...) because
that prematurely marks the once guard and prevents Close() from running; remove
the empty once.Do call from Error() and instead track closed state explicitly
(e.g., add a s.closed bool set to true inside Close() after running the
once-protected cleanup, protected by the existing mutex or an atomic), then have
Error() return s.closeErr (and/or provide a variant returning (error,bool) if
callers need to know whether Close() ran). Ensure Close() still uses s.once.Do
to perform cleanup and sets s.closed and s.closeErr inside that Do block so
Error() can observe accurate state.

---

Nitpick comments:
In `@pkg/destroy/vsphere/client.go`:
- Around line 70-74: Replace the unconditional creation of a new CNS client with
reuse of the session's CNS client: check sess.CNSClient() first and if non-nil
assign it to cnsClient; otherwise fall back to calling
cns.NewClient(context.TODO(), sess.Vim25Client()) and preserve the current error
handling (sess.Close() and return nil, err) to surface failures. Use the
existing variables cnsClient and sess and the methods sess.CNSClient() and
cns.NewClient(...) to implement this conditional reuse.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fc98e816-70a8-43a8-87a1-78630e304011

📥 Commits

Reviewing files that changed from the base of the PR and between ae2ae05 and 364c615.

📒 Files selected for processing (3)
  • pkg/asset/installconfig/vsphere/client.go
  • pkg/asset/installconfig/vsphere/session.go
  • pkg/destroy/vsphere/client.go

Comment thread pkg/asset/installconfig/vsphere/session.go
@jcpowermac
Copy link
Copy Markdown
Contributor Author

@coderabbitai please review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

@jcpowermac Sure! Triggering a full review of the PR now. Note that since this PR is in draft status, automatic reviews may be skipped by default — you can enable draft PR reviews in the CodeRabbit settings if needed.

✅ Action performed

Full review finished.

- Migrate validation.go:getVCenterClient() and vsphere.go:getClients()
  to use NewSession instead of deprecated CreateVSphereClients
- Simplify Session.Error() by removing unnecessary sync.Once guard
- Fix optional CNS/PBM client creation to use context.Background()
  instead of timeout context to prevent premature cancellation
- Add 33 unit tests for Session covering caching, idempotent close,
  error propagation, logging, and cache key properties
- Remove unused defaultTimeout constant from destroy/vsphere/client.go
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jcpowermac for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jcpowermac
Copy link
Copy Markdown
Contributor Author

@coderabbitai please review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

@jcpowermac Sure! Triggering a full review of the PR now.

Note: This PR is currently in draft status — CodeRabbit's automatic reviews may be configured to skip draft PRs by default. If you'd like automatic reviews on future draft PRs, you can enable that in the CodeRabbit settings.


🧠 Learnings used
Learnt from: CR
Repo: openshift/installer PR: 0
File: cluster-api/providers/vsphere/vendor/github.com/prometheus/procfs/CONTRIBUTING.md:0-0
Timestamp: 2026-06-05T11:46:51.522Z
Learning: Applies to cluster-api/providers/vsphere/vendor/github.com/prometheus/procfs/**/*_test.go : Add tests relevant to fixed bugs or new features
✅ Action performed

Full review finished.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
pkg/asset/installconfig/vsphere/session.go (2)

114-124: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Remove the extra unlock in the eviction path.

When sess.Error() != nil, Line 117 unlocks sessionsMu and control still falls through to Line 124, which unlocks it again. That panics with sync: unlock of unlocked mutex.

Minimal fix
 	if sess, ok := sessions[key]; ok {
 		if sess.Error() != nil {
 			// Cached session is closed — evict it so a fresh one is created.
 			delete(sessions, key)
-			sessionsMu.Unlock()
 			// Fall through to create a new session.
 		} else {
 			sessionsMu.Unlock()
 			return sess, func() { sess.Close() }, nil
 		}
 	}
 	sessionsMu.Unlock()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 114 - 124, The code
double-unlocks the mutex sessionsMu when a cached session is evicted: inside the
sess.Error() != nil branch you call sessionsMu.Unlock() then fall through to the
outer sessionsMu.Unlock(), causing a panic; remove the inner unlock so that
after delete(sessions, key) control falls through and the single outer
sessionsMu.Unlock() releases the lock. Specifically edit the block around
sess.Error() != nil (the delete(sessions, key) path) in session.go so only the
outer sessionsMu.Unlock() is executed.

112-121: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Cache hits hand out a shared session with a real closer, so callers can invalidate each other.

Every NewSession caller gets a cleanup function that calls Close(), including cache hits. If two call sites reuse the same cached Session, the first cleanup logs out the shared SOAP/REST clients while the other caller still holds them. The sess.Error() check does not make this safe either, because a clean Close() leaves closeErr == nil. This needs lease/reference-count semantics, or Close() must evict/mark the session closed before any later cache hit can reuse it.

Also applies to: 147-147

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 112 - 121, The
cached session is being handed out with a real closer so multiple callers can
call sess.Close() and race to logout shared clients; update the cache to use
lease/reference-count semantics instead of returning the raw Close: change the
sessions map entries to a small struct (e.g., containing the Session, an int
refCount, and a closed flag), increment refCount on cache hit in NewSession, and
return a cleanup func that decrements refCount and only calls the underlying
sess.Close() (and removes the entry from sessions) when refCount reaches zero;
additionally ensure that when initiating an actual Close you mark the entry as
closed/evicted before performing network logout so no later cache hit reuses a
session (adjust logic around sessionsMu, NewSession, sess.Close(), and
sess.Error() accordingly; same fix where cache hits are handled at the other
occurrence).
🧹 Nitpick comments (3)
pkg/asset/installconfig/vsphere/session_test.go (2)

341-364: 💤 Low value

Consider verifying redacted hostname format.

The test checks that the logged vcenter field is not the plaintext hostname ("https://vcenter.example.com"), but it doesn't verify that it matches the expected redacted format (32-character hex string from redactHostname). While TestSessionRedactHostname validates the helper function separately, adding a format assertion here would make this test more robust.

♻️ Optional enhancement to verify redacted format
 	// Check that the logged hostname is redacted
 	found := false
 	for _, entry := range hook.AllEntries() {
 		if field, ok := entry.Data["vcenter"]; ok {
 			vcenter, ok := field.(string)
-			if ok && vcenter != "https://vcenter.example.com" {
+			if ok && vcenter != "https://vcenter.example.com" && len(vcenter) == 32 {
+				// Also verify it's hex format
+				isHex := true
+				for _, c := range vcenter {
+					if !((c >= '0' && c <= '9') || (c >= 'a' && c <= 'f')) {
+						isHex = false
+						break
+					}
+				}
+				if isHex {
+					found = true
+					break
+				}
+			} else if ok && vcenter != "https://vcenter.example.com" {
 				found = true
 				break
 			}
 		}
 	}
 	assert.True(t, found, "should have logged redacted vcenter hostname")

Or more simply, just assert the length:

 	// Check that the logged hostname is redacted
 	found := false
 	for _, entry := range hook.AllEntries() {
 		if field, ok := entry.Data["vcenter"]; ok {
 			vcenter, ok := field.(string)
-			if ok && vcenter != "https://vcenter.example.com" {
+			if ok && vcenter != "https://vcenter.example.com" && len(vcenter) == 32 {
 				found = true
 				break
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session_test.go` around lines 341 - 364, The
test TestSessionCloseLogsRedactedHostname only ensures the logged vcenter is not
the plaintext; update it to assert the logged value matches the expected
redacted format by using the redactHostname helper or a 32‑char hex check: after
retrieving entry.Data["vcenter"] in the loop, compute expected :=
redactHostname(sess.server) (or validate that the string length is 32 and
contains only hex characters) and assert equality/format; reference
Session.Close, TestSessionCloseLogsRedactedHostname, redactHostname and the
hook.AllEntries loop to locate where to add the additional assertion.

200-223: ⚡ Quick win

Use functional option pattern to test option functions.

The tests TestSessionOptionsWithTimeout, TestSessionOptionsWithInsecure, and TestSessionOptionsWithLogger directly mutate struct fields after calling applyDefaults(), but they don't invoke the actual functional option functions (WithTimeout(), WithInsecure(), WithLogger()). This means these tests don't verify that the option functions correctly configure a SessionOptions instance.

♻️ Proposed refactor to test functional options

Replace the three tests with versions that use the functional option pattern:

 func TestSessionOptionsWithTimeout(t *testing.T) {
-	opts := &SessionOptions{}
-	opts.applyDefaults()
-	opts.Timeout = 30 * time.Second
+	opt := WithTimeout(30 * time.Second)
+	opts := &SessionOptions{}
+	opt(opts)
 
 	assert.Equal(t, 30*time.Second, opts.Timeout)
 }
 
 func TestSessionOptionsWithInsecure(t *testing.T) {
-	opts := &SessionOptions{}
-	opts.applyDefaults()
-	opts.Insecure = true
+	opt := WithInsecure(true)
+	opts := &SessionOptions{}
+	opt(opts)
 
 	assert.True(t, opts.Insecure)
 }
 
 func TestSessionOptionsWithLogger(t *testing.T) {
 	logger := logrus.New()
-	opts := &SessionOptions{}
-	opts.applyDefaults()
-	opts.Logger = logger
+	opt := WithLogger(logger)
+	opts := &SessionOptions{}
+	opt(opts)
 
 	assert.Equal(t, logger, opts.Logger)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session_test.go` around lines 200 - 223, The
tests mutate SessionOptions fields directly instead of exercising the functional
option helpers; update TestSessionOptionsWithTimeout,
TestSessionOptionsWithInsecure, and TestSessionOptionsWithLogger to call the
option functions (WithTimeout, WithInsecure, WithLogger) against a
defaults-populated opts (e.g. opts := &SessionOptions{}; opts.applyDefaults();
WithTimeout(30*time.Second)(opts)) rather than assigning
opts.Timeout/opts.Insecure/opts.Logger directly, then assert the values remain
as expected.
pkg/asset/installconfig/vsphere/client.go (1)

51-55: ⚡ Quick win

Use public accessor methods instead of private fields.

Line 55 accesses private session fields directly, while other consumers (validation.go, vsphere.go) use the public Vim25Client() and RestClient() methods. This breaks encapsulation.

♻️ Use accessor methods for consistency
-	return sess.vim25, sess.rest, cleanup, nil
+	return sess.Vim25Client(), sess.RestClient(), cleanup, nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/client.go` around lines 51 - 55, Replace
direct private field access on the session with its public accessors: instead of
returning sess.vim25 and sess.rest from the NewSession result, call
sess.Vim25Client() and sess.RestClient() (the public methods used elsewhere) and
return those values along with cleanup and nil error so encapsulation is
preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/asset/installconfig/vsphere/session_test.go`:
- Around line 81-89: The test TestNewSessionTimeout mixes two timeout sources
(context.WithTimeout and the WithTimeout option) which confuses intent; update
the test to only exercise one behavior: either remove the context timeout and
assert NewSession with WithTimeout(1*time.Millisecond) returns nil session and
an error, or split into two tests—one that uses context.WithTimeout to trigger
context cancellation and another that passes WithTimeout to trigger the
option-based timeout—referencing TestNewSessionTimeout, NewSession, WithTimeout
and context.WithTimeout to locate and adjust the code accordingly.

In `@pkg/asset/installconfig/vsphere/session.go`:
- Around line 78-80: SessionCacheKey currently joins server, username, password
with "|" which is ambiguous; change it to hash length-prefixed fields instead:
build the input to sha256 by writing each field as a length (e.g. 4-byte
big-endian or varint) followed by the raw field bytes for server, then username,
then password, then compute sha256 and return the first 16 bytes hex as before.
Update the SessionCacheKey function to construct the length-prefixed byte
sequence (instead of fmt.Sprintf) before hashing to eliminate delimiter
collision.
- Around line 167-177: Session.Close currently calls
s.rest.Logout(context.Background()) and s.govmomi.Logout(context.Background())
which can block; change Close to use a bounded context derived from the session
timeout (SessionOptions.Timeout) or a stored timeout field on Session: create a
context.WithTimeout(ctx, timeout) before calling s.rest.Logout(...) and
s.govmomi.Logout(...), cancel the context when done, and fall back to a sensible
default if no timeout is set; update the Session struct to store the timeout
from SessionOptions if needed and ensure both Logout calls use the same bounded
context in Session.Close().

---

Duplicate comments:
In `@pkg/asset/installconfig/vsphere/session.go`:
- Around line 114-124: The code double-unlocks the mutex sessionsMu when a
cached session is evicted: inside the sess.Error() != nil branch you call
sessionsMu.Unlock() then fall through to the outer sessionsMu.Unlock(), causing
a panic; remove the inner unlock so that after delete(sessions, key) control
falls through and the single outer sessionsMu.Unlock() releases the lock.
Specifically edit the block around sess.Error() != nil (the delete(sessions,
key) path) in session.go so only the outer sessionsMu.Unlock() is executed.
- Around line 112-121: The cached session is being handed out with a real closer
so multiple callers can call sess.Close() and race to logout shared clients;
update the cache to use lease/reference-count semantics instead of returning the
raw Close: change the sessions map entries to a small struct (e.g., containing
the Session, an int refCount, and a closed flag), increment refCount on cache
hit in NewSession, and return a cleanup func that decrements refCount and only
calls the underlying sess.Close() (and removes the entry from sessions) when
refCount reaches zero; additionally ensure that when initiating an actual Close
you mark the entry as closed/evicted before performing network logout so no
later cache hit reuses a session (adjust logic around sessionsMu, NewSession,
sess.Close(), and sess.Error() accordingly; same fix where cache hits are
handled at the other occurrence).

---

Nitpick comments:
In `@pkg/asset/installconfig/vsphere/client.go`:
- Around line 51-55: Replace direct private field access on the session with its
public accessors: instead of returning sess.vim25 and sess.rest from the
NewSession result, call sess.Vim25Client() and sess.RestClient() (the public
methods used elsewhere) and return those values along with cleanup and nil error
so encapsulation is preserved.

In `@pkg/asset/installconfig/vsphere/session_test.go`:
- Around line 341-364: The test TestSessionCloseLogsRedactedHostname only
ensures the logged vcenter is not the plaintext; update it to assert the logged
value matches the expected redacted format by using the redactHostname helper or
a 32‑char hex check: after retrieving entry.Data["vcenter"] in the loop, compute
expected := redactHostname(sess.server) (or validate that the string length is
32 and contains only hex characters) and assert equality/format; reference
Session.Close, TestSessionCloseLogsRedactedHostname, redactHostname and the
hook.AllEntries loop to locate where to add the additional assertion.
- Around line 200-223: The tests mutate SessionOptions fields directly instead
of exercising the functional option helpers; update
TestSessionOptionsWithTimeout, TestSessionOptionsWithInsecure, and
TestSessionOptionsWithLogger to call the option functions (WithTimeout,
WithInsecure, WithLogger) against a defaults-populated opts (e.g. opts :=
&SessionOptions{}; opts.applyDefaults(); WithTimeout(30*time.Second)(opts))
rather than assigning opts.Timeout/opts.Insecure/opts.Logger directly, then
assert the values remain as expected.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9998e465-4308-4009-8932-46368d310dee

📥 Commits

Reviewing files that changed from the base of the PR and between 35013a5 and 7b2fb6e.

📒 Files selected for processing (6)
  • pkg/asset/installconfig/vsphere/client.go
  • pkg/asset/installconfig/vsphere/session.go
  • pkg/asset/installconfig/vsphere/session_test.go
  • pkg/asset/installconfig/vsphere/validation.go
  • pkg/asset/installconfig/vsphere/vsphere.go
  • pkg/destroy/vsphere/client.go

Comment on lines +81 to +89
func TestNewSessionTimeout(t *testing.T) {
logger, _ := test.NewNullLogger()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Millisecond)
defer cancel()

sess, _, err := NewSession(ctx, "https://nonexistent.example.com:80", "user", "pass", WithTimeout(1*time.Millisecond), WithLogger(logger))
assert.Nil(t, sess, "session should be nil on timeout")
assert.Error(t, err, "should return error on timeout")
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify timeout test or split into separate cases.

The test uses both context.WithTimeout(context.Background(), 10*time.Millisecond) and WithTimeout(1*time.Millisecond), introducing two timeout sources. While the effective timeout will be the minimum (1ms), this makes the test harder to understand and doesn't clearly document which timeout behavior is being validated.

Consider either:

  • Removing the context timeout and only testing the WithTimeout option
  • Splitting into two separate tests: one for context cancellation and one for option timeout
♻️ Proposed refactor to test only option timeout
 func TestNewSessionTimeout(t *testing.T) {
 	logger, _ := test.NewNullLogger()
-	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Millisecond)
-	defer cancel()
 
-	sess, _, err := NewSession(ctx, "https://nonexistent.example.com:80", "user", "pass", WithTimeout(1*time.Millisecond), WithLogger(logger))
+	sess, _, err := NewSession(context.Background(), "https://nonexistent.example.com:80", "user", "pass", WithTimeout(1*time.Millisecond), WithLogger(logger))
 	assert.Nil(t, sess, "session should be nil on timeout")
 	assert.Error(t, err, "should return error on timeout")
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestNewSessionTimeout(t *testing.T) {
logger, _ := test.NewNullLogger()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Millisecond)
defer cancel()
sess, _, err := NewSession(ctx, "https://nonexistent.example.com:80", "user", "pass", WithTimeout(1*time.Millisecond), WithLogger(logger))
assert.Nil(t, sess, "session should be nil on timeout")
assert.Error(t, err, "should return error on timeout")
}
func TestNewSessionTimeout(t *testing.T) {
logger, _ := test.NewNullLogger()
sess, _, err := NewSession(context.Background(), "https://nonexistent.example.com:80", "user", "pass", WithTimeout(1*time.Millisecond), WithLogger(logger))
assert.Nil(t, sess, "session should be nil on timeout")
assert.Error(t, err, "should return error on timeout")
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session_test.go` around lines 81 - 89, The
test TestNewSessionTimeout mixes two timeout sources (context.WithTimeout and
the WithTimeout option) which confuses intent; update the test to only exercise
one behavior: either remove the context timeout and assert NewSession with
WithTimeout(1*time.Millisecond) returns nil session and an error, or split into
two tests—one that uses context.WithTimeout to trigger context cancellation and
another that passes WithTimeout to trigger the option-based timeout—referencing
TestNewSessionTimeout, NewSession, WithTimeout and context.WithTimeout to locate
and adjust the code accordingly.

Comment on lines +78 to +80
func SessionCacheKey(server, username, password string) string {
hash := sha256.Sum256([]byte(fmt.Sprintf("%s|%s|%s", server, username, password)))
return fmt.Sprintf("%x", hash[:16])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Encode the cache-key inputs unambiguously.

fmt.Sprintf("%s|%s|%s", ...) is not injective: values containing | can produce the same preimage string for different (server, username, password) tuples. That can make unrelated credentials reuse the same cached session. Hash length-prefixed fields instead of a delimiter-joined string.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 78 - 80,
SessionCacheKey currently joins server, username, password with "|" which is
ambiguous; change it to hash length-prefixed fields instead: build the input to
sha256 by writing each field as a length (e.g. 4-byte big-endian or varint)
followed by the raw field bytes for server, then username, then password, then
compute sha256 and return the first 16 bytes hex as before. Update the
SessionCacheKey function to construct the length-prefixed byte sequence (instead
of fmt.Sprintf) before hashing to eliminate delimiter collision.

Comment on lines +167 to +177
if s.rest != nil {
if err := s.rest.Logout(context.Background()); err != nil {
s.logger.WithError(err).Warn("Failed to logout REST client")
if firstErr == nil {
firstErr = err
}
}
}

if s.govmomi != nil {
if err := s.govmomi.Logout(context.Background()); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and show the relevant code in session.go
FILE="pkg/asset/installconfig/vsphere/session.go"
if [ ! -f "$FILE" ]; then
  echo "Missing file: $FILE"
  exit 1
fi

nl -ba "$FILE" | sed -n '130,220p'

# Search for Close() and any existing timeout/context usage in the vsphere session code
rg -n "func \\(.*\\) Close\\(|Logout\\(context|WithTimeout\\(|Background\\(\\)" "$FILE"

# Show definitions of any session structs fields related to timeouts or contexts
rg -n "type .*Session|struct .*Session|timeout|ctx|context\\.Context" "$FILE"

Repository: openshift/installer

Length of output: 107


🏁 Script executed:

#!/bin/bash
set -euo pipefail

FILE="pkg/asset/installconfig/vsphere/session.go"
[ -f "$FILE" ] || { echo "Missing file: $FILE"; exit 1; }

# Show the relevant region with line numbers
cat -n "$FILE" | sed -n '130,220p'

# Find Close() and Logout usage plus any existing timeouts/context usage nearby
rg -n "func \\(.*\\) Close\\(|Logout\\(context\\.|Logout\\(|WithTimeout\\(|context\\.Background\\(" "$FILE"

# Inspect the Logout method signatures/types referenced in this file (if any)
rg -n "Logout\\(" "$FILE"

# Check whether session struct already carries a timeout or context
rg -n "type .*Session|struct .*Session|timeout|ctx\\b|context\\.Context" "$FILE"

Repository: openshift/installer

Length of output: 4817


🏁 Script executed:

#!/bin/bash
set -euo pipefail
FILE="pkg/asset/installconfig/vsphere/session.go"

# Print the top portion: Session/SessionOptions structs and timeout option definitions
cat -n "$FILE" | sed -n '1,140p'

# Print the rest around NewSession and any other context usage
cat -n "$FILE" | sed -n '140,280p'

Repository: openshift/installer

Length of output: 9840


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for Logout signatures in vendor (govmomi/vim25 and vapi/rest)
rg -n "func \\(.*\\) Logout\\(ctx context\\.Context\\)" vendor/github.com/vmware/govmomi -S || true
rg -n "Logout\\(ctx context\\.Context\\)" vendor/github.com/vmware/govmomi -S | head -n 50

# Also check for rest client Logout signature
rg -n "Logout\\(ctx context\\.Context\\)" vendor/github.com/vmware/govmomi/vapi/rest -S || true

# Show session.go Session struct fields to see if timeout is stored
rg -n "type Session struct \\{|type SessionOptions struct \\{|Timeout" pkg/asset/installconfig/vsphere/session.go

Repository: openshift/installer

Length of output: 1249


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show relevant parts around Logout implementations in vendored govmomi
for f in \
  vendor/github.com/vmware/govmomi/client.go \
  vendor/github.com/vmware/govmomi/vapi/rest/client.go \
  vendor/github.com/vmware/govmomi/session/manager.go
do
  echo "==== $f ===="
  # print around the first Logout occurrence
  start=$(rg -n "func \\(.*\\) Logout\\(ctx context\\.Context\\)" "$f" | head -n 1 | cut -d: -f1)
  if [ -z "${start:-}" ]; then
    echo "No Logout(ctx) found in $f"
    continue
  fi
  # show ~80 lines around it
  from=$((start-20))
  to=$((start+120))
  if [ $from -lt 1 ]; then from=1; fi
  sed -n "${from},${to}p" "$f" | nl -ba | sed 's/^ *[0-9]*\t//'
done

Repository: openshift/installer

Length of output: 178


🏁 Script executed:

#!/bin/bash
set -euo pipefail

for f in \
  vendor/github.com/vmware/govmomi/client.go \
  vendor/github.com/vmware/govmomi/vapi/rest/client.go \
  vendor/github.com/vmware/govmomi/session/manager.go
do
  echo "==== $f ===="
  # Find the first Logout(ctx ...) in the file
  line=$(rg -n "func \\(.*\\) Logout\\(ctx context\\.Context\\)" "$f" | head -n 1 | cut -d: -f1 || true)
  if [ -z "${line:-}" ]; then
    echo "No Logout(ctx) found"
    continue
  fi
  from=$((line-20))
  to=$((line+160))
  if [ $from -lt 1 ]; then from=1; fi
  cat -n "$f" | sed -n "${from},${to}p"
  echo
done

Repository: openshift/installer

Length of output: 8535


Bound vCenter logout RPCs with a timeout in Session.Close() (avoid context.Background()).
pkg/asset/installconfig/vsphere/session.go calls s.rest.Logout(context.Background()) and s.govmomi.Logout(context.Background()); govmomi’s REST/SOAP logout APIs accept a context.Context and use it for the underlying network RPC. SessionOptions.Timeout is currently only applied to the initial connection, so Close() can still block indefinitely on a broken connection—store a timeout on the Session (from SessionOptions) or derive a short context.WithTimeout locally and use it for both logout calls.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 167 - 177,
Session.Close currently calls s.rest.Logout(context.Background()) and
s.govmomi.Logout(context.Background()) which can block; change Close to use a
bounded context derived from the session timeout (SessionOptions.Timeout) or a
stored timeout field on Session: create a context.WithTimeout(ctx, timeout)
before calling s.rest.Logout(...) and s.govmomi.Logout(...), cancel the context
when done, and fall back to a sensible default if no timeout is set; update the
Session struct to store the timeout from SessionOptions if needed and ensure
both Logout calls use the same bounded context in Session.Close().

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
pkg/asset/installconfig/vsphere/session.go (4)

167-177: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid unbounded context.Background() on vSphere RPCs.

Logout and optional CNS/PBM initialization calls bypass the session timeout and can block indefinitely on network issues. Use a bounded context (derived from session options or caller context) for these external calls.

#!/bin/bash
# Verify unbounded context usage in session lifecycle paths
rg -n "Logout\(context\.Background\(\)|cns\.NewClient\(context\.Background\(\)|pbm\.NewClient\(context\.Background\(\)" pkg/asset/installconfig/vsphere/session.go

As per coding guidelines: context.Context for cancellation and timeouts.

Also applies to: 220-220, 227-233

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 167 - 177, The
Logout and client initialization calls (s.rest.Logout, s.govmomi.Logout,
cns.NewClient, pbm.NewClient) are using context.Background() which can block
indefinitely; change these to use a bounded context (e.g., derive from a
session-level context or session options timeout/cancel function) and pass that
context into the calls so RPCs observe cancellation/timeouts. Locate the calls
in session.go (the s.rest.Logout and s.govmomi.Logout blocks and any
cns.NewClient/pbm.NewClient usages) and replace context.Background() with a
context derived from the caller/session (with a timeout or from an existing ctx
parameter) and propagate errors unchanged.

114-115: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t use Error() as the cache “closed” signal.

A successful Close() leaves closeErr == nil, so closed sessions remain cache-valid and can be returned as reusable sessions. Keep close errors separate from lifecycle state and track explicit closed state for eviction.

Also applies to: 190-193

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 114 - 115, Currently
the code uses sess.Error() to decide whether a cached session is "closed" —
change this to track explicit lifecycle state instead: add an explicit closed
boolean (e.g., cachedSession.closed) on the session wrapper and set that flag
when Close() is called (leave closeErr as a separate error value), then replace
any checks of sess.Error() (references around sess.Error() at the existing
checks and the other occurrence mentioned) with checks of the explicit closed
flag so a successfully-closed session (closeErr == nil) is not treated as
invalid and sessions with actual close failures can still expose their closeErr
for logging or handling.

78-80: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make SessionCacheKey encoding unambiguous.

Delimiter-joining ("%s|%s|%s") is not injective when inputs contain |, so different credential tuples can map to the same cache key and accidentally share a session.

Proposed fix
+import (
+	"bytes"
+	"encoding/binary"
+)
+
 func SessionCacheKey(server, username, password string) string {
-	hash := sha256.Sum256([]byte(fmt.Sprintf("%s|%s|%s", server, username, password)))
+	var buf bytes.Buffer
+	writeField := func(v string) {
+		var lenBuf [binary.MaxVarintLen64]byte
+		n := binary.PutUvarint(lenBuf[:], uint64(len(v)))
+		buf.Write(lenBuf[:n])
+		buf.WriteString(v)
+	}
+	writeField(server)
+	writeField(username)
+	writeField(password)
+	hash := sha256.Sum256(buf.Bytes())
 	return fmt.Sprintf("%x", hash[:16])
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 78 - 80,
SessionCacheKey currently concatenates server, username, password with "|" which
is ambiguous if fields contain "|"; change SessionCacheKey to serialize the
three strings unambiguously before hashing (e.g., use a canonical serialization
such as length-prefixed fields or json.Marshal of a struct
{Server,Username,Password string}), then compute the sha256 and return the first
16 bytes hex as before; update the implementation in SessionCacheKey to use that
serialization so different credential tuples cannot collide.

112-124: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Fix the double-unlock path in NewSession.

When a cached session exists and sess.Error() != nil, the mutex is unlocked in-branch and then unlocked again at the end of the block, which can panic (sync: unlock of unlocked mutex).

Proposed fix
 	sessionsMu.Lock()
 	if sess, ok := sessions[key]; ok {
 		if sess.Error() != nil {
 			// Cached session is closed — evict it so a fresh one is created.
 			delete(sessions, key)
-			sessionsMu.Unlock()
 			// Fall through to create a new session.
 		} else {
 			sessionsMu.Unlock()
 			return sess, func() { sess.Close() }, nil
 		}
 	}
 	sessionsMu.Unlock()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 112 - 124, The mutex
is being unlocked twice when a cached session exists but sess.Error() != nil in
NewSession: you delete(sessions, key) and call sessionsMu.Unlock() inside that
branch, then reach the trailing sessionsMu.Unlock() again causing a panic. Fix
by restructuring the conditional so the mutex is unlocked exactly once: when
sess.Error() != nil, delete the entry and unlock then immediately proceed to
create a new session (e.g., return to the outer create path or use an if/else so
the final sessionsMu.Unlock() isn't executed for that branch). Reference
symbols: NewSession, sessionsMu, sessions, sess.Error(), delete(sessions, key).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/asset/installconfig/vsphere/session.go`:
- Around line 121-122: The cleanup closure returned on cache hits always calls
sess.Close(), which allows one caller to close a shared cached session while
others still use it; change the cache to maintain a reference count per cached
session (e.g., a struct with sess and refCount) and update the acquire path to
increment refCount when returning a cached session and the cleanup closure to
decrement refCount and only call sess.Close() when refCount reaches zero; update
any places that return "sess, func() { sess.Close() }, nil" (the factory that
currently returns sess and the cleanup closure) to use the new acquire/release
semantics so both cache hits and misses follow the same ref-counted cleanup
behavior.

---

Duplicate comments:
In `@pkg/asset/installconfig/vsphere/session.go`:
- Around line 167-177: The Logout and client initialization calls
(s.rest.Logout, s.govmomi.Logout, cns.NewClient, pbm.NewClient) are using
context.Background() which can block indefinitely; change these to use a bounded
context (e.g., derive from a session-level context or session options
timeout/cancel function) and pass that context into the calls so RPCs observe
cancellation/timeouts. Locate the calls in session.go (the s.rest.Logout and
s.govmomi.Logout blocks and any cns.NewClient/pbm.NewClient usages) and replace
context.Background() with a context derived from the caller/session (with a
timeout or from an existing ctx parameter) and propagate errors unchanged.
- Around line 114-115: Currently the code uses sess.Error() to decide whether a
cached session is "closed" — change this to track explicit lifecycle state
instead: add an explicit closed boolean (e.g., cachedSession.closed) on the
session wrapper and set that flag when Close() is called (leave closeErr as a
separate error value), then replace any checks of sess.Error() (references
around sess.Error() at the existing checks and the other occurrence mentioned)
with checks of the explicit closed flag so a successfully-closed session
(closeErr == nil) is not treated as invalid and sessions with actual close
failures can still expose their closeErr for logging or handling.
- Around line 78-80: SessionCacheKey currently concatenates server, username,
password with "|" which is ambiguous if fields contain "|"; change
SessionCacheKey to serialize the three strings unambiguously before hashing
(e.g., use a canonical serialization such as length-prefixed fields or
json.Marshal of a struct {Server,Username,Password string}), then compute the
sha256 and return the first 16 bytes hex as before; update the implementation in
SessionCacheKey to use that serialization so different credential tuples cannot
collide.
- Around line 112-124: The mutex is being unlocked twice when a cached session
exists but sess.Error() != nil in NewSession: you delete(sessions, key) and call
sessionsMu.Unlock() inside that branch, then reach the trailing
sessionsMu.Unlock() again causing a panic. Fix by restructuring the conditional
so the mutex is unlocked exactly once: when sess.Error() != nil, delete the
entry and unlock then immediately proceed to create a new session (e.g., return
to the outer create path or use an if/else so the final sessionsMu.Unlock()
isn't executed for that branch). Reference symbols: NewSession, sessionsMu,
sessions, sess.Error(), delete(sessions, key).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a35d0f12-227b-4e6d-b731-33adf1a50dab

📥 Commits

Reviewing files that changed from the base of the PR and between 35013a5 and 7b2fb6e.

📒 Files selected for processing (6)
  • pkg/asset/installconfig/vsphere/client.go
  • pkg/asset/installconfig/vsphere/session.go
  • pkg/asset/installconfig/vsphere/session_test.go
  • pkg/asset/installconfig/vsphere/validation.go
  • pkg/asset/installconfig/vsphere/vsphere.go
  • pkg/destroy/vsphere/client.go

Comment on lines +121 to +122
return sess, func() { sess.Close() }, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Shared cached sessions need reference-counted cleanup.

Each caller gets a cleanup function that always calls Close(). On cache hits, one caller can close the shared session while another caller is still using it. This needs ref tracking (increment on acquire, decrement on cleanup, close on zero).

Also applies to: 142-147

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/asset/installconfig/vsphere/session.go` around lines 121 - 122, The
cleanup closure returned on cache hits always calls sess.Close(), which allows
one caller to close a shared cached session while others still use it; change
the cache to maintain a reference count per cached session (e.g., a struct with
sess and refCount) and update the acquire path to increment refCount when
returning a cached session and the cleanup closure to decrement refCount and
only call sess.Close() when refCount reaches zero; update any places that return
"sess, func() { sess.Close() }, nil" (the factory that currently returns sess
and the cleanup closure) to use the new acquire/release semantics so both cache
hits and misses follow the same ref-counted cleanup behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant