Add pre/post lease hooks by kirkbrauer · Pull Request #140 · jumpstarter-dev/jumpstarter

kirkbrauer · 2026-01-24T02:18:44Z

Summary

Implement hook scripts that can be configured in the exporter config to run j commands within a temporary lease environment. This PR combines changes across Python, Controller, and Protocol components.

Environment Variables

The following environment variables are exposed to hook scripts:

JUMPSTARTER_HOST - Path to the Unix socket for j CLI access
JMP_DRIVERS_ALLOW=UNSAFE - Allows all drivers for local access
LEASE_NAME - The name of the current lease
CLIENT_NAME - Name of the client that acquired the lease

Example Configuration

apiVersion: jumpstarter.dev/v1alpha1
kind: ExporterConfig
metadata:
  namespace: default
  name: test-exporter
endpoint: grpc.jumpstarter.100.123.60.47.nip.io:8082
hooks:
  beforeLease:
    script: |
      echo "Starting lease setup for ${CLIENT_NAME}"
      j power on
      echo "Pre-lease setup completed!"
    timeout: 120
    onFailure: exit
  afterLease:
    script: |
      echo "Starting lease cleanup for ${CLIENT_NAME}"
      j power off
      echo "Post-lease cleanup completed!"
    timeout: 60
    onFailure: warn
tls:
  ca: ''
  insecure: true
token: abc123
export:
  power:
    type: "jumpstarter_driver_power.driver.MockPower"

Changes

Python (`python/`)

Hook executor with before/after lease lifecycle support
Configurable timeouts and failure modes (exit, warn)
Exporter status reporting and streaming
EndSession support for graceful session termination
Typed Protobuf/gRPC stubs generation
Status field exposed in CLI (jmp admin get exporter)

Controller (`controller/`)

Full ExporterStatus field support with state transitions
Status updates reflected from exporters in real-time
EndSession API implementation
Exporter-initiated lease release support
CRDs managed via Helm chart templates

Protocol (`protocol/`)

EndSession API for keeping connections open after lease ends
release_lease field for exporter-initiated lease release
status_version and previous_status fields for transition tracking

Related PRs

Summary by CodeRabbit

Release Notes

New Features
- Added exporter status tracking and display with multiple operational states
- Introduced lifecycle hooks (before/after lease) for custom script execution during lease operations
- Added status monitoring for non-blocking lease readiness checks
- Enhanced logging with source-based routing and filtering
Bug Fixes
- Improved exporter connection handling with retry logic and status validation
Tests
- Added comprehensive end-to-end tests for hook functionality and status management

coderabbitai · 2026-01-24T02:19:07Z

📝 Walkthrough

Walkthrough

This PR introduces comprehensive exporter lifecycle management through beforeLease/afterLease hooks, exporter status reporting and monitoring, log source tagging, and extended gRPC capabilities. It adds StatusMonitor for non-blocking status polling, HookExecutor for hook execution, LeaseContext for lease-scoped resources, ReportStatus endpoint, EndSession RPC, and corresponding Python/Go client updates across multiple packages.

Changes

Cohort / File(s)	Summary
Protocol Definitions `protocol/proto/jumpstarter/v1/jumpstarter.proto`	Adds ReportStatus RPC, EndSession RPC/messages, release_lease field to ReportStatusRequest, status_version and previous_status fields to GetStatusResponse, and optional source field to LogStreamResponse.
CRD Definitions `controller/deploy/helm/jumpstarter/crds/jumpstarter.dev_.yaml`, `controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/`	Adds new CRDs (Client, ExporterAccessPolicy, Exporter, Lease) and updates Exporter CRD with exporterStatus/statusMessage fields, printer columns, and status schema.
Controller API `controller/api/v1alpha1/exporter_*.go`	Adds ExporterStatusValue and StatusMessage fields to ExporterStatus struct, introduces status enum constants, adds kubebuilder markers for printer columns, and implements ToProtobuf mapping with stringToProtoStatus conversion.
Controller Service `controller/internal/controller/exporter_controller.go`, `controller/internal/service/controller_service.go`	Implements ReportStatus endpoint handler, status-to-string conversion, Online condition synchronization, lease release handling, exporter status validation for driver calls, and integrates LastSeen/StatusMessage updates.
Exporter Runtime `python/packages/jumpstarter/jumpstarter/exporter/exporter.py`, `python/packages/jumpstarter/jumpstarter/exporter/session.py`	Major restructuring: adds hook_executor field, LeaseContext support, _handle_lease method, _register/_unregister/_report_status helpers, session_for_lease context manager yielding dual sockets, serve coordination with status streaming, and lease release request handling.
Hook Infrastructure `python/packages/jumpstarter/jumpstarter/exporter/hooks.py`, `python/packages/jumpstarter/jumpstarter/exporter/lease_context.py`, `python/packages/jumpstarter/jumpstarter/exporter/logging_protocol.py`	New modules implementing HookExecutor (hook configuration, environment setup, process execution, timeout/error handling), LeaseContext (lease-scoped state, status propagation, event signaling), and LoggerRegistration protocol for logger source management.
Logging Extensions `python/packages/jumpstarter/jumpstarter/exporter/logging.py`	Extends LogHandler with source field and per-logger source mappings, adds context_log_source context manager, introduces get_logger helper integrating source tracking with session registration.
Status Monitoring `python/packages/jumpstarter/jumpstarter/client/status_monitor.py`	New module implementing StatusMonitor with background polling, event-driven callbacks, wait-for patterns, adaptive polling (fast/slow/critical), connection-loss detection, and missed-transition tracking.
Client Core `python/packages/jumpstarter/jumpstarter/client/core.py`	Adds ExporterNotReady exception, async status methods (get_status_async, wait_for_lease_ready, end_session_async, wait_for_hook_status), StatusMonitor integration, enhanced log_stream_async with source-based routing and filtering, and RichHandler with show_path=False.
Client gRPC `python/packages/jumpstarter/jumpstarter/client/grpc.py`, `python/packages/jumpstarter/jumpstarter/client/lease.py`	Adds Exporter.status field and show_status option, dial_timeout field to Lease, exponential-backoff retry loop in handle_async with failure detection, and _wait_for_ready_connection helper for connection establishment.
Configuration `python/packages/jumpstarter/jumpstarter/config/exporter.py`, `python/packages/jumpstarter/jumpstarter/config/client.py`	Adds HookInstanceConfigV1Alpha1 and HookConfigV1Alpha1 models, hooks field to ExporterConfig with HookExecutor wiring, dial_timeout field to ClientConfigV1Alpha1Lease, and include_status parameter to list_exporters.
Common Exports `python/packages/jumpstarter/jumpstarter/common/enums.py`, `python/packages/jumpstarter/jumpstarter/common/types.py`, `python/packages/jumpstarter/jumpstarter/common/__init__.py`	New enum classes ExporterStatus and LogSource with from_proto/to_proto converters, TypeAlias definitions for AsyncChannel and service stubs, and expanded all exports.
Generated Protocol Stubs `python/packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter//pb2.py{,i}`	Generated stubs for client v1 protocol, gRPC servicer interfaces, common enums, and router services with full type annotations and message definitions.
Driver & Test Updates `python/packages/jumpstarter/jumpstarter/driver/base.py`, `python/packages/jumpstarter/jumpstarter/exporter/session.py`	Updates Driver logger initialization to use get_logger with LogSource.DRIVER, modifies Session initialization to map driver logs, adjusts LogStream polling interval, and implements _check_status_for_driver_call precondition.
Test Suites `controller/internal/service/controller_service_test.go`, `controller/internal/service/controller_service_integration_test.go`, `python/packages/jumpstarter/jumpstarter/exporter/hooks_test.py`, `python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py`, `python/packages/jumpstarter/jumpstarter/exporter/lease_context_test.py`	Comprehensive tests for hook execution, status handling, lease release, online condition syncing, status monitoring, and hook failure modes.
E2E Tests & Configs `e2e/tests-hooks.bats`, `e2e/exporters/exporter-hooks-*.yaml`	New hook-specific e2e test suite covering beforeLease/afterLease execution, failure modes (warn/endLease/exit), timeouts, environment variables, and test configurations for various hook scenarios.
Build & Deployment `controller/Makefile`, `controller/hack/deploy_with_helm.sh`, `controller/.gitignore`, `python/buf.gen.yaml`	Adds awk-based sync-wave annotation injection for role, CRD template copying, --skip-crds flag for helm, mypy/mypy-grpc plugins for Python protobuf generation.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Exporter as Exporter<br/>(Main)
    participant HookExec as HookExecutor
    participant Driver
    participant ControllerSvc as ControllerService
    participant StatusMon as StatusMonitor

    Client->>Exporter: request_lease()
    Exporter->>ControllerSvc: ReportStatus(BEFORE_LEASE_HOOK)
    ControllerSvc->>StatusMon: (status updated)
    
    Exporter->>HookExec: run_before_lease_hook()
    HookExec->>Driver: instantiate device
    HookExec->>HookExec: execute_hook_process()
    HookExec-->>Exporter: (hook complete/failed)
    
    alt Hook Success
        Exporter->>Exporter: drivers ready event set
        Exporter->>ControllerSvc: ReportStatus(LEASE_READY)
        Exporter->>Client: (allow client driver calls)
        Client->>Driver: driver operations
    else Hook Failure (on_failure=endLease)
        Exporter->>Exporter: _request_lease_release()
        Exporter->>ControllerSvc: (lease release signaled)
    end
    
    Exporter->>HookExec: run_after_lease_hook()
    HookExec->>HookExec: execute_hook_process()
    HookExec-->>Exporter: (hook complete)
    
    Exporter->>ControllerSvc: ReportStatus(OFFLINE)
    ControllerSvc->>StatusMon: (status updated)

sequenceDiagram
    participant Client
    participant AsyncClient as AsyncDriverClient
    participant StatusMon as StatusMonitor
    participant ExporterSvc as ExporterService<br/>(gRPC)

    Client->>StatusMon: start()
    activate StatusMon
    StatusMon->>ExporterSvc: GetStatus() [background poll]
    ExporterSvc-->>StatusMon: (status response)
    StatusMon->>StatusMon: notify on_status_change callbacks
    
    Client->>AsyncClient: wait_for_lease_ready(timeout=300s)
    AsyncClient->>StatusMon: wait_for_status(LEASE_READY)
    
    loop Poll Loop (0.3s intervals)
        StatusMon->>ExporterSvc: GetStatus()
        ExporterSvc-->>StatusMon: BEFORE_LEASE_HOOK
        StatusMon->>StatusMon: signal any_change_event
        AsyncClient->>StatusMon: check current_status
        alt Status != LEASE_READY
            AsyncClient->>AsyncClient: continue waiting
        else Status == LEASE_READY
            AsyncClient-->>Client: ready
        end
    end
    
    Client->>AsyncClient: call_async(method, args)
    AsyncClient->>AsyncClient: check status (LEASE_READY)
    AsyncClient->>ExporterSvc: DriverCall()
    ExporterSvc-->>AsyncClient: result
    
    StatusMon->>StatusMon: stop polling
    deactivate StatusMon

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related issues

Support for Exporter lifecycle hooks #65: Directly implements exporter lifecycle hooks feature across exporter, session, hook executor, lease context, CLI, and e2e tests.
jumpstarter-dev/jumpstarter#522: Implements exporter lifecycle hooks (beforeLease/afterLease) with configuration support and runtime execution across config, exporter, CRDs, and tests.
[Proposal] Enhanced Lease Statuses for Hooks #104: Implements exporter status enums, status messaging, ReportStatus endpoint, LogStream source tagging, and hook-driven status transitions matching proposal specifications.
jumpstarter-dev/jumpstarter-protocol#25: Directly addresses protocol-level proposals for status enums, log-source tagging, status reporting, and hook execution monitoring.

Possibly related PRs

jumpstarter-dev/jumpstarter#357: Both modify exporter/exporter.py restructuring the controller listen/status loops and adding retry/resilience around interrupted connections.
jumpstarter-dev/jumpstarter#704: Both modify client/lease.py with connection establishment, retry logic, and lease-related configuration (dial_timeout).
jumpstarter-dev/jumpstarter#297: Both extend exporter-to-client logging surface with LogHandler source mappings, Session log queue integration, and client-side log_stream_async with source-based routing.

Suggested labels

operator, feature, gRPC, status-management, hooks, lifecycle-management

Poem

🐰 A Leap of Status and Hooks

Exporters now report their state,
With hooks that run before and late,
StatusMonitor watches with care,
While leases dance through the air—
From offline to ready, we celebrate! 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add pre/post lease hooks' directly and clearly summarizes the main change—implementing configurable beforeLease and afterLease lifecycle hooks for exporters.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch add-hooks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

python/packages/jumpstarter/jumpstarter/config/client.py (1)
184-196: Add status field to exporter reconstruction to preserve it when include_leases=True.

The exporter objects lose their status field when leases are included. The original exporters have a status field that should be preserved during reconstruction, especially when include_status=True is requested. Update line 187-193 to include status=exporter.status.
Code snippet
exporter_with_lease = Exporter(
    namespace=exporter.namespace,
    name=exporter.name,
    labels=exporter.labels,
    online=exporter.online,
    status=exporter.status,  # Add this line
    lease=lease,
)
python/packages/jumpstarter/jumpstarter/config/exporter.py (1)

224-246: Remove unused device_factory parameter from HookExecutor.

The device_factory parameter passed to HookExecutor at line 231 is never used. HookExecutor only executes shell commands via subprocess and accesses the session for logging; it has no need for driver instantiation. Remove the unused parameter from both the HookExecutor dataclass definition and the instantiation call.

🤖 Fix all issues with AI agents

In
`@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_clients.yaml`:
- Around line 20-47: Update the CRD description strings so they reference
"Client" consistently instead of "identities" or "Identity": change the top
description "Client is the Schema for the identities API" to a concise "Client
is the Schema for the Client API" (or simply "Client"), and update the
spec.description "ClientSpec defines the desired state of Identity" and
status.description "ClientStatus defines the observed state of Identity" to use
"Client" (e.g., "ClientSpec defines the desired state of Client" and
"ClientStatus defines the observed state of Client") so all descriptive fields
in the CRD (the description at the top and the
spec.description/status.description entries) no longer reference
Identity/identities.

In
`@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_leases.yaml`:
- Around line 30-31: Update the CRD description for the Lease resource: replace
the current "Lease is the Schema for the exporters API" text in the description
property of the Lease CRD with the corrected wording using "leases" (e.g.,
"Leases is the Schema for the exporters API" or "leases is the Schema for the
exporters API") so the description uses the plural form; locate the description
property under the Lease CRD (kind: Lease) in the CRD template and change that
string accordingly.

In `@python/packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py`:
- Line 19: basicConfig assignment line exceeds max line length; split the call
across multiple lines to satisfy Ruff E501. Locate the basicConfig variable
initialization (basicConfig = partial(logging.basicConfig, format="%(message)s
[%(name)s]", handlers=[RichHandler(show_path=False)]) ) and reformat the
arguments onto separate lines (e.g., place format=... and handlers=[...] each on
their own lines and close parens on its own line) so the call to
partial(logging.basicConfig, ...) fits within line-length limits while
preserving the same arguments and use of RichHandler(show_path=False).

In `@python/packages/jumpstarter-cli/jumpstarter_cli/shell.py`:
- Around line 80-89: Remove the unused synchronous helper _run_shell_with_lease:
delete the entire function definition (including its docstring and with-blocks)
since the async variant _run_shell_with_lease_async is now the only one used;
then search the repo for any remaining references to _run_shell_with_lease and
remove or update them, run the test suite/lint to confirm no import or name
errors, and ensure behavior relies solely on _run_shell_with_lease_async.

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 395-401: The async context manager on StatusMonitor is incomplete:
__aenter__ currently just returns self and never starts the background polling;
update __aenter__ to initialize the task group and call self.start() (mirroring
client.status_monitor_async() behavior) so entering the context begins
monitoring, and keep __aexit__ awaiting self.stop(); alternatively remove both
methods if context-manager usage is unsupported—modify the methods on the
StatusMonitor class (__aenter__, __aexit__) to either perform full start/stop
lifecycle (create task group + await self.start() in __aenter__, await
self.stop() in __aexit__) or delete them to avoid silent failures.

In `@python/packages/jumpstarter/jumpstarter/common/enums.py`:
- Around line 26-27: The docstring for the enum value AFTER_LEASE_HOOK (mapped
to common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK) contains a typo; update the
string from "Lease was releaseed, but exporter is executing after lease hook" to
"Lease was released, but exporter is executing after lease hook" so the
documentation reads correctly.

In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py`:
- Around line 541-613: The Listen retry task started with
tg.start_soon(self._retry_stream, "Listen",
self._listen_stream_factory(lease_name), listen_tx) is attached to the outer
task group so it keeps running after the lease ends; move that start call into
the lease-scoped task group (use conn_tg.start_soon(...) before starting
wait_for_lease_end/process_connections) so the Listen task is cancelled with the
lease, and in the enclosing finally block ensure the send stream is closed
(await listen_tx.aclose() or the appropriate close method) to signal listen_rx
termination; reference symbols: _retry_stream,
_listen_stream_factory(lease_name), listen_tx/listen_rx, tg -> conn_tg, and
conn_tg.start_soon.
- Around line 202-321: Change _get_controller_stub to be an async context
manager (e.g., async def _controller_stub(self):) that creates the channel via
self.channel_factory(), yields a
jumpstarter_pb2_grpc.ControllerServiceStub(channel) and always closes the
underlying channel with await channel.close() in a finally block; then update
all callers to use "async with self._controller_stub() as controller:" instead
of awaiting _get_controller_stub(), specifically in _retry_stream (where
controller is acquired for stream_factory), _register_with_controller (Register
call), and _report_status (ReportStatus call) so channels are closed after use
and no resources leak.

In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py`:
- Around line 299-307: wait_for_process currently uses
anyio.to_thread.run_sync(process.wait, abandon_on_cancel=True) which can leave
the child as a zombie if the wait is abandoned; modify wait_for_process to
ensure the subprocess is reaped on cancellation by adding a try/finally: in the
try await the run_sync as before, but in the finally check process.poll() and if
it is None try to terminate/kill (process.terminate()/process.kill()) then call
a non-abandoning run_sync(process.wait) or loop-poll with process.poll() until
exit to fully reap the child; reference the wait_for_process coroutine and the
process variable when making the changes.
- Around line 257-260: The debug call inside read_pty_output exceeds the
120-char line limit; shorten the line by splitting the log message across
multiple strings or moving the format string to a short variable and then
calling logger.debug. For example, break the message into two adjacent string
literals or assign msg = "read_pty_output: heartbeat at %.1fs, iterations=%d"
and call logger.debug(msg, elapsed, read_count) so you keep the same
logger.debug call and the same format arguments (logger, elapsed, read_count)
but comply with the E501 line-length rule.
- Around line 186-189: The test mock for logging_session must support the
context manager used by logging_session.context_log_source in hooks.py; update
the test fixture (e.g., mock_session) so mock_session.context_log_source returns
an object that implements __enter__ and __exit__ (for async usage you can set
mock_session.context_log_source.return_value.__enter__ and __exit__ to AsyncMock
or wrap with contextlib.nullcontext), ensuring the context manager protocol is
honored when hooks.py calls logging_session.context_log_source.

🧹 Nitpick comments (10)

controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_exporteraccesspolicies.yaml (2)

149-152: Add schema validation for duration and priority.

Without constraints, malformed durations or negative priorities can be accepted and only fail at runtime. Consider adding format/bounds in the CRD.

♻️ Suggested schema tightening

                   maximumDuration:
                     type: string
+                    format: duration
                   priority:
                     type: integer
+                    minimum: 0

40-157: Require spec/policies to avoid empty CRs.

Right now an empty resource (missing spec or policies) is valid. If that’s not intended, enforce it in the schema to reduce no-op objects.

♻️ Suggested required fields

           spec:
             description: ExporterAccessPolicySpec defines the desired state of ExporterAccessPolicy.
             properties:
               exporterSelector:
                 description: |-
                   A label selector is a label query over a set of resources. The result of matchLabels and
                   matchExpressions are ANDed. An empty label selector matches all objects. A null
                   label selector matches no objects.
                 properties:
                   matchExpressions:
                     description: matchExpressions is a list of label selector requirements.
                       The requirements are ANDed.
                     items:
                       description: |-
                         A label selector requirement is a selector that contains values, a key, and an operator that
                         relates the key and values.
                       properties:
                         key:
                           description: key is the label key that the selector applies
                             to.
                           type: string
                         operator:
                           description: |-
                             operator represents a key's relationship to a set of values.
                             Valid operators are In, NotIn, Exists and DoesNotExist.
                           type: string
                         values:
                           description: |-
                             values is an array of string values. If the operator is In or NotIn,
                             the values array must be non-empty. If the operator is Exists or DoesNotExist,
                             the values array must be empty. This array is replaced during a strategic
                             merge patch.
                           items:
                             type: string
                           type: array
                           x-kubernetes-list-type: atomic
                       required:
                       - key
                       - operator
                       type: object
                     type: array
                     x-kubernetes-list-type: atomic
                   matchLabels:
                     additionalProperties:
                       type: string
                     description: |-
                       matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
                       map is equivalent to an element of matchExpressions, whose key field is "key", the
                       operator is "In", and the values array contains only "value". The requirements are ANDed.
                     type: object
                 type: object
                 x-kubernetes-map-type: atomic
               policies:
                 items:
                   properties:
                     from:
                       items:
                         properties:
                           clientSelector:
                             description: |-
                               A label selector is a label query over a set of resources. The result of matchLabels and
                               matchExpressions are ANDed. An empty label selector matches all objects. A null
                               label selector matches no objects.
                             properties:
                               matchExpressions:
                                 description: matchExpressions is a list of label selector
                                   requirements. The requirements are ANDed.
                                 items:
                                   description: |-
                                     A label selector requirement is a selector that contains values, a key, and an operator that
                                     relates the key and values.
                                   properties:
                                     key:
                                       description: key is the label key that the selector
                                         applies to.
                                       type: string
                                     operator:
                                       description: |-
                                         operator represents a key's relationship to a set of values.
                                         Valid operators are In, NotIn, Exists and DoesNotExist.
                                       type: string
                                     values:
                                       description: |-
                                         values is an array of string values. If the operator is In or NotIn,
                                         the values array must be non-empty. If the operator is Exists or DoesNotExist,
                                         the values array must be empty. This array is replaced during a strategic
                                         merge patch.
                                       items:
                                         type: string
                                       type: array
                                       x-kubernetes-list-type: atomic
                                   required:
                                   - key
                                   - operator
                                   type: object
                                 type: array
                                 x-kubernetes-list-type: atomic
                               matchLabels:
                                 additionalProperties:
                                   type: string
                                 description: |-
                                   matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels
                                   map is equivalent to an element of matchExpressions, whose key field is "key", the
                                   operator is "In", and the values array contains only "value". The requirements are ANDed.
                                 type: object
                             type: object
                             x-kubernetes-map-type: atomic
                         type: object
                       type: array
                     maximumDuration:
                       type: string
                     priority:
                       type: integer
                     spotAccess:
                       type: boolean
                   type: object
                 type: array
+                minItems: 1
             type: object
+            required:
+            - policies
           status:
             description: ExporterAccessPolicyStatus defines the observed state of
               ExporterAccessPolicy.
             type: object
         type: object
+        required:
+        - spec

Also applies to: 162-162

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)

157-161: Redundant timeout configuration.

Both move_on_after(300) and wait_for_any_of(..., timeout=300.0) specify 300 seconds. Since wait_for_any_of has its own timeout, the outer move_on_after is redundant. Consider either removing the outer timeout or differentiating them if different semantics are intended (e.g., outer as a hard cap slightly longer than inner).

♻️ Suggested simplification

-                                        with anyio.move_on_after(300) as timeout_scope:  # 5 minute timeout
-                                            result = await monitor.wait_for_any_of(
-                                                [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
-                                                timeout=300.0
-                                            )
+                                        result = await monitor.wait_for_any_of(
+                                            [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
+                                            timeout=300.0
+                                        )
+                                        if result is None:
+                                            logger.warning("Timeout waiting for afterLease hook to complete")

Note: This would require adjusting the subsequent conditional logic (lines 162-172) to handle the timeout case inline rather than via timeout_scope.cancelled_caught.

controller/Makefile (1)

66-68: Make the CRD copy step more robust with mkdir.

If the templates directory is missing in a fresh checkout, cp will fail.
♻️ Suggested tweak
 	cp deploy/helm/jumpstarter/crds/* deploy/operator/config/crd/bases/
+	mkdir -p deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/
 	cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/

controller/internal/service/controller_service.go (1)

318-338: Consider logging unexpected status values in the default case.

The protoStatusToString function silently maps unknown proto status values to ExporterStatusUnspecified. While this provides a safe fallback, logging at debug/warning level would help diagnose protocol mismatches.
♻️ Optional: Add logging for unexpected status values
 func protoStatusToString(status pb.ExporterStatus) string {
 	switch status {
 	case pb.ExporterStatus_EXPORTER_STATUS_OFFLINE:
 		return jumpstarterdevv1alpha1.ExporterStatusOffline
 	// ... other cases ...
 	default:
+		// Log unexpected status values for debugging protocol mismatches
+		// Consider using a package-level logger if available
 		return jumpstarterdevv1alpha1.ExporterStatusUnspecified
 	}
 }

python/packages/jumpstarter/jumpstarter/common/enums.py (1)

38-45: Consider handling invalid protobuf values gracefully in from_proto.

The current implementation will raise ValueError if an unrecognized integer is passed to from_proto. For robustness against protocol version mismatches, consider returning UNSPECIFIED for unknown values.
♻️ Optional: Handle unknown values gracefully
     `@classmethod`
     def from_proto(cls, value: int) -> "ExporterStatus":
         """Convert from protobuf integer to enum."""
-        return cls(value)
+        try:
+            return cls(value)
+        except ValueError:
+            return cls.UNSPECIFIED

python/packages/jumpstarter/jumpstarter/exporter/logging.py (1)

31-39: Consider longest-prefix matching for hierarchical loggers.

The current prefix matching returns the first match found during iteration. With hierarchical logger names (e.g., driver.power, driver.power.usb), if both driver and driver.power are mapped, the match depends on iteration order. Consider sorting by key length descending to ensure the most specific (longest) prefix wins.
♻️ Suggested improvement for deterministic matching
     def get_source_for_record(self, record):
         """Determine the appropriate log source for a record."""
         with self._lock:
             # Check if this record comes from a logger with a specific source mapping
             logger_name = record.name
-            for mapped_logger, source in self._child_handlers.items():
+            # Sort by key length descending to match most specific prefix first
+            for mapped_logger, source in sorted(
+                self._child_handlers.items(), key=lambda x: len(x[0]), reverse=True
+            ):
                 if logger_name.startswith(mapped_logger):
                     return source
             return self.source

python/packages/jumpstarter/jumpstarter/client/core.py (1)

440-510: Consider extracting helper functions to reduce complexity.

The static analysis correctly flags this method as too complex (C901: 14 > 10). The method handles multiple concerns: reconnection with backoff, log source routing, and filtering. While the current structure is readable, extracting helpers would improve maintainability.

♻️ Suggested structure for reduced complexity

def _route_log_to_logger(self, response, show_all_logs: bool) -> None:
    """Route a log response to the appropriate logger."""
    from jumpstarter.common import LogSource

    if response.HasField("source"):
        source = LogSource(response.source)
        is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK)
    else:
        source = LogSource.SYSTEM
        is_hook = False

    if not (is_hook or show_all_logs):
        return

    severity = response.severity if response.severity else "INFO"
    log_level = getattr(logging, severity, logging.INFO)

    logger_names = {
        LogSource.BEFORE_LEASE_HOOK: "exporter:beforeLease",
        LogSource.AFTER_LEASE_HOOK: "exporter:afterLease",
        LogSource.DRIVER: "exporter:driver",
    }
    logger_name = logger_names.get(source, "exporter:system")
    logging.getLogger(logger_name).log(log_level, response.message)

Then use this helper in the main loop to reduce nesting.

python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)

496-498: TODO: Controller-side lease ending not yet implemented.

The comment indicates that when on_failure='endLease', there's no mechanism to actually end the lease from the exporter side. This should be tracked.

Would you like me to open an issue to track the controller-side lease ending mechanism for beforeLease hook failures with on_failure='endLease'?

python/packages/jumpstarter/jumpstarter/client/status_monitor.py (1)

197-279: Consider extracting common verification logic.

Both wait_for_status and wait_for_any_of share substantial code for connection recovery verification. This duplication contributes to the complexity flagged by static analysis.

♻️ Suggested refactor to reduce duplication

async def _verify_connection_recovery(self) -> ExporterStatus | None:
    """Verify connection and return current status if recovered, None if still lost."""
    if not self._connection_lost:
        return self._current_status
    
    logger.debug("Connection was marked as lost, verifying...")
    try:
        response = await self._stub.GetStatus(jumpstarter_pb2.GetStatusRequest())
        logger.info("Connection recovered during verification poll")
        self._connection_lost = False
        new_status = ExporterStatus.from_proto(response.status)
        self._current_status = new_status
        self._status_version = response.status_version
        return new_status
    except AioRpcError as e:
        if e.code() == StatusCode.UNAVAILABLE:
            logger.debug("Connection still lost (UNAVAILABLE)")
        else:
            logger.debug("GetStatus error during verification: %s", e.code())
        return None
    except Exception as e:
        logger.debug("GetStatus error during verification: %s", e)
        return None

Then use this helper in both wait_for_status and wait_for_any_of.

coderabbitai · 2026-01-24T02:33:03Z

...y/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_clients.yaml

+        description: Client is the Schema for the identities API
+        properties:
+          apiVersion:
+            description: |-
+              APIVersion defines the versioned schema of this representation of an object.
+              Servers should convert recognized schemas to the latest internal value, and
+              may reject unrecognized values.
+              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources
+            type: string
+          kind:
+            description: |-
+              Kind is a string value representing the REST resource this object represents.
+              Servers may infer this from the endpoint the client submits requests to.
+              Cannot be updated.
+              In CamelCase.
+              More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds
+            type: string
+          metadata:
+            type: object
+          spec:
+            description: ClientSpec defines the desired state of Identity
+            properties:
+              username:
+                type: string
+            type: object
+          status:
+            description: ClientStatus defines the observed state of Identity
+            properties:


⚠️ Potential issue | 🟡 Minor

Fix CRD description wording to “Client” (minor).

The descriptions currently reference “identities/Identity,” which looks like a copy‑paste typo.

✏️ Suggested wording fix

- description: Client is the Schema for the identities API + description: Client is the Schema for the clients API @@ - description: ClientSpec defines the desired state of Identity + description: ClientSpec defines the desired state of Client @@ - description: ClientStatus defines the observed state of Identity + description: ClientStatus defines the observed state of Client

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

description: Client is the Schema for the identities API

properties:

apiVersion:

description: |-

APIVersion defines the versioned schema of this representation of an object.

Servers should convert recognized schemas to the latest internal value, and

may reject unrecognized values.

More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

type: string

kind:

description: |-

Kind is a string value representing the REST resource this object represents.

Servers may infer this from the endpoint the client submits requests to.

Cannot be updated.

In CamelCase.

More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

type: string

metadata:

type: object

spec:

description: ClientSpec defines the desired state of Identity

properties:

username:

type: string

type: object

status:

description: ClientStatus defines the observed state of Identity

properties:

description: Client is the Schema for the clients API

properties:

apiVersion:

description: |-

APIVersion defines the versioned schema of this representation of an object.

Servers should convert recognized schemas to the latest internal value, and

may reject unrecognized values.

More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources

type: string

kind:

description: |-

Kind is a string value representing the REST resource this object represents.

Servers may infer this from the endpoint the client submits requests to.

Cannot be updated.

In CamelCase.

More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds

type: string

metadata:

type: object

spec:

description: ClientSpec defines the desired state of Client

properties:

username:

type: string

type: object

status:

description: ClientStatus defines the observed state of Client

properties:

🤖 Prompt for AI Agents

In `@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_clients.yaml` around lines 20 - 47, Update the CRD description strings so they reference "Client" consistently instead of "identities" or "Identity": change the top description "Client is the Schema for the identities API" to a concise "Client is the Schema for the Client API" (or simply "Client"), and update the spec.description "ClientSpec defines the desired state of Identity" and status.description "ClientStatus defines the observed state of Identity" to use "Client" (e.g., "ClientSpec defines the desired state of Client" and "ClientStatus defines the observed state of Client") so all descriptive fields in the CRD (the description at the top and the spec.description/status.description entries) no longer reference Identity/identities.

coderabbitai · 2026-01-24T02:33:03Z

...oy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_leases.yaml

+        description: Lease is the Schema for the exporters API
+        properties:


⚠️ Potential issue | 🟡 Minor

Fix CRD description wording to “leases” (minor).

The description says “exporters API” for the Lease CRD.

✏️ Suggested wording fix

- description: Lease is the Schema for the exporters API + description: Lease is the Schema for the leases API

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

description: Lease is the Schema for the exporters API

properties:

description: Lease is the Schema for the leases API

properties:

🤖 Prompt for AI Agents

In `@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_leases.yaml` around lines 30 - 31, Update the CRD description for the Lease resource: replace the current "Lease is the Schema for the exporters API" text in the description property of the Lease CRD with the corrected wording using "leases" (e.g., "Leases is the Schema for the exporters API" or "leases is the Schema for the exporters API") so the description uses the plural form; locate the description property under the Lease CRD (kind: Lease) in the CRD template and change that string accordingly.

python/packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py

python/packages/jumpstarter/jumpstarter/client/status_monitor.py

python/packages/jumpstarter/jumpstarter/exporter/exporter.py

python/packages/jumpstarter/jumpstarter/exporter/hooks.py

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In
`@python/packages/jumpstarter-driver-power/jumpstarter_driver_power/client_test.py`:
- Around line 8-20: The test_log_stream uses fixed time.sleep(1) which is flaky;
replace these sleeps with a short-polling loop that checks caplog.text until the
expected substring appears or a timeout expires. In test_log_stream (around
client.log_stream(), client.on(), client.off()) implement a small helper or
inline poll using time.monotonic() to loop with a tiny sleep (e.g., 0.01–0.1s)
and fail if timeout reached, then assert "power on" and "power off" only after
the poll succeeds; keep using caplog and the same logger scope so the
client.log_stream(), client.on(), and client.off() calls remain unchanged.

In `@python/packages/jumpstarter/jumpstarter/client/client_test.py`:
- Line 3: Remove the unused import OrderedDict from the imports in
client_test.py; locate the line "from collections import OrderedDict" and delete
it (or remove OrderedDict from the import list) so the lint warning is resolved
and no functionality is changed.

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py`:
- Line 3: The import list in status_monitor_test.py includes an unused symbol
AsyncMock; remove AsyncMock from the import statement (keep MagicMock) so the
module only imports symbols actually used in the tests, ensuring static analysis
no longer flags the unused AsyncMock.

In `@python/packages/jumpstarter/jumpstarter/common/enums_test.py`:
- Line 3: Remove the unused top-level import "import pytest" from enums_test.py;
locate the import statement at the top of the test module and delete it so the
file no longer contains the unused pytest symbol.

🧹 Nitpick comments (4)

python/packages/jumpstarter/jumpstarter/exporter/lease_context_test.py (1)
186-200: Avoid time-based sleep to reduce test flakiness.

Line 195 uses a fixed sleep, which can be flaky on slow CI. Consider a handshake event so the test deterministically waits for wait_for_drivers() to start before setting the hook.
♻️ Proposed deterministic rewrite
     async def test_wait_for_drivers_blocks_until_set(self) -> None:
         """Test that wait_for_drivers() blocks until hook event is set."""
         import anyio
 
         before_hook = Event()
         ctx = LeaseContext(lease_name="test-lease", before_lease_hook=before_hook)
 
-        # Set the event after a short delay
-        async def set_after_delay():
-            await anyio.sleep(0.05)
-            before_hook.set()
+        waiter_started = Event()
+
+        async def wait_and_signal():
+            waiter_started.set()
+            await ctx.wait_for_drivers()
 
         async with anyio.create_task_group() as tg:
-            tg.start_soon(set_after_delay)
-            await ctx.wait_for_drivers()
+            tg.start_soon(wait_and_signal)
+            await waiter_started.wait()
+            before_hook.set()
 
         assert ctx.drivers_ready()
python/packages/jumpstarter/jumpstarter/client/core_test.py (2)
16-32: Consider extracting shared test utilities.

MockAioRpcError and create_mock_rpc_error are duplicated from status_monitor_test.py. Consider extracting these to a shared test utilities module (e.g., jumpstarter/client/test_utils.py or a conftest.py).

90-116: Move import to module level.

The anyio import inside wait_for_hook_status should be at the module level for consistency and clarity.
Proposed fix

Add to imports at top of file:
import anyio
Then update the method:
     async def wait_for_hook_status(self, target_status: ExporterStatus, timeout: float = 60.0) -> bool:
         """Wait for exporter to reach a target status using polling."""
-        import anyio
-
         poll_interval = 0.1  # Fast for testing
python/packages/jumpstarter/jumpstarter/client/client_test.py (1)
121-124: Consider adding assertion to fully verify topological order.

The comment states children should be built before parents, but the assertion only verifies grandparent is built after parent. Adding an assertion that child is built before parent would fully validate the topological ordering.
Suggested improvement
         # Children should be built before parents in topological order
-        # The order should be child, parent, grandparent (or parent, child, grandparent)
+        # The order should be child, parent, grandparent
         assert "grandparent" in build_order
         assert build_order.index("grandparent") > build_order.index("parent")
+        assert build_order.index("parent") > build_order.index("child")

coderabbitai · 2026-01-24T16:30:40Z

python/packages/jumpstarter-driver-power/jumpstarter_driver_power/client_test.py

+def test_log_stream(caplog):
+    """Test that driver logs are properly streamed to the client."""
    with serve(MockPower()) as client:
-        log = MagicMock()
-        monkeypatch.setattr(client, "_AsyncDriverClient__log", log)
-        with client.log_stream():
-            client.on()
-            time.sleep(1)  # to ensure log is flushed
-            log.assert_called_with(logging.INFO, "power on")
+        # Set log level to capture INFO messages from exporter:driver logger
+        with caplog.at_level(logging.INFO, logger="exporter:driver"):
+            with client.log_stream():
+                client.on()
+                time.sleep(1)  # to ensure log is flushed
+                assert "power on" in caplog.text

-            client.off()
-            time.sleep(1)
-            log.assert_called_with(logging.INFO, "power off")
+                client.off()
+                time.sleep(1)
+                assert "power off" in caplog.text


⚠️ Potential issue | 🟡 Minor

Avoid fixed sleeps; wait for logs with a timeout.

The 1s sleeps can be flaky (slow/fast CI) and unnecessarily slow locally. Consider polling caplog.text with a short timeout to make the test reliable and quicker.

🔧 Suggested fix

def test_log_stream(caplog): """Test that driver logs are properly streamed to the client.""" with serve(MockPower()) as client: # Set log level to capture INFO messages from exporter:driver logger with caplog.at_level(logging.INFO, logger="exporter:driver"): with client.log_stream(): client.on() - time.sleep(1) # to ensure log is flushed - assert "power on" in caplog.text + for _ in range(50): + if "power on" in caplog.text: + break + time.sleep(0.1) + assert "power on" in caplog.text client.off() - time.sleep(1) - assert "power off" in caplog.text + for _ in range(50): + if "power off" in caplog.text: + break + time.sleep(0.1) + assert "power off" in caplog.text

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter-driver-power/jumpstarter_driver_power/client_test.py` around lines 8 - 20, The test_log_stream uses fixed time.sleep(1) which is flaky; replace these sleeps with a short-polling loop that checks caplog.text until the expected substring appears or a timeout expires. In test_log_stream (around client.log_stream(), client.on(), client.off()) implement a small helper or inline poll using time.monotonic() to loop with a tiny sleep (e.g., 0.01–0.1s) and fail if timeout reached, then assert "power on" and "power off" only after the poll succeeds; keep using caplog and the same logger scope so the client.log_stream(), client.on(), and client.off() calls remain unchanged.

python/packages/jumpstarter/jumpstarter/client/client_test.py

python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py

python/packages/jumpstarter/jumpstarter/common/enums_test.py

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@controller/internal/service/suite_test.go`:
- Around line 56-58: The test harness sets up testEnv as an envtest.Environment
with CRDDirectoryPaths and ErrorIfCRDPathMissing: true, which will abort the
suite if the CRD path is wrong; verify and update the CRDDirectoryPaths value to
the actual CRD directory (or add multiple fallback paths) so envtest can find
the CRDs, e.g., check and correct the path used in the testEnv initialization
(the envtest.Environment struct, the CRDDirectoryPaths slice, and the
ErrorIfCRDPathMissing flag) or set ErrorIfCRDPathMissing to false while you
confirm the correct location.
- Around line 29-31: The test currently ignores errors when creating a
Namespace; change the Namespace creation logic (the call that creates a
corev1.Namespace object, e.g., client.CoreV1().Namespaces().Create or similar in
the test setup) to explicitly handle errors: import
k8s.io/apimachinery/pkg/api/errors as apierrors, then after the Create call
check if err != nil and if apierrors.IsAlreadyExists(err) continue/ignore,
otherwise fail the test (t.Fatalf or return the error). Apply the same pattern
for the other namespace-creation site referenced in the comment (the second
block around lines 81-87) so only AlreadyExists is tolerated and all other
errors cause the test to fail.
- Around line 19-24: Update the test setup in suite_test.go to avoid hard-coding
BinaryAssetsDirectory: first respect the KUBEBUILDER_ASSETS environment variable
if set (use it to set BinaryAssetsDirectory), then fall back to existing helper
GetFirstFoundEnvTestBinaryDir() or a directory-existence check for the current
hard-coded path (bin/k8s/1.30.0-<os>-<arch>), and only assign the hard-coded
path as a last resort; modify the code that sets BinaryAssetsDirectory to
perform conditional checks (os.Getenv("KUBEBUILDER_ASSETS"), os.Stat on
candidate dirs, and GetFirstFoundEnvTestBinaryDir()) so tests run correctly
outside the Makefile and when the version/path differs.

In `@python/packages/jumpstarter/jumpstarter/client/client_test.py`:
- Around line 120-124: The test's topological-order check only ensures
"grandparent" comes after "parent" but doesn't verify that "child" precedes
"parent"; update the assertions to explicitly assert that
build_order.index("child") < build_order.index("parent") (and keep/optionally
assert both "child" and "parent" are before "grandparent") so the test enforces
children-before-parents ordering; reference the existing build_order list and
the node names "child", "parent", and "grandparent" when adding the
assertion(s).

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 31-38: The docstring usage example in status_monitor.py
incorrectly calls client.start_status_monitor(); update the example to call the
actual method client.status_monitor_async() (or add a thin alias
start_status_monitor -> status_monitor_async in the client if you prefer) so the
docstring matches the real API; ensure the rest of the example still references
monitor.current_status and monitor.wait_for_status(ExporterStatus.LEASE_READY).

In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py`:
- Around line 426-500: The before-lease failure branch doesn't actually end the
lease for on_failure='endLease'; modify run_before_lease_hook to accept an async
lease-release callback (e.g., end_lease: Callable[[LeaseContext],
Awaitable[None]]) or reuse an existing controller-release method, and in the
HookExecutionError else-branch (the on_failure='endLease' path) invoke await
end_lease(lease_scope) (and still set lease_scope.before_lease_hook
event/unblock connections and report status) so the lease is cleanly terminated;
update all call sites of run_before_lease_hook to pass the new end_lease
callback (mirroring how run_after_lease_hook is wired).

♻️ Duplicate comments (2)

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)

80-90: Unused synchronous function.

This function is no longer called anywhere in the codebase since line 213 now uses _run_shell_with_lease_async. Consider removing it to reduce maintenance burden.

python/packages/jumpstarter/jumpstarter/client/status_monitor.py (1)

397-403: Async context manager won't start monitoring.

The __aenter__ method returns self without calling start(). Direct usage as an async context manager (async with StatusMonitor(...) as monitor:) will silently fail to start background polling. Users must use client.status_monitor_async() instead. Either fix __aenter__ to properly start monitoring or remove the context manager methods to prevent misuse.

🧹 Nitpick comments (9)

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (2)
157-161: Redundant double timeout.

Both anyio.move_on_after(300) and timeout=300.0 in wait_for_any_of set 5-minute timeouts. The inner timeout parameter makes the outer cancel scope redundant. Consider removing one:
♻️ Proposed fix
-                                        with anyio.move_on_after(300) as timeout_scope:  # 5 minute timeout
-                                            result = await monitor.wait_for_any_of(
-                                                [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
-                                                timeout=300.0
-                                            )
+                                        result = await monitor.wait_for_any_of(
+                                            [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
+                                            timeout=300.0
+                                        )
+                                        if result is None:
+                                            logger.warning("Timeout waiting for afterLease hook to complete")
Then adjust the subsequent conditional logic accordingly, removing the timeout_scope.cancelled_caught check at line 171.
133-136: Consider checking connection_lost for beforeLease hook.

The afterLease handling (line 166) checks monitor.connection_lost to distinguish between timeout and connection loss. The beforeLease handling treats both cases as "Timeout" which could be misleading in logs.
♻️ Proposed enhancement
                             if result == ExporterStatus.BEFORE_LEASE_HOOK_FAILED:
                                 logger.warning("beforeLease hook failed")
                             elif result is None:
-                                logger.warning("Timeout waiting for beforeLease hook")
+                                if monitor.connection_lost:
+                                    logger.warning("Connection lost during beforeLease hook")
+                                else:
+                                    logger.warning("Timeout waiting for beforeLease hook")
python/packages/jumpstarter/jumpstarter/client/client_test.py (1)
58-58: Ensure ExitStack is closed after each test.

ExitStack() is never closed, which can leak cleanup callbacks if client_from_channel registers any. Prefer with ExitStack() as mock_stack: to guarantee teardown.
♻️ Example pattern (apply to each test)
-        mock_stack = ExitStack()
+        with ExitStack() as mock_stack:
             with patch(
                 "jumpstarter.client.client.MultipathExporterStub",
                 return_value=mock_stub
             ), patch(
                 "jumpstarter.client.client.import_class",
                 return_value=MockDriverClient
             ):
                 from jumpstarter.client.client import client_from_channel

                 client = await client_from_channel(
                     mock_channel, mock_portal, mock_stack, allow=[], unsafe=True
                 )
Also applies to: 98-98, 143-143, 177-177, 211-211, 245-245
python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py (1)

16-32: Consider extracting shared mock utilities to a common test module.

MockAioRpcError and create_mock_rpc_error are duplicated here and in core_test.py (per the relevant snippets). Extracting these to a shared test utilities module (e.g., conftest.py or a dedicated testing.py) would reduce duplication.

python/packages/jumpstarter/jumpstarter/client/core.py (1)

332-351: Asymmetric behavior when monitor is not started.

wait_for_lease_ready_monitored raises RuntimeError if the monitor isn't started, while wait_for_hook_complete_monitored silently returns True. This asymmetry could be surprising. Consider documenting this behavior difference more explicitly or using consistent error handling.
python/packages/jumpstarter/jumpstarter/client/status_monitor.py (1)
309-309: Use %-style formatting in logger calls instead of f-strings.

F-strings in logging calls evaluate the string even when the log level is disabled, causing unnecessary overhead. Use %-style formatting for deferred evaluation.
Proposed fix
-                    logger.warning(f"Missed {missed} status transition(s)")
+                    logger.warning("Missed %d status transition(s)", missed)
-                    logger.info(f"Status changed: {old_status} -> {new_status} (version={new_version})")
+                    logger.info("Status changed: %s -> %s (version=%d)", old_status, new_status, new_version)
-                            logger.error(f"Status change callback error: {e}")
+                            logger.error("Status change callback error: %s", e)
-                logger.debug(f"GetStatus poll error: {e.code()}")
+                logger.debug("GetStatus poll error: %s", e.code())
-                logger.debug(f"GetStatus poll error: {e}")
+                logger.debug("GetStatus poll error: %s", e)
Also applies to: 320-320, 337-337, 352-354
controller/internal/service/suite_test.go (1)
90-93: Guard teardown if setup failed.

If testEnv never initialized, Stop() can panic in AfterSuite. A simple nil check avoids cascading failures.
🛡️ Suggested guard
 	By("tearing down the test environment")
-	err := testEnv.Stop()
-	Expect(err).NotTo(HaveOccurred())
+	if testEnv != nil {
+		err := testEnv.Stop()
+		Expect(err).NotTo(HaveOccurred())
+	}
controller/Makefile (1)

66-75: Guard against duplicate metadata.annotations blocks.

Line 66–71 unconditionally inserts a new annotations: map. If the generated role ever includes annotations (now or later), this can create duplicate keys and potentially drop the sync-wave or other annotations depending on YAML parsing behavior. Consider merging into an existing annotations map or using a YAML-aware tool for the insert.
python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)
341-356: Use anyio.to_thread.run_sync() with timeout for blocking process.wait() after process.kill().

The direct process.wait() call at line 354 blocks the event loop if the process doesn't exit promptly. The same block already demonstrates the correct pattern at line 346–347 with anyio.to_thread.run_sync() and anyio.move_on_after(). Apply the same pattern here for consistency and robustness:
Safer non‑blocking wait
                 if process.poll() is None:
                     process.kill()
                     try:
-                        process.wait()
+                        with anyio.move_on_after(5):
+                            await anyio.to_thread.run_sync(process.wait)
                     except Exception:
                         pass

coderabbitai · 2026-01-24T18:43:50Z

controller/internal/service/suite_test.go

+import (
+	"context"
+	"fmt"
+	"path/filepath"
+	"runtime"
+	"testing"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the suite_test.go file to see the current implementation find . -name "suite_test.go" -path "*/controller/internal/service/*"

Repository: jumpstarter-dev/jumpstarter

Length of output: 114

🏁 Script executed:

# Search for the file more broadly find . -name "suite_test.go" | grep -E "(controller|service)"

Repository: jumpstarter-dev/jumpstarter

Length of output: 321

🏁 Script executed:

# List files in the controller directory git ls-files | grep -E "controller.*service.*test" | head -20

Repository: jumpstarter-dev/jumpstarter

Length of output: 234

🏁 Script executed:

cat -n ./controller/internal/service/suite_test.go

Repository: jumpstarter-dev/jumpstarter

Length of output: 3574

🌐 Web query:

controller-runtime envtest BinaryAssetsDirectory KUBEBUILDER_ASSETS best practices

💡 Result:

Best practices (controller-runtime envtest) for BinaryAssetsDirectory / KUBEBUILDER_ASSETS

Prefer driving asset selection from your build/test harness (Makefile/CI) via KUBEBUILDER_ASSETS, not hardcoding paths in Go.
Kubebuilder’s recommended pattern is to have setup-envtest pick the right binaries and export the directory for the test run, e.g. KUBEBUILDER_ASSETS="$(setup-envtest use $(ENVTEST_K8S_VERSION) ... -p path)" go test ./.... This keeps tests reproducible across dev machines and CI. [2]

Use BinaryAssetsDirectory only when you explicitly want to control/share the on-disk cache location from Go code.
Controller-runtime provides SetupEnvtestDefaultBinaryAssetsDirectory() specifically so you can point Environment.BinaryAssetsDirectory at the same OS-specific directory that setup-envtest uses—so runs can share binaries—without hardcoding OS paths. [1]

Avoid relying on the historical default /usr/local/kubebuilder/bin.
Envtest loads binaries from /usr/local/kubebuilder/bin by default, but best practice is to override via KUBEBUILDER_ASSETS (or explicit TEST_ASSET_* vars) for portability and hermetic CI. [1]

For offline/air-gapped or “no network during tests”, use installed-only mode.
Use setup-envtest ... -i (or ENVTEST_INSTALLED_ONLY=true) so it won’t try to download binaries during go test. If you already manage binaries elsewhere, use --use-env -i (or ENVTEST_USE_ENV=true) will respect an externally-provided KUBEBUILDER_ASSETS. [2][4]

Ensure you’re using the current envtest artifact index/location (don’t depend on the old GCS bucket).
The old storage.googleapis.com/kubebuilder-tools artifacts are deprecated; newer versions are promoted via controller-tools’ envtest-releases.yaml, and you need setup-envtest from controller-runtime v0.19.0+ to transparently use the new location. [3]

🏁 Script executed:

# Check if other test suites in the repo use conditional BinaryAssetsDirectory rg "BinaryAssetsDirectory" -A 3 -B 1 controller/

Repository: jumpstarter-dev/jumpstarter

Length of output: 3875

🏁 Script executed:

# Check if KUBEBUILDER_ASSETS is used anywhere in the codebase rg "KUBEBUILDER_ASSETS" .

Repository: jumpstarter-dev/jumpstarter

Length of output: 613

🏁 Script executed:

# Examine the conditional approach used in other test files cat controller/deploy/operator/internal/controller/testutils/envtest.go | head -50

Repository: jumpstarter-dev/jumpstarter

Length of output: 1885

Use a conditional check for BinaryAssetsDirectory instead of hard-coding a fixed path.

The current code unconditionally sets BinaryAssetsDirectory to bin/k8s/1.30.0-<os>-<arch>, which will fail when:

Running tests via IDE or direct go test invocation (outside the Makefile that sets KUBEBUILDER_ASSETS)

Binaries haven't been downloaded to that specific path

The Kubernetes version is upgraded

Other test suites in the codebase already use conditional logic (e.g., controller/deploy/operator/ uses GetFirstFoundEnvTestBinaryDir()). Consider making this check conditional on directory existence, or respecting the KUBEBUILDER_ASSETS environment variable as the primary source.

🔧 Suggested change (conditional BinaryAssetsDirectory)

import ( "context" "fmt" + "os" "path/filepath" "runtime" "testing" @@ By("bootstrapping test environment") testEnv = &envtest.Environment{ CRDDirectoryPaths: []string{filepath.Join("..", "..", "deploy", "helm", "jumpstarter", "crds")}, ErrorIfCRDPathMissing: true, - - // The BinaryAssetsDirectory is only required if you want to run the tests directly - // without calling the makefile target test. If not informed it will look for the - // default path defined in controller-runtime which is /usr/local/kubebuilder/. - // Note that you must have the required binaries setup under the bin directory to perform - // the tests directly. When we run make test it will be setup and used automatically. - BinaryAssetsDirectory: filepath.Join("..", "..", "bin", "k8s", - fmt.Sprintf("1.30.0-%s-%s", runtime.GOOS, runtime.GOARCH)), } + + // Only pin envtest binaries if the expected directory exists + binPath := filepath.Join("..", "..", "bin", "k8s", + fmt.Sprintf("1.30.0-%s-%s", runtime.GOOS, runtime.GOARCH)) + if _, err := os.Stat(binPath); err == nil { + testEnv.BinaryAssetsDirectory = binPath + }

🤖 Prompt for AI Agents

In `@controller/internal/service/suite_test.go` around lines 19 - 24, Update the test setup in suite_test.go to avoid hard-coding BinaryAssetsDirectory: first respect the KUBEBUILDER_ASSETS environment variable if set (use it to set BinaryAssetsDirectory), then fall back to existing helper GetFirstFoundEnvTestBinaryDir() or a directory-existence check for the current hard-coded path (bin/k8s/1.30.0-<os>-<arch>), and only assign the hard-coded path as a last resort; modify the code that sets BinaryAssetsDirectory to perform conditional checks (os.Getenv("KUBEBUILDER_ASSETS"), os.Stat on candidate dirs, and GetFirstFoundEnvTestBinaryDir()) so tests run correctly outside the Makefile and when the version/path differs.

coderabbitai · 2026-01-24T18:43:50Z

controller/internal/service/suite_test.go

+	corev1 "k8s.io/api/core/v1"
+	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
+	"k8s.io/client-go/kubernetes/scheme"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find the suite_test.go file and read it find . -name "suite_test.go" -path "*/controller/internal/service/*" | head -5

Repository: jumpstarter-dev/jumpstarter

Length of output: 114

🏁 Script executed:

# Read the suite_test.go file to see the code at the mentioned lines cat -n ./controller/internal/service/suite_test.go | head -100

Repository: jumpstarter-dev/jumpstarter

Length of output: 3574

🏁 Script executed:

# Verify apierrors package exists and IsAlreadyExists is the correct function web_search "k8s.io/apimachinery/pkg/api/errors IsAlreadyExists golang"

Repository: jumpstarter-dev/jumpstarter

Length of output: 121

🌐 Web query:

k8s.io/apimachinery pkg api errors IsAlreadyExists usage golang kubernetes

💡 Result:

In Go, k8s.io/apimachinery/pkg/api/errors.IsAlreadyExists(err) is the standard way to check whether a Kubernetes API call failed because the object you tried to create already exists. It returns true for “already exists” API StatusErrors (and also supports wrapped errors). [1]

Typical usage (client-go)

import ( apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" corev1 "k8s.io/api/core/v1" ) cm := &corev1.ConfigMap{ ObjectMeta: metav1.ObjectMeta{ Name: "demo", Namespace: "default", }, Data: map[string]string{"k": "v"}, } _, err := clientset.CoreV1().ConfigMaps("default").Create(ctx, cm, metav1.CreateOptions{}) if err != nil { if apierrors.IsAlreadyExists(err) { // idempotent create: ignore or fetch/update instead return nil } return err } return nil

Notes

Use it after Create(...) (or any call that can return an “already exists” response). [1]

The function signature is func IsAlreadyExists(err error) bool. [1]

Sources:
[1] pkg.go.dev: k8s.io/apimachinery/pkg/api/errors (IsAlreadyExists) (pkg.go.dev)

Citations:

1: https://pkg.go.dev/k8s.io/apimachinery/pkg/api/errors?utm_source=openai

Don't ignore namespace creation errors.

Ignoring the error can mask real failures (RBAC, invalid config, etc.). Handle AlreadyExists explicitly and fail on other errors.

✅ Suggested fix

import ( corev1 "k8s.io/api/core/v1" + apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/client-go/kubernetes/scheme"

ns := &corev1.Namespace{ ObjectMeta: metav1.ObjectMeta{ Name: "default", }, } - _ = k8sClient.Create(ctx, ns) + if err := k8sClient.Create(ctx, ns); err != nil && !apierrors.IsAlreadyExists(err) { + Expect(err).NotTo(HaveOccurred()) + }

Also applies to: 81-87

🤖 Prompt for AI Agents

In `@controller/internal/service/suite_test.go` around lines 29 - 31, The test currently ignores errors when creating a Namespace; change the Namespace creation logic (the call that creates a corev1.Namespace object, e.g., client.CoreV1().Namespaces().Create or similar in the test setup) to explicitly handle errors: import k8s.io/apimachinery/pkg/api/errors as apierrors, then after the Create call check if err != nil and if apierrors.IsAlreadyExists(err) continue/ignore, otherwise fail the test (t.Fatalf or return the error). Apply the same pattern for the other namespace-creation site referenced in the comment (the second block around lines 81-87) so only AlreadyExists is tolerated and all other errors cause the test to fail.

controller/internal/service/suite_test.go

coderabbitai · 2026-01-24T18:43:50Z

python/packages/jumpstarter/jumpstarter/client/client_test.py

+        # Children should be built before parents in topological order
+        # The order should be child, parent, grandparent (or parent, child, grandparent)
+        assert "grandparent" in build_order
+        assert build_order.index("grandparent") > build_order.index("parent")
+


⚠️ Potential issue | 🟡 Minor

Strengthen the topological-order assertion.

The test claims “children before parents” but only checks that grandparent is after parent. It never asserts child is before parent, and the comment even allows the opposite order.

✅ Suggested assertion update

- # The order should be child, parent, grandparent (or parent, child, grandparent) - assert "grandparent" in build_order - assert build_order.index("grandparent") > build_order.index("parent") + # The order should be child -> parent -> grandparent + assert "grandparent" in build_order + assert build_order.index("parent") > build_order.index("child") + assert build_order.index("grandparent") > build_order.index("parent")

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter/jumpstarter/client/client_test.py` around lines 120 - 124, The test's topological-order check only ensures "grandparent" comes after "parent" but doesn't verify that "child" precedes "parent"; update the assertions to explicitly assert that build_order.index("child") < build_order.index("parent") (and keep/optionally assert both "child" and "parent" are before "grandparent") so the test enforces children-before-parents ordering; reference the existing build_order list and the node names "child", "parent", and "grandparent" when adding the assertion(s).

python/packages/jumpstarter/jumpstarter/client/status_monitor.py

python/packages/jumpstarter/jumpstarter/exporter/hooks.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@controller/Makefile`:
- Line 74: The Makefile currently runs the cp command copying CRDs to
deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/, which
will fail if that target directory doesn't exist; update the Makefile to create
the directory first by running a mkdir -p for
deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/
immediately before the cp command so the cp in the Makefile succeeds even when
the templates/crds directory is missing.

🧹 Nitpick comments (1)

controller/Makefile (1)
66-71: Idempotency risk: duplicate annotations: keys if file already contains annotations.

The AWK script unconditionally inserts an annotations: block after the name: line. If controller-gen ever outputs annotations in the generated role.yaml, or if the manifests target runs without a fresh regeneration, this will produce invalid YAML with duplicate annotations: keys.

Consider using a more robust approach that checks for existing annotations before inserting:
♻️ Suggested safer AWK approach
 	# Add ArgoCD sync-wave annotation to RBAC role for proper deployment ordering (PR `#207`)
-	`@awk` '/^  name: jumpstarter-manager-role$$/{print; print "  annotations:"; print "    argocd.argoproj.io/sync-wave: \"-1\""; next}1' \
+	`@awk` ' \
+		/^  name: jumpstarter-manager-role$$/ { print; found_name=1; next } \
+		found_name && /^  annotations:/ { print; found_annotations=1; next } \
+		found_name && found_annotations && /^  [a-zA-Z]/ && !/^    / { \
+			print "    argocd.argoproj.io/sync-wave: \"-1\""; found_name=0; found_annotations=0 \
+		} \
+		found_name && !found_annotations && /^  [a-zA-Z]/ { \
+			print "  annotations:"; print "    argocd.argoproj.io/sync-wave: \"-1\""; found_name=0 \
+		} \
+		{ print }' \
 		deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml > \
 		deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml.tmp && \
 		mv deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml.tmp \
 		deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml
Alternatively, consider using a tool like yq for safer YAML manipulation.

Also, the comment references "PR #207" but this appears to be PR #140.

coderabbitai · 2026-01-24T19:45:50Z

controller/Makefile

+		deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml
+
 	cp deploy/helm/jumpstarter/crds/* deploy/operator/config/crd/bases/
+	cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if the target directory exists in the repository find deploy/helm/jumpstarter/charts/jumpstarter-controller/templates -type d -name "crds" 2>/dev/null # Check if it's tracked in git at all git ls-tree -r HEAD deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds 2>/dev/null || echo "No files found in git" # List what's in templates directory echo "=== Contents of templates directory ===" ls -la deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/ 2>/dev/null || echo "Directory does not exist" # Check the Makefile around line 74 to see context echo "=== Makefile context around line 74 ===" sed -n '70,80p' controller/Makefile

Repository: jumpstarter-dev/jumpstarter

Length of output: 764

🏁 Script executed:

#!/bin/bash # Check if deploy/helm/jumpstarter/charts directory exists echo "=== Checking helm chart directory structure ===" ls -la deploy/helm/jumpstarter/charts/ 2>/dev/null || echo "charts directory does not exist" # Search Makefile for any mkdir commands echo "=== Searching for mkdir in Makefile ===" grep -n "mkdir" controller/Makefile || echo "No mkdir found" # Check if there's a target that creates the templates directory echo "=== Search for directory creation patterns ===" grep -n "templates" controller/Makefile # Check the generate-crds or similar target echo "=== Full Makefile targets related to CRD or helm ===" grep -n "^[a-zA-Z].*:" controller/Makefile | grep -E "(crd|helm|generate)"

Repository: jumpstarter-dev/jumpstarter

Length of output: 1423

🏁 Script executed:

#!/bin/bash # Get the full target definition that contains line 74 echo "=== Full target containing the cp command ===" sed -n '60,80p' controller/Makefile # Check if there's an earlier step that might create the directory echo "=== Check for any manifests or controller-gen target ===" sed -n '1,100p' controller/Makefile | grep -A 20 "^manifests:" # Look for the controller-gen command echo "=== Look for controller-gen configuration ===" grep -B 5 -A 10 "controller-gen" controller/Makefile | head -40

Repository: jumpstarter-dev/jumpstarter

Length of output: 4959

🏁 Script executed:

#!/bin/bash # Check if there are any .gitkeep or placeholder files in the charts/templates directories echo "=== Check for placeholder files ===" find deploy/helm/jumpstarter/charts -type f 2>/dev/null | head -20 # Check the full directory structure of deploy/helm echo "=== Full helm structure ===" find deploy/helm -type d 2>/dev/null | sort # Check if crds directory exists at all echo "=== Search for crds directories ===" find deploy -type d -name "crds" 2>/dev/null

Repository: jumpstarter-dev/jumpstarter

Length of output: 170

🏁 Script executed:

#!/bin/bash # Check what exists in deploy directory echo "=== Full deploy directory structure ===" find deploy -type d 2>/dev/null | head -30 # Check if operator directory structure exists echo "=== Check operator crd/bases directory ===" ls -la deploy/operator/config/crd/bases/ 2>/dev/null || echo "Directory does not exist" # Check if there are any crds in the source echo "=== Check source crds ===" ls -la deploy/helm/jumpstarter/crds/ 2>/dev/null || echo "CRDs source directory does not exist"

Repository: jumpstarter-dev/jumpstarter

Length of output: 241

🏁 Script executed:

#!/bin/bash # List everything in deploy directory echo "=== All files in deploy/ recursively ===" find deploy -type f 2>/dev/null | head -50 # Check if deploy directory itself exists echo "=== Deploy directory contents ===" ls -la deploy/ 2>/dev/null # Count total files in entire repo echo "=== Repository file count and structure ===" find . -type f -not -path './.git/*' | wc -l echo "Total directories:" find . -type d -not -path './.git/*' | wc -l

Repository: jumpstarter-dev/jumpstarter

Length of output: 216

🏁 Script executed:

#!/bin/bash # Check git status of deploy directory echo "=== Check if deploy is tracked in git ===" git ls-tree -d HEAD deploy 2>/dev/null || echo "deploy directory not in git" # List top-level directories echo "=== Top-level directories ===" ls -la | grep '^d' # Check if there's a .gitkeep or similar in deploy echo "=== Hidden files in deploy ===" ls -la deploy/ 2>/dev/null # Check recent commits to understand if deploy is supposed to exist echo "=== Check if deploy paths were ever added ===" git log --all --full-history --oneline -- "deploy/helm/*" "deploy/operator/*" 2>/dev/null | head -5

Repository: jumpstarter-dev/jumpstarter

Length of output: 676

🏁 Script executed:

#!/bin/bash # Check if deploy directory exists at all echo "=== Does deploy directory exist? ===" test -d deploy && echo "Yes" || echo "No" # List actual path with stat echo "=== Using stat on deploy ===" stat deploy 2>&1 || echo "deploy does not exist" # Check the controller Makefile more carefully for context echo "=== Check lines around the change (60-80) ===" sed -n '60,80p' controller/Makefile # Check if there's documentation about running manifests echo "=== Check README or setup docs ===" ls -la *.md controller/*.md 2>/dev/null | head -10

Repository: jumpstarter-dev/jumpstarter

Length of output: 1817

🏁 Script executed:

#!/bin/bash # Check what controller-gen does - look at how it's invoked and if it creates directories echo "=== Check controller version and documentation ===" grep -r "controller-gen" controller/ | grep -v ".o\|Binary" | head -5 # Check if there's documentation about directory creation echo "=== Check if there are any docs about the build process ===" cat README.md | head -50 # Check controller/README.md echo "=== Check controller README ===" cat controller/README.md

Repository: jumpstarter-dev/jumpstarter

Length of output: 5972

Add directory creation before copying CRDs to the Helm chart templates.

The cp command on line 74 will fail because deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ doesn't exist. While controller-gen creates the output directories it writes to, it doesn't create the templates/crds/ subdirectory. Add mkdir -p before the copy.

♻️ Suggested fix

cp deploy/helm/jumpstarter/crds/* deploy/operator/config/crd/bases/ + mkdir -p deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/

mkdir -p deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/

cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/

🤖 Prompt for AI Agents

In `@controller/Makefile` at line 74, The Makefile currently runs the cp command copying CRDs to deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/, which will fail if that target directory doesn't exist; update the Makefile to create the directory first by running a mkdir -p for deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ immediately before the cp command so the cp in the Makefile succeeds even when the templates/crds directory is missing.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@python/packages/jumpstarter-cli/jumpstarter_cli/shell.py`:
- Around line 125-143: The code currently launches the shell unconditionally
after monitor.wait_for_any_of; change it to treat
ExporterStatus.BEFORE_LEASE_HOOK_FAILED and a None (timeout) as hard stops by
not calling _run_shell_only. After the wait_for_any_of call
(monitor.wait_for_any_of), check result and if it equals
ExporterStatus.BEFORE_LEASE_HOOK_FAILED or is None, log the warning and exit the
function (or raise an appropriate exception) instead of proceeding to call
anyio.to_thread.run_sync(_run_shell_only, lease, config, command, path); only
call _run_shell_only when result == ExporterStatus.LEASE_READY.

♻️ Duplicate comments (5)

python/packages/jumpstarter/jumpstarter/exporter/exporter.py (4)

202-204: Controller channels are not closed after use.

This creates a new gRPC channel per call but never closes it. Multiple calls during status updates and stream retries will leak channels. The pattern in _unregister_with_controller() (lines 357-369) demonstrates proper cleanup with explicit await channel.close().

414-458: Task may hang when lease ends without EndSession.

This task only waits on end_session_requested (line 426). If the lease ends without the client calling EndSession, the task continues waiting indefinitely on the old lease context. Consider racing against lease_ended or using a task group scoped to the lease lifetime.

546-553: Listen task should be scoped to the lease lifetime.

The _retry_stream task for Listen is started on the outer tg (line 548), causing it to continue running after handle_lease exits. This results in the task persisting across lease transitions. Move this to conn_tg so it's automatically cancelled when the lease ends, and close listen_tx in the finally block to signal proper stream termination.

578-579: Consider scoping _handle_end_session to the lease task group.

Starting this task on the outer tg means it survives the lease ending. If combined with the fact that it only waits on end_session_requested, the task can be orphaned. This is related to the hang issue noted in the _handle_end_session method.
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
80-89: Consider removing _run_shell_with_lease if it’s now unused.

The async path appears to be the only entry point, so this sync helper likely adds maintenance overhead.
#!/bin/bash
# Verify remaining usages of the sync helper
rg -nP '\b_run_shell_with_lease\s*\(' --type py

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 8-18: The import block is unsorted per Ruff I001; reorder imports
into standard-library first (logging, collections.abc.Awaitable,
collections.abc.Callable, contextlib.AsyncExitStack), then third-party packages
(anyio, anyio.Event, anyio.create_task_group, grpc.StatusCode,
grpc.aio.AioRpcError, jumpstarter_protocol.jumpstarter_pb2), and finally
local/project imports (jumpstarter.common.ExporterStatus) so the import order
satisfies the linter and keeps the existing imported symbols intact.
- Around line 196-201: The timeout checks in wait_for_status() and
wait_for_any_of() incorrectly use "if timeout:" which treats 0 as falsy; change
both checks to "if timeout is not None:" so a timeout of 0 is handled as an
immediate timeout; update the branches around anyio.move_on_after in
wait_for_status() and wait_for_any_of() (the blocks that call wait_loop() and
return False) to use the new condition.
- Around line 396-400: stop() currently sets _running = False and signals
_stop_event but does not wake waiters blocked on _any_change_event; update
stop() to also set _any_change_event (and optionally _any_change_event.set()
followed by _any_change_event.clear() if you want a one-shot wake) so waiters in
wait_for_status and wait_for_any_of that are awaiting _any_change_event.wait()
will wake and observe _running == False and exit their loops; reference the stop
method and the _any_change_event, wait_for_status, and wait_for_any_of functions
when applying this change.

In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py`:
- Around line 168-191: The stop() method currently ignores the caller's
should_unregister when wait_for_lease_exit is True, causing later calls (e.g.,
in serve()) to override intent; update stop() so that whenever a deferred stop
is requested (the branch setting self._stop_requested = True) you also set
self._unregister = should_unregister (and likewise when a stop has already been
requested you should update/preserve self._unregister accordingly) so the
original caller preference is retained; refer to the stop() method, the
attributes self._unregister, self._stop_requested, and self._tg.cancel_scope to
implement this change.

🧹 Nitpick comments (1)

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)

147-162: Remove redundant timeout wrapper and improve timeout detection logic.

The StatusMonitor.connection_lost property is valid and well-tested. However, the outer anyio.move_on_after(30) is redundant—wait_for_any_of() already has an identical timeout=30.0 parameter. When both are equal, the inner timeout fires first, making the outer scope's cancelled_caught flag never True. This means timeout cases currently fall through to "Hook completion not confirmed" rather than the timeout warning.

The suggested refactor properly addresses this by removing the outer timeout wrapper and using result is None to detect timeouts, which disambiguates between connection loss and timeout.

♻️ Suggested simplification

-                                        with anyio.move_on_after(30) as timeout_scope:  # 30 second timeout
-                                            result = await monitor.wait_for_any_of(
-                                                [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
-                                                timeout=30.0
-                                            )
-                                            if result == ExporterStatus.AVAILABLE:
-                                                logger.info("afterLease hook completed")
-                                            elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED:
-                                                logger.warning("afterLease hook failed")
-                                            elif monitor.connection_lost:
-                                                # Connection lost - exporter closed, hook likely completed
-                                                logger.info("Exporter connection closed (hook completed)")
-                                            else:
-                                                logger.debug("Hook completion not confirmed")
-                                        if timeout_scope.cancelled_caught:
-                                            logger.warning("Timeout waiting for afterLease hook to complete")
+                                        result = await monitor.wait_for_any_of(
+                                            [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
+                                            timeout=30.0
+                                        )
+                                        if result == ExporterStatus.AVAILABLE:
+                                            logger.info("afterLease hook completed")
+                                        elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED:
+                                            logger.warning("afterLease hook failed")
+                                        elif result is None and monitor.connection_lost:
+                                            # Connection lost - exporter closed, hook likely completed
+                                            logger.info("Exporter connection closed (hook completed)")
+                                        elif result is None:
+                                            logger.warning("Timeout waiting for afterLease hook to complete")
+                                        else:
+                                            logger.debug("Hook completion not confirmed")

python/packages/jumpstarter/jumpstarter/client/status_monitor.py

coderabbitai · 2026-01-24T20:33:18Z

python/packages/jumpstarter/jumpstarter/client/status_monitor.py

+        if timeout:
+            with anyio.move_on_after(timeout):
+                return await wait_loop()
+            return False
+        else:
+            return await wait_loop()


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd /tmp && find . -name "status_monitor.py" -type f 2>/dev/null | head -5

Repository: jumpstarter-dev/jumpstarter

Length of output: 53

🏁 Script executed:

# First, let's find the file and examine the relevant lines find . -name "status_monitor.py" -type f 2>/dev/null

Repository: jumpstarter-dev/jumpstarter

Length of output: 137

🏁 Script executed:

# If found, read the file to see context around the mentioned lines cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py 2>/dev/null | sed -n '190,210p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 923

🏁 Script executed:

# Also check the second location (lines 280-285) cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py 2>/dev/null | sed -n '275,295p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 939

🏁 Script executed:

# Search for function calls to wait_for_status and wait_for_any_of rg "wait_for_status\(|wait_for_any_of\(" --type=py -A 3 -B 1

Repository: jumpstarter-dev/jumpstarter

Length of output: 9431

🏁 Script executed:

# Search for timeout=0 patterns rg "timeout\s*=\s*0" --type=py -B 2 -A 2

Repository: jumpstarter-dev/jumpstarter

Length of output: 8378

🏁 Script executed:

# Check if there's any documentation or usage of anyio.move_on_after in the codebase rg "anyio.move_on_after|move_on_after" --type=py -B 1 -A 1

Repository: jumpstarter-dev/jumpstarter

Length of output: 3402

🏁 Script executed:

# Check the function definitions to understand the full context cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py | sed -n '155,210p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 2539

🏁 Script executed:

# Look at the second function to confirm it's identical pattern cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py | sed -n '245,290p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 2252

Use if timeout is not None: to correctly handle timeout=0.

The condition if timeout: treats 0 as falsy, which causes an immediate timeout (timeout=0) to instead wait indefinitely. This occurs in both wait_for_status() and wait_for_any_of() methods because the timeout parameter accepts float | None, where 0 is a valid value distinct from None.

Affected locations

Lines 196-201 in wait_for_status()

Lines 280-285 in wait_for_any_of()

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py` around lines 196 - 201, The timeout checks in wait_for_status() and wait_for_any_of() incorrectly use "if timeout:" which treats 0 as falsy; change both checks to "if timeout is not None:" so a timeout of 0 is handled as an immediate timeout; update the branches around anyio.move_on_after in wait_for_status() and wait_for_any_of() (the blocks that call wait_loop() and return False) to use the new condition.

python/packages/jumpstarter/jumpstarter/client/status_monitor.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@python/packages/jumpstarter/jumpstarter/client/lease.py`:
- Around line 237-257: The fixed-count retry loop around self.controller.Dial
(using max_retries/base_delay/max_delay) should be replaced with a time‑based
retry bounded by a configurable timeout (use the existing acquisition_timeout if
available or add a new dial_timeout attribute/parameter) so Dial keeps retrying
until time.monotonic() exceeds the deadline; compute deadline = now + timeout,
then loop attempting await self.controller.Dial(...), on AioRpcError with
FAILED_PRECONDITION and "not ready" apply the same exponential/backoff delay
(min(base_delay*2**attempt, max_delay)) but stop retrying and re-raise once
monotonic() > deadline; update the retry log to show remaining time or attempts
left and ensure the new timeout is plumbed through the Lease acquisition
call/site that calls this Dial logic.

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 343-347: In _poll_loop (status_monitor.py) the AioRpcError
UNIMPLEMENTED branch currently breaks without clearing _running or notifying
waiters; update that branch to mirror the UNAVAILABLE handling by setting
self._running = False and calling self._any_change_event.set() (and any other
cleanup done in the UNAVAILABLE branch) before breaking so wait_for_status() and
wait_for_any_of() don't hang; also add a unit test that triggers UNIMPLEMENTED
while a waiter (using wait_for_status() or wait_for_any_of()) is active and
assert the waiter returns promptly.

♻️ Duplicate comments (3)

python/packages/jumpstarter/jumpstarter/client/status_monitor.py (3)
8-18: Reorder imports to satisfy Ruff (I001).

Current ordering may still fail Ruff I001.
🔧 Proposed import order
-import logging
-from collections.abc import Awaitable, Callable
-from contextlib import AsyncExitStack
+from collections.abc import Awaitable, Callable
+from contextlib import AsyncExitStack
+import logging
Please verify with:
#!/bin/bash
ruff check python/packages/jumpstarter/jumpstarter/client/status_monitor.py
Based on learnings, Ruff is the formatter/linter for Python files.

196-201: Handle timeout=0 explicitly.

if timeout: treats 0 as falsy, so an immediate timeout becomes an infinite wait (Line 196 and Line 280).
🐛 Proposed fix
-        if timeout:
+        if timeout is not None:
             with anyio.move_on_after(timeout):
                 return await wait_loop()
             return False
-        if timeout:
+        if timeout is not None:
             with anyio.move_on_after(timeout):
                 return await wait_for_first()
             return None
Optional verification:
#!/bin/bash
rg -n "wait_for_status|wait_for_any_of|timeout\s*=\s*0" --type=py
Also applies to: 280-285

396-400: Wake waiting coroutines on stop().

stop() doesn’t signal _any_change_event, so wait_for_* can hang after shutdown.
🛠️ Proposed fix
     async def stop(self):
         """Stop the background polling task."""
         self._running = False
         self._stop_event.set()
+        # Wake any waiters so they can observe _running=False
+        self._any_change_event.set()
+        self._any_change_event = Event()

python/packages/jumpstarter/jumpstarter/client/lease.py

python/packages/jumpstarter/jumpstarter/client/status_monitor.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@e2e/dex.values.yaml`:
- Around line 37-40: The userID value for the test user "test-exporter-hooks" is
not a valid UUID because it contains the non-hex character 'g' (value:
c6ed6f40-6689-6g7e-c64c-55d0a9a5871f); replace this with a valid UUID (only 0-9
and a-f), e.g., generate a new v4 UUID and set userID to that new value so the
entry for username "test-exporter-hooks" and its userID field are valid.

🧹 Nitpick comments (5)

e2e/tests-hooks.bats (5)
9-10: Unused variable declaration.

CURRENT_HOOKS_CONFIG is declared but never referenced anywhere in the file. Consider removing it to avoid confusion.
Suggested fix
-# Track which config is currently active
-CURRENT_HOOKS_CONFIG=""
51-54: Consider graceful termination before SIGKILL.

Using kill -9 immediately doesn't give the process a chance to clean up. Consider sending SIGTERM first with a brief grace period before escalating to SIGKILL.
Suggested improvement
         if ps -p "$pid" > /dev/null 2>&1; then
           echo "  Killing PID $pid" >&2
-          kill -9 "$pid" 2>/dev/null || true
+          kill -TERM "$pid" 2>/dev/null || true
+          sleep 0.5
+          ps -p "$pid" > /dev/null 2>&1 && kill -9 "$pid" 2>/dev/null || true
         fi
104-110: Fixed sleep may introduce flakiness.

The 2-second sleep on line 107 is intended to avoid catching pre-connect state, but this is timing-dependent. If the exporter connects faster or slower than expected, tests could become flaky. Consider polling for a specific pre-condition instead, or documenting why 2 seconds is sufficient.

205-208: Add safeguards for empty grep results.

If the grep commands fail to find matches (e.g., due to unexpected output format), before_pos or after_pos could be empty, causing the comparison on line 208 to fail with a shell error. Consider adding validation or using || true with a fallback.
Suggested improvement
   # Verify order: BEFORE should appear before AFTER in output
   local before_pos=$(echo "$output" | grep -n "BEFORE_HOOK:" | head -1 | cut -d: -f1)
   local after_pos=$(echo "$output" | grep -n "AFTER_HOOK:" | head -1 | cut -d: -f1)

+  # Ensure both markers were found
+  [ -n "$before_pos" ] && [ -n "$after_pos" ]
   [ "$before_pos" -lt "$after_pos" ]
315-333: Tests lack verification of actual j command effects.

Tests E1 and E2 only verify that the hook scripts complete (via output markers), but don't verify that j power on/off commands actually executed or had the expected effect. Consider adding assertions that verify the power state changed, or at minimum checking for command output in the hook logs.

e2e/dex.values.yaml

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 358-365: The UNAVAILABLE branch breaks the poll loop but doesn't
update the monitor's running state, so set self._running = False when handling
StatusCode.UNAVAILABLE; specifically, inside the elif e.code() ==
StatusCode.UNAVAILABLE block update self._running to False alongside setting
self._connection_lost, trigger and recreate self._any_change_event as shown, and
ensure start() can restart the monitor after this exit.

♻️ Duplicate comments (2)

python/packages/jumpstarter/jumpstarter/client/status_monitor.py (2)
199-204: Handle timeout=0 explicitly.

if timeout: treats 0 as falsy, so timeout=0 waits indefinitely instead of timing out immediately.
🛠️ Proposed fix
-        if timeout:
+        if timeout is not None:
             with anyio.move_on_after(timeout):
                 return await wait_loop()
             return False
-        if timeout:
+        if timeout is not None:
             with anyio.move_on_after(timeout):
                 return await wait_for_first()
             return None
Also applies to: 283-286

416-420: Wake waiters on stop to avoid hangs.

stop() only signals _stop_event, but waiters block on _any_change_event, so they can hang indefinitely.
🛠️ Proposed fix
     async def stop(self):
         """Stop the background polling task."""
         self._running = False
         self._stop_event.set()
+        # Wake waiters so they can observe _running=False
+        self._any_change_event.set()
+        self._any_change_event = Event()

coderabbitai · 2026-01-25T17:39:53Z

python/packages/jumpstarter/jumpstarter/client/status_monitor.py

+                elif e.code() == StatusCode.UNAVAILABLE:
+                    # Connection lost - exporter closed or restarted
+                    logger.info("Connection lost (UNAVAILABLE), signaling waiters")
+                    self._connection_lost = True
+                    # Fire the change event to wake up any waiters
+                    self._any_change_event.set()
+                    self._any_change_event = Event()
+                    break


⚠️ Potential issue | 🟡 Minor

Mark monitor as not running on UNAVAILABLE exit.

The poll loop breaks on UNAVAILABLE but leaves _running=True, preventing restart via start() and misrepresenting state.

🛠️ Proposed fix

elif e.code() == StatusCode.UNAVAILABLE: # Connection lost - exporter closed or restarted logger.info("Connection lost (UNAVAILABLE), signaling waiters") self._connection_lost = True + self._running = False # Fire the change event to wake up any waiters self._any_change_event.set() self._any_change_event = Event() break

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py` around lines 358 - 365, The UNAVAILABLE branch breaks the poll loop but doesn't update the monitor's running state, so set self._running = False when handling StatusCode.UNAVAILABLE; specifically, inside the elif e.code() == StatusCode.UNAVAILABLE block update self._running to False alongside setting self._connection_lost, trigger and recreate self._any_change_event as shown, and ensure start() can restart the monitor after this exit.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/packages/jumpstarter/jumpstarter/config/client.py (1)
162-177: Thread include_status through to the ListExporters request.

The include_status parameter is only used to set a display flag on the response (line 176), but it's never passed to ClientService.ListExporters. If the server requires this flag in the request to include status data, status information won't be fetched. Update ClientService.ListExporters to accept and propagate include_status to the protobuf request.
Suggested update
-exporters_response = await svc.ListExporters(page_size=page_size, page_token=page_token, filter=filter)
+exporters_response = await svc.ListExporters(
+    page_size=page_size,
+    page_token=page_token,
+    filter=filter,
+    include_status=include_status,
+)

…r clarity

…o get blocked

- Add release_lease field to ReportStatusRequest to allow exporters to signal the controller to release the active lease after hooks complete - Add status_version and previous_status fields to GetStatusResponse for tracking status transitions

The make manifests target regenerates RBAC role.yaml using controller-gen, which strips manually-added annotations. This adds a post-processing step using awk to re-add the ArgoCD sync-wave annotation that was added in PR #207 to ensure proper deployment ordering.

Updates the operator install.yaml to include the new ExporterStatus and StatusMessage fields added to the Exporter CRD, along with additional printer columns for better kubectl output.

This ensures the operator's install.yaml stays in sync with CRD changes by automatically running the operator's build-installer target as part of the manifests target. This prevents drift between the controller CRDs and the operator bundle.

… hook

…ield

coderabbitai

Actionable comments posted: 6

🤖 Fix all issues with AI agents

In `@controller/internal/service/controller_service_integration_test.go`:
- Around line 76-86: The Lease objects in the tests (the Lease struct created
and passed to k8sClient.Create in controller_service_integration_test.go) are
missing the required Selector field in LeaseSpec which causes API validation to
fail; update the Lease initializations (the variable lease used with
k8sClient.Create and its LeaseSpec) to include Selector: metav1.LabelSelector{}
or an appropriate label selector (e.g., Selector:
&metav1.LabelSelector{MatchLabels: map[string]string{"app":"test"}}) so the
CRD-required selector is present when Create(ctx, lease) is called.

In `@e2e/tests.bats`:
- Around line 190-196: Two occurrences of the yq load expression still reference
the old path load("e2e/exporter.yaml"); update those commands that match the
pattern go run ... -i ". * load(\"e2e/exporter.yaml\")" to use
load("e2e/exporters/exporter.yaml") instead so all yq invocations point to the
moved exporter.yaml file (ensure the same replacement is applied for both
remaining occurrences).

In `@python/packages/jumpstarter-cli/jumpstarter_cli/run.py`:
- Around line 79-94: The parent currently interprets any non-zero child exit as
a signal and adds 128, which double-encodes real exit codes; change the protocol
so the child (exporter) encodes signals as 128 + signal number and the parent in
run.py should stop adding 128 — i.e., use exporter_exit_code directly when set
(replace any logic that converts non-zero exporter_exit_code into 128+value),
and keep received_signal only for locally-caught signals; update handling around
exporter_exit_code and received_signal (and cancellation via
signal_tg.cancel_scope.cancel()) so the parent returns exporter_exit_code as-is
when present.

In `@python/packages/jumpstarter/jumpstarter/client/core.py`:
- Around line 458-485: Guard creation of LogSource from response.source with a
try/except to handle unknown enum values: when evaluating source =
LogSource(response.source) (and computing is_hook), catch ValueError and set
source to LogSource.SYSTEM (and is_hook = False) so the stream continues instead
of raising; keep the existing routing logic that maps source to logger_name and
logging through source_logger.log(log_level, response.message). Also ensure
severity fallback (severity = response.severity if response.severity else
"INFO") remains unchanged.
- Around line 127-187: The wait_for_lease_ready method currently logs a warning
on timeout but returns silently so callers cannot detect failure; modify
wait_for_lease_ready to raise DriverError (or return False if you prefer an API
change) when the timeout elapses instead of only logging: replace the final
self.logger.warning(...) with raising DriverError(f"Timeout waiting for
beforeLease hook to complete after {timeout:.1f}s and {poll_count} polls")
(ensure DriverError is imported/available) so callers of wait_for_lease_ready
can programmatically handle timeouts.

In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py`:
- Around line 168-191: The stop method currently ignores the caller's
should_unregister value when deferring stop; update stop(self, ...) to persist
should_unregister into an instance field (e.g. set self._unregister =
should_unregister) whenever wait_for_lease_exit is true and you set
self._stop_requested = True so the original preference is preserved, and ensure
the lease-exit path that later invokes stop uses that stored self._unregister
rather than unconditionally passing True; update references to _unregister and
_stop_requested so deferred stops respect the saved preference.

🧹 Nitpick comments (11)

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
147-162: Redundant timeout pattern may cause confusion.

The code uses both anyio.move_on_after(30) and timeout=30.0 in wait_for_any_of. Since both timeouts are identical, the inner wait_for_any_of will return None on timeout before move_on_after can trigger, meaning timeout_scope.cancelled_caught at line 161 will likely never be True.

Consider removing one timeout mechanism:
♻️ Suggested simplification
-                                        with anyio.move_on_after(30) as timeout_scope:  # 30 second timeout
-                                            result = await monitor.wait_for_any_of(
-                                                [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
-                                                timeout=30.0
-                                            )
-                                            if result == ExporterStatus.AVAILABLE:
-                                                logger.info("afterLease hook completed")
-                                            elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED:
-                                                logger.warning("afterLease hook failed")
-                                            elif monitor.connection_lost:
-                                                # Connection lost - exporter closed, hook likely completed
-                                                logger.info("Exporter connection closed (hook completed)")
-                                            else:
-                                                logger.debug("Hook completion not confirmed")
-                                        if timeout_scope.cancelled_caught:
-                                            logger.warning("Timeout waiting for afterLease hook to complete")
+                                        result = await monitor.wait_for_any_of(
+                                            [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED],
+                                            timeout=30.0
+                                        )
+                                        if result == ExporterStatus.AVAILABLE:
+                                            logger.info("afterLease hook completed")
+                                        elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED:
+                                            logger.warning("afterLease hook failed")
+                                        elif monitor.connection_lost:
+                                            logger.info("Exporter connection closed (hook completed)")
+                                        else:
+                                            logger.warning("Timeout waiting for afterLease hook to complete")
e2e/tests-hooks.bats (2)

75-76: Consider caching yq installation for faster test execution.

Using go run github.com/mikefarah/yq/v4@latest fetches from the network on each invocation. For repeated test runs, this adds latency. Consider installing yq once in the test environment setup or using a pre-installed binary.

78-82: Clarify the 3>&- redirection purpose.

The 3>&- closes file descriptor 3 before running the background loop. While this works, a brief comment explaining why this is needed (likely to prevent fd leaks from bats internals) would help maintainability.

python/packages/jumpstarter/jumpstarter/client/core_test.py (1)

16-53: Optional: consolidate gRPC error/status helpers.

These helpers look similar to those in status_monitor_test.py; a shared test util could reduce duplication.
python/packages/jumpstarter/jumpstarter/client/core.py (1)
105-126: Consider allowing ExporterStatus.UNSPECIFIED for backward compatibility.

If GetStatus returns the default/unspecified value, Line 124 currently raises ExporterNotReady, which can block driver calls for exporters that haven’t adopted explicit status reporting yet. Adding UNSPECIFIED keeps behavior more permissive without impacting the ready path.
♻️ Suggested tweak
-        ALLOWED_STATUSES = {
-            ExporterStatus.LEASE_READY,
-            ExporterStatus.BEFORE_LEASE_HOOK,
-            ExporterStatus.AFTER_LEASE_HOOK,
-        }
+        ALLOWED_STATUSES = {
+            ExporterStatus.LEASE_READY,
+            ExporterStatus.BEFORE_LEASE_HOOK,
+            ExporterStatus.AFTER_LEASE_HOOK,
+            ExporterStatus.UNSPECIFIED,
+        }
python/packages/jumpstarter/jumpstarter/common/types.py (1)
16-18: Consider removing the empty TYPE_CHECKING block.

The TYPE_CHECKING guard and its comment suggest async stub type aliases were planned but the block is empty. This is dead code that could be removed for clarity, or the intended type aliases should be added.
🧹 Proposed cleanup
-# Async stub type aliases are only available for type checking (defined in .pyi files)
-if TYPE_CHECKING:
-    pass
-
Also update the import:
-from typing import TYPE_CHECKING, TypeAlias
+from typing import TypeAlias
python/packages/jumpstarter/jumpstarter/exporter/lease_context.py (1)
54-57: Consider using explicit validation instead of assert statements.

assert statements can be disabled when Python runs with optimization flags (-O), which would skip these critical validations in production. Consider raising ValueError explicitly.
♻️ Proposed fix
     def __post_init__(self):
         """Validate that required resources are present."""
-        assert self.before_lease_hook is not None, "LeaseScope requires a before_lease_hook event"
-        assert self.lease_name, "LeaseScope requires a non-empty lease_name"
+        if self.before_lease_hook is None:
+            raise ValueError("LeaseContext requires a before_lease_hook event")
+        if not self.lease_name:
+            raise ValueError("LeaseContext requires a non-empty lease_name")
python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)

621-629: Consider making the 1-second delay configurable or documenting its purpose more prominently.

The delay at Line 624 allows clients to poll and see the AVAILABLE status before the lease is released. While the comment explains this, users might wonder why there's a hardcoded delay. Consider extracting this to a constant or making it configurable.

python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py (1)

426-522: Good coverage of wait_for_any_of behavior.

The tests cover the key scenarios for multi-target waiting. The parallel structure to wait_for_status tests ensures consistent behavior.

Consider adding a test for DEADLINE_EXCEEDED handling in wait_for_any_of for completeness (similar to test_wait_for_status_deadline_exceeded_returns_promptly), though the UNAVAILABLE test provides reasonable coverage of the connection-loss path.
python/packages/jumpstarter/jumpstarter/exporter/session.py (1)
270-280: Consider reducing GetStatus log level.

The GetStatus RPC may be polled frequently by clients (e.g., StatusMonitor polls every 0.1-0.3s). Using logger.info could result in excessive log output.
Consider using DEBUG level
     async def GetStatus(self, request, context):
         """Get the current exporter status with transition tracking."""
-        logger.info("GetStatus() -> %s (version=%d)", self._current_status, self._status_version)
+        logger.debug("GetStatus() -> %s (version=%d)", self._current_status, self._status_version)
         response = jumpstarter_pb2.GetStatusResponse(
python/packages/jumpstarter/jumpstarter/exporter/exporter.py (1)
360-382: Consider using _controller_stub in _unregister_with_controller.

This method manages the channel manually instead of using the _controller_stub context manager. While the explicit channel.close() with CancelScope(shield=True) is correct, using the context manager would be more consistent with the rest of the codebase.
Optional refactor for consistency
     async def _unregister_with_controller(self):
         """Safely unregister from controller with timeout and error handling."""
         if not (self._registered and self._unregister):
             return

         logger.info("Unregistering exporter with controller")
         try:
             with move_on_after(10):  # 10 second timeout
-                channel = await self.channel_factory()
-                try:
-                    controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel)
+                async with self._controller_stub() as controller:
                     await self._report_status(ExporterStatus.OFFLINE, "Exporter shutting down")
                     await controller.Unregister(
                         jumpstarter_pb2.UnregisterRequest(
                             reason="Exporter shutdown",
                         )
                     )
                     logger.info("Controller unregistration completed successfully")
-                finally:
-                    with CancelScope(shield=True):
-                        await channel.close()
         except Exception as e:
             logger.error("Error during controller unregistration: %s", e, exc_info=True)

coderabbitai · 2026-02-05T01:20:49Z

controller/internal/service/controller_service_integration_test.go

+			lease = &jumpstarterdevv1alpha1.Lease{
+				ObjectMeta: metav1.ObjectMeta{
+					Name:      "lease-to-release",
+					Namespace: testNamespace,
+				},
+				Spec: jumpstarterdevv1alpha1.LeaseSpec{
+					ClientRef: corev1.LocalObjectReference{Name: "test-client"},
+					Release:   false,
+				},
+			}
+			Expect(k8sClient.Create(ctx, lease)).To(Succeed())


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the LeaseSpec definition and Selector field fd -e go "lease_types.go" controller/ # Get the actual content of LeaseSpec rg -n "type LeaseSpec struct" --type=go -A20 controller/

Repository: jumpstarter-dev/jumpstarter

Length of output: 2012

🏁 Script executed:

#!/bin/bash # Check for validation markers on the Selector field rg -n "Selector" controller/api/v1alpha1/lease_types.go -B5 -A1 # Check if there are validation rules defined elsewhere for Lease rg -n "kubebuilder:" controller/api/v1alpha1/lease_types.go | head -20

Repository: jumpstarter-dev/jumpstarter

Length of output: 864

🏁 Script executed:

#!/bin/bash # Check if there are CRD definition files fd -e yaml "lease.*crd" controller/ # Check the test file to see if it has any setup that might affect validation head -40 controller/internal/service/controller_service_integration_test.go # Look for any ValidatingWebhook or validation setup rg -n "Validat" controller/ --type=go | head -20 # Check if there's a markers file or +kubebuilder validation rules rg -n "+kubebuilder:validation" controller/api/v1alpha1/lease_types.go

Repository: jumpstarter-dev/jumpstarter

Length of output: 2310

🏁 Script executed:

#!/bin/bash # Search for CRD files more broadly find controller -name "*.yaml" -o -name "*.yml" | grep -i lease # Check if there's a config or manifests directory fd -e yaml controller/config/ | head -20 # Look for the validation webhook or rules setup rg -n "LeaseSpec" controller/api/v1alpha1/lease_types.go -B10 | grep -E "kubebuilder|validation" # Check the entire lease_types.go file for validation markers cat controller/api/v1alpha1/lease_types.go | grep -E "kubebuilder|validation"

Repository: jumpstarter-dev/jumpstarter

Length of output: 1509

🏁 Script executed:

#!/bin/bash # Check the CRD definition for validation rules on Selector cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | grep -A30 "selector:" # Also check for any validation rules in the properties section cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | grep -B5 -A10 '"selector"'

Repository: jumpstarter-dev/jumpstarter

Length of output: 1734

🏁 Script executed:

#!/bin/bash # Get more context from the CRD - check if selector is in the required fields cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | grep -B20 "selector:" | grep -A30 "required:" # Or let's look at the entire spec section to see required fields cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | sed -n '/spec:/,/status:/p' | grep -A100 "properties:" | head -60

Repository: jumpstarter-dev/jumpstarter

Length of output: 2973

🏁 Script executed:

#!/bin/bash # Look for the required fields in the spec section cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | sed -n '/spec:/,/status:/p' | tail -50

Repository: jumpstarter-dev/jumpstarter

Length of output: 2575

Add required Selector field to all Lease objects in tests.

The Selector field is required in LeaseSpec according to the CRD definition (see required: [clientRef, selector] in controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml). All Lease objects created in these tests omit this field, which will cause validation failures when k8sClient.Create(ctx, lease) is called. Set Selector: metav1.LabelSelector{} or a valid label selector for each Lease creation.

🤖 Prompt for AI Agents

In `@controller/internal/service/controller_service_integration_test.go` around lines 76 - 86, The Lease objects in the tests (the Lease struct created and passed to k8sClient.Create in controller_service_integration_test.go) are missing the required Selector field in LeaseSpec which causes API validation to fail; update the Lease initializations (the variable lease used with k8sClient.Create and its LeaseSpec) to include Selector: metav1.LabelSelector{} or an appropriate label selector (e.g., Selector: &metav1.LabelSelector{MatchLabels: map[string]string{"app":"test"}}) so the CRD-required selector is present when Create(ctx, lease) is called.

coderabbitai · 2026-02-05T01:20:49Z

e2e/tests.bats

+  go run github.com/mikefarah/yq/v4@latest -i ". * load(\"e2e/exporters/exporter.yaml\")" \
+    /etc/jumpstarter/exporters/test-exporter-oidc.yaml
+  go run github.com/mikefarah/yq/v4@latest -i ". * load(\"e2e/exporters/exporter.yaml\")" \
    /etc/jumpstarter/exporters/test-exporter-sa.yaml
+  go run github.com/mikefarah/yq/v4@latest -i ". * load(\"e2e/exporters/exporter.yaml\")" \
+    /etc/jumpstarter/exporters/test-exporter-legacy.yaml
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check which exporter.yaml paths exist in the e2e directory echo "=== Checking exporter.yaml locations ===" fd -t f "exporter.yaml" e2e/ echo "" echo "=== References in tests.bats ===" rg -n "exporter\.yaml" e2e/tests.bats

Repository: jumpstarter-dev/jumpstarter

Length of output: 633

Update exporter.yaml path references on lines 126 and 174.

The file e2e/exporter.yaml no longer exists; it was moved to e2e/exporters/exporter.yaml. Lines 190–195 were correctly updated, but lines 126 and 174 still reference the old non-existent path, causing test failures:

Line 126: Change e2e/exporter.yaml to e2e/exporters/exporter.yaml

Line 174: Change e2e/exporter.yaml to e2e/exporters/exporter.yaml

🤖 Prompt for AI Agents

In `@e2e/tests.bats` around lines 190 - 196, Two occurrences of the yq load expression still reference the old path load("e2e/exporter.yaml"); update those commands that match the pattern go run ... -i ". * load(\"e2e/exporter.yaml\")" to use load("e2e/exporters/exporter.yaml") instead so all yq invocations point to the moved exporter.yaml file (ensure the same replacement is applied for both remaining occurrences).

coderabbitai · 2026-02-05T01:20:49Z

python/packages/jumpstarter-cli/jumpstarter_cli/run.py

+                # Check if exporter set an exit code (e.g., from hook failure with on_failure='exit')
+                exporter_exit_code = exporter.exit_code
+
            # Cancel the signal handler after exporter completes
            signal_tg.cancel_scope.cancel()

-        # Return signal number if received, otherwise 0 for immediate restart
-        return received_signal if received_signal else 0
+        # Return exit code in priority order:
+        # 1. Signal number if received (for signal-based termination)
+        # 2. Exporter's exit code if set (for hook failure with on_failure='exit')
+        # 3. 0 for immediate restart (normal exit without signal or explicit exit code)
+        if received_signal:
+            return received_signal
+        elif exporter_exit_code is not None:
+            return exporter_exit_code
+        else:
+            return 0


⚠️ Potential issue | 🟠 Major

Align child/parent exit-code encoding to avoid 128+ skew.

With the new exporter_exit_code, the parent still assumes any non‑zero child exit code is a signal number and adds 128 (Line 131–138). That will mangle real exit codes (e.g., 1 → 129). Encode signals as 128+signal in the child and have the parent return the child exit code directly.

Proposed fix

@@ - # 1. Signal number if received (for signal-based termination) + # 1. Signal-based termination → return standard 128+signal @@ - if received_signal: - return received_signal + if received_signal: + return 128 + received_signal @@ - if child_exit_code == 0: - return None # restart child (unexpected exit/exception) - else: - # Child indicates termination (signal number) - return 128 + child_exit_code # Return standard Unix exit code + if child_exit_code == 0: + return None # restart child (unexpected exit/exception) + else: + # Child already encodes final exit status (128+signal or exporter exit code) + return child_exit_code

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter-cli/jumpstarter_cli/run.py` around lines 79 - 94, The parent currently interprets any non-zero child exit as a signal and adds 128, which double-encodes real exit codes; change the protocol so the child (exporter) encodes signals as 128 + signal number and the parent in run.py should stop adding 128 — i.e., use exporter_exit_code directly when set (replace any logic that converts non-zero exporter_exit_code into 128+value), and keep received_signal only for locally-caught signals; update handling around exporter_exit_code and received_signal (and cancellation via signal_tg.cancel_scope.cancel()) so the parent returns exporter_exit_code as-is when present.

coderabbitai · 2026-02-05T01:20:49Z

python/packages/jumpstarter/jumpstarter/client/core.py

+    async def wait_for_lease_ready(self, timeout: float = 300.0) -> None:
+        """Wait for exporter to report LEASE_READY status.
+
+        This polls GetStatus until the beforeLease hook completes.
+        Should be called after log streaming is started so hook output
+        can be displayed in real-time.
+
+        Prefer using StatusMonitor.wait_for_any_of() for non-blocking status tracking.
+
+        Args:
+            timeout: Maximum time to wait in seconds (default: 5 minutes)
+        """
+        import anyio
+
+        poll_interval = 0.5  # seconds
+        elapsed = 0.0
+        poll_count = 0
+
+        self.logger.debug("Waiting for exporter to be ready...")
+        while elapsed < timeout:
+            poll_count += 1
+            self.logger.debug("[POLL %d] Calling GetStatus (elapsed: %.1fs)...", poll_count, elapsed)
+            try:
+                status = await self.get_status_async()
+                self.logger.debug("[POLL %d] GetStatus returned: %s", poll_count, status)
+            except Exception as e:
+                # Connection error - keep trying
+                self.logger.debug("[POLL %d] Error getting status, will retry: %s", poll_count, e)
+                await anyio.sleep(poll_interval)
+                elapsed += poll_interval
+                continue
+
+            if status is None:
+                # GetStatus not implemented - assume ready for backward compatibility
+                self.logger.debug("[POLL %d] GetStatus not implemented, assuming ready", poll_count)
+                return
+
+            if status == ExporterStatus.LEASE_READY:
+                self.logger.info("Exporter ready, starting shell...")
+                return
+            elif status == ExporterStatus.BEFORE_LEASE_HOOK:
+                # Hook is running - this is expected, keep waiting
+                self.logger.debug("[POLL %d] beforeLease hook is running...", poll_count)
+            elif status == ExporterStatus.BEFORE_LEASE_HOOK_FAILED:
+                # Hook failed - log but continue (exporter may still be usable)
+                self.logger.warning("beforeLease hook failed")
+                return
+            elif status == ExporterStatus.AVAILABLE:
+                # Exporter is available but not yet leased - keep waiting
+                # This can happen if client connects before exporter receives lease assignment
+                self.logger.debug("[POLL %d] Exporter status: AVAILABLE (waiting for lease assignment)", poll_count)
+            else:
+                # Other status - continue waiting
+                self.logger.debug("[POLL %d] Exporter status: %s (waiting...)", poll_count, status)
+
+            self.logger.debug("[POLL %d] Sleeping for %.1fs before next poll...", poll_count, poll_interval)
+            await anyio.sleep(poll_interval)
+            elapsed += poll_interval
+
+        self.logger.warning("Timeout waiting for beforeLease hook to complete (after %d polls)", poll_count)
+


⚠️ Potential issue | 🟡 Minor

Timeout isn’t surfaced to callers.

After the loop, Line 186 only logs a warning and returns, so callers can’t distinguish success from a timeout. Consider raising DriverError (or returning a boolean) when the timeout elapses.

🛠️ Possible fix (raise on timeout)

- self.logger.warning("Timeout waiting for beforeLease hook to complete (after %d polls)", poll_count) + self.logger.warning("Timeout waiting for beforeLease hook to complete (after %d polls)", poll_count) + raise DriverError("Timeout waiting for beforeLease hook to complete")

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter/jumpstarter/client/core.py` around lines 127 - 187, The wait_for_lease_ready method currently logs a warning on timeout but returns silently so callers cannot detect failure; modify wait_for_lease_ready to raise DriverError (or return False if you prefer an API change) when the timeout elapses instead of only logging: replace the final self.logger.warning(...) with raising DriverError(f"Timeout waiting for beforeLease hook to complete after {timeout:.1f}s and {poll_count} polls") (ensure DriverError is imported/available) so callers of wait_for_lease_ready can programmatically handle timeouts.

coderabbitai · 2026-02-05T01:20:49Z

python/packages/jumpstarter/jumpstarter/client/core.py

+                        # Determine log source
+                        if response.HasField("source"):
+                            source = LogSource(response.source)
+                            is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK)
+                        else:
+                            source = LogSource.SYSTEM
+                            is_hook = False
+
+                        # Filter: always show hooks, only show system logs if enabled
+                        if is_hook or show_all_logs:
+                            # Get severity level
+                            severity = response.severity if response.severity else "INFO"
+                            log_level = getattr(logging, severity, logging.INFO)
+
+                            # Route to appropriate logger based on source
+                            if source == LogSource.BEFORE_LEASE_HOOK:
+                                logger_name = "exporter:beforeLease"
+                            elif source == LogSource.AFTER_LEASE_HOOK:
+                                logger_name = "exporter:afterLease"
+                            elif source == LogSource.DRIVER:
+                                logger_name = "exporter:driver"
+                            else:  # SYSTEM
+                                logger_name = "exporter:system"
+
+                            # Log through logger for RichHandler formatting
+                            source_logger = logging.getLogger(logger_name)
+                            source_logger.log(log_level, response.message)
+


⚠️ Potential issue | 🟡 Minor

Guard against unknown LogSource values.

If the server sends a new/unknown enum value, LogSource(response.source) raises ValueError, which can trigger reconnect churn. A small fallback keeps the stream healthy.

🛠️ Suggested fallback

- if response.HasField("source"): - source = LogSource(response.source) - is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK) + if response.HasField("source"): + try: + source = LogSource(response.source) + except ValueError: + self.logger.debug("Unknown LogSource value: %s", response.source) + source = LogSource.SYSTEM + is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK)

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter/jumpstarter/client/core.py` around lines 458 - 485, Guard creation of LogSource from response.source with a try/except to handle unknown enum values: when evaluating source = LogSource(response.source) (and computing is_hook), catch ValueError and set source to LogSource.SYSTEM (and is_hook = False) so the stream continues instead of raising; keep the existing routing logic that maps source to logger_name and logging through source_logger.log(log_level, response.message). Also ensure severity fallback (severity = response.severity if response.severity else "INFO") remains unchanged.

coderabbitai · 2026-02-05T01:20:49Z

python/packages/jumpstarter/jumpstarter/exporter/exporter.py

+    def stop(self, wait_for_lease_exit=False, should_unregister=False, exit_code: int | None = None):
        """Signal the exporter to stop.

        Args:
            wait_for_lease_exit (bool): If True, wait for the current lease to exit before stopping.
            should_unregister (bool): If True, unregister from controller. Otherwise rely on heartbeat.
+            exit_code (int | None): If set, the exporter will exit with this code (non-zero means no restart).
        """
+        # Set exit code if provided
+        if exit_code is not None:
+            self._exit_code = exit_code

        # Stop immediately if not started yet or if immediate stop is requested
        if (not self._started or not wait_for_lease_exit) and self._tg is not None:
-            logger.info("Stopping exporter immediately, unregister from controller=%s", should_unregister)
+            if should_unregister:
+                logger.info("Stopping exporter immediately, unregistering from controller")
+            else:
+                logger.info("Stopping exporter immediately, will not unregister from controller")
            self._unregister = should_unregister
+            # Cancel any ongoing tasks
            self._tg.cancel_scope.cancel()
        elif not self._stop_requested:
            self._stop_requested = True
            logger.info("Exporter marked for stop upon lease exit")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's find the exporter.py file and check its size find . -name "exporter.py" -path "*jumpstarter*" -type f

Repository: jumpstarter-dev/jumpstarter

Length of output: 194

🏁 Script executed:

#!/bin/bash # Get the exact file location and check its size EXPORTER_FILE="python/packages/jumpstarter/jumpstarter/exporter/exporter.py" if [ -f "$EXPORTER_FILE" ]; then wc -l "$EXPORTER_FILE" fi

Repository: jumpstarter-dev/jumpstarter

Length of output: 135

🏁 Script executed:

#!/bin/bash # Read the relevant sections: lines 168-191 (stop method) and around line 788 EXPORTER_FILE="python/packages/jumpstarter/jumpstarter/exporter/exporter.py" echo "=== Lines 168-191 (stop method) ===" sed -n '168,191p' "$EXPORTER_FILE" echo -e "\n=== Lines 780-800 (serve method call to stop) ===" sed -n '780,800p' "$EXPORTER_FILE" echo -e "\n=== Check for _stop_requested and should_unregister references ===" grep -n "_stop_requested\|should_unregister\|_unregister" "$EXPORTER_FILE" | head -30

Repository: jumpstarter-dev/jumpstarter

Length of output: 2962

🏁 Script executed:

#!/bin/bash # Search for all calls to stop() in the exporter module to understand usage patterns EXPORTER_DIR="python/packages/jumpstarter/jumpstarter/exporter/" rg -n '\.stop\(' "$EXPORTER_DIR" -A 1 -B 1

Repository: jumpstarter-dev/jumpstarter

Length of output: 1341

should_unregister preference is not preserved when deferring stop.

When stop(wait_for_lease_exit=True, should_unregister=False) is called, the should_unregister=False preference is discarded. The method only sets _stop_requested=True without storing the should_unregister value. Later, when the lease exits (line 788), stop(should_unregister=True) is called unconditionally, overriding the caller's intent to defer unregistration.

Consider storing should_unregister in a field (similar to how _unregister is used for immediate stops) when deferring the stop, so the original caller preference is honored when the lease eventually exits.

🤖 Prompt for AI Agents

In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py` around lines 168 - 191, The stop method currently ignores the caller's should_unregister value when deferring stop; update stop(self, ...) to persist should_unregister into an instance field (e.g. set self._unregister = should_unregister) whenever wait_for_lease_exit is true and you set self._stop_requested = True so the original preference is preserved, and ensure the lease-exit path that later invokes stop uses that stored self._unregister rather than unconditionally passing True; update references to _unregister and _stop_requested so deferred stops respect the saved preference.

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

python/packages/jumpstarter-cli/jumpstarter_cli/shell.py Show resolved Hide resolved

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

python/packages/jumpstarter/jumpstarter/client/lease.py Show resolved Hide resolved

python/packages/jumpstarter/jumpstarter/client/status_monitor.py Show resolved Hide resolved

coderabbitai bot reviewed Jan 25, 2026

View reviewed changes

e2e/dex.values.yaml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 25, 2026

View reviewed changes

kirkbrauer added 19 commits February 4, 2026 19:59

Add pre/post lease hooks

e4deb3d

Add enums and exporter status reporting

53ad12e

Improve logging infrastructure

03e811b

Fix circular dependency in logging.py

48b2ca1

Update hook behavior to match spec

3daf505

Improve hook error handling

b7476f5

Add strongly-typed Protobuf and gRPC codegen and refactor exporter fo…

cc8f3a4

…r clarity

Improve messaging and typing

90a8eb7

Finish refactoring the Exporter class and improve hooks handling

782ba12

Fix broken tests due to field name change and status not being correct

9ce7ec4

Fix typing issues

6e1d85a

Fix controller registration issue

cae32b4

Add status field to jmp admin get exporter

3f41f0a

Fix lease status race condition causing E2E tests to fail

b654744

Fix additional status update race conditions breaking E2E

106381b

Fix unit test failures

b866886

Fix broken unit tests

c52ac8b

Fix CodeRabbit warnings for previous_leased

1c30485

Fix hooks race condition

3f72703

kirkbrauer added 28 commits February 4, 2026 20:04

Fix exit on hook failure and exit code handling

6eec757

Enable executing j commands within hooks

837394b

Add post-lease hook streaming

163670d

Improve lease handling and eliminate race conditions causing leases t…

fa0dcd5

…o get blocked

Use separate channel for hooks and clients to prevent timeout issues

82fe446

temp: fix setExporterOnlineConditions

44ee5dc

Implement EndSession API for terminating session after lease

30f2611

Add full support for ExporterStatus field and state transitions

ae1bd65

Enable exporters to release leases

a6734f6

Fix exporter/client status polling, deadlocks, and race conditions

ac8eda9

Add EndSession API to keep connection open after lease ends

5afbed4

Fix broken tests and add missing test cases

e638da7

Add additional controller tests and fix flaky test cases

3424aec

Fix Python linting issues

b75afe5

Fix Go integration tests CRD paths

1a888e2

Re-generate operator bundle

27bd46d

Regenerate operator install.yaml with Exporter status fields

3d28d1f

Updates the operator install.yaml to include the new ExporterStatus and StatusMessage fields added to the Exporter CRD, along with additional printer columns for better kubectl output.

Fix controller service integration tests

876568d

Fix exporter not setting AVAILABLE status when there is no afterLease…

4457644

… hook

Fix potential race conditions, spelling issues, CodeRabbit suggestions

11a626d

FIx timeout issues in e2e tests without hooks

719f491

Add hooks e2e tests

5f29bb3

Fix CodeRabbit suggestions and incorrect hooks e2e tests

af12355

Improve status monitoring when no afterLease hook is configured

1c493d8

Fix invalid UUID in hooks e2e tests and exclude dial_timeout client f…

4ae3c64

…ield

kirkbrauer force-pushed the add-hooks branch from dfc1639 to 4ae3c64 Compare February 5, 2026 01:10

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

		description: Lease is the Schema for the exporters API
		properties:

	cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/
	mkdir -p deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/
	cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/

Conversation

kirkbrauer commented Jan 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Environment Variables

Example Configuration

Changes

Python (python/)

Controller (controller/)

Protocol (protocol/)

Related PRs

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Best practices (controller-runtime envtest) for BinaryAssetsDirectory / KUBEBUILDER_ASSETS

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Typical usage (client-go)

Notes

Uh oh!

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

kirkbrauer commented Jan 24, 2026 •

edited by coderabbitai bot

Loading

Python (`python/`)

Controller (`controller/`)

Protocol (`protocol/`)

coderabbitai bot commented Jan 24, 2026 •

edited

Loading

Best practices (controller-runtime `envtest`) for `BinaryAssetsDirectory` / `KUBEBUILDER_ASSETS`