Conversation
📝 WalkthroughWalkthroughThis PR introduces comprehensive exporter lifecycle management through beforeLease/afterLease hooks, exporter status reporting and monitoring, log source tagging, and extended gRPC capabilities. It adds StatusMonitor for non-blocking status polling, HookExecutor for hook execution, LeaseContext for lease-scoped resources, ReportStatus endpoint, EndSession RPC, and corresponding Python/Go client updates across multiple packages. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Exporter as Exporter<br/>(Main)
participant HookExec as HookExecutor
participant Driver
participant ControllerSvc as ControllerService
participant StatusMon as StatusMonitor
Client->>Exporter: request_lease()
Exporter->>ControllerSvc: ReportStatus(BEFORE_LEASE_HOOK)
ControllerSvc->>StatusMon: (status updated)
Exporter->>HookExec: run_before_lease_hook()
HookExec->>Driver: instantiate device
HookExec->>HookExec: execute_hook_process()
HookExec-->>Exporter: (hook complete/failed)
alt Hook Success
Exporter->>Exporter: drivers ready event set
Exporter->>ControllerSvc: ReportStatus(LEASE_READY)
Exporter->>Client: (allow client driver calls)
Client->>Driver: driver operations
else Hook Failure (on_failure=endLease)
Exporter->>Exporter: _request_lease_release()
Exporter->>ControllerSvc: (lease release signaled)
end
Exporter->>HookExec: run_after_lease_hook()
HookExec->>HookExec: execute_hook_process()
HookExec-->>Exporter: (hook complete)
Exporter->>ControllerSvc: ReportStatus(OFFLINE)
ControllerSvc->>StatusMon: (status updated)
sequenceDiagram
participant Client
participant AsyncClient as AsyncDriverClient
participant StatusMon as StatusMonitor
participant ExporterSvc as ExporterService<br/>(gRPC)
Client->>StatusMon: start()
activate StatusMon
StatusMon->>ExporterSvc: GetStatus() [background poll]
ExporterSvc-->>StatusMon: (status response)
StatusMon->>StatusMon: notify on_status_change callbacks
Client->>AsyncClient: wait_for_lease_ready(timeout=300s)
AsyncClient->>StatusMon: wait_for_status(LEASE_READY)
loop Poll Loop (0.3s intervals)
StatusMon->>ExporterSvc: GetStatus()
ExporterSvc-->>StatusMon: BEFORE_LEASE_HOOK
StatusMon->>StatusMon: signal any_change_event
AsyncClient->>StatusMon: check current_status
alt Status != LEASE_READY
AsyncClient->>AsyncClient: continue waiting
else Status == LEASE_READY
AsyncClient-->>Client: ready
end
end
Client->>AsyncClient: call_async(method, args)
AsyncClient->>AsyncClient: check status (LEASE_READY)
AsyncClient->>ExporterSvc: DriverCall()
ExporterSvc-->>AsyncClient: result
StatusMon->>StatusMon: stop polling
deactivate StatusMon
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
python/packages/jumpstarter/jumpstarter/config/client.py (1)
184-196: Addstatusfield to exporter reconstruction to preserve it wheninclude_leases=True.The exporter objects lose their
statusfield when leases are included. The original exporters have astatusfield that should be preserved during reconstruction, especially wheninclude_status=Trueis requested. Update line 187-193 to includestatus=exporter.status.Code snippet
exporter_with_lease = Exporter( namespace=exporter.namespace, name=exporter.name, labels=exporter.labels, online=exporter.online, status=exporter.status, # Add this line lease=lease, )python/packages/jumpstarter/jumpstarter/config/exporter.py (1)
224-246: Remove unuseddevice_factoryparameter fromHookExecutor.The
device_factoryparameter passed toHookExecutorat line 231 is never used.HookExecutoronly executes shell commands via subprocess and accesses the session for logging; it has no need for driver instantiation. Remove the unused parameter from both theHookExecutordataclass definition and the instantiation call.
🤖 Fix all issues with AI agents
In
`@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_clients.yaml`:
- Around line 20-47: Update the CRD description strings so they reference
"Client" consistently instead of "identities" or "Identity": change the top
description "Client is the Schema for the identities API" to a concise "Client
is the Schema for the Client API" (or simply "Client"), and update the
spec.description "ClientSpec defines the desired state of Identity" and
status.description "ClientStatus defines the observed state of Identity" to use
"Client" (e.g., "ClientSpec defines the desired state of Client" and
"ClientStatus defines the observed state of Client") so all descriptive fields
in the CRD (the description at the top and the
spec.description/status.description entries) no longer reference
Identity/identities.
In
`@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_leases.yaml`:
- Around line 30-31: Update the CRD description for the Lease resource: replace
the current "Lease is the Schema for the exporters API" text in the description
property of the Lease CRD with the corrected wording using "leases" (e.g.,
"Leases is the Schema for the exporters API" or "leases is the Schema for the
exporters API") so the description uses the plural form; locate the description
property under the Lease CRD (kind: Lease) in the CRD template and change that
string accordingly.
In `@python/packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py`:
- Line 19: basicConfig assignment line exceeds max line length; split the call
across multiple lines to satisfy Ruff E501. Locate the basicConfig variable
initialization (basicConfig = partial(logging.basicConfig, format="%(message)s
[%(name)s]", handlers=[RichHandler(show_path=False)]) ) and reformat the
arguments onto separate lines (e.g., place format=... and handlers=[...] each on
their own lines and close parens on its own line) so the call to
partial(logging.basicConfig, ...) fits within line-length limits while
preserving the same arguments and use of RichHandler(show_path=False).
In `@python/packages/jumpstarter-cli/jumpstarter_cli/shell.py`:
- Around line 80-89: Remove the unused synchronous helper _run_shell_with_lease:
delete the entire function definition (including its docstring and with-blocks)
since the async variant _run_shell_with_lease_async is now the only one used;
then search the repo for any remaining references to _run_shell_with_lease and
remove or update them, run the test suite/lint to confirm no import or name
errors, and ensure behavior relies solely on _run_shell_with_lease_async.
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 395-401: The async context manager on StatusMonitor is incomplete:
__aenter__ currently just returns self and never starts the background polling;
update __aenter__ to initialize the task group and call self.start() (mirroring
client.status_monitor_async() behavior) so entering the context begins
monitoring, and keep __aexit__ awaiting self.stop(); alternatively remove both
methods if context-manager usage is unsupported—modify the methods on the
StatusMonitor class (__aenter__, __aexit__) to either perform full start/stop
lifecycle (create task group + await self.start() in __aenter__, await
self.stop() in __aexit__) or delete them to avoid silent failures.
In `@python/packages/jumpstarter/jumpstarter/common/enums.py`:
- Around line 26-27: The docstring for the enum value AFTER_LEASE_HOOK (mapped
to common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK) contains a typo; update the
string from "Lease was releaseed, but exporter is executing after lease hook" to
"Lease was released, but exporter is executing after lease hook" so the
documentation reads correctly.
In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py`:
- Around line 541-613: The Listen retry task started with
tg.start_soon(self._retry_stream, "Listen",
self._listen_stream_factory(lease_name), listen_tx) is attached to the outer
task group so it keeps running after the lease ends; move that start call into
the lease-scoped task group (use conn_tg.start_soon(...) before starting
wait_for_lease_end/process_connections) so the Listen task is cancelled with the
lease, and in the enclosing finally block ensure the send stream is closed
(await listen_tx.aclose() or the appropriate close method) to signal listen_rx
termination; reference symbols: _retry_stream,
_listen_stream_factory(lease_name), listen_tx/listen_rx, tg -> conn_tg, and
conn_tg.start_soon.
- Around line 202-321: Change _get_controller_stub to be an async context
manager (e.g., async def _controller_stub(self):) that creates the channel via
self.channel_factory(), yields a
jumpstarter_pb2_grpc.ControllerServiceStub(channel) and always closes the
underlying channel with await channel.close() in a finally block; then update
all callers to use "async with self._controller_stub() as controller:" instead
of awaiting _get_controller_stub(), specifically in _retry_stream (where
controller is acquired for stream_factory), _register_with_controller (Register
call), and _report_status (ReportStatus call) so channels are closed after use
and no resources leak.
In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py`:
- Around line 299-307: wait_for_process currently uses
anyio.to_thread.run_sync(process.wait, abandon_on_cancel=True) which can leave
the child as a zombie if the wait is abandoned; modify wait_for_process to
ensure the subprocess is reaped on cancellation by adding a try/finally: in the
try await the run_sync as before, but in the finally check process.poll() and if
it is None try to terminate/kill (process.terminate()/process.kill()) then call
a non-abandoning run_sync(process.wait) or loop-poll with process.poll() until
exit to fully reap the child; reference the wait_for_process coroutine and the
process variable when making the changes.
- Around line 257-260: The debug call inside read_pty_output exceeds the
120-char line limit; shorten the line by splitting the log message across
multiple strings or moving the format string to a short variable and then
calling logger.debug. For example, break the message into two adjacent string
literals or assign msg = "read_pty_output: heartbeat at %.1fs, iterations=%d"
and call logger.debug(msg, elapsed, read_count) so you keep the same
logger.debug call and the same format arguments (logger, elapsed, read_count)
but comply with the E501 line-length rule.
- Around line 186-189: The test mock for logging_session must support the
context manager used by logging_session.context_log_source in hooks.py; update
the test fixture (e.g., mock_session) so mock_session.context_log_source returns
an object that implements __enter__ and __exit__ (for async usage you can set
mock_session.context_log_source.return_value.__enter__ and __exit__ to AsyncMock
or wrap with contextlib.nullcontext), ensuring the context manager protocol is
honored when hooks.py calls logging_session.context_log_source.
🧹 Nitpick comments (10)
controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_exporteraccesspolicies.yaml (2)
149-152: Add schema validation for duration and priority.Without constraints, malformed durations or negative priorities can be accepted and only fail at runtime. Consider adding format/bounds in the CRD.
♻️ Suggested schema tightening
maximumDuration: type: string + format: duration priority: type: integer + minimum: 0
40-157: Require spec/policies to avoid empty CRs.Right now an empty resource (missing spec or policies) is valid. If that’s not intended, enforce it in the schema to reduce no-op objects.
♻️ Suggested required fields
spec: description: ExporterAccessPolicySpec defines the desired state of ExporterAccessPolicy. properties: exporterSelector: description: |- A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects. properties: matchExpressions: description: matchExpressions is a list of label selector requirements. The requirements are ANDed. items: description: |- A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values. properties: key: description: key is the label key that the selector applies to. type: string operator: description: |- operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist. type: string values: description: |- values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch. items: type: string type: array x-kubernetes-list-type: atomic required: - key - operator type: object type: array x-kubernetes-list-type: atomic matchLabels: additionalProperties: type: string description: |- matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed. type: object type: object x-kubernetes-map-type: atomic policies: items: properties: from: items: properties: clientSelector: description: |- A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all objects. A null label selector matches no objects. properties: matchExpressions: description: matchExpressions is a list of label selector requirements. The requirements are ANDed. items: description: |- A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values. properties: key: description: key is the label key that the selector applies to. type: string operator: description: |- operator represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist. type: string values: description: |- values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch. items: type: string type: array x-kubernetes-list-type: atomic required: - key - operator type: object type: array x-kubernetes-list-type: atomic matchLabels: additionalProperties: type: string description: |- matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed. type: object type: object x-kubernetes-map-type: atomic type: object type: array maximumDuration: type: string priority: type: integer spotAccess: type: boolean type: object type: array + minItems: 1 type: object + required: + - policies status: description: ExporterAccessPolicyStatus defines the observed state of ExporterAccessPolicy. type: object type: object + required: + - specAlso applies to: 162-162
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
157-161: Redundant timeout configuration.Both
move_on_after(300)andwait_for_any_of(..., timeout=300.0)specify 300 seconds. Sincewait_for_any_ofhas its own timeout, the outermove_on_afteris redundant. Consider either removing the outer timeout or differentiating them if different semantics are intended (e.g., outer as a hard cap slightly longer than inner).♻️ Suggested simplification
- with anyio.move_on_after(300) as timeout_scope: # 5 minute timeout - result = await monitor.wait_for_any_of( - [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], - timeout=300.0 - ) + result = await monitor.wait_for_any_of( + [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], + timeout=300.0 + ) + if result is None: + logger.warning("Timeout waiting for afterLease hook to complete")Note: This would require adjusting the subsequent conditional logic (lines 162-172) to handle the timeout case inline rather than via
timeout_scope.cancelled_caught.controller/Makefile (1)
66-68: Make the CRD copy step more robust with mkdir.If the templates directory is missing in a fresh checkout,
cpwill fail.♻️ Suggested tweak
cp deploy/helm/jumpstarter/crds/* deploy/operator/config/crd/bases/ + mkdir -p deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/controller/internal/service/controller_service.go (1)
318-338: Consider logging unexpected status values in the default case.The
protoStatusToStringfunction silently maps unknown proto status values toExporterStatusUnspecified. While this provides a safe fallback, logging at debug/warning level would help diagnose protocol mismatches.♻️ Optional: Add logging for unexpected status values
func protoStatusToString(status pb.ExporterStatus) string { switch status { case pb.ExporterStatus_EXPORTER_STATUS_OFFLINE: return jumpstarterdevv1alpha1.ExporterStatusOffline // ... other cases ... default: + // Log unexpected status values for debugging protocol mismatches + // Consider using a package-level logger if available return jumpstarterdevv1alpha1.ExporterStatusUnspecified } }python/packages/jumpstarter/jumpstarter/common/enums.py (1)
38-45: Consider handling invalid protobuf values gracefully infrom_proto.The current implementation will raise
ValueErrorif an unrecognized integer is passed tofrom_proto. For robustness against protocol version mismatches, consider returningUNSPECIFIEDfor unknown values.♻️ Optional: Handle unknown values gracefully
`@classmethod` def from_proto(cls, value: int) -> "ExporterStatus": """Convert from protobuf integer to enum.""" - return cls(value) + try: + return cls(value) + except ValueError: + return cls.UNSPECIFIEDpython/packages/jumpstarter/jumpstarter/exporter/logging.py (1)
31-39: Consider longest-prefix matching for hierarchical loggers.The current prefix matching returns the first match found during iteration. With hierarchical logger names (e.g.,
driver.power,driver.power.usb), if bothdriveranddriver.powerare mapped, the match depends on iteration order. Consider sorting by key length descending to ensure the most specific (longest) prefix wins.♻️ Suggested improvement for deterministic matching
def get_source_for_record(self, record): """Determine the appropriate log source for a record.""" with self._lock: # Check if this record comes from a logger with a specific source mapping logger_name = record.name - for mapped_logger, source in self._child_handlers.items(): + # Sort by key length descending to match most specific prefix first + for mapped_logger, source in sorted( + self._child_handlers.items(), key=lambda x: len(x[0]), reverse=True + ): if logger_name.startswith(mapped_logger): return source return self.sourcepython/packages/jumpstarter/jumpstarter/client/core.py (1)
440-510: Consider extracting helper functions to reduce complexity.The static analysis correctly flags this method as too complex (C901: 14 > 10). The method handles multiple concerns: reconnection with backoff, log source routing, and filtering. While the current structure is readable, extracting helpers would improve maintainability.
♻️ Suggested structure for reduced complexity
def _route_log_to_logger(self, response, show_all_logs: bool) -> None: """Route a log response to the appropriate logger.""" from jumpstarter.common import LogSource if response.HasField("source"): source = LogSource(response.source) is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK) else: source = LogSource.SYSTEM is_hook = False if not (is_hook or show_all_logs): return severity = response.severity if response.severity else "INFO" log_level = getattr(logging, severity, logging.INFO) logger_names = { LogSource.BEFORE_LEASE_HOOK: "exporter:beforeLease", LogSource.AFTER_LEASE_HOOK: "exporter:afterLease", LogSource.DRIVER: "exporter:driver", } logger_name = logger_names.get(source, "exporter:system") logging.getLogger(logger_name).log(log_level, response.message)Then use this helper in the main loop to reduce nesting.
python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)
496-498: TODO: Controller-side lease ending not yet implemented.The comment indicates that when
on_failure='endLease', there's no mechanism to actually end the lease from the exporter side. This should be tracked.Would you like me to open an issue to track the controller-side lease ending mechanism for
beforeLeasehook failures withon_failure='endLease'?python/packages/jumpstarter/jumpstarter/client/status_monitor.py (1)
197-279: Consider extracting common verification logic.Both
wait_for_statusandwait_for_any_ofshare substantial code for connection recovery verification. This duplication contributes to the complexity flagged by static analysis.♻️ Suggested refactor to reduce duplication
async def _verify_connection_recovery(self) -> ExporterStatus | None: """Verify connection and return current status if recovered, None if still lost.""" if not self._connection_lost: return self._current_status logger.debug("Connection was marked as lost, verifying...") try: response = await self._stub.GetStatus(jumpstarter_pb2.GetStatusRequest()) logger.info("Connection recovered during verification poll") self._connection_lost = False new_status = ExporterStatus.from_proto(response.status) self._current_status = new_status self._status_version = response.status_version return new_status except AioRpcError as e: if e.code() == StatusCode.UNAVAILABLE: logger.debug("Connection still lost (UNAVAILABLE)") else: logger.debug("GetStatus error during verification: %s", e.code()) return None except Exception as e: logger.debug("GetStatus error during verification: %s", e) return NoneThen use this helper in both
wait_for_statusandwait_for_any_of.
| description: Client is the Schema for the identities API | ||
| properties: | ||
| apiVersion: | ||
| description: |- | ||
| APIVersion defines the versioned schema of this representation of an object. | ||
| Servers should convert recognized schemas to the latest internal value, and | ||
| may reject unrecognized values. | ||
| More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | ||
| type: string | ||
| kind: | ||
| description: |- | ||
| Kind is a string value representing the REST resource this object represents. | ||
| Servers may infer this from the endpoint the client submits requests to. | ||
| Cannot be updated. | ||
| In CamelCase. | ||
| More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | ||
| type: string | ||
| metadata: | ||
| type: object | ||
| spec: | ||
| description: ClientSpec defines the desired state of Identity | ||
| properties: | ||
| username: | ||
| type: string | ||
| type: object | ||
| status: | ||
| description: ClientStatus defines the observed state of Identity | ||
| properties: |
There was a problem hiding this comment.
Fix CRD description wording to “Client” (minor).
The descriptions currently reference “identities/Identity,” which looks like a copy‑paste typo.
✏️ Suggested wording fix
- description: Client is the Schema for the identities API
+ description: Client is the Schema for the clients API
@@
- description: ClientSpec defines the desired state of Identity
+ description: ClientSpec defines the desired state of Client
@@
- description: ClientStatus defines the observed state of Identity
+ description: ClientStatus defines the observed state of Client📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| description: Client is the Schema for the identities API | |
| properties: | |
| apiVersion: | |
| description: |- | |
| APIVersion defines the versioned schema of this representation of an object. | |
| Servers should convert recognized schemas to the latest internal value, and | |
| may reject unrecognized values. | |
| More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | |
| type: string | |
| kind: | |
| description: |- | |
| Kind is a string value representing the REST resource this object represents. | |
| Servers may infer this from the endpoint the client submits requests to. | |
| Cannot be updated. | |
| In CamelCase. | |
| More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | |
| type: string | |
| metadata: | |
| type: object | |
| spec: | |
| description: ClientSpec defines the desired state of Identity | |
| properties: | |
| username: | |
| type: string | |
| type: object | |
| status: | |
| description: ClientStatus defines the observed state of Identity | |
| properties: | |
| description: Client is the Schema for the clients API | |
| properties: | |
| apiVersion: | |
| description: |- | |
| APIVersion defines the versioned schema of this representation of an object. | |
| Servers should convert recognized schemas to the latest internal value, and | |
| may reject unrecognized values. | |
| More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources | |
| type: string | |
| kind: | |
| description: |- | |
| Kind is a string value representing the REST resource this object represents. | |
| Servers may infer this from the endpoint the client submits requests to. | |
| Cannot be updated. | |
| In CamelCase. | |
| More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds | |
| type: string | |
| metadata: | |
| type: object | |
| spec: | |
| description: ClientSpec defines the desired state of Client | |
| properties: | |
| username: | |
| type: string | |
| type: object | |
| status: | |
| description: ClientStatus defines the observed state of Client | |
| properties: |
🤖 Prompt for AI Agents
In
`@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_clients.yaml`
around lines 20 - 47, Update the CRD description strings so they reference
"Client" consistently instead of "identities" or "Identity": change the top
description "Client is the Schema for the identities API" to a concise "Client
is the Schema for the Client API" (or simply "Client"), and update the
spec.description "ClientSpec defines the desired state of Identity" and
status.description "ClientStatus defines the observed state of Identity" to use
"Client" (e.g., "ClientSpec defines the desired state of Client" and
"ClientStatus defines the observed state of Client") so all descriptive fields
in the CRD (the description at the top and the
spec.description/status.description entries) no longer reference
Identity/identities.
| description: Lease is the Schema for the exporters API | ||
| properties: |
There was a problem hiding this comment.
Fix CRD description wording to “leases” (minor).
The description says “exporters API” for the Lease CRD.
✏️ Suggested wording fix
- description: Lease is the Schema for the exporters API
+ description: Lease is the Schema for the leases API📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| description: Lease is the Schema for the exporters API | |
| properties: | |
| description: Lease is the Schema for the leases API | |
| properties: |
🤖 Prompt for AI Agents
In
`@controller/deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/jumpstarter.dev_leases.yaml`
around lines 30 - 31, Update the CRD description for the Lease resource: replace
the current "Lease is the Schema for the exporters API" text in the description
property of the Lease CRD with the corrected wording using "leases" (e.g.,
"Leases is the Schema for the exporters API" or "leases is the Schema for the
exporters API") so the description uses the plural form; locate the description
property under the Lease CRD (kind: Lease) in the CRD template and change that
string accordingly.
python/packages/jumpstarter-cli-common/jumpstarter_cli_common/opt.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In
`@python/packages/jumpstarter-driver-power/jumpstarter_driver_power/client_test.py`:
- Around line 8-20: The test_log_stream uses fixed time.sleep(1) which is flaky;
replace these sleeps with a short-polling loop that checks caplog.text until the
expected substring appears or a timeout expires. In test_log_stream (around
client.log_stream(), client.on(), client.off()) implement a small helper or
inline poll using time.monotonic() to loop with a tiny sleep (e.g., 0.01–0.1s)
and fail if timeout reached, then assert "power on" and "power off" only after
the poll succeeds; keep using caplog and the same logger scope so the
client.log_stream(), client.on(), and client.off() calls remain unchanged.
In `@python/packages/jumpstarter/jumpstarter/client/client_test.py`:
- Line 3: Remove the unused import OrderedDict from the imports in
client_test.py; locate the line "from collections import OrderedDict" and delete
it (or remove OrderedDict from the import list) so the lint warning is resolved
and no functionality is changed.
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py`:
- Line 3: The import list in status_monitor_test.py includes an unused symbol
AsyncMock; remove AsyncMock from the import statement (keep MagicMock) so the
module only imports symbols actually used in the tests, ensuring static analysis
no longer flags the unused AsyncMock.
In `@python/packages/jumpstarter/jumpstarter/common/enums_test.py`:
- Line 3: Remove the unused top-level import "import pytest" from enums_test.py;
locate the import statement at the top of the test module and delete it so the
file no longer contains the unused pytest symbol.
🧹 Nitpick comments (4)
python/packages/jumpstarter/jumpstarter/exporter/lease_context_test.py (1)
186-200: Avoid time-based sleep to reduce test flakiness.Line 195 uses a fixed sleep, which can be flaky on slow CI. Consider a handshake event so the test deterministically waits for
wait_for_drivers()to start before setting the hook.♻️ Proposed deterministic rewrite
async def test_wait_for_drivers_blocks_until_set(self) -> None: """Test that wait_for_drivers() blocks until hook event is set.""" import anyio before_hook = Event() ctx = LeaseContext(lease_name="test-lease", before_lease_hook=before_hook) - # Set the event after a short delay - async def set_after_delay(): - await anyio.sleep(0.05) - before_hook.set() + waiter_started = Event() + + async def wait_and_signal(): + waiter_started.set() + await ctx.wait_for_drivers() async with anyio.create_task_group() as tg: - tg.start_soon(set_after_delay) - await ctx.wait_for_drivers() + tg.start_soon(wait_and_signal) + await waiter_started.wait() + before_hook.set() assert ctx.drivers_ready()python/packages/jumpstarter/jumpstarter/client/core_test.py (2)
16-32: Consider extracting shared test utilities.
MockAioRpcErrorandcreate_mock_rpc_errorare duplicated fromstatus_monitor_test.py. Consider extracting these to a shared test utilities module (e.g.,jumpstarter/client/test_utils.pyor aconftest.py).
90-116: Move import to module level.The
anyioimport insidewait_for_hook_statusshould be at the module level for consistency and clarity.Proposed fix
Add to imports at top of file:
import anyioThen update the method:
async def wait_for_hook_status(self, target_status: ExporterStatus, timeout: float = 60.0) -> bool: """Wait for exporter to reach a target status using polling.""" - import anyio - poll_interval = 0.1 # Fast for testingpython/packages/jumpstarter/jumpstarter/client/client_test.py (1)
121-124: Consider adding assertion to fully verify topological order.The comment states children should be built before parents, but the assertion only verifies
grandparentis built afterparent. Adding an assertion thatchildis built beforeparentwould fully validate the topological ordering.Suggested improvement
# Children should be built before parents in topological order - # The order should be child, parent, grandparent (or parent, child, grandparent) + # The order should be child, parent, grandparent assert "grandparent" in build_order assert build_order.index("grandparent") > build_order.index("parent") + assert build_order.index("parent") > build_order.index("child")
| def test_log_stream(caplog): | ||
| """Test that driver logs are properly streamed to the client.""" | ||
| with serve(MockPower()) as client: | ||
| log = MagicMock() | ||
| monkeypatch.setattr(client, "_AsyncDriverClient__log", log) | ||
| with client.log_stream(): | ||
| client.on() | ||
| time.sleep(1) # to ensure log is flushed | ||
| log.assert_called_with(logging.INFO, "power on") | ||
| # Set log level to capture INFO messages from exporter:driver logger | ||
| with caplog.at_level(logging.INFO, logger="exporter:driver"): | ||
| with client.log_stream(): | ||
| client.on() | ||
| time.sleep(1) # to ensure log is flushed | ||
| assert "power on" in caplog.text | ||
|
|
||
| client.off() | ||
| time.sleep(1) | ||
| log.assert_called_with(logging.INFO, "power off") | ||
| client.off() | ||
| time.sleep(1) | ||
| assert "power off" in caplog.text |
There was a problem hiding this comment.
Avoid fixed sleeps; wait for logs with a timeout.
The 1s sleeps can be flaky (slow/fast CI) and unnecessarily slow locally. Consider polling caplog.text with a short timeout to make the test reliable and quicker.
🔧 Suggested fix
def test_log_stream(caplog):
"""Test that driver logs are properly streamed to the client."""
with serve(MockPower()) as client:
# Set log level to capture INFO messages from exporter:driver logger
with caplog.at_level(logging.INFO, logger="exporter:driver"):
with client.log_stream():
client.on()
- time.sleep(1) # to ensure log is flushed
- assert "power on" in caplog.text
+ for _ in range(50):
+ if "power on" in caplog.text:
+ break
+ time.sleep(0.1)
+ assert "power on" in caplog.text
client.off()
- time.sleep(1)
- assert "power off" in caplog.text
+ for _ in range(50):
+ if "power off" in caplog.text:
+ break
+ time.sleep(0.1)
+ assert "power off" in caplog.text🤖 Prompt for AI Agents
In
`@python/packages/jumpstarter-driver-power/jumpstarter_driver_power/client_test.py`
around lines 8 - 20, The test_log_stream uses fixed time.sleep(1) which is
flaky; replace these sleeps with a short-polling loop that checks caplog.text
until the expected substring appears or a timeout expires. In test_log_stream
(around client.log_stream(), client.on(), client.off()) implement a small helper
or inline poll using time.monotonic() to loop with a tiny sleep (e.g.,
0.01–0.1s) and fail if timeout reached, then assert "power on" and "power off"
only after the poll succeeds; keep using caplog and the same logger scope so the
client.log_stream(), client.on(), and client.off() calls remain unchanged.
python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@controller/internal/service/suite_test.go`:
- Around line 56-58: The test harness sets up testEnv as an envtest.Environment
with CRDDirectoryPaths and ErrorIfCRDPathMissing: true, which will abort the
suite if the CRD path is wrong; verify and update the CRDDirectoryPaths value to
the actual CRD directory (or add multiple fallback paths) so envtest can find
the CRDs, e.g., check and correct the path used in the testEnv initialization
(the envtest.Environment struct, the CRDDirectoryPaths slice, and the
ErrorIfCRDPathMissing flag) or set ErrorIfCRDPathMissing to false while you
confirm the correct location.
- Around line 29-31: The test currently ignores errors when creating a
Namespace; change the Namespace creation logic (the call that creates a
corev1.Namespace object, e.g., client.CoreV1().Namespaces().Create or similar in
the test setup) to explicitly handle errors: import
k8s.io/apimachinery/pkg/api/errors as apierrors, then after the Create call
check if err != nil and if apierrors.IsAlreadyExists(err) continue/ignore,
otherwise fail the test (t.Fatalf or return the error). Apply the same pattern
for the other namespace-creation site referenced in the comment (the second
block around lines 81-87) so only AlreadyExists is tolerated and all other
errors cause the test to fail.
- Around line 19-24: Update the test setup in suite_test.go to avoid hard-coding
BinaryAssetsDirectory: first respect the KUBEBUILDER_ASSETS environment variable
if set (use it to set BinaryAssetsDirectory), then fall back to existing helper
GetFirstFoundEnvTestBinaryDir() or a directory-existence check for the current
hard-coded path (bin/k8s/1.30.0-<os>-<arch>), and only assign the hard-coded
path as a last resort; modify the code that sets BinaryAssetsDirectory to
perform conditional checks (os.Getenv("KUBEBUILDER_ASSETS"), os.Stat on
candidate dirs, and GetFirstFoundEnvTestBinaryDir()) so tests run correctly
outside the Makefile and when the version/path differs.
In `@python/packages/jumpstarter/jumpstarter/client/client_test.py`:
- Around line 120-124: The test's topological-order check only ensures
"grandparent" comes after "parent" but doesn't verify that "child" precedes
"parent"; update the assertions to explicitly assert that
build_order.index("child") < build_order.index("parent") (and keep/optionally
assert both "child" and "parent" are before "grandparent") so the test enforces
children-before-parents ordering; reference the existing build_order list and
the node names "child", "parent", and "grandparent" when adding the
assertion(s).
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 31-38: The docstring usage example in status_monitor.py
incorrectly calls client.start_status_monitor(); update the example to call the
actual method client.status_monitor_async() (or add a thin alias
start_status_monitor -> status_monitor_async in the client if you prefer) so the
docstring matches the real API; ensure the rest of the example still references
monitor.current_status and monitor.wait_for_status(ExporterStatus.LEASE_READY).
In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py`:
- Around line 426-500: The before-lease failure branch doesn't actually end the
lease for on_failure='endLease'; modify run_before_lease_hook to accept an async
lease-release callback (e.g., end_lease: Callable[[LeaseContext],
Awaitable[None]]) or reuse an existing controller-release method, and in the
HookExecutionError else-branch (the on_failure='endLease' path) invoke await
end_lease(lease_scope) (and still set lease_scope.before_lease_hook
event/unblock connections and report status) so the lease is cleanly terminated;
update all call sites of run_before_lease_hook to pass the new end_lease
callback (mirroring how run_after_lease_hook is wired).
♻️ Duplicate comments (2)
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
80-90: Unused synchronous function.This function is no longer called anywhere in the codebase since line 213 now uses
_run_shell_with_lease_async. Consider removing it to reduce maintenance burden.python/packages/jumpstarter/jumpstarter/client/status_monitor.py (1)
397-403: Async context manager won't start monitoring.The
__aenter__method returnsselfwithout callingstart(). Direct usage as an async context manager (async with StatusMonitor(...) as monitor:) will silently fail to start background polling. Users must useclient.status_monitor_async()instead. Either fix__aenter__to properly start monitoring or remove the context manager methods to prevent misuse.
🧹 Nitpick comments (9)
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (2)
157-161: Redundant double timeout.Both
anyio.move_on_after(300)andtimeout=300.0inwait_for_any_ofset 5-minute timeouts. The innertimeoutparameter makes the outer cancel scope redundant. Consider removing one:♻️ Proposed fix
- with anyio.move_on_after(300) as timeout_scope: # 5 minute timeout - result = await monitor.wait_for_any_of( - [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], - timeout=300.0 - ) + result = await monitor.wait_for_any_of( + [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], + timeout=300.0 + ) + if result is None: + logger.warning("Timeout waiting for afterLease hook to complete")Then adjust the subsequent conditional logic accordingly, removing the
timeout_scope.cancelled_caughtcheck at line 171.
133-136: Consider checkingconnection_lostfor beforeLease hook.The afterLease handling (line 166) checks
monitor.connection_lostto distinguish between timeout and connection loss. The beforeLease handling treats both cases as "Timeout" which could be misleading in logs.♻️ Proposed enhancement
if result == ExporterStatus.BEFORE_LEASE_HOOK_FAILED: logger.warning("beforeLease hook failed") elif result is None: - logger.warning("Timeout waiting for beforeLease hook") + if monitor.connection_lost: + logger.warning("Connection lost during beforeLease hook") + else: + logger.warning("Timeout waiting for beforeLease hook")python/packages/jumpstarter/jumpstarter/client/client_test.py (1)
58-58: EnsureExitStackis closed after each test.
ExitStack()is never closed, which can leak cleanup callbacks ifclient_from_channelregisters any. Preferwith ExitStack() as mock_stack:to guarantee teardown.♻️ Example pattern (apply to each test)
- mock_stack = ExitStack() + with ExitStack() as mock_stack: with patch( "jumpstarter.client.client.MultipathExporterStub", return_value=mock_stub ), patch( "jumpstarter.client.client.import_class", return_value=MockDriverClient ): from jumpstarter.client.client import client_from_channel client = await client_from_channel( mock_channel, mock_portal, mock_stack, allow=[], unsafe=True )Also applies to: 98-98, 143-143, 177-177, 211-211, 245-245
python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py (1)
16-32: Consider extracting shared mock utilities to a common test module.
MockAioRpcErrorandcreate_mock_rpc_errorare duplicated here and incore_test.py(per the relevant snippets). Extracting these to a shared test utilities module (e.g.,conftest.pyor a dedicatedtesting.py) would reduce duplication.python/packages/jumpstarter/jumpstarter/client/core.py (1)
332-351: Asymmetric behavior when monitor is not started.
wait_for_lease_ready_monitoredraisesRuntimeErrorif the monitor isn't started, whilewait_for_hook_complete_monitoredsilently returnsTrue. This asymmetry could be surprising. Consider documenting this behavior difference more explicitly or using consistent error handling.python/packages/jumpstarter/jumpstarter/client/status_monitor.py (1)
309-309: Use%-style formatting in logger calls instead of f-strings.F-strings in logging calls evaluate the string even when the log level is disabled, causing unnecessary overhead. Use
%-style formatting for deferred evaluation.Proposed fix
- logger.warning(f"Missed {missed} status transition(s)") + logger.warning("Missed %d status transition(s)", missed)- logger.info(f"Status changed: {old_status} -> {new_status} (version={new_version})") + logger.info("Status changed: %s -> %s (version=%d)", old_status, new_status, new_version)- logger.error(f"Status change callback error: {e}") + logger.error("Status change callback error: %s", e)- logger.debug(f"GetStatus poll error: {e.code()}") + logger.debug("GetStatus poll error: %s", e.code()) - logger.debug(f"GetStatus poll error: {e}") + logger.debug("GetStatus poll error: %s", e)Also applies to: 320-320, 337-337, 352-354
controller/internal/service/suite_test.go (1)
90-93: Guard teardown if setup failed.If
testEnvnever initialized,Stop()can panic inAfterSuite. A simple nil check avoids cascading failures.🛡️ Suggested guard
By("tearing down the test environment") - err := testEnv.Stop() - Expect(err).NotTo(HaveOccurred()) + if testEnv != nil { + err := testEnv.Stop() + Expect(err).NotTo(HaveOccurred()) + }controller/Makefile (1)
66-75: Guard against duplicatemetadata.annotationsblocks.Line 66–71 unconditionally inserts a new
annotations:map. If the generated role ever includes annotations (now or later), this can create duplicate keys and potentially drop the sync-wave or other annotations depending on YAML parsing behavior. Consider merging into an existing annotations map or using a YAML-aware tool for the insert.python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)
341-356: Useanyio.to_thread.run_sync()with timeout for blockingprocess.wait()afterprocess.kill().The direct
process.wait()call at line 354 blocks the event loop if the process doesn't exit promptly. The same block already demonstrates the correct pattern at line 346–347 withanyio.to_thread.run_sync()andanyio.move_on_after(). Apply the same pattern here for consistency and robustness:Safer non‑blocking wait
if process.poll() is None: process.kill() try: - process.wait() + with anyio.move_on_after(5): + await anyio.to_thread.run_sync(process.wait) except Exception: pass
| import ( | ||
| "context" | ||
| "fmt" | ||
| "path/filepath" | ||
| "runtime" | ||
| "testing" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the suite_test.go file to see the current implementation
find . -name "suite_test.go" -path "*/controller/internal/service/*"Repository: jumpstarter-dev/jumpstarter
Length of output: 114
🏁 Script executed:
# Search for the file more broadly
find . -name "suite_test.go" | grep -E "(controller|service)"Repository: jumpstarter-dev/jumpstarter
Length of output: 321
🏁 Script executed:
# List files in the controller directory
git ls-files | grep -E "controller.*service.*test" | head -20Repository: jumpstarter-dev/jumpstarter
Length of output: 234
🏁 Script executed:
cat -n ./controller/internal/service/suite_test.goRepository: jumpstarter-dev/jumpstarter
Length of output: 3574
🌐 Web query:
controller-runtime envtest BinaryAssetsDirectory KUBEBUILDER_ASSETS best practices
💡 Result:
Best practices (controller-runtime envtest) for BinaryAssetsDirectory / KUBEBUILDER_ASSETS
-
Prefer driving asset selection from your build/test harness (Makefile/CI) via
KUBEBUILDER_ASSETS, not hardcoding paths in Go.
Kubebuilder’s recommended pattern is to havesetup-envtestpick the right binaries and export the directory for the test run, e.g.KUBEBUILDER_ASSETS="$(setup-envtest use $(ENVTEST_K8S_VERSION) ... -p path)" go test ./.... This keeps tests reproducible across dev machines and CI. [2] -
Use
BinaryAssetsDirectoryonly when you explicitly want to control/share the on-disk cache location from Go code.
Controller-runtime providesSetupEnvtestDefaultBinaryAssetsDirectory()specifically so you can pointEnvironment.BinaryAssetsDirectoryat the same OS-specific directory thatsetup-envtestuses—so runs can share binaries—without hardcoding OS paths. [1] -
Avoid relying on the historical default
/usr/local/kubebuilder/bin.
Envtest loads binaries from/usr/local/kubebuilder/binby default, but best practice is to override viaKUBEBUILDER_ASSETS(or explicitTEST_ASSET_*vars) for portability and hermetic CI. [1] -
For offline/air-gapped or “no network during tests”, use installed-only mode.
Usesetup-envtest ... -i(orENVTEST_INSTALLED_ONLY=true) so it won’t try to download binaries duringgo test. If you already manage binaries elsewhere,use --use-env -i(orENVTEST_USE_ENV=true) will respect an externally-providedKUBEBUILDER_ASSETS. [2][4] -
Ensure you’re using the current envtest artifact index/location (don’t depend on the old GCS bucket).
The oldstorage.googleapis.com/kubebuilder-toolsartifacts are deprecated; newer versions are promoted viacontroller-tools’envtest-releases.yaml, and you needsetup-envtestfrom controller-runtimev0.19.0+to transparently use the new location. [3]
🏁 Script executed:
# Check if other test suites in the repo use conditional BinaryAssetsDirectory
rg "BinaryAssetsDirectory" -A 3 -B 1 controller/Repository: jumpstarter-dev/jumpstarter
Length of output: 3875
🏁 Script executed:
# Check if KUBEBUILDER_ASSETS is used anywhere in the codebase
rg "KUBEBUILDER_ASSETS" .Repository: jumpstarter-dev/jumpstarter
Length of output: 613
🏁 Script executed:
# Examine the conditional approach used in other test files
cat controller/deploy/operator/internal/controller/testutils/envtest.go | head -50Repository: jumpstarter-dev/jumpstarter
Length of output: 1885
Use a conditional check for BinaryAssetsDirectory instead of hard-coding a fixed path.
The current code unconditionally sets BinaryAssetsDirectory to bin/k8s/1.30.0-<os>-<arch>, which will fail when:
- Running tests via IDE or direct
go testinvocation (outside the Makefile that setsKUBEBUILDER_ASSETS) - Binaries haven't been downloaded to that specific path
- The Kubernetes version is upgraded
Other test suites in the codebase already use conditional logic (e.g., controller/deploy/operator/ uses GetFirstFoundEnvTestBinaryDir()). Consider making this check conditional on directory existence, or respecting the KUBEBUILDER_ASSETS environment variable as the primary source.
🔧 Suggested change (conditional BinaryAssetsDirectory)
import (
"context"
"fmt"
+ "os"
"path/filepath"
"runtime"
"testing"
@@
By("bootstrapping test environment")
testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{filepath.Join("..", "..", "deploy", "helm", "jumpstarter", "crds")},
ErrorIfCRDPathMissing: true,
-
- // The BinaryAssetsDirectory is only required if you want to run the tests directly
- // without calling the makefile target test. If not informed it will look for the
- // default path defined in controller-runtime which is /usr/local/kubebuilder/.
- // Note that you must have the required binaries setup under the bin directory to perform
- // the tests directly. When we run make test it will be setup and used automatically.
- BinaryAssetsDirectory: filepath.Join("..", "..", "bin", "k8s",
- fmt.Sprintf("1.30.0-%s-%s", runtime.GOOS, runtime.GOARCH)),
}
+
+ // Only pin envtest binaries if the expected directory exists
+ binPath := filepath.Join("..", "..", "bin", "k8s",
+ fmt.Sprintf("1.30.0-%s-%s", runtime.GOOS, runtime.GOARCH))
+ if _, err := os.Stat(binPath); err == nil {
+ testEnv.BinaryAssetsDirectory = binPath
+ }🤖 Prompt for AI Agents
In `@controller/internal/service/suite_test.go` around lines 19 - 24, Update the
test setup in suite_test.go to avoid hard-coding BinaryAssetsDirectory: first
respect the KUBEBUILDER_ASSETS environment variable if set (use it to set
BinaryAssetsDirectory), then fall back to existing helper
GetFirstFoundEnvTestBinaryDir() or a directory-existence check for the current
hard-coded path (bin/k8s/1.30.0-<os>-<arch>), and only assign the hard-coded
path as a last resort; modify the code that sets BinaryAssetsDirectory to
perform conditional checks (os.Getenv("KUBEBUILDER_ASSETS"), os.Stat on
candidate dirs, and GetFirstFoundEnvTestBinaryDir()) so tests run correctly
outside the Makefile and when the version/path differs.
| corev1 "k8s.io/api/core/v1" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/client-go/kubernetes/scheme" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find the suite_test.go file and read it
find . -name "suite_test.go" -path "*/controller/internal/service/*" | head -5Repository: jumpstarter-dev/jumpstarter
Length of output: 114
🏁 Script executed:
# Read the suite_test.go file to see the code at the mentioned lines
cat -n ./controller/internal/service/suite_test.go | head -100Repository: jumpstarter-dev/jumpstarter
Length of output: 3574
🏁 Script executed:
# Verify apierrors package exists and IsAlreadyExists is the correct function
web_search "k8s.io/apimachinery/pkg/api/errors IsAlreadyExists golang"Repository: jumpstarter-dev/jumpstarter
Length of output: 121
🌐 Web query:
k8s.io/apimachinery pkg api errors IsAlreadyExists usage golang kubernetes
💡 Result:
In Go, k8s.io/apimachinery/pkg/api/errors.IsAlreadyExists(err) is the standard way to check whether a Kubernetes API call failed because the object you tried to create already exists. It returns true for “already exists” API StatusErrors (and also supports wrapped errors). [1]
Typical usage (client-go)
import (
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
corev1 "k8s.io/api/core/v1"
)
cm := &corev1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: "demo",
Namespace: "default",
},
Data: map[string]string{"k": "v"},
}
_, err := clientset.CoreV1().ConfigMaps("default").Create(ctx, cm, metav1.CreateOptions{})
if err != nil {
if apierrors.IsAlreadyExists(err) {
// idempotent create: ignore or fetch/update instead
return nil
}
return err
}
return nilNotes
- Use it after
Create(...)(or any call that can return an “already exists” response). [1] - The function signature is
func IsAlreadyExists(err error) bool. [1]
Sources:
[1] pkg.go.dev: k8s.io/apimachinery/pkg/api/errors (IsAlreadyExists) (pkg.go.dev)
Citations:
Don't ignore namespace creation errors.
Ignoring the error can mask real failures (RBAC, invalid config, etc.). Handle AlreadyExists explicitly and fail on other errors.
✅ Suggested fix
import (
corev1 "k8s.io/api/core/v1"
+ apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes/scheme" ns := &corev1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Name: "default",
},
}
- _ = k8sClient.Create(ctx, ns)
+ if err := k8sClient.Create(ctx, ns); err != nil && !apierrors.IsAlreadyExists(err) {
+ Expect(err).NotTo(HaveOccurred())
+ }Also applies to: 81-87
🤖 Prompt for AI Agents
In `@controller/internal/service/suite_test.go` around lines 29 - 31, The test
currently ignores errors when creating a Namespace; change the Namespace
creation logic (the call that creates a corev1.Namespace object, e.g.,
client.CoreV1().Namespaces().Create or similar in the test setup) to explicitly
handle errors: import k8s.io/apimachinery/pkg/api/errors as apierrors, then
after the Create call check if err != nil and if apierrors.IsAlreadyExists(err)
continue/ignore, otherwise fail the test (t.Fatalf or return the error). Apply
the same pattern for the other namespace-creation site referenced in the comment
(the second block around lines 81-87) so only AlreadyExists is tolerated and all
other errors cause the test to fail.
| # Children should be built before parents in topological order | ||
| # The order should be child, parent, grandparent (or parent, child, grandparent) | ||
| assert "grandparent" in build_order | ||
| assert build_order.index("grandparent") > build_order.index("parent") | ||
|
|
There was a problem hiding this comment.
Strengthen the topological-order assertion.
The test claims “children before parents” but only checks that grandparent is after parent. It never asserts child is before parent, and the comment even allows the opposite order.
✅ Suggested assertion update
- # The order should be child, parent, grandparent (or parent, child, grandparent)
- assert "grandparent" in build_order
- assert build_order.index("grandparent") > build_order.index("parent")
+ # The order should be child -> parent -> grandparent
+ assert "grandparent" in build_order
+ assert build_order.index("parent") > build_order.index("child")
+ assert build_order.index("grandparent") > build_order.index("parent")🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/client_test.py` around lines
120 - 124, The test's topological-order check only ensures "grandparent" comes
after "parent" but doesn't verify that "child" precedes "parent"; update the
assertions to explicitly assert that build_order.index("child") <
build_order.index("parent") (and keep/optionally assert both "child" and
"parent" are before "grandparent") so the test enforces children-before-parents
ordering; reference the existing build_order list and the node names "child",
"parent", and "grandparent" when adding the assertion(s).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@controller/Makefile`:
- Line 74: The Makefile currently runs the cp command copying CRDs to
deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/, which
will fail if that target directory doesn't exist; update the Makefile to create
the directory first by running a mkdir -p for
deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/
immediately before the cp command so the cp in the Makefile succeeds even when
the templates/crds directory is missing.
🧹 Nitpick comments (1)
controller/Makefile (1)
66-71: Idempotency risk: duplicateannotations:keys if file already contains annotations.The AWK script unconditionally inserts an
annotations:block after thename:line. If controller-gen ever outputs annotations in the generated role.yaml, or if the manifests target runs without a fresh regeneration, this will produce invalid YAML with duplicateannotations:keys.Consider using a more robust approach that checks for existing annotations before inserting:
♻️ Suggested safer AWK approach
# Add ArgoCD sync-wave annotation to RBAC role for proper deployment ordering (PR `#207`) - `@awk` '/^ name: jumpstarter-manager-role$$/{print; print " annotations:"; print " argocd.argoproj.io/sync-wave: \"-1\""; next}1' \ + `@awk` ' \ + /^ name: jumpstarter-manager-role$$/ { print; found_name=1; next } \ + found_name && /^ annotations:/ { print; found_annotations=1; next } \ + found_name && found_annotations && /^ [a-zA-Z]/ && !/^ / { \ + print " argocd.argoproj.io/sync-wave: \"-1\""; found_name=0; found_annotations=0 \ + } \ + found_name && !found_annotations && /^ [a-zA-Z]/ { \ + print " annotations:"; print " argocd.argoproj.io/sync-wave: \"-1\""; found_name=0 \ + } \ + { print }' \ deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml > \ deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml.tmp && \ mv deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml.tmp \ deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yamlAlternatively, consider using a tool like
yqfor safer YAML manipulation.Also, the comment references "PR
#207" but this appears to be PR#140.
| deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/rbac/role.yaml | ||
|
|
||
| cp deploy/helm/jumpstarter/crds/* deploy/operator/config/crd/bases/ | ||
| cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the target directory exists in the repository
find deploy/helm/jumpstarter/charts/jumpstarter-controller/templates -type d -name "crds" 2>/dev/null
# Check if it's tracked in git at all
git ls-tree -r HEAD deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds 2>/dev/null || echo "No files found in git"
# List what's in templates directory
echo "=== Contents of templates directory ==="
ls -la deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/ 2>/dev/null || echo "Directory does not exist"
# Check the Makefile around line 74 to see context
echo "=== Makefile context around line 74 ==="
sed -n '70,80p' controller/MakefileRepository: jumpstarter-dev/jumpstarter
Length of output: 764
🏁 Script executed:
#!/bin/bash
# Check if deploy/helm/jumpstarter/charts directory exists
echo "=== Checking helm chart directory structure ==="
ls -la deploy/helm/jumpstarter/charts/ 2>/dev/null || echo "charts directory does not exist"
# Search Makefile for any mkdir commands
echo "=== Searching for mkdir in Makefile ==="
grep -n "mkdir" controller/Makefile || echo "No mkdir found"
# Check if there's a target that creates the templates directory
echo "=== Search for directory creation patterns ==="
grep -n "templates" controller/Makefile
# Check the generate-crds or similar target
echo "=== Full Makefile targets related to CRD or helm ==="
grep -n "^[a-zA-Z].*:" controller/Makefile | grep -E "(crd|helm|generate)"Repository: jumpstarter-dev/jumpstarter
Length of output: 1423
🏁 Script executed:
#!/bin/bash
# Get the full target definition that contains line 74
echo "=== Full target containing the cp command ==="
sed -n '60,80p' controller/Makefile
# Check if there's an earlier step that might create the directory
echo "=== Check for any manifests or controller-gen target ==="
sed -n '1,100p' controller/Makefile | grep -A 20 "^manifests:"
# Look for the controller-gen command
echo "=== Look for controller-gen configuration ==="
grep -B 5 -A 10 "controller-gen" controller/Makefile | head -40Repository: jumpstarter-dev/jumpstarter
Length of output: 4959
🏁 Script executed:
#!/bin/bash
# Check if there are any .gitkeep or placeholder files in the charts/templates directories
echo "=== Check for placeholder files ==="
find deploy/helm/jumpstarter/charts -type f 2>/dev/null | head -20
# Check the full directory structure of deploy/helm
echo "=== Full helm structure ==="
find deploy/helm -type d 2>/dev/null | sort
# Check if crds directory exists at all
echo "=== Search for crds directories ==="
find deploy -type d -name "crds" 2>/dev/nullRepository: jumpstarter-dev/jumpstarter
Length of output: 170
🏁 Script executed:
#!/bin/bash
# Check what exists in deploy directory
echo "=== Full deploy directory structure ==="
find deploy -type d 2>/dev/null | head -30
# Check if operator directory structure exists
echo "=== Check operator crd/bases directory ==="
ls -la deploy/operator/config/crd/bases/ 2>/dev/null || echo "Directory does not exist"
# Check if there are any crds in the source
echo "=== Check source crds ==="
ls -la deploy/helm/jumpstarter/crds/ 2>/dev/null || echo "CRDs source directory does not exist"Repository: jumpstarter-dev/jumpstarter
Length of output: 241
🏁 Script executed:
#!/bin/bash
# List everything in deploy directory
echo "=== All files in deploy/ recursively ==="
find deploy -type f 2>/dev/null | head -50
# Check if deploy directory itself exists
echo "=== Deploy directory contents ==="
ls -la deploy/ 2>/dev/null
# Count total files in entire repo
echo "=== Repository file count and structure ==="
find . -type f -not -path './.git/*' | wc -l
echo "Total directories:"
find . -type d -not -path './.git/*' | wc -lRepository: jumpstarter-dev/jumpstarter
Length of output: 216
🏁 Script executed:
#!/bin/bash
# Check git status of deploy directory
echo "=== Check if deploy is tracked in git ==="
git ls-tree -d HEAD deploy 2>/dev/null || echo "deploy directory not in git"
# List top-level directories
echo "=== Top-level directories ==="
ls -la | grep '^d'
# Check if there's a .gitkeep or similar in deploy
echo "=== Hidden files in deploy ==="
ls -la deploy/ 2>/dev/null
# Check recent commits to understand if deploy is supposed to exist
echo "=== Check if deploy paths were ever added ==="
git log --all --full-history --oneline -- "deploy/helm/*" "deploy/operator/*" 2>/dev/null | head -5Repository: jumpstarter-dev/jumpstarter
Length of output: 676
🏁 Script executed:
#!/bin/bash
# Check if deploy directory exists at all
echo "=== Does deploy directory exist? ==="
test -d deploy && echo "Yes" || echo "No"
# List actual path with stat
echo "=== Using stat on deploy ==="
stat deploy 2>&1 || echo "deploy does not exist"
# Check the controller Makefile more carefully for context
echo "=== Check lines around the change (60-80) ==="
sed -n '60,80p' controller/Makefile
# Check if there's documentation about running manifests
echo "=== Check README or setup docs ==="
ls -la *.md controller/*.md 2>/dev/null | head -10Repository: jumpstarter-dev/jumpstarter
Length of output: 1817
🏁 Script executed:
#!/bin/bash
# Check what controller-gen does - look at how it's invoked and if it creates directories
echo "=== Check controller version and documentation ==="
grep -r "controller-gen" controller/ | grep -v ".o\|Binary" | head -5
# Check if there's documentation about directory creation
echo "=== Check if there are any docs about the build process ==="
cat README.md | head -50
# Check controller/README.md
echo "=== Check controller README ==="
cat controller/README.mdRepository: jumpstarter-dev/jumpstarter
Length of output: 5972
Add directory creation before copying CRDs to the Helm chart templates.
The cp command on line 74 will fail because deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ doesn't exist. While controller-gen creates the output directories it writes to, it doesn't create the templates/crds/ subdirectory. Add mkdir -p before the copy.
♻️ Suggested fix
cp deploy/helm/jumpstarter/crds/* deploy/operator/config/crd/bases/
+ mkdir -p deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/
cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ | |
| mkdir -p deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ | |
| cp deploy/helm/jumpstarter/crds/* deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/ |
🤖 Prompt for AI Agents
In `@controller/Makefile` at line 74, The Makefile currently runs the cp command
copying CRDs to
deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/, which
will fail if that target directory doesn't exist; update the Makefile to create
the directory first by running a mkdir -p for
deploy/helm/jumpstarter/charts/jumpstarter-controller/templates/crds/
immediately before the cp command so the cp in the Makefile succeeds even when
the templates/crds directory is missing.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@python/packages/jumpstarter-cli/jumpstarter_cli/shell.py`:
- Around line 125-143: The code currently launches the shell unconditionally
after monitor.wait_for_any_of; change it to treat
ExporterStatus.BEFORE_LEASE_HOOK_FAILED and a None (timeout) as hard stops by
not calling _run_shell_only. After the wait_for_any_of call
(monitor.wait_for_any_of), check result and if it equals
ExporterStatus.BEFORE_LEASE_HOOK_FAILED or is None, log the warning and exit the
function (or raise an appropriate exception) instead of proceeding to call
anyio.to_thread.run_sync(_run_shell_only, lease, config, command, path); only
call _run_shell_only when result == ExporterStatus.LEASE_READY.
♻️ Duplicate comments (5)
python/packages/jumpstarter/jumpstarter/exporter/exporter.py (4)
202-204: Controller channels are not closed after use.This creates a new gRPC channel per call but never closes it. Multiple calls during status updates and stream retries will leak channels. The pattern in
_unregister_with_controller()(lines 357-369) demonstrates proper cleanup with explicitawait channel.close().
414-458: Task may hang when lease ends without EndSession.This task only waits on
end_session_requested(line 426). If the lease ends without the client calling EndSession, the task continues waiting indefinitely on the old lease context. Consider racing againstlease_endedor using a task group scoped to the lease lifetime.
546-553: Listen task should be scoped to the lease lifetime.The
_retry_streamtask for Listen is started on the outertg(line 548), causing it to continue running afterhandle_leaseexits. This results in the task persisting across lease transitions. Move this toconn_tgso it's automatically cancelled when the lease ends, and closelisten_txin the finally block to signal proper stream termination.
578-579: Consider scoping_handle_end_sessionto the lease task group.Starting this task on the outer
tgmeans it survives the lease ending. If combined with the fact that it only waits onend_session_requested, the task can be orphaned. This is related to the hang issue noted in the_handle_end_sessionmethod.python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
80-89: Consider removing_run_shell_with_leaseif it’s now unused.The async path appears to be the only entry point, so this sync helper likely adds maintenance overhead.
#!/bin/bash # Verify remaining usages of the sync helper rg -nP '\b_run_shell_with_lease\s*\(' --type py
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 8-18: The import block is unsorted per Ruff I001; reorder imports
into standard-library first (logging, collections.abc.Awaitable,
collections.abc.Callable, contextlib.AsyncExitStack), then third-party packages
(anyio, anyio.Event, anyio.create_task_group, grpc.StatusCode,
grpc.aio.AioRpcError, jumpstarter_protocol.jumpstarter_pb2), and finally
local/project imports (jumpstarter.common.ExporterStatus) so the import order
satisfies the linter and keeps the existing imported symbols intact.
- Around line 196-201: The timeout checks in wait_for_status() and
wait_for_any_of() incorrectly use "if timeout:" which treats 0 as falsy; change
both checks to "if timeout is not None:" so a timeout of 0 is handled as an
immediate timeout; update the branches around anyio.move_on_after in
wait_for_status() and wait_for_any_of() (the blocks that call wait_loop() and
return False) to use the new condition.
- Around line 396-400: stop() currently sets _running = False and signals
_stop_event but does not wake waiters blocked on _any_change_event; update
stop() to also set _any_change_event (and optionally _any_change_event.set()
followed by _any_change_event.clear() if you want a one-shot wake) so waiters in
wait_for_status and wait_for_any_of that are awaiting _any_change_event.wait()
will wake and observe _running == False and exit their loops; reference the stop
method and the _any_change_event, wait_for_status, and wait_for_any_of functions
when applying this change.
In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py`:
- Around line 168-191: The stop() method currently ignores the caller's
should_unregister when wait_for_lease_exit is True, causing later calls (e.g.,
in serve()) to override intent; update stop() so that whenever a deferred stop
is requested (the branch setting self._stop_requested = True) you also set
self._unregister = should_unregister (and likewise when a stop has already been
requested you should update/preserve self._unregister accordingly) so the
original caller preference is retained; refer to the stop() method, the
attributes self._unregister, self._stop_requested, and self._tg.cancel_scope to
implement this change.
🧹 Nitpick comments (1)
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
147-162: Remove redundant timeout wrapper and improve timeout detection logic.The
StatusMonitor.connection_lostproperty is valid and well-tested. However, the outeranyio.move_on_after(30)is redundant—wait_for_any_of()already has an identicaltimeout=30.0parameter. When both are equal, the inner timeout fires first, making the outer scope'scancelled_caughtflag never True. This means timeout cases currently fall through to "Hook completion not confirmed" rather than the timeout warning.The suggested refactor properly addresses this by removing the outer timeout wrapper and using
result is Noneto detect timeouts, which disambiguates between connection loss and timeout.♻️ Suggested simplification
- with anyio.move_on_after(30) as timeout_scope: # 30 second timeout - result = await monitor.wait_for_any_of( - [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], - timeout=30.0 - ) - if result == ExporterStatus.AVAILABLE: - logger.info("afterLease hook completed") - elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED: - logger.warning("afterLease hook failed") - elif monitor.connection_lost: - # Connection lost - exporter closed, hook likely completed - logger.info("Exporter connection closed (hook completed)") - else: - logger.debug("Hook completion not confirmed") - if timeout_scope.cancelled_caught: - logger.warning("Timeout waiting for afterLease hook to complete") + result = await monitor.wait_for_any_of( + [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], + timeout=30.0 + ) + if result == ExporterStatus.AVAILABLE: + logger.info("afterLease hook completed") + elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED: + logger.warning("afterLease hook failed") + elif result is None and monitor.connection_lost: + # Connection lost - exporter closed, hook likely completed + logger.info("Exporter connection closed (hook completed)") + elif result is None: + logger.warning("Timeout waiting for afterLease hook to complete") + else: + logger.debug("Hook completion not confirmed")
| if timeout: | ||
| with anyio.move_on_after(timeout): | ||
| return await wait_loop() | ||
| return False | ||
| else: | ||
| return await wait_loop() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /tmp && find . -name "status_monitor.py" -type f 2>/dev/null | head -5Repository: jumpstarter-dev/jumpstarter
Length of output: 53
🏁 Script executed:
# First, let's find the file and examine the relevant lines
find . -name "status_monitor.py" -type f 2>/dev/nullRepository: jumpstarter-dev/jumpstarter
Length of output: 137
🏁 Script executed:
# If found, read the file to see context around the mentioned lines
cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py 2>/dev/null | sed -n '190,210p'Repository: jumpstarter-dev/jumpstarter
Length of output: 923
🏁 Script executed:
# Also check the second location (lines 280-285)
cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py 2>/dev/null | sed -n '275,295p'Repository: jumpstarter-dev/jumpstarter
Length of output: 939
🏁 Script executed:
# Search for function calls to wait_for_status and wait_for_any_of
rg "wait_for_status\(|wait_for_any_of\(" --type=py -A 3 -B 1Repository: jumpstarter-dev/jumpstarter
Length of output: 9431
🏁 Script executed:
# Search for timeout=0 patterns
rg "timeout\s*=\s*0" --type=py -B 2 -A 2Repository: jumpstarter-dev/jumpstarter
Length of output: 8378
🏁 Script executed:
# Check if there's any documentation or usage of anyio.move_on_after in the codebase
rg "anyio.move_on_after|move_on_after" --type=py -B 1 -A 1Repository: jumpstarter-dev/jumpstarter
Length of output: 3402
🏁 Script executed:
# Check the function definitions to understand the full context
cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py | sed -n '155,210p'Repository: jumpstarter-dev/jumpstarter
Length of output: 2539
🏁 Script executed:
# Look at the second function to confirm it's identical pattern
cat -n python/packages/jumpstarter/jumpstarter/client/status_monitor.py | sed -n '245,290p'Repository: jumpstarter-dev/jumpstarter
Length of output: 2252
Use if timeout is not None: to correctly handle timeout=0.
The condition if timeout: treats 0 as falsy, which causes an immediate timeout (timeout=0) to instead wait indefinitely. This occurs in both wait_for_status() and wait_for_any_of() methods because the timeout parameter accepts float | None, where 0 is a valid value distinct from None.
Affected locations
- Lines 196-201 in
wait_for_status() - Lines 280-285 in
wait_for_any_of()
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py` around
lines 196 - 201, The timeout checks in wait_for_status() and wait_for_any_of()
incorrectly use "if timeout:" which treats 0 as falsy; change both checks to "if
timeout is not None:" so a timeout of 0 is handled as an immediate timeout;
update the branches around anyio.move_on_after in wait_for_status() and
wait_for_any_of() (the blocks that call wait_loop() and return False) to use the
new condition.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@python/packages/jumpstarter/jumpstarter/client/lease.py`:
- Around line 237-257: The fixed-count retry loop around self.controller.Dial
(using max_retries/base_delay/max_delay) should be replaced with a time‑based
retry bounded by a configurable timeout (use the existing acquisition_timeout if
available or add a new dial_timeout attribute/parameter) so Dial keeps retrying
until time.monotonic() exceeds the deadline; compute deadline = now + timeout,
then loop attempting await self.controller.Dial(...), on AioRpcError with
FAILED_PRECONDITION and "not ready" apply the same exponential/backoff delay
(min(base_delay*2**attempt, max_delay)) but stop retrying and re-raise once
monotonic() > deadline; update the retry log to show remaining time or attempts
left and ensure the new timeout is plumbed through the Lease acquisition
call/site that calls this Dial logic.
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 343-347: In _poll_loop (status_monitor.py) the AioRpcError
UNIMPLEMENTED branch currently breaks without clearing _running or notifying
waiters; update that branch to mirror the UNAVAILABLE handling by setting
self._running = False and calling self._any_change_event.set() (and any other
cleanup done in the UNAVAILABLE branch) before breaking so wait_for_status() and
wait_for_any_of() don't hang; also add a unit test that triggers UNIMPLEMENTED
while a waiter (using wait_for_status() or wait_for_any_of()) is active and
assert the waiter returns promptly.
♻️ Duplicate comments (3)
python/packages/jumpstarter/jumpstarter/client/status_monitor.py (3)
8-18: Reorder imports to satisfy Ruff (I001).Current ordering may still fail Ruff I001.
🔧 Proposed import order
-import logging -from collections.abc import Awaitable, Callable -from contextlib import AsyncExitStack +from collections.abc import Awaitable, Callable +from contextlib import AsyncExitStack +import loggingPlease verify with:
#!/bin/bash ruff check python/packages/jumpstarter/jumpstarter/client/status_monitor.pyBased on learnings, Ruff is the formatter/linter for Python files.
196-201: Handletimeout=0explicitly.
if timeout:treats0as falsy, so an immediate timeout becomes an infinite wait (Line 196 and Line 280).🐛 Proposed fix
- if timeout: + if timeout is not None: with anyio.move_on_after(timeout): return await wait_loop() return False- if timeout: + if timeout is not None: with anyio.move_on_after(timeout): return await wait_for_first() return NoneOptional verification:
#!/bin/bash rg -n "wait_for_status|wait_for_any_of|timeout\s*=\s*0" --type=pyAlso applies to: 280-285
396-400: Wake waiting coroutines on stop().
stop()doesn’t signal_any_change_event, sowait_for_*can hang after shutdown.🛠️ Proposed fix
async def stop(self): """Stop the background polling task.""" self._running = False self._stop_event.set() + # Wake any waiters so they can observe _running=False + self._any_change_event.set() + self._any_change_event = Event()
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@e2e/dex.values.yaml`:
- Around line 37-40: The userID value for the test user "test-exporter-hooks" is
not a valid UUID because it contains the non-hex character 'g' (value:
c6ed6f40-6689-6g7e-c64c-55d0a9a5871f); replace this with a valid UUID (only 0-9
and a-f), e.g., generate a new v4 UUID and set userID to that new value so the
entry for username "test-exporter-hooks" and its userID field are valid.
🧹 Nitpick comments (5)
e2e/tests-hooks.bats (5)
9-10: Unused variable declaration.
CURRENT_HOOKS_CONFIGis declared but never referenced anywhere in the file. Consider removing it to avoid confusion.Suggested fix
-# Track which config is currently active -CURRENT_HOOKS_CONFIG=""
51-54: Consider graceful termination before SIGKILL.Using
kill -9immediately doesn't give the process a chance to clean up. Consider sendingSIGTERMfirst with a brief grace period before escalating toSIGKILL.Suggested improvement
if ps -p "$pid" > /dev/null 2>&1; then echo " Killing PID $pid" >&2 - kill -9 "$pid" 2>/dev/null || true + kill -TERM "$pid" 2>/dev/null || true + sleep 0.5 + ps -p "$pid" > /dev/null 2>&1 && kill -9 "$pid" 2>/dev/null || true fi
104-110: Fixed sleep may introduce flakiness.The 2-second sleep on line 107 is intended to avoid catching pre-connect state, but this is timing-dependent. If the exporter connects faster or slower than expected, tests could become flaky. Consider polling for a specific pre-condition instead, or documenting why 2 seconds is sufficient.
205-208: Add safeguards for empty grep results.If the grep commands fail to find matches (e.g., due to unexpected output format),
before_posorafter_poscould be empty, causing the comparison on line 208 to fail with a shell error. Consider adding validation or using|| truewith a fallback.Suggested improvement
# Verify order: BEFORE should appear before AFTER in output local before_pos=$(echo "$output" | grep -n "BEFORE_HOOK:" | head -1 | cut -d: -f1) local after_pos=$(echo "$output" | grep -n "AFTER_HOOK:" | head -1 | cut -d: -f1) + # Ensure both markers were found + [ -n "$before_pos" ] && [ -n "$after_pos" ] [ "$before_pos" -lt "$after_pos" ]
315-333: Tests lack verification of actualjcommand effects.Tests E1 and E2 only verify that the hook scripts complete (via output markers), but don't verify that
j power on/offcommands actually executed or had the expected effect. Consider adding assertions that verify the power state changed, or at minimum checking for command output in the hook logs.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py`:
- Around line 358-365: The UNAVAILABLE branch breaks the poll loop but doesn't
update the monitor's running state, so set self._running = False when handling
StatusCode.UNAVAILABLE; specifically, inside the elif e.code() ==
StatusCode.UNAVAILABLE block update self._running to False alongside setting
self._connection_lost, trigger and recreate self._any_change_event as shown, and
ensure start() can restart the monitor after this exit.
♻️ Duplicate comments (2)
python/packages/jumpstarter/jumpstarter/client/status_monitor.py (2)
199-204: Handletimeout=0explicitly.
if timeout:treats0as falsy, sotimeout=0waits indefinitely instead of timing out immediately.🛠️ Proposed fix
- if timeout: + if timeout is not None: with anyio.move_on_after(timeout): return await wait_loop() return False- if timeout: + if timeout is not None: with anyio.move_on_after(timeout): return await wait_for_first() return NoneAlso applies to: 283-286
416-420: Wake waiters on stop to avoid hangs.
stop()only signals_stop_event, but waiters block on_any_change_event, so they can hang indefinitely.🛠️ Proposed fix
async def stop(self): """Stop the background polling task.""" self._running = False self._stop_event.set() + # Wake waiters so they can observe _running=False + self._any_change_event.set() + self._any_change_event = Event()
| elif e.code() == StatusCode.UNAVAILABLE: | ||
| # Connection lost - exporter closed or restarted | ||
| logger.info("Connection lost (UNAVAILABLE), signaling waiters") | ||
| self._connection_lost = True | ||
| # Fire the change event to wake up any waiters | ||
| self._any_change_event.set() | ||
| self._any_change_event = Event() | ||
| break |
There was a problem hiding this comment.
Mark monitor as not running on UNAVAILABLE exit.
The poll loop breaks on UNAVAILABLE but leaves _running=True, preventing restart via start() and misrepresenting state.
🛠️ Proposed fix
elif e.code() == StatusCode.UNAVAILABLE:
# Connection lost - exporter closed or restarted
logger.info("Connection lost (UNAVAILABLE), signaling waiters")
self._connection_lost = True
+ self._running = False
# Fire the change event to wake up any waiters
self._any_change_event.set()
self._any_change_event = Event()
break🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/status_monitor.py` around
lines 358 - 365, The UNAVAILABLE branch breaks the poll loop but doesn't update
the monitor's running state, so set self._running = False when handling
StatusCode.UNAVAILABLE; specifically, inside the elif e.code() ==
StatusCode.UNAVAILABLE block update self._running to False alongside setting
self._connection_lost, trigger and recreate self._any_change_event as shown, and
ensure start() can restart the monitor after this exit.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/packages/jumpstarter/jumpstarter/config/client.py (1)
162-177: Threadinclude_statusthrough to the ListExporters request.The
include_statusparameter is only used to set a display flag on the response (line 176), but it's never passed toClientService.ListExporters. If the server requires this flag in the request to include status data, status information won't be fetched. UpdateClientService.ListExportersto accept and propagateinclude_statusto the protobuf request.Suggested update
-exporters_response = await svc.ListExporters(page_size=page_size, page_token=page_token, filter=filter) +exporters_response = await svc.ListExporters( + page_size=page_size, + page_token=page_token, + filter=filter, + include_status=include_status, +)
- Add release_lease field to ReportStatusRequest to allow exporters to signal the controller to release the active lease after hooks complete - Add status_version and previous_status fields to GetStatusResponse for tracking status transitions
The make manifests target regenerates RBAC role.yaml using controller-gen, which strips manually-added annotations. This adds a post-processing step using awk to re-add the ArgoCD sync-wave annotation that was added in PR #207 to ensure proper deployment ordering.
Updates the operator install.yaml to include the new ExporterStatus and StatusMessage fields added to the Exporter CRD, along with additional printer columns for better kubectl output.
This ensures the operator's install.yaml stays in sync with CRD changes by automatically running the operator's build-installer target as part of the manifests target. This prevents drift between the controller CRDs and the operator bundle.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Fix all issues with AI agents
In `@controller/internal/service/controller_service_integration_test.go`:
- Around line 76-86: The Lease objects in the tests (the Lease struct created
and passed to k8sClient.Create in controller_service_integration_test.go) are
missing the required Selector field in LeaseSpec which causes API validation to
fail; update the Lease initializations (the variable lease used with
k8sClient.Create and its LeaseSpec) to include Selector: metav1.LabelSelector{}
or an appropriate label selector (e.g., Selector:
&metav1.LabelSelector{MatchLabels: map[string]string{"app":"test"}}) so the
CRD-required selector is present when Create(ctx, lease) is called.
In `@e2e/tests.bats`:
- Around line 190-196: Two occurrences of the yq load expression still reference
the old path load("e2e/exporter.yaml"); update those commands that match the
pattern go run ... -i ". * load(\"e2e/exporter.yaml\")" to use
load("e2e/exporters/exporter.yaml") instead so all yq invocations point to the
moved exporter.yaml file (ensure the same replacement is applied for both
remaining occurrences).
In `@python/packages/jumpstarter-cli/jumpstarter_cli/run.py`:
- Around line 79-94: The parent currently interprets any non-zero child exit as
a signal and adds 128, which double-encodes real exit codes; change the protocol
so the child (exporter) encodes signals as 128 + signal number and the parent in
run.py should stop adding 128 — i.e., use exporter_exit_code directly when set
(replace any logic that converts non-zero exporter_exit_code into 128+value),
and keep received_signal only for locally-caught signals; update handling around
exporter_exit_code and received_signal (and cancellation via
signal_tg.cancel_scope.cancel()) so the parent returns exporter_exit_code as-is
when present.
In `@python/packages/jumpstarter/jumpstarter/client/core.py`:
- Around line 458-485: Guard creation of LogSource from response.source with a
try/except to handle unknown enum values: when evaluating source =
LogSource(response.source) (and computing is_hook), catch ValueError and set
source to LogSource.SYSTEM (and is_hook = False) so the stream continues instead
of raising; keep the existing routing logic that maps source to logger_name and
logging through source_logger.log(log_level, response.message). Also ensure
severity fallback (severity = response.severity if response.severity else
"INFO") remains unchanged.
- Around line 127-187: The wait_for_lease_ready method currently logs a warning
on timeout but returns silently so callers cannot detect failure; modify
wait_for_lease_ready to raise DriverError (or return False if you prefer an API
change) when the timeout elapses instead of only logging: replace the final
self.logger.warning(...) with raising DriverError(f"Timeout waiting for
beforeLease hook to complete after {timeout:.1f}s and {poll_count} polls")
(ensure DriverError is imported/available) so callers of wait_for_lease_ready
can programmatically handle timeouts.
In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py`:
- Around line 168-191: The stop method currently ignores the caller's
should_unregister value when deferring stop; update stop(self, ...) to persist
should_unregister into an instance field (e.g. set self._unregister =
should_unregister) whenever wait_for_lease_exit is true and you set
self._stop_requested = True so the original preference is preserved, and ensure
the lease-exit path that later invokes stop uses that stored self._unregister
rather than unconditionally passing True; update references to _unregister and
_stop_requested so deferred stops respect the saved preference.
🧹 Nitpick comments (11)
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
147-162: Redundant timeout pattern may cause confusion.The code uses both
anyio.move_on_after(30)andtimeout=30.0inwait_for_any_of. Since both timeouts are identical, the innerwait_for_any_ofwill returnNoneon timeout beforemove_on_aftercan trigger, meaningtimeout_scope.cancelled_caughtat line 161 will likely never beTrue.Consider removing one timeout mechanism:
♻️ Suggested simplification
- with anyio.move_on_after(30) as timeout_scope: # 30 second timeout - result = await monitor.wait_for_any_of( - [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], - timeout=30.0 - ) - if result == ExporterStatus.AVAILABLE: - logger.info("afterLease hook completed") - elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED: - logger.warning("afterLease hook failed") - elif monitor.connection_lost: - # Connection lost - exporter closed, hook likely completed - logger.info("Exporter connection closed (hook completed)") - else: - logger.debug("Hook completion not confirmed") - if timeout_scope.cancelled_caught: - logger.warning("Timeout waiting for afterLease hook to complete") + result = await monitor.wait_for_any_of( + [ExporterStatus.AVAILABLE, ExporterStatus.AFTER_LEASE_HOOK_FAILED], + timeout=30.0 + ) + if result == ExporterStatus.AVAILABLE: + logger.info("afterLease hook completed") + elif result == ExporterStatus.AFTER_LEASE_HOOK_FAILED: + logger.warning("afterLease hook failed") + elif monitor.connection_lost: + logger.info("Exporter connection closed (hook completed)") + else: + logger.warning("Timeout waiting for afterLease hook to complete")e2e/tests-hooks.bats (2)
75-76: Consider caching yq installation for faster test execution.Using
go run github.com/mikefarah/yq/v4@latestfetches from the network on each invocation. For repeated test runs, this adds latency. Consider installing yq once in the test environment setup or using a pre-installed binary.
78-82: Clarify the3>&-redirection purpose.The
3>&-closes file descriptor 3 before running the background loop. While this works, a brief comment explaining why this is needed (likely to prevent fd leaks from bats internals) would help maintainability.python/packages/jumpstarter/jumpstarter/client/core_test.py (1)
16-53: Optional: consolidate gRPC error/status helpers.These helpers look similar to those in
status_monitor_test.py; a shared test util could reduce duplication.python/packages/jumpstarter/jumpstarter/client/core.py (1)
105-126: Consider allowingExporterStatus.UNSPECIFIEDfor backward compatibility.If
GetStatusreturns the default/unspecified value, Line 124 currently raisesExporterNotReady, which can block driver calls for exporters that haven’t adopted explicit status reporting yet. AddingUNSPECIFIEDkeeps behavior more permissive without impacting the ready path.♻️ Suggested tweak
- ALLOWED_STATUSES = { - ExporterStatus.LEASE_READY, - ExporterStatus.BEFORE_LEASE_HOOK, - ExporterStatus.AFTER_LEASE_HOOK, - } + ALLOWED_STATUSES = { + ExporterStatus.LEASE_READY, + ExporterStatus.BEFORE_LEASE_HOOK, + ExporterStatus.AFTER_LEASE_HOOK, + ExporterStatus.UNSPECIFIED, + }python/packages/jumpstarter/jumpstarter/common/types.py (1)
16-18: Consider removing the emptyTYPE_CHECKINGblock.The
TYPE_CHECKINGguard and its comment suggest async stub type aliases were planned but the block is empty. This is dead code that could be removed for clarity, or the intended type aliases should be added.🧹 Proposed cleanup
-# Async stub type aliases are only available for type checking (defined in .pyi files) -if TYPE_CHECKING: - pass -Also update the import:
-from typing import TYPE_CHECKING, TypeAlias +from typing import TypeAliaspython/packages/jumpstarter/jumpstarter/exporter/lease_context.py (1)
54-57: Consider using explicit validation instead ofassertstatements.
assertstatements can be disabled when Python runs with optimization flags (-O), which would skip these critical validations in production. Consider raisingValueErrorexplicitly.♻️ Proposed fix
def __post_init__(self): """Validate that required resources are present.""" - assert self.before_lease_hook is not None, "LeaseScope requires a before_lease_hook event" - assert self.lease_name, "LeaseScope requires a non-empty lease_name" + if self.before_lease_hook is None: + raise ValueError("LeaseContext requires a before_lease_hook event") + if not self.lease_name: + raise ValueError("LeaseContext requires a non-empty lease_name")python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)
621-629: Consider making the 1-second delay configurable or documenting its purpose more prominently.The delay at Line 624 allows clients to poll and see the AVAILABLE status before the lease is released. While the comment explains this, users might wonder why there's a hardcoded delay. Consider extracting this to a constant or making it configurable.
python/packages/jumpstarter/jumpstarter/client/status_monitor_test.py (1)
426-522: Good coverage of wait_for_any_of behavior.The tests cover the key scenarios for multi-target waiting. The parallel structure to
wait_for_statustests ensures consistent behavior.Consider adding a test for
DEADLINE_EXCEEDEDhandling inwait_for_any_offor completeness (similar totest_wait_for_status_deadline_exceeded_returns_promptly), though theUNAVAILABLEtest provides reasonable coverage of the connection-loss path.python/packages/jumpstarter/jumpstarter/exporter/session.py (1)
270-280: Consider reducing GetStatus log level.The
GetStatusRPC may be polled frequently by clients (e.g.,StatusMonitorpolls every 0.1-0.3s). Usinglogger.infocould result in excessive log output.Consider using DEBUG level
async def GetStatus(self, request, context): """Get the current exporter status with transition tracking.""" - logger.info("GetStatus() -> %s (version=%d)", self._current_status, self._status_version) + logger.debug("GetStatus() -> %s (version=%d)", self._current_status, self._status_version) response = jumpstarter_pb2.GetStatusResponse(python/packages/jumpstarter/jumpstarter/exporter/exporter.py (1)
360-382: Consider using_controller_stubin_unregister_with_controller.This method manages the channel manually instead of using the
_controller_stubcontext manager. While the explicitchannel.close()withCancelScope(shield=True)is correct, using the context manager would be more consistent with the rest of the codebase.Optional refactor for consistency
async def _unregister_with_controller(self): """Safely unregister from controller with timeout and error handling.""" if not (self._registered and self._unregister): return logger.info("Unregistering exporter with controller") try: with move_on_after(10): # 10 second timeout - channel = await self.channel_factory() - try: - controller = jumpstarter_pb2_grpc.ControllerServiceStub(channel) + async with self._controller_stub() as controller: await self._report_status(ExporterStatus.OFFLINE, "Exporter shutting down") await controller.Unregister( jumpstarter_pb2.UnregisterRequest( reason="Exporter shutdown", ) ) logger.info("Controller unregistration completed successfully") - finally: - with CancelScope(shield=True): - await channel.close() except Exception as e: logger.error("Error during controller unregistration: %s", e, exc_info=True)
| lease = &jumpstarterdevv1alpha1.Lease{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "lease-to-release", | ||
| Namespace: testNamespace, | ||
| }, | ||
| Spec: jumpstarterdevv1alpha1.LeaseSpec{ | ||
| ClientRef: corev1.LocalObjectReference{Name: "test-client"}, | ||
| Release: false, | ||
| }, | ||
| } | ||
| Expect(k8sClient.Create(ctx, lease)).To(Succeed()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the LeaseSpec definition and Selector field
fd -e go "lease_types.go" controller/
# Get the actual content of LeaseSpec
rg -n "type LeaseSpec struct" --type=go -A20 controller/Repository: jumpstarter-dev/jumpstarter
Length of output: 2012
🏁 Script executed:
#!/bin/bash
# Check for validation markers on the Selector field
rg -n "Selector" controller/api/v1alpha1/lease_types.go -B5 -A1
# Check if there are validation rules defined elsewhere for Lease
rg -n "kubebuilder:" controller/api/v1alpha1/lease_types.go | head -20Repository: jumpstarter-dev/jumpstarter
Length of output: 864
🏁 Script executed:
#!/bin/bash
# Check if there are CRD definition files
fd -e yaml "lease.*crd" controller/
# Check the test file to see if it has any setup that might affect validation
head -40 controller/internal/service/controller_service_integration_test.go
# Look for any ValidatingWebhook or validation setup
rg -n "Validat" controller/ --type=go | head -20
# Check if there's a markers file or +kubebuilder validation rules
rg -n "+kubebuilder:validation" controller/api/v1alpha1/lease_types.goRepository: jumpstarter-dev/jumpstarter
Length of output: 2310
🏁 Script executed:
#!/bin/bash
# Search for CRD files more broadly
find controller -name "*.yaml" -o -name "*.yml" | grep -i lease
# Check if there's a config or manifests directory
fd -e yaml controller/config/ | head -20
# Look for the validation webhook or rules setup
rg -n "LeaseSpec" controller/api/v1alpha1/lease_types.go -B10 | grep -E "kubebuilder|validation"
# Check the entire lease_types.go file for validation markers
cat controller/api/v1alpha1/lease_types.go | grep -E "kubebuilder|validation"Repository: jumpstarter-dev/jumpstarter
Length of output: 1509
🏁 Script executed:
#!/bin/bash
# Check the CRD definition for validation rules on Selector
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | grep -A30 "selector:"
# Also check for any validation rules in the properties section
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | grep -B5 -A10 '"selector"'Repository: jumpstarter-dev/jumpstarter
Length of output: 1734
🏁 Script executed:
#!/bin/bash
# Get more context from the CRD - check if selector is in the required fields
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | grep -B20 "selector:" | grep -A30 "required:"
# Or let's look at the entire spec section to see required fields
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | sed -n '/spec:/,/status:/p' | grep -A100 "properties:" | head -60Repository: jumpstarter-dev/jumpstarter
Length of output: 2973
🏁 Script executed:
#!/bin/bash
# Look for the required fields in the spec section
cat controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml | sed -n '/spec:/,/status:/p' | tail -50Repository: jumpstarter-dev/jumpstarter
Length of output: 2575
Add required Selector field to all Lease objects in tests.
The Selector field is required in LeaseSpec according to the CRD definition (see required: [clientRef, selector] in controller/deploy/operator/config/crd/bases/jumpstarter.dev_leases.yaml). All Lease objects created in these tests omit this field, which will cause validation failures when k8sClient.Create(ctx, lease) is called. Set Selector: metav1.LabelSelector{} or a valid label selector for each Lease creation.
🤖 Prompt for AI Agents
In `@controller/internal/service/controller_service_integration_test.go` around
lines 76 - 86, The Lease objects in the tests (the Lease struct created and
passed to k8sClient.Create in controller_service_integration_test.go) are
missing the required Selector field in LeaseSpec which causes API validation to
fail; update the Lease initializations (the variable lease used with
k8sClient.Create and its LeaseSpec) to include Selector: metav1.LabelSelector{}
or an appropriate label selector (e.g., Selector:
&metav1.LabelSelector{MatchLabels: map[string]string{"app":"test"}}) so the
CRD-required selector is present when Create(ctx, lease) is called.
| go run github.com/mikefarah/yq/v4@latest -i ". * load(\"e2e/exporters/exporter.yaml\")" \ | ||
| /etc/jumpstarter/exporters/test-exporter-oidc.yaml | ||
| go run github.com/mikefarah/yq/v4@latest -i ". * load(\"e2e/exporters/exporter.yaml\")" \ | ||
| /etc/jumpstarter/exporters/test-exporter-sa.yaml | ||
| go run github.com/mikefarah/yq/v4@latest -i ". * load(\"e2e/exporters/exporter.yaml\")" \ | ||
| /etc/jumpstarter/exporters/test-exporter-legacy.yaml | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check which exporter.yaml paths exist in the e2e directory
echo "=== Checking exporter.yaml locations ==="
fd -t f "exporter.yaml" e2e/
echo ""
echo "=== References in tests.bats ==="
rg -n "exporter\.yaml" e2e/tests.batsRepository: jumpstarter-dev/jumpstarter
Length of output: 633
Update exporter.yaml path references on lines 126 and 174.
The file e2e/exporter.yaml no longer exists; it was moved to e2e/exporters/exporter.yaml. Lines 190–195 were correctly updated, but lines 126 and 174 still reference the old non-existent path, causing test failures:
- Line 126: Change
e2e/exporter.yamltoe2e/exporters/exporter.yaml - Line 174: Change
e2e/exporter.yamltoe2e/exporters/exporter.yaml
🤖 Prompt for AI Agents
In `@e2e/tests.bats` around lines 190 - 196, Two occurrences of the yq load
expression still reference the old path load("e2e/exporter.yaml"); update those
commands that match the pattern go run ... -i ". * load(\"e2e/exporter.yaml\")"
to use load("e2e/exporters/exporter.yaml") instead so all yq invocations point
to the moved exporter.yaml file (ensure the same replacement is applied for both
remaining occurrences).
| # Check if exporter set an exit code (e.g., from hook failure with on_failure='exit') | ||
| exporter_exit_code = exporter.exit_code | ||
|
|
||
| # Cancel the signal handler after exporter completes | ||
| signal_tg.cancel_scope.cancel() | ||
|
|
||
| # Return signal number if received, otherwise 0 for immediate restart | ||
| return received_signal if received_signal else 0 | ||
| # Return exit code in priority order: | ||
| # 1. Signal number if received (for signal-based termination) | ||
| # 2. Exporter's exit code if set (for hook failure with on_failure='exit') | ||
| # 3. 0 for immediate restart (normal exit without signal or explicit exit code) | ||
| if received_signal: | ||
| return received_signal | ||
| elif exporter_exit_code is not None: | ||
| return exporter_exit_code | ||
| else: | ||
| return 0 |
There was a problem hiding this comment.
Align child/parent exit-code encoding to avoid 128+ skew.
With the new exporter_exit_code, the parent still assumes any non‑zero child exit code is a signal number and adds 128 (Line 131–138). That will mangle real exit codes (e.g., 1 → 129). Encode signals as 128+signal in the child and have the parent return the child exit code directly.
Proposed fix
@@
- # 1. Signal number if received (for signal-based termination)
+ # 1. Signal-based termination → return standard 128+signal
@@
- if received_signal:
- return received_signal
+ if received_signal:
+ return 128 + received_signal
@@
- if child_exit_code == 0:
- return None # restart child (unexpected exit/exception)
- else:
- # Child indicates termination (signal number)
- return 128 + child_exit_code # Return standard Unix exit code
+ if child_exit_code == 0:
+ return None # restart child (unexpected exit/exception)
+ else:
+ # Child already encodes final exit status (128+signal or exporter exit code)
+ return child_exit_code🤖 Prompt for AI Agents
In `@python/packages/jumpstarter-cli/jumpstarter_cli/run.py` around lines 79 - 94,
The parent currently interprets any non-zero child exit as a signal and adds
128, which double-encodes real exit codes; change the protocol so the child
(exporter) encodes signals as 128 + signal number and the parent in run.py
should stop adding 128 — i.e., use exporter_exit_code directly when set (replace
any logic that converts non-zero exporter_exit_code into 128+value), and keep
received_signal only for locally-caught signals; update handling around
exporter_exit_code and received_signal (and cancellation via
signal_tg.cancel_scope.cancel()) so the parent returns exporter_exit_code as-is
when present.
| async def wait_for_lease_ready(self, timeout: float = 300.0) -> None: | ||
| """Wait for exporter to report LEASE_READY status. | ||
|
|
||
| This polls GetStatus until the beforeLease hook completes. | ||
| Should be called after log streaming is started so hook output | ||
| can be displayed in real-time. | ||
|
|
||
| Prefer using StatusMonitor.wait_for_any_of() for non-blocking status tracking. | ||
|
|
||
| Args: | ||
| timeout: Maximum time to wait in seconds (default: 5 minutes) | ||
| """ | ||
| import anyio | ||
|
|
||
| poll_interval = 0.5 # seconds | ||
| elapsed = 0.0 | ||
| poll_count = 0 | ||
|
|
||
| self.logger.debug("Waiting for exporter to be ready...") | ||
| while elapsed < timeout: | ||
| poll_count += 1 | ||
| self.logger.debug("[POLL %d] Calling GetStatus (elapsed: %.1fs)...", poll_count, elapsed) | ||
| try: | ||
| status = await self.get_status_async() | ||
| self.logger.debug("[POLL %d] GetStatus returned: %s", poll_count, status) | ||
| except Exception as e: | ||
| # Connection error - keep trying | ||
| self.logger.debug("[POLL %d] Error getting status, will retry: %s", poll_count, e) | ||
| await anyio.sleep(poll_interval) | ||
| elapsed += poll_interval | ||
| continue | ||
|
|
||
| if status is None: | ||
| # GetStatus not implemented - assume ready for backward compatibility | ||
| self.logger.debug("[POLL %d] GetStatus not implemented, assuming ready", poll_count) | ||
| return | ||
|
|
||
| if status == ExporterStatus.LEASE_READY: | ||
| self.logger.info("Exporter ready, starting shell...") | ||
| return | ||
| elif status == ExporterStatus.BEFORE_LEASE_HOOK: | ||
| # Hook is running - this is expected, keep waiting | ||
| self.logger.debug("[POLL %d] beforeLease hook is running...", poll_count) | ||
| elif status == ExporterStatus.BEFORE_LEASE_HOOK_FAILED: | ||
| # Hook failed - log but continue (exporter may still be usable) | ||
| self.logger.warning("beforeLease hook failed") | ||
| return | ||
| elif status == ExporterStatus.AVAILABLE: | ||
| # Exporter is available but not yet leased - keep waiting | ||
| # This can happen if client connects before exporter receives lease assignment | ||
| self.logger.debug("[POLL %d] Exporter status: AVAILABLE (waiting for lease assignment)", poll_count) | ||
| else: | ||
| # Other status - continue waiting | ||
| self.logger.debug("[POLL %d] Exporter status: %s (waiting...)", poll_count, status) | ||
|
|
||
| self.logger.debug("[POLL %d] Sleeping for %.1fs before next poll...", poll_count, poll_interval) | ||
| await anyio.sleep(poll_interval) | ||
| elapsed += poll_interval | ||
|
|
||
| self.logger.warning("Timeout waiting for beforeLease hook to complete (after %d polls)", poll_count) | ||
|
|
There was a problem hiding this comment.
Timeout isn’t surfaced to callers.
After the loop, Line 186 only logs a warning and returns, so callers can’t distinguish success from a timeout. Consider raising DriverError (or returning a boolean) when the timeout elapses.
🛠️ Possible fix (raise on timeout)
- self.logger.warning("Timeout waiting for beforeLease hook to complete (after %d polls)", poll_count)
+ self.logger.warning("Timeout waiting for beforeLease hook to complete (after %d polls)", poll_count)
+ raise DriverError("Timeout waiting for beforeLease hook to complete")🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/core.py` around lines 127 -
187, The wait_for_lease_ready method currently logs a warning on timeout but
returns silently so callers cannot detect failure; modify wait_for_lease_ready
to raise DriverError (or return False if you prefer an API change) when the
timeout elapses instead of only logging: replace the final
self.logger.warning(...) with raising DriverError(f"Timeout waiting for
beforeLease hook to complete after {timeout:.1f}s and {poll_count} polls")
(ensure DriverError is imported/available) so callers of wait_for_lease_ready
can programmatically handle timeouts.
| # Determine log source | ||
| if response.HasField("source"): | ||
| source = LogSource(response.source) | ||
| is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK) | ||
| else: | ||
| source = LogSource.SYSTEM | ||
| is_hook = False | ||
|
|
||
| # Filter: always show hooks, only show system logs if enabled | ||
| if is_hook or show_all_logs: | ||
| # Get severity level | ||
| severity = response.severity if response.severity else "INFO" | ||
| log_level = getattr(logging, severity, logging.INFO) | ||
|
|
||
| # Route to appropriate logger based on source | ||
| if source == LogSource.BEFORE_LEASE_HOOK: | ||
| logger_name = "exporter:beforeLease" | ||
| elif source == LogSource.AFTER_LEASE_HOOK: | ||
| logger_name = "exporter:afterLease" | ||
| elif source == LogSource.DRIVER: | ||
| logger_name = "exporter:driver" | ||
| else: # SYSTEM | ||
| logger_name = "exporter:system" | ||
|
|
||
| # Log through logger for RichHandler formatting | ||
| source_logger = logging.getLogger(logger_name) | ||
| source_logger.log(log_level, response.message) | ||
|
|
There was a problem hiding this comment.
Guard against unknown LogSource values.
If the server sends a new/unknown enum value, LogSource(response.source) raises ValueError, which can trigger reconnect churn. A small fallback keeps the stream healthy.
🛠️ Suggested fallback
- if response.HasField("source"):
- source = LogSource(response.source)
- is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK)
+ if response.HasField("source"):
+ try:
+ source = LogSource(response.source)
+ except ValueError:
+ self.logger.debug("Unknown LogSource value: %s", response.source)
+ source = LogSource.SYSTEM
+ is_hook = source in (LogSource.BEFORE_LEASE_HOOK, LogSource.AFTER_LEASE_HOOK)🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/core.py` around lines 458 -
485, Guard creation of LogSource from response.source with a try/except to
handle unknown enum values: when evaluating source = LogSource(response.source)
(and computing is_hook), catch ValueError and set source to LogSource.SYSTEM
(and is_hook = False) so the stream continues instead of raising; keep the
existing routing logic that maps source to logger_name and logging through
source_logger.log(log_level, response.message). Also ensure severity fallback
(severity = response.severity if response.severity else "INFO") remains
unchanged.
| def stop(self, wait_for_lease_exit=False, should_unregister=False, exit_code: int | None = None): | ||
| """Signal the exporter to stop. | ||
|
|
||
| Args: | ||
| wait_for_lease_exit (bool): If True, wait for the current lease to exit before stopping. | ||
| should_unregister (bool): If True, unregister from controller. Otherwise rely on heartbeat. | ||
| exit_code (int | None): If set, the exporter will exit with this code (non-zero means no restart). | ||
| """ | ||
| # Set exit code if provided | ||
| if exit_code is not None: | ||
| self._exit_code = exit_code | ||
|
|
||
| # Stop immediately if not started yet or if immediate stop is requested | ||
| if (not self._started or not wait_for_lease_exit) and self._tg is not None: | ||
| logger.info("Stopping exporter immediately, unregister from controller=%s", should_unregister) | ||
| if should_unregister: | ||
| logger.info("Stopping exporter immediately, unregistering from controller") | ||
| else: | ||
| logger.info("Stopping exporter immediately, will not unregister from controller") | ||
| self._unregister = should_unregister | ||
| # Cancel any ongoing tasks | ||
| self._tg.cancel_scope.cancel() | ||
| elif not self._stop_requested: | ||
| self._stop_requested = True | ||
| logger.info("Exporter marked for stop upon lease exit") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's find the exporter.py file and check its size
find . -name "exporter.py" -path "*jumpstarter*" -type fRepository: jumpstarter-dev/jumpstarter
Length of output: 194
🏁 Script executed:
#!/bin/bash
# Get the exact file location and check its size
EXPORTER_FILE="python/packages/jumpstarter/jumpstarter/exporter/exporter.py"
if [ -f "$EXPORTER_FILE" ]; then
wc -l "$EXPORTER_FILE"
fiRepository: jumpstarter-dev/jumpstarter
Length of output: 135
🏁 Script executed:
#!/bin/bash
# Read the relevant sections: lines 168-191 (stop method) and around line 788
EXPORTER_FILE="python/packages/jumpstarter/jumpstarter/exporter/exporter.py"
echo "=== Lines 168-191 (stop method) ==="
sed -n '168,191p' "$EXPORTER_FILE"
echo -e "\n=== Lines 780-800 (serve method call to stop) ==="
sed -n '780,800p' "$EXPORTER_FILE"
echo -e "\n=== Check for _stop_requested and should_unregister references ==="
grep -n "_stop_requested\|should_unregister\|_unregister" "$EXPORTER_FILE" | head -30Repository: jumpstarter-dev/jumpstarter
Length of output: 2962
🏁 Script executed:
#!/bin/bash
# Search for all calls to stop() in the exporter module to understand usage patterns
EXPORTER_DIR="python/packages/jumpstarter/jumpstarter/exporter/"
rg -n '\.stop\(' "$EXPORTER_DIR" -A 1 -B 1Repository: jumpstarter-dev/jumpstarter
Length of output: 1341
should_unregister preference is not preserved when deferring stop.
When stop(wait_for_lease_exit=True, should_unregister=False) is called, the should_unregister=False preference is discarded. The method only sets _stop_requested=True without storing the should_unregister value. Later, when the lease exits (line 788), stop(should_unregister=True) is called unconditionally, overriding the caller's intent to defer unregistration.
Consider storing should_unregister in a field (similar to how _unregister is used for immediate stops) when deferring the stop, so the original caller preference is honored when the lease eventually exits.
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/exporter.py` around lines
168 - 191, The stop method currently ignores the caller's should_unregister
value when deferring stop; update stop(self, ...) to persist should_unregister
into an instance field (e.g. set self._unregister = should_unregister) whenever
wait_for_lease_exit is true and you set self._stop_requested = True so the
original preference is preserved, and ensure the lease-exit path that later
invokes stop uses that stored self._unregister rather than unconditionally
passing True; update references to _unregister and _stop_requested so deferred
stops respect the saved preference.
Summary
Implement hook scripts that can be configured in the exporter config to run
jcommands within a temporary lease environment. This PR combines changes across Python, Controller, and Protocol components.Environment Variables
The following environment variables are exposed to hook scripts:
JUMPSTARTER_HOST- Path to the Unix socket for j CLI accessJMP_DRIVERS_ALLOW=UNSAFE- Allows all drivers for local accessLEASE_NAME- The name of the current leaseCLIENT_NAME- Name of the client that acquired the leaseExample Configuration
Changes
Python (
python/)exit,warn)jmp admin get exporter)Controller (
controller/)Protocol (
protocol/)release_leasefield for exporter-initiated lease releasestatus_versionandprevious_statusfields for transition trackingRelated PRs
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests