lmos-operator issue resolved#149
Conversation
| import java.net.HttpURLConnection | ||
| import java.net.URL | ||
|
|
||
| object AlertClient { |
There was a problem hiding this comment.
The Operator is using Spring Boot. This could be a Spring bean instead of a object.
Configuration should not be read directly via System.gentenv.
Spring Boot provides better options. Please have a look at https://github.com/eclipse-lmos/lmos-operator/blob/main/src/main/kotlin/org/eclipse/lmos/operator/OperatorConfig.kt and
There was a problem hiding this comment.
Thanks for the review!
I've refactored AlertClient to be a Spring-managed bean and switched configuration
handling to Spring Boot property injection, avoiding direct use of System.getenv.
This aligns with the existing OperatorConfig approach and makes the alerting
logic easier to test and configure.
Please let me know if you'd prefer @ConfigurationProperties instead of @value.
Fixes #51
Title
feat(operator): alert on unresolved channels in LMOS Operator
Description
What does this change do?
This PR introduces an alerting mechanism for unresolved Channels in the LMOS Operator.
When a Channel cannot resolve its required capabilities against available Agent resources, the Operator already marks the Channel as UNRESOLVED, but there was no notification or alert emitted. This change adds structured alerting so that unresolved states are visible to operators and monitoring systems.
The alert includes:
Namespace
Channel name
Unresolved capabilities (id, name, version)
Reason for unresolved state
Alerts are always logged and can optionally be sent to an external system via a configurable webhook.
Why is this change needed?
Previously:
Channels could remain unresolved silently
Operators had no easy way to detect capability mismatches
Debugging required manual inspection of Channel status
This led to:
Reduced observability
Delayed detection of misconfigurations
Potential runtime issues going unnoticed
This PR improves operational visibility while keeping runtime behavior unchanged.
How is this implemented?
Added a new AlertClient component responsible for:
Structured logging of unresolved Channel alerts
Optional webhook-based notification (LMOS_ALERT_WEBHOOK_URL)
Integrated alert emission into ChannelDependentResource when:
Capability resolution fails
Channel transitions into UNRESOLVED state
Alerting is non-blocking and best-effort
Failures in alert delivery do not affect reconciliation logic
Files changed
src/main/kotlin/org/eclipse/lmos/operator/alert/AlertClient.kt
New alert client responsible for logging and webhook delivery
src/main/kotlin/org/eclipse/lmos/operator/reconciler/ChannelDependentResource.kt
Emit alerts when Channel capability resolution fails
Testing done
Manually tested reconciliation flow with:
Missing Agent capabilities
Incompatible capability versions
Verified:
Channel status is set to UNRESOLVED
Structured alert log entry is emitted
Webhook delivery succeeds when LMOS_ALERT_WEBHOOK_URL is configured
Operator behavior remains unchanged when webhook is not configured
Confirmed no impact on successful Channel resolution paths
Backward compatibility
✅ No breaking changes
✅ Existing Channel resolution logic unchanged
✅ Alerting is additive and optional
✅ No configuration required unless webhook notifications are desired
Security considerations
Webhook URL is read from environment variable
No sensitive data beyond Channel metadata is transmitted
Failures in external communication do not affect core Operator logic
Screenshots
N/A – backend and operator logic change only.
Checklist
Code follows existing project conventions
Change is minimal and scoped
No behavioral regression introduced
Alerting is optional and non-blocking
Logging provides sufficient diagnostic context
Notes for reviewers
This PR focuses on observability only.
It does not alter scheduling, routing, or capability matching logic.
Future enhancements (out of scope here) could include:
Prometheus metrics for unresolved Channels
Alert deduplication or throttling
Pluggable notification backends