allow CAPApplicationVersion cleanup without a Prometheus connection#416
Draft
skrishnan-sap wants to merge 8 commits into
Draft
allow CAPApplicationVersion cleanup without a Prometheus connection#416skrishnan-sap wants to merge 8 commits into
CAPApplicationVersion cleanup without a Prometheus connection#416skrishnan-sap wants to merge 8 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allow
CAPApplicationVersioncleanup without a Prometheus connectionProblem
The version-cleanup routine only started when
PROMETHEUS_ADDRESSwas set and the controller could reach the Prometheus server (Runtimeinfoprobe succeeded). This made Prometheus a hard dependency even for applications that don't define anymonitoring.deletionRuleson their workloads and would simply like to clean up older versions once a new one becomesReady.Change
The version-cleanup loop is now decoupled from Prometheus availability:
PROMETHEUS_ADDRESSis not set (or the configured server is unreachable), the loop still runs:deletionRulesare eligible for cleanup as before.deletionRulesare skipped (not deleted, not requeued in error) and re-evaluated on the next cycle once Prometheus becomes available again.PROM_ACQUIRE_CLIENT_RETRY_DELAY, never on every queue iteration.The
sme.sap.com/enable-cleanup-monitoringannotation contract onCAPApplicationis unchanged ("true"and"dry-run", case-insensitive). Defaulting that annotation or adding an opt-out value is explicitly out of scope here.Implementation details (
internal/controller/version-monitoring.go)parseMonitoringEnv()now always returns a non-nil*monitoringEnv. WhenPROMETHEUS_ADDRESSis unset,addressis empty; the duration fields keep their existing defaults (acquireClientRetryDelay = 1h,evaluationInterval = 10m).startVersionCleanup(ctx)no longer short-circuits when no address is configured. It logs once at startup ("PROMETHEUS_ADDRESS is not set; only versions without deletion rules will be cleaned up") and proceeds to launch the goroutines.initializeVersionCleanupOrchestrator(ctx, mEnv)now always returns a non-nil*cleanupOrchestrator. It no longer blocks on Prometheus reachability.promClientProviderabstraction:noopPromClientProvider— used whenaddress == "". Always reports unavailable; never constructs a client; never callsRuntimeinfo.cachedPromClientProvider— constructs the client lazily, caches the workingpromv1.APIon success, and rate-limits reachability probes via a mutex +lastProbeAt. Logs the first failure (and the recovery transition) but does not spam.cleanupOrchestrator.apiwas replaced bycleanupOrchestrator.clientProvider;processVersionCleanupQueueItemnow obtains the API viaclientProvider.Get(ctx)and passes it (possiblynil) toevaluateVersionForCleanup.evaluateWorkloadForCleanupaccepts a possibly-nilpromv1.API. Workloads withoutdeletionRulesremain auto-eligible (no Prom access). Workloads withdeletionRulesand an unavailable client are marked not eligible and a single info log entry is emitted naming the version + workload + reason.Tests (
internal/controller/version-monitoring_test.go)TestMonitoringEnv— empty-address case now asserts a non-nil env with emptyaddressand the documented duration defaults.TestGracefulShutdownMonitoringRoutines— extended to cover the no-address case; goroutines must still launch and shut down cleanly on context cancellation.Test_initializeVersionCleanupOrchestrator— orchestrator is always non-nil; reachability is asserted directly against the new provider; new explicit empty-address subtest verifies zeroruntimeinfoHTTP hits.TestVersionCleanupEvaluation_NoPrometheus_NoDeletionRules— empty address, CAV without deletion rules → CAV is deleted.TestVersionCleanupEvaluation_NoPrometheus_WithDeletionRules— empty address, CAV with deletion rules → CAV is not deleted, no requeue, no panic.TestVersionCleanupEvaluation_PromUnreachable— address set but server returns 503 fromruntimeinfo; CAV with deletion rules not deleted, CAV without deletion rules deleted in the same cycle.TestVersionCleanupEvaluation_RuntimeInfoRateLimited— two consecutive probe attempts withinacquireClientRetryDelayproduce at most oneruntimeinforequest on the mock server.internal/controller/testdata/version-monitoring/cav-v1-no-deletion-rules.yamlfor the no-rules scenarios.Documentation
website/content/en/docs/configuration/_index.md—PROMETHEUS_ADDRESSbullet rewritten: the controller still runs the cleanup loop without a Prometheus connection; only versions with nodeletionRulesare cleaned up in that case.website/content/en/docs/usage/version-monitoring.md:deletionRules.Verification
go build ./...✅go test ./...✅go test ./internal/controller/...✅pkg/apis/sme.sap.com/v1alpha1/,pkg/client/, orcrds/, docs updated).Diff stat
Non-goals
sme.sap.com/enable-cleanup-monitoringto"true"or introducing an opt-out value (e.g."false"/"disabled") — tracked separately.ServiceMonitorcreation, or any CR status fields.