Skip to content

Releases: LerianStudio/lib-commons

v6.0.0-beta.1

20 Apr 17:10
4f47f0d

Choose a tag to compare

v6.0.0-beta.1 Pre-release
Pre-release
chore: merge main into develop

Absorbs main's rabbitmq leak-fix (close leaked connections on concurrent
reconnect in EnsureChannelContext). develop already handles the symmetric
channel-swap race, so the merged behaviour is a strict superset of both.

The leak-fix pushed commitChannelState cyclomatic complexity from 11 to 17
(over the repo lint threshold of 16). Split into commitNewConnection and
commitChannelOnExistingConnection — same behaviour, lock discipline
preserved, highest function now at 11.

Verification:
- make build, vet, lint: green
- make test-unit: 5475 tests / 0 failures
- integration tests not run in this pass

v5.0.0-beta.8

20 Apr 16:01
5b10802

Choose a tag to compare

v5.0.0-beta.8 Pre-release
Pre-release
chore: merge main into develop (#448)

* release: lib-commons v5 — systemplane v2, certificate, dlq, idempotency, webhook, ssrf (#435)

* feat(systemplane): add standardization helpers, catalog, and hardened service layer (#404)

* feat(systemplane): add standardization helpers, catalog, and hardened service layer

Adds catalog package with shared key validation and registration for
Postgres and Redis backends. Introduces domain coercion helpers for
type-safe snapshot value conversion (bool, int, float64, duration, string
slice) with overflow/NaN/Inf guards and case-insensitive bool parsing.

Adds fail-closed default auth adapters (deny mutations, allow reads) and
default identity adapters (system identity for internal operations) with
Actor.ID length validation.

Refactors service/manager into reads, writes, and helpers files. Adds
component diff detection for identifying changed settings between
snapshots. Introduces shutdown helpers for ordered systemplane teardown.
Extracts supervisor helpers with overflow-safe phase sorting and
non-mutating tenant ID merge.

Hardens BootstrapConfig.Validate with default case, DelegatingAuthorizer
with unconditional empty-action rejection, cloneSnapshot with nil
TenantSettings preservation, and Swagger MergeInto with tag
deduplication. Includes comprehensive tests.

X-Lerian-Ref: 0x1

* style(tenant-manager): apply lint-fix and gofmt auto-formatting

Sort import groups alphabetically, remove trailing blank line in test,
and normalize struct field comment alignment in Manager.

* fix(systemplane): address all CodeRabbit review findings

- Add sync.RWMutex to backend registry for concurrent safety
- Use snapshot consistently for factory lookup in NewBackendFromConfig
- Extract withRegistrySnapshot test helper to reduce duplication
- Add ErrUnhandledBackend sentinel error in config validation
- Fix EnvVar validation gap when catalog defines no env vars
- Fix float-to-int boundary at 2^63 (>= instead of > for overflow check)
- Accept Secret+RedactMask in KeyDef.Validate (runtime normalizes to Full)
- Add godocs for Actor/TenantID, maxTenantIDLength, whitespace test case
- Add DeepEqual semantics comment on effectiveValueChanged
- Return persisted revision on escalation failure in persistAndApplyWrite
- Assert unconditionally in shutdown_test (covers all-nil-steps case)
- Check key existence before asserting Redacted in snapshot_builder_test
- Use discardFailedCandidate for incremental build error path
- Document concurrency safety invariant in prepareReloadBuild
- Update MergeInto godoc and deduplicate srcArr tags in mergeTags
- Auto-formatting: struct field alignment in tenant-manager packages

* fix(systemplane): address remaining CodeRabbit review findings

- Guard nil init errors in RecordInitError to prevent false positives
- Propagate WriteResult.Revision on escalation failure in PatchConfigs/PatchSettings
- Include Revision and BuiltAt in zero-snapshot check for accurate diff
- Return error instead of silently skipping malformed source tags in swagger merge
- Reject empty-name tags in swagger merge to prevent spec corruption
- Route backend_test.go registry mutations through locked helpers
- Align Actor godoc with whitespace-only fallback behavior

* fix(systemplane): eliminate torn-read window and log cleanup errors

Replace two separate atomic pointers (snapshot + bundle) with a single
atomic.Pointer[supervisorState] so lock-free readers always see a
consistent (snapshot, bundle) pair. This eliminates the torn-read window
where Current() and Snapshot() could return mismatched states.

Record cleanup errors from discard/close paths to the active OTEL span
instead of silently discarding them with _ assignment.

Add 64-bit platform note to intFromFloat64 boundary constant.

---------

Co-authored-by: Jefferson Rodrigues <jeff@lerian.studio>

* docs: sync PROJECT_RULES.md with current codebase state (#417)

- Add missing packages to structure tree: internal/nilcheck, outbox,
  secretsmanager, systemplane (7 sub-dirs), tenant-manager (14 sub-dirs),
  and root files environment.go, security_override.go
- Update enabled linters: remove stale thelper/tparallel, add Tier 1-3
  linters (27 total) matching actual .golangci.yml
- Fix API Invariants heading: v2 -> v4
- Update allowed dependencies table with mongo-driver/v2,
  testcontainers-go, go-sqlmock, goleak, jwt/v5, aws-sdk-go-v2,
  golang.org/x/sync, golang.org/x/text, grpc, protobuf
- Fix security section: replace non-existent SECURE_LOG_FIELDS with
  actual LOG_OBFUSCATION_DISABLED and security.IsSensitiveField()

* feat: add certificate, dlq, idempotency, webhook packages and systemplane enhancements (#418)

* feat(certificate): add thread-safe TLS certificate manager with hot-reload

Provides PEM-based certificate loading (PKCS#8/PKCS#1/EC key parsing order), strict file-permission enforcement (0600), atomic hot-reload via Rotate with expiry and public-key match validation, and tls.Config integration via GetCertificateFunc for transparent certificate rotation without restart.

X-Lerian-Ref: 0x1

* feat(dlq): add Redis-backed dead letter queue with consumer lifecycle

Tenant-scoped Redis keys with exponential backoff (AWS Full Jitter), background consumer with poll-based retry/exhaust lifecycle, non-blocking SCAN for tenant discovery, and pruning for exhausted messages. Nil-safe handler and consumer with functional options for logger, tracer, and metrics.

X-Lerian-Ref: 0x1

* feat(idempotency): add Redis-backed at-most-once request middleware for Fiber

SetNX-based idempotency enforcement with tenant-scoped keys, fail-open on Redis unavailability, cached response replay with Idempotency-Replayed header, 409 Conflict for in-flight duplicates, and automatic key cleanup on handler error to allow client retry.

X-Lerian-Ref: 0x1

* feat(webhook): add outbound webhook delivery with SSRF protection and HMAC signing

Concurrent fan-out delivery to active endpoints with semaphore-capped concurrency, DNS-pinned SSRF protection (validates all resolved IPs against private/loopback/CGNAT/RFC-reserved ranges to eliminate TOCTOU), redirect blocking, HMAC-SHA256 payload signing, encrypted secret support via SecretDecryptor, and exponential backoff retries (non-retryable on 4xx except 429).

X-Lerian-Ref: 0x1

* feat(systemplane): add bootstrap helpers, validation options, and snapshot builder

Adds ApplyKeyDefs for propagating KeyDef behaviors into bootstrap config with auto-configured secret encryption, LoadFromEnvOrDefault for zero-config Postgres fallback, ValidateKeyDefsWithOptions with WithIgnoreFields/WithKnownDeviation for suppressing intentional catalog deviations, SecretStoreConfig validation with base64 key support, and SnapshotFromKeyDefs domain helper.

X-Lerian-Ref: 0x1

* docs: document certificate, dlq, idempotency, webhook packages and systemplane catalog env vars

Updates AGENTS.md API reference, README.md package listings, PROJECT_RULES.md structure and invariant tables, and .env.reference with systemplane catalog environment variables (server TLS, CORS, rate limiting, auth, telemetry, and all shared catalog keys).

X-Lerian-Ref: 0x1

* fix(dlq): resolve modernize lint issues in handler

Use omitzero tag for time.Time struct field (omitempty has no effect on nested structs) and replace manual floor comparison with builtin max.

X-Lerian-Ref: 0x1

* fix: address CodeRabbit review feedback across all new packages

Security (critical):
- SSRF: fail-closed on DNS lookup failure instead of falling back to raw URL
- SSRF: pin to first valid IP (skip unparseable entries) and bracket-wrap bare IPv6
- SSRF test: assert error identity (errors.Is) instead of message substring

DLQ hardening:
- Consumer Run() guards against concurrent invocations (prevents orphaned stopCh)
- retryFunc panic recovery via safeRetryFunc (prevents message loss on panic)
- Source validation (validateKeySegment) on all read/scan APIs, not just Enqueue
- PruneExhaustedMessages distinguishes empty-queue from real errors, propagates both Dequeue and Enqueue failures

Documentation:
- certificate/doc.go: example now handles LoadFromFiles and Rotate errors
- idempotency/doc.go: example now handles redis.New error
- webhook/doc.go: clarifies timestamp is unsigned and insufficient for replay protection
- webhook/errors.go: broadened ErrSSRFBlocked and ErrInvalidURL godoc to match actual scope

Webhook delivery:
- Pre-populate DeliveryResult with EndpointID before goroutine launch (panic-safe)

Systemplane:
- env_or_default tests: explicit t.Setenv(EnvBackend, "") to isolate from ambient env
- ValidateOption nil guard in newValidateConfig to prevent panic
- ValidateKeyDefs godoc: removed misleading "append to returned slice" guidance

X-Lerian-Ref: 0x1

* fix: address second-round CodeRabbit review feedback

- certificate/doc.go: document EC (SEC 1) key fallback alongside PKCS#8 and PKCS#1
- dlq/handler.go: add backslash to validateKeySegment disallowed characters (Redis escape char)
- idempotency/doc.go: clarify tenant-isolation is scoped when tenant is present, global otherwise
- validate.go: correct godoc to reference ValidateKeyDefsWithOptions as the options entry point
- ssrf.go: panic on invalid hardcoded CIDR in init() instead of silently skipping

X-Lerian-Ref: 0x1

* refactor(webhook): replace runtime CIDR parsing with static net.IPNet literals

Eliminates all net.ParseCIDR calls, the self-invoking cgnatBlock initializer, and the init() panic path. SSRF blocklist entries are now compile-time-constructed via a cidr4 helper, so typos surface as test failures rather than startup crashes.

X-Lerian-Ref: 0x1

* fix: address CodeRabbit review findings across certificate, dlq, webhook, idempotency, and systemplane (#420)

* fix: address CodeRabbit review findings across certificate, dlq, webhook, idempotency, and systemplane packages

- certificate: preserve intermediate chain in Rotate via variadic intermediates,
  deep-copy DER chain in TLSCertificate to prevent aliasing, add LoadFromFilesWithChain
- dlq/consumer: fix Run restart after ctx shutdown by clearing stopCh on exit,
  escalate re-enqueue failures to ERROR with metrics (prevents silent message loss)
- webhook: enforce redirect blocking on custom HTTP clients (SSRF protection),
  fix TLS SNI for HTTPS endpoints after DNS pinning via ServerName on cloned transport
- idempotency: use Redis pipeline for atomic response cache + state marker writes,
  correct godoc from at-most-once to best-effort idempotency (fail-open on Redis outage)
- systemplane/domain: add nil-receiver guard to ConfigValue and setting helpers,
  clone mutable defaults in DefaultSnapshotFromKeyDefs to prevent Value/Default aliasing,
  use platform-independent overflow bounds in intFromFloat64 and scaleDurationFloat64
- systemplane/bootstrap: wrap resource-validation errors with backend kind context
- systemplane/catalog: inline containsString to direct slices.Contains call
- systemplane/ports: fix AllowAllAuthorizer godoc to reflect fail-closed behavior,
  add TenantID exact-max-length boundary test
- systemplane/bootstrap: use snapshot() in tests to honor registry mutex contract

* fix: address follow-up CodeRabbit findings on PR #420

- certificate: deep-copy intermediate DER bytes in Rotate to prevent caller mutation
- dlq/consumer: simplify Run() overlap guard to reject when stopCh is non-nil
  (prevents concurrent loops while previous goroutine drains safeProcessOnce);
  lost not-yet-ready messages now count against BatchSize (return true)
- idempotency: combine two pipe.Del calls into single variadic Del(ctx, key, responseKey)
- webhook: enforce minimum TLS 1.2 on cloned transport even when caller config is weaker
- systemplane/domain: add nil-receiver guards to GlobalSettingValue and TenantSettingValue;
  add comment explaining intentional narrower clone scope vs reflection-based version

* fix: address second-round CodeRabbit findings on PR #420

- certificate: deep-copy leaf DER (cert.Raw) in Rotate to prevent aliasing;
  add chain[1:]... usage example to Rotate and LoadFromFilesWithChain godocs
- dlq/consumer: record span error via HandleSpanError on both terminal
  message-loss paths (not-yet-ready re-enqueue failure and retry re-enqueue failure)
- idempotency: correct saveResult godoc from 'atomically' to 'in a single round-trip'
  (Pipeline batches but does not provide MULTI/EXEC transactional atomicity)
- webhook: normalize nil Transport to http.DefaultTransport before type assertion
  in httpsClientForPinnedIP; clone d.client instead of building from scratch
  to preserve Jar and other caller-configured fields

* fix: address third-round CodeRabbit findings on PR #420

Deep-copy leaf certificate in Rotate to prevent aliasing caller-owned
memory, document shared Leaf pointer in TLSCertificate godoc, and
remove high-cardinality composite key from idempotency warning logs.

* fix: add RecordLost to DLQMetrics to distinguish message loss from exhaustion

Adds a dedicated RecordLost method to the DLQMetrics interface so operators
can separately alert on infrastructure-caused message loss (re-enqueue
failures) versus expected exhaustion (max retries reached).

* fix: split multi-key DEL to avoid Redis Cluster CROSSSLOT error in idempotency

Split pipe.Del(ctx, key, responseKey) into two single-key DEL calls
within the same pipeline to prevent CROSSSLOT errors when keys hash
to different Redis Cluster slots.

* fix: address CodeRabbit review findings for PR #419 (#421)

* fix: apply CodeRabbit auto-fixes for PR #419

Address 21 unresolved CodeRabbit review findings across certificate,
systemplane, webhook, DLQ, and idempotency packages:

- certificate: defensive copies in GetCertificate/TLSCertificate, nil-receiver
  safety for TLSCertificate, expanded DaysUntilExpiry godoc
- systemplane: nil-safety guards for SnapSettingInt/SnapSettingBool,
  consolidate redundant state loads in supervisor reload, robust factory
  assertion in backend test, duplicate env var detection in catalog validation
- webhook: redact URL userinfo in logs, versioned HMAC signature format (v1)
  with backward-compatible migration path, DNS-free scheme validation tests,
  preserve original authority for HTTP Host header, defensive filter comment,
  consistent metric reporting with result.Attempts
- dlq: clone sources slice in WithSources, reject negative MaxRetries,
  validate tenant key segment before Redis routing
- idempotency: binary-safe cached response with headers, fail-open error
  handling in handleDuplicate, accurate package documentation

* fix: reduce Enqueue cyclomatic complexity and add missing errors import

- Extract validateEnqueueMessage, stampInitialEnqueue, and
  resolveAndValidateTenant helpers from Handler.Enqueue to bring
  cyclomatic complexity from 18 down to 7
- Add missing "errors" import in idempotency package for errors.Is calls

* fix: replace deprecated Header.VisitAll with Header.All iterator

staticcheck SA1019: fasthttp Header.VisitAll is deprecated in favor of
the range-compatible All() iterator.

* fix: address CodeRabbit round-2 findings on PR #421

- idempotency: use Header.Add for multi-value replay, log unmarshal
  failures, rename shadowed loop variable
- catalog: introduce ValidationResult to separate env var conflicts from
  Mismatch, mark duplicate env vars as ambiguous in index
- webhook: strip URL fragments in sanitizeURL, fix godoc references to
  VerifySignature/VerifySignatureWithFreshness, pin TestComputeHMACv1
  to independently computed expected value

* feat(security/ssrf): add canonical SSRF validation package

Introduce commons/security/ssrf as the single source of truth for SSRF
protection across all Lerian services. Consolidates two internal,
duplicated implementations into one exported package.

New exported API:
- IsBlockedIP(net.IP) / IsBlockedAddr(netip.Addr): IP-level blocking
  with canonical CIDR blocklist (8 ranges + stdlib predicates)
- IsBlockedHostname(hostname): hostname-level blocking for localhost,
  cloud metadata endpoints, .local/.internal/.cluster.local suffixes
- BlockedPrefixes(): returns copy of CIDR blocklist for auditing
- ValidateURL(ctx, url, opts...): scheme + hostname + IP validation
  without DNS resolution
- ResolveAndValidate(ctx, url, opts...): DNS-pinned validation with
  TOCTOU elimination, returns ResolveResult{PinnedURL, Authority,
  SNIHostname}
- Functional options: WithHTTPSOnly, WithAllowPrivateNetwork,
  WithLookupFunc, WithAllowHostname
- Sentinel errors: ErrBlocked, ErrInvalidURL, ErrDNSFailed

Refactored consumers:
- commons/webhook/ssrf.go: resolveAndValidateIP delegates to
  ssrf.ResolveAndValidate, removed duplicated isPrivateIP/CIDR blocklist
- commons/net/http/proxy_validation.go: isUnsafeIP delegates to
  ssrf.IsBlockedIP, removed duplicated blockedProxyPrefixes

Canonicalized on netip.Prefix (modern Go) with net.IP bridge for
legacy callers. All tests hermetic via WithLookupFunc injection.

* docs: document commons/security/ssrf package in AGENTS.md and CLAUDE.md

Add repository shape entry, API invariants section, and other-packages
bullet for the new canonical SSRF validation package.

* style(security/ssrf): fix comment alignment in CIDR blocklist

X-Lerian-Ref: 0x1

* test(webhook): refactor SSRF tests and add mapSSRFError coverage

Remove duplicate test functions (InvalidScheme was duplicate of BlockedSchemes, PrivateIP replaced by simpler BlockedHostname). Add dedicated TestMapSSRFError covering all four sentinel translation branches. Upgrade assert.Error to require.Error for fail-fast on nil errors.

X-Lerian-Ref: 0x1

* fix: apply CodeRabbit auto-fixes for PR #421

* chore(deps): bump google.golang.org/api from 0.272.0 to 0.273.0 (#422)

Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.272.0 to 0.273.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.272.0...v0.273.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-version: 0.273.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump github.com/aws/aws-sdk-go-v2 from 1.41.4 to 1.41.5 (#423)

Bumps [github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2) from 1.41.4 to 1.41.5.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.41.4...v1.41.5)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2
  dependency-version: 1.41.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump cloud.google.com/go/iam from 1.5.3 to 1.6.0 (#425)

Bumps [cloud.google.com/go/iam](https://github.com/googleapis/google-cloud-go) from 1.5.3 to 1.6.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/documentai/CHANGES.md)
- [Commits](https://github.com/googleapis/google-cloud-go/compare/iam/v1.5.3...iap/v1.6.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/iam
  dependency-version: 1.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: harden certificate, dlq, ssrf, idempotency, and webhook paths (#427)

* fix(certificate): harden rotation and chain loading

Prevent nil concrete signers from panicking during rotation and make full certificate chains available for hot-reload validation. This keeps TLS reload paths nil-safe while preserving chain data for callers that need it.

* fix(dlq): default metrics and preserve retried messages

Default DLQ metric recording to a noop implementation so retry paths stay nil-safe, and keep panic-recovered retries from dropping messages. The key helpers move into a dedicated file to keep handler and consumer responsibilities narrower.

* fix(ssrf): harden option handling and blocked ranges

Ignore nil options safely, respect canceled contexts during validation, and reject additional metadata and special-purpose IPv6 targets. This keeps SSRF policy enforcement consistent even when callers pass malformed configuration.

* fix(idempotency): log cache serialization failures

Surface cached-response marshal failures in logs without breaking the request path, and cover PUT so all mutating methods keep the same replay semantics. This makes silent cache misses diagnosable while preserving fail-open behavior.

* fix(webhook): preserve retry diagnostics and signatures

Keep signature helpers isolated from delivery flow, retry 429 responses correctly, and preserve the last transport failure when retries are exhausted. This makes webhook failures easier to debug without changing the public delivery API.

* docs: clarify helper contracts and security notes

Document the updated certificate, DLQ, and idempotency contracts, and record security trade-offs around schema env-var exposure and the remaining proxy transport SSRF gap. This keeps the repo guidance aligned with the code that now ships.

* fix(ssrf): guard nil contexts in URL validation

Fail fast with ErrInvalidURL when callers pass a nil context so URL validation stays nil-safe instead of panicking. Clarify the reverse-proxy transport note to describe the real remaining TOCTOU gap in SSRF resolution.

* fix(ssrf): fail fast on invalid resolve contexts

Mirror the ValidateURL preflight in ResolveAndValidate so nil and canceled contexts return ErrInvalidURL instead of reaching DNS resolution paths. Add regression coverage for both nil and canceled contexts.

* refactor(ssrf): share validator context preflight

Reuse the same nil and canceled context guard across both SSRF validators so behavior stays aligned while keeping ResolveAndValidate below the repository complexity threshold. This preserves the existing ErrInvalidURL wrapping and restores a clean CI pipeline.

* chore(deps): bump github.com/aws/aws-sdk-go-v2/service/secretsmanager (#424)

Bumps [github.com/aws/aws-sdk-go-v2/service/secretsmanager](https://github.com/aws/aws-sdk-go-v2) from 1.41.4 to 1.41.5.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.41.4...v1.41.5)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/secretsmanager
  dependency-version: 1.41.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* refactor(systemplane)!: simplify to flat dual-backend architecture (#434)

* chore(systemplane): scaffold v2 package skeleton

Adds 15 stub files matching the approved v2 public surface. Stubs compile
and return zero-value sentinels so Phase 2 (Postgres) and Phase 3 (MongoDB)
can proceed in parallel against a frozen Store interface. Old systemplane
subpackages remain untouched; they will be removed in Phase 7.

Refs: docs/plans/pure-crunching-noodle.md

* feat(systemplane): implement Postgres Store adapter with LISTEN/NOTIFY

Replaces Phase 1 stubs with the real Postgres-backed Store implementation.
Uses pgx/v5 for both SQL operations and the dedicated LISTEN connection,
with trigger-emitted NOTIFY on every upsert. LISTEN reconnect via
commons/backoff.ExponentialWithJitter; handler panics recovered via
commons/runtime.RecoverAndLogWithContext.

Adds store.ErrClosed sentinel for nil/closed receiver safety.

Integration tests use testcontainers-go.

Refs: docs/plans/pure-crunching-noodle.md

* feat(systemplane): add debouncer and backend-agnostic contract test suite

Debouncer: trailing-edge, per-key, panic-safe via commons/runtime. Nil-safe.
Window=0 disables debouncing (synchronous invocation). Used by Phase 5 Client
to coalesce rapid change-feed events.

Contract suite (systemplanetest.Run): 10 cases asserting Store semantics —
CRUD, subscribe delivery, ctx cancel, close idempotency, namespace isolation.
Invoked by Phase 2 (Postgres) and Phase 3 (MongoDB) integration tests.

Refs: docs/plans/pure-crunching-noodle.md

* feat(systemplane): implement MongoDB Store adapter with change streams and polling

Replaces Phase 1 stubs with the real MongoDB-backed Store. Change-streams
mode (default) requires a replica set; polling mode via WithPollInterval
supports standalone deployments. Handler panics recovered via
commons/runtime.RecoverAndLog; reconnect uses commons/backoff.

Integration tests use testcontainers-go with rs0 replica set.

Refs: docs/plans/pure-crunching-noodle.md

* docs(systemplane): add simplification plan for v2 redesign

Captures the full diagnosis, env-var vs. hot-reload rule, target public API,
delete list, execution phases, and verification strategy that govern the
commons/systemplane v2 simplification. Kept on the feature branch as a
historical record of the decisions behind the implementation.

* feat(systemplane): implement public Client with registry, cache, and OnChange

Client is the integrator: wires the Store interface, debouncer, in-memory
registry, RWMutex-protected value cache, and a per-key subscriber table into
the plan's public surface (NewPostgres, NewMongoDB, Register, Start, Close,
Get, typed accessors, Set, OnChange).

Write-through cache on Set for same-process read consistency; subscribers
fire from the changefeed echo to avoid double-notification. Span helpers
return finish funcs to satisfy spancheck; postgres DDL uses
validated-identifier table names with justified gosec suppressions.

54 unit tests against an in-memory fake Store; race-clean. Coverage 89.7%.

Refs: docs/plans/pure-crunching-noodle.md

* feat(systemplane): add admin subpackage with Fiber HTTP routes

Mount(router, client, opts...) registers three admin routes at a
configurable prefix (default /system):
  GET  :prefix/:namespace       - list namespace entries (redacted)
  GET  :prefix/:namespace/:key  - read one entry (redacted)
  PUT  :prefix/:namespace/:key  - write via Client.Set

Authorization via WithAuthorizer (actions: "read", "write"); actor
via WithActorExtractor for audit trail. Redaction wired through
systemplane.ApplyRedaction using the key's registered RedactPolicy.

Adds two public accessors to *Client: List(namespace) and
KeyRedaction(ns, key). Also adds NewForTesting(TestStore, opts) as an
explicit test-helper entry point for out-of-package tests. TestStore,
TestEntry, and TestEvent are public mirrors of the internal store
types, bridged by a testStoreAdapter. Not a production API.

Exports ApplyRedaction (was unexported applyRedaction with
nolint:unused). Existing tests updated to use the new name.

22 unit tests against an in-memory fake store via NewForTesting.
98.3% coverage for admin package. Zero lint issues.

Refs: docs/plans/pure-crunching-noodle.md

* refactor(systemplane)!: remove legacy packages and update docs

BREAKING CHANGE: the entire legacy systemplane implementation is removed.
No deprecation shim; consumers must migrate to the v2 surface documented
in CLAUDE.md and MIGRATION_MAP.md.

Deletes nine subpackages (adapters, bootstrap, catalog, domain, ports,
registry, service, swagger, testutil) — ~37K LOC combined. What remains
is the v2 public surface (~2K LOC) plus internal backend adapters and
tests. See MIGRATION_MAP.md for the full removed-symbol inventory.

Fixes gosec G201/G202 annotations in the Postgres store to use the
project-standard #nosec format instead of //nolint:gosec. Fixes the
Postgres integration contract suite to use per-subtest table isolation
(atomic counter for unique table names) so leftover data from earlier
subtests does not pollute later ones.

Updates CLAUDE.md with the new API invariants and README.md with a brief
mention. make ci passes (modulo known macOS MongoDB Docker networking
skip, consistent with prior behavior).

Refs: docs/plans/pure-crunching-noodle.md

* fix(systemplane): harden client concurrency and panic safety

Add missing cacheMu lock around default seeding in Start to prevent a race with concurrent reads. Wrap the subscribe goroutine with RecoverAndLog to avoid silent crashes. Replace the bool-guarded unsubscribe closure with sync.Once for thread-safe idempotent teardown. Remove the dead ErrAlreadyStarted sentinel that was no longer reachable after the Start simplification.

X-Lerian-Ref: 0x1

* fix(systemplane): guard store operations after Close

Add isClosed() check to MongoDB List/Get/Set/Subscribe so callers get ErrClosed instead of operating on a shut-down collection handle. Simplify Postgres invokeHandler by removing the redundant nil-logger guard — RecoverAndLogWithContext already handles a nil logger safely.

X-Lerian-Ref: 0x1

* feat(systemplane): surface key descriptions and harden admin routes

Add Description field to ListEntry and a new KeyDescription accessor so the admin HTTP layer can expose human-readable metadata. Change the default authorizer from allow-all to deny-all (secure by default) and replace raw err.Error() messages with generic strings to avoid leaking internal details. Fix lock acquisition order in List (registryMu before cacheMu) to match the rest of the codebase.

X-Lerian-Ref: 0x1

* test(systemplane): replace sleep-based assertions with polling

Swap fragile time.Sleep waits for deadline-based polling loops in changefeed tests, eliminating false negatives under load. Add explicit json.Marshal error checks throughout test helpers. Add nil-input constructor tests for NewPostgres and NewMongoDB.

X-Lerian-Ref: 0x1

* docs(systemplane): align env reference and AGENTS.md with v2 API

Remove stale systemplane/catalog and systemplane/bootstrap references from .env.reference now that v2 uses explicit Go constructors with functional options. Document the new constructor signatures, WithTable/WithCollection/WithPollInterval options, and update reload semantics. Update AGENTS.md sentinel error list to reflect the ErrAlreadyStarted removal and ErrDuplicateKey addition.

X-Lerian-Ref: 0x1

* fix(systemplane): make Start retriable and harden store backends

Replace sync.Once with sync.Mutex state machine in Start() so transient
store failures don't permanently poison the Client. Add retry test.

Store backend fixes:
- Remove unused cancel field from MongoDB Store
- Align Close() nil-receiver to return nil on both backends
- Add compile-time interface check to Postgres Store
- Use Go-side timestamp in Postgres Set (was using SQL now())
- Truncate NOTIFY payload in warn log for defense-in-depth

X-Lerian-Ref: 0x1

* fix(systemplane): improve API correctness and cache consistency

Normalize cached values via JSON roundtrip in Set() so cache type
matches refreshFromStore behavior (prevents int/float64 mismatch).

Also:
- Return ErrValidation for empty namespace/key (was ErrUnknownKey)
- Fix NewForTesting debounce override order (apply default before opts)
- Add TestMount_DefaultAuthorizer_DeniesAll for secure-by-default guard
- Surface authorizer error in admin 403 response message

X-Lerian-Ref: 0x1

* fix(systemplane): update docs, comments, and project rules

Fix doc.go 'lock-free' claim (reads are read-locked, not lock-free).
Expand contract test sleep comment with rationale.
Update PROJECT_RULES.md directory tree and remove stale go-sqlmock dep.

X-Lerian-Ref: 0x1

* refactor(systemplane): replace BSON value wrapping with plain JSON string storage

Eliminates the jsonToBSONRaw() conversion layer entirely. Values are now stored as plain JSON strings in MongoDB, which removes the {"v":...} envelope that caused ChangeStreams to deliver wrapped values instead of raw JSON on subscription callbacks.

X-Lerian-Ref: 0x1

* fix(systemplane): default zero UpdatedAt to current time in Postgres Set

When callers pass a zero-value UpdatedAt, Set() now defaults to time.Now().UTC() instead of passing the zero time to SQL, which caused integration tests to fail with unexpected timestamp values.

X-Lerian-Ref: 0x1

* style(systemplane): align formatting and remove stale nolint directive

Normalizes struct field alignment across client and admin files, removes a //nolint:govet comment that is no longer needed after the ctx-shadow was resolved, and adds blank lines for readability.

X-Lerian-Ref: 0x1

* fix(systemplane): apply CodeRabbit auto-fixes for PR #434

- Reject PUT {} with 400 instead of writing nil to systemplane keys;
  use json.RawMessage to distinguish missing value from explicit null
- Document that Mount defaults to deny-all until WithAuthorizer is supplied
- Reclassify secrets, broker DSNs, and exchange names as bootstrap-only
  config in .env.reference (not hot-reloadable via systemplane)

* fix(systemplane): apply CodeRabbit auto-fixes (round 2)

- Soften absolute hot-reload guarantees for Postgres pool vars in
  .env.reference to conditional language reflecting service-level opt-in
- Add stored-value nil assertion to TestPut_ExplicitNullValue

* fix: apply CodeRabbit auto-fixes for PR #435 (#436)

* fix: apply CodeRabbit auto-fixes for PR #435

Certificate:
- Return detached (deep-copied) public key from PublicKey() for concurrency safety
- Validate cert NotBefore/NotAfter in LoadFromFiles to match advertised behavior
- Narrow nil-safety doc to scope claim to read helpers, call out exceptions

DLQ:
- Fix head-of-line blocking: drainSource now rotates future-dated messages
  and continues processing ready items in the same cycle
- Add validateSource() helper and enforce non-empty source in all exported ops
- Fail closed on invalid tenant IDs instead of falling back to global queue

Idempotency:
- Replace hand-rolled fiber.Map responses with shared ErrorResponse contract
- Bypass idempotency (fail-open) when tenant context is missing
- Don't cache 5xx responses as idempotent success; delete keys to allow retry

SSRF:
- Normalize trailing DNS root label before hostname blocklist checks

Docs:
- AGENTS.md: describe webhook SSRF as delegation to commons/security/ssrf
- MIGRATION_MAP.md: inline canonical API surface, remove CLAUDE.md reference

* refactor(certificate): extract parseCertPEM and parseKeyFile to reduce cyclomatic complexity

loadFromFiles had cyclomatic complexity 18 (threshold 16) after adding
NotBefore/NotAfter validation. Extract two helpers:
- parseCertPEM: PEM decoding, chain parsing, lifetime validation (complexity 8)
- parseKeyFile: permission check, PEM decoding, key parsing cascade (complexity 8)
loadFromFiles drops to complexity 4.

Also includes minor lint-fix formatting in idempotency.go.

* chore!: bump module version from v4 to v5

BREAKING CHANGE: module path changed from
github.com/LerianStudio/lib-commons/v4 to
github.com/LerianStudio/lib-commons/v5.

All internal imports, go.mod, documentation, and .goreleaser.yml
updated. 218 files changed (529 symmetric replacements).

* fix: apply CodeRabbit auto-fixes (round 2) for PR #436

Certificate:
- Extract generateExpiredCert() test helper to eliminate duplication
- Document nil-receiver and deep-clone behavior on PublicKey()

DLQ:
- Use require.Len before indexing slice to prevent panic on failure
- Cap per-key rotation at BatchSize to prevent starvation of other keys

Idempotency:
- Document tenant-context bypass on exported Check() contract
- Update retry-path comments/logs to reflect 5xx response handling

SSRF:
- Use TrimRight instead of TrimSuffix to strip all trailing dots

* fix: apply CodeRabbit auto-fixes (round 3) for PR #436

- Use min() builtin for DLQ rotation cap instead of manual if/else
- Remove WARN log on documented tenant-bypass path in idempotency
- Bump Go toolchain from 1.25.7 to 1.25.9 (9 stdlib CVE fixes)
- Update PROJECT_RULES.md tree label from (v4) to (v5)
- Update MIGRATION_MAP.md title from v3->v4 to v3->v5

* fix: apply CodeRabbit auto-fixes (round 4) for PR #436

- Restrict idempotency to positive method allowlist (POST/PUT/PATCH/DELETE)
  instead of negative bypass (GET/HEAD/OPTIONS), closing TRACE/CONNECT gap
- Update MIGRATION_MAP.md: clarify v3→v5 scope, rename v4 column headers
  and section labels to v5 throughout

* fix: apply CodeRabbit auto-fixes for PR #435 (#439)

- Migrate proxy transport to ssrf.ResolveAndValidate, eliminating TOCTOU window
- Block hostnames that normalize to empty values (e.g., ".", "..")
- Fix DLQ batch cap bypass when all tenant queues have future-dated heads
- Normalize dequeued message Source to the authoritative queue name
- Restore tenant context before re-enqueueing during DLQ prune
- Fix potential OOB panic in truncateString with invalid UTF-8
- Clarify TLSCertificate PrivateKey sharing in doc comment
- Use keyStateProcessing constant in idempotency test
- Add maintenance comments linking expectedPrefixCount to blockedPrefixes
- Add test for blocked IP in middle of resolved IP list

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jefferson Rodrigues <jeff@lerian.studio>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix(rabbitmq): close leaked connections on concurrent reconnect in EnsureChannelContext

* fix: copy Fiber context strings before c.Next() to prevent UnsafeString race

Fiber v2 returns strings via utils.UnsafeString pointing directly into
fasthttp's RequestCtx buffer. When c.Method(), c.OriginalURL(), c.Protocol(),
c.Hostname(), and c.Get(HeaderUserAgent) are read after c.Next(), the
underlying buffer may already be recycled for the next request, causing
corrupted span attributes (e.g. GET→GETT, POST→POS).

Capture all Fiber context string values into heap-owned local variables
before c.Next() using string([]byte(...)) to force a safe copy.

Requested-by: @qnen

* refactor(rabbitmq): extract helper from EnsureChannelContext to reduce complexity

Cyclomatic complexity of EnsureChannelContext dropped from 19 to 13 by extracting the post-dial race-detection branch into commitChannelState. No behavior change.

* perf(tenant-manager/core): remove alloc + regex from tenant ID hot path

IsValidTenantID and GetTenantIDContext both sit on the systemplane
GetForTenant hot path and account for the AC15 sub-microsecond budget.
Profiling the AC15 perf gate under noisy CI conditions showed two fixes:

- GetTenantIDContext allocated 16 B/op because the inlined nonNilContext
  helper caused escape analysis to spill context.Background() to heap on
  every call, even when ctx was non-nil. Handling nil explicitly in the
  read path keeps the function allocation-free. Plus the context key
  is now a pointer value instead of a struct value, so ctx.Value lookups
  no longer box the key.

- IsValidTenantID used regexp.MatchString against a pattern that is
  purely ASCII-ranged. The onePass-DFA walker accounted for ~60% of
  total CPU in the benchmark. Replaced with a byte-loop; semantics are
  pinned by the existing TestIsValidTenantID table test.

Measured on Apple M5 Max, BenchmarkGetForTenant_Hit: 145 ns/op, 16 B/op,
1 alloc/op → 43 ns/op, 0 B/op, 0 allocs/op. Miss_Eager: 141 ns/op →
42 ns/op, same alloc delta.

* fix: apply CodeRabbit auto-fixes

- rabbitmq: guard !newConnection branch in commitChannelState against
  concurrent connection swap; close orphaned channel when rc.Connection
  no longer matches snap.existingConn or is closed.
- rabbitmq tests: buffer dialStarted chan so the first dialer never
  drops the signal; capture EnsureChannelContext errors from concurrent
  goroutines and assert all nil instead of dropping them.
- docs/plans/pure-crunching-noodle.md: mark the Proposed Architecture
  snippets as non-normative pseudocode pointing at CLAUDE.md/
  MIGRATION_MAP.md; correct "Created (13 files)" to 15 to match the
  list; collapse the Execution Phases roadmap into a Status: Shipped
  section.

* test(rabbitmq): bound the dialStarted wait to fail fast

Replace the bare <-dialStarted receive in the concurrent-reconnect
subtest with a 5s bounded select. If a future refactor of
EnsureChannelContext changes the flow such that no goroutine reaches
the dial phase, the subtest now fails with a clear message instead
of hanging until the go test global timeout.

* fix: apply CodeRabbit auto-fixes

- Close replaced channel in same-connection reopen path (rabbitmq.go)
- Strengthen concurrent reconnect test barrier (rabbitmq_test.go)
- Add historical notice to systemplane plan document

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jefferson Rodrigues <jeff@lerian.studio>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Clara Tersi <mclara.tersi@gmail.com>
Co-authored-by: Gandalf <gandalf@lerian.studio>
Co-authored-by: Gabriel Brecci <34200450+qnen@users.noreply.github.com>

v5.0.0-beta.7

20 Apr 14:53
ee9bd6a

Choose a tag to compare

v5.0.0-beta.7 Pre-release
Pre-release
chore: merge main into develop + reduce rabbitmq.EnsureChannelContext…

v5.0.0-beta.6

20 Apr 12:50
45f2c19

Choose a tag to compare

v5.0.0-beta.6 Pre-release
Pre-release
feat(systemplane): tenant-scoped runtime configuration (#445)

* docs(systemplane): add pre-dev artifacts for tenant-scoped keys

Pre-development planning artifacts for the tenant-scoped systemplane
feature: research (codebase + best-practices + frameworks), PRD, TRD,
task breakdown, and delivery roadmap.

Feature scope: additive v5 API growth adding six Client methods
(RegisterTenantScoped, GetForTenant, SetForTenant, DeleteForTenant,
ListTenantsForKey, OnTenantChange) plus five typed accessor mirrors,
three admin HTTP routes, and backward-compatible Postgres/MongoDB
schema evolution under a '_global' sentinel.

* feat(systemplane/T-001): add tenant method signatures to Store + TestStore

Extends the internal store.Store interface with five additive tenant
methods (GetTenantValue, SetTenantValue, DeleteTenantValue,
ListTenantValues, ListTenantsForKey). Entry and Event gain a TenantID
string field placed after Key to mirror the composite-key identity
(namespace, key, tenant_id) that Tasks 3/4 will use as the unique index.

client_testing.go's TestStore mirror grows the same five signatures;
testStoreAdapter propagates TenantID both directions across every
existing method's struct conversion. Two collateral fakeStore test
helpers (client_test.go, admin/admin_test.go) receive no-op tenant
stubs so they continue satisfying the widened interface.

Backend implementations in internal/postgres and internal/mongodb carry
stub methods returning 'not implemented — task 3' / 'task 4' errors,
preserving the compile-time 'var _ store.Store = (*Store)(nil)'
assertion until the real implementations land in Tasks 3 and 4.

A single smoke test (TestStoreAdapter_TenantIDRoundtrips) pins the
TenantID round-trip across Set, Get, List, and Subscribe.

Foundation task — Tasks 2, 3, 4, 5 all consume this interface shape.

Part of: docs/pre-dev/tenant-scoped-systemplane/

* feat(systemplane/T-002): client scaffolding for tenant-scoped keys

Extends Client with the state needed to manage tenant-scoped overrides:
- tenantScopedRegistry (marks which keys accept tenant overrides)
- tenantCache (interface + eager map impl + bounded LRU impl)
- tenantSubsMu / tenantSubscribers / nextTenantSubID (OnTenantChange scaffolding)
- tenantLoadMode (eager | lazy, driven by WithLazyTenantLoad option)

WithLazyTenantLoad(maxEntries) is the opt-in Option for large-tenant
deployments; a non-positive maxEntries falls back to eager mode, matching
the WithDebounce / WithPollInterval convention of "non-positive = disabled."

Three sentinels land in errors.go for the write/read path Task 5 will
flesh out: ErrMissingTenantContext, ErrInvalidTenantID, and
ErrTenantScopeNotRegistered.

RegisterTenantScoped is the only Client method added in this task; it
mirrors Register's pre-Start / duplicate-check / validator-apply semantics
and atomically inserts into both registry and tenantScopedRegistry under
registryMu, then seeds the legacy cache with the default value under
cacheMu. Pre-Start reads for tenant-scoped keys therefore return the
default without a code path through the tenant miss logic.

The remaining tenant Client methods (SetForTenant, GetForTenant,
DeleteForTenant, ListTenantsForKey, OnTenantChange, typed accessors) are
deliberately deferred to Task 5 rather than stubbed here — stubs would
require choosing the UnsubscribeFunc vs bare-func return shape which is
Task 5's responsibility, and the error-returning stubs would be noise
for reviewers as they'd be replaced immediately in Task 5.

Unused-field linter false-positives are silenced with //nolint:unused
comments carrying explicit "consumed by Task 5" attribution; the
comments are removed when Task 5 lands the dispatch code.

tenantCache is split into tenant_cache.go (interface + eager) and
tenant_cache_lru.go (bounded LRU via container/list) to respect the
300-LOC per-file guardrail; both files share the caller-holds-lock
contract — cacheMu guards every op, no internal synchronization.

13 unit tests pin AC2 behavior:
- RegisterTenantScoped pre-Start / post-Start / duplicate / nil-receiver
- WithLazyTenantLoad eager fallback on non-positive maxEntries
- LRU eviction semantics (oldest-out, promote-on-get, update-in-place)
- Eager cache basic ops + tenant isolation

Part of: docs/pre-dev/tenant-scoped-systemplane/

* feat(systemplane/T-003): Postgres tenant storage + DDL evolution + trigger DELETE

Replaces the Task 1 stubs in internal/postgres with real implementations
of the five tenant-scoped Store methods, evolves the schema additively,
and rewrites the NOTIFY trigger to fire on DELETE with tenant_id in the
payload.

Schema evolution (in ensureSchema, same transaction as existing CREATE TABLE):

1. ADD COLUMN IF NOT EXISTS tenant_id TEXT NOT NULL DEFAULT '_global'
   — additive column; existing rows land with the sentinel via the DEFAULT.
2. UPDATE ... SET tenant_id = '_global' WHERE tenant_id IS NULL OR empty
   — defensive backfill for rows that may have been created before the
   DEFAULT was in place; idempotent.
3. DROP CONSTRAINT IF EXISTS <table>_pkey
   — removes the legacy (namespace, key) PK; the DROP is safe under
   concurrent writes because each individual statement is transactional
   and the replacement CREATE UNIQUE INDEX enforces the new invariant
   before the transaction commits.
4. CREATE UNIQUE INDEX IF NOT EXISTS <table>_pkey_v2
       ON <table> (namespace, key, tenant_id)
   — enforces uniqueness under the widened tuple; becomes the ON CONFLICT
   target for Set and SetTenantValue.

The sentinel '_global' is safe because commons/tenant-manager/core/validation.go
enforces a regex ^[a-zA-Z0-9][a-zA-Z0-9_-]*$ on tenant IDs — the leading
underscore guarantees zero collision.

Trigger rewrite:

- CREATE OR REPLACE FUNCTION systemplane_notify() branches on TG_OP; DELETE
  reads OLD.*, INSERT/UPDATE read NEW.*. pg_notify payload now carries
  tenant_id in every case.
- CREATE TRIGGER extended from AFTER INSERT OR UPDATE to AFTER INSERT OR
  UPDATE OR DELETE. DROP TRIGGER IF EXISTS + CREATE TRIGGER stays idempotent.

Five new Store methods mirror the existing span/error-wrap pattern:

- GetTenantValue  — filters namespace + key + tenant_id; returns (Entry{}, false, nil)
                     on sql.ErrNoRows
- SetTenantValue  — INSERT ... ON CONFLICT (namespace, key, tenant_id) DO UPDATE;
                     tenantID argument is authoritative, e.TenantID is ignored
- DeleteTenantValue — straight DELETE; actor is logged as span attribute
- ListTenantValues — returns every row including _global (caller filters)
- ListTenantsForKey — SELECT DISTINCT tenant_id ... WHERE tenant_id <> '_global'
                       ORDER BY tenant_id; empty slice (not nil) when none

Existing method backward-compat invariants:

- Set now writes tenant_id='_global' explicitly and uses the composite ON CONFLICT
- Get filters tenant_id='_global' — PRD AC1: Get must ignore tenant overrides
- List filters tenant_id='_global' — TRD §4.5: List hydrates globals at Start;
  tenant rows are delivered via ListTenantValues instead

The notifyPayload struct gains TenantID with json:"tenant_id" tag; the
parser is backward-compatible — a payload without tenant_id decodes with
TenantID=="" (no error) to survive the rollout window where the column
exists but the trigger may not yet emit the new shape.

Four unit tests pin the parser: with-tenant, _global sentinel pass-through,
legacy-no-tenant, and invalid-JSON error. Integration tests deferred to Task 8.

File-size layout (300-LOC guardrail):
- postgres.go           499 — lifecycle + globals + Subscribe + helpers
- postgres_schema.go    150 — DDL bootstrap (new)
- postgres_tenant.go    289 — five tenant methods (new)
- postgres_test.go       82 — parser unit tests (new)

Part of: docs/pre-dev/tenant-scoped-systemplane/

* feat(systemplane/T-004): MongoDB tenant storage + compound _id migration + delete-event support

Replaces the Task 1 stubs in internal/mongodb with real implementations
of the five tenant-scoped Store methods, migrates the collection to a
compound BSON document _id, and extends change-stream + polling paths
to surface tenant_id on every event including deletes.

Key design decision: compound BSON document _id, not string _id

The task prompt suggested a string-encoded _id ("namespace/key/tenant_id"
with / separator), relying on the tenant-manager regex forbidding / in
tenant IDs. But systemplane namespaces and keys are free-form: existing
usage already includes "log.level" and "tenant:acme"; nothing today
forbids a future key from containing /. A string _id with a separator
is a latent landmine.

Compound BSON document _id — {namespace, key, tenant_id} — solves this
cleanly: MongoDB natively handles compound _id, the change-stream
documentKey._id on deletes is self-describing, and decoding into a
typed compoundID struct is zero-parse-risk. No separator assumption is
made about any identifier.

Migration (ensureSchema, four idempotent passes):

1. Backfill — set tenant_id="_global" where the field is absent
2. _id rewrite — find docs where _id is ObjectId, insert a new doc with
   compound _id and delete the ObjectId row. Idempotent on retry: a
   duplicate-key error on re-insert indicates a partial prior run and
   proceeds to delete the orphan ObjectId row.
3. Legacy index drop — DropOne("namespace_1_key_1"); IndexNotFound (code
   27) is swallowed, any other error propagates. isIndexNotFoundErr
   matches both CommandError code and a fallback string for driver-
   version robustness.
4. Compound unique index — (namespace, key, tenant_id), unique. Idempotent
   under same key signature. Redundant with _id uniqueness but serves as
   a natural covering index for ListTenantsForKey's Distinct call.

Deployment note (to be captured in Task 9's migration doc): on a
production cluster, step 2 is O(N) sequential without a transaction. A
crash mid-migration leaves mixed _id shapes; the next New() call heals
on retry. No data loss; possibly duplicate-key warnings on re-insert.

Change-stream delete handling:

- $match pipeline extended to include "delete" alongside insert/update/replace.
- changeEvent.DocumentKey.ID decodes the compound _id the server always
  sends on deletes.
- extractEvent branches on operationType: "delete" reads DocumentKey.ID;
  everything else reads FullDocument.
- Defensive fallback: non-delete events with empty fullDocument.tenant_id
  (pre-migration legacy rows) fall back to "_global" rather than drop.

Polling-mode parity:

- pollChanges surfaces doc.TenantID on emitted events; empty tenant_id
  falls back to "_global" for symmetry with the change-stream path.
- Delete detection: polling has no native delete signal. Between ticks,
  deletes are invisible. Documented explicitly in subscribePoll godoc —
  consumers that need delete visibility must use change-streams.

Existing method backward-compat invariants (per TRD §9):

- Set unconditionally writes tenant_id="_global" regardless of
  e.TenantID; upserts by compound _id. Deliberate API narrowing:
  tenant writes must go through SetTenantValue.
- Get unconditionally filters tenant_id="_global". Tenant rows invisible.
- List unconditionally filters tenant_id="_global". Tenant rows visible
  only via ListTenantValues.

Five new Store methods use mongo-driver/v2 patterns: FindOne / UpdateOne
with Upsert / DeleteOne / Find (sorted) / Distinct. OTEL spans carry
tenant.id attribute consistent with the Postgres implementation.

Tests (23 unit tests under //go:build unit, all < 100ms total):

- extractEvent delete/insert/update paths including legacy tenant_id=""
- BSON wire-shape round-trip for delete + insert events (end-to-end
  codec verification against the exact shape mongo-driver emits)
- compound _id round-trip + server-shape decode
- entryDoc → store.Entry conversion tenant_id propagation
- isIndexNotFoundErr covers CommandError code 27 + string-match fallback
- legacy namespace_1_key_1 index name regression pin

File-size layout (300-LOC guardrail):
- mongodb.go                385 — lifecycle + globals + helpers (down from 553)
- mongodb_tenant.go         284 — five tenant methods (new)
- mongodb_changestream.go   311 — Subscribe + polling + extractEvent (new)
- mongodb_migration.go      245 — backfill + _id rewrite + index swap (new)
- mongodb_test.go           380 — 23 unit tests (new)

Part of: docs/pre-dev/tenant-scoped-systemplane/

* feat(systemplane/T-005): tenant Client dataflow — read/write/delete/subscribe

Binds Task 2's Client scaffolding to Tasks 3/4's backend methods, making
the tenant-scoped feature work end-to-end. Adds SetForTenant,
GetForTenant (eager + lazy branches), DeleteForTenant, ListTenantsForKey,
OnTenantChange, five typed accessor mirrors, changefeed routing, and
eager-mode tenant hydration at Start.

Changefeed routing (the critical correctness path)

onEvent's composite debounce key widens from (namespace, key) to
(namespace, key, tenant_id) using U+001F as the separator. This is
mandatory from day one — tenant-A and tenant-B writes to the same key
in the same 100ms window cannot share a timer slot; the old key would
have silently coalesced them and dropped one tenant's write.

refreshFromStore splits into:
- refreshGlobalFromStore (tenant_id == "_global") — updates legacy cache,
  fires OnChange subscribers. Preserves PRD AC1 + AC8: Get(...) returns
  only globals; OnChange never fires on tenant writes.
- refreshTenantFromStore (tenant_id != "_global") — calls
  store.GetTenantValue, updates tenantCache, fires OnTenantChange. On
  store miss (the delete path), deletes the cache entry and fires
  OnTenantChange with the registered default (PRD AC9).

The routing decision is a pure string comparison against sentinelGlobal.
No heuristics, no tenant-id regex; the invariant is: every event carries
a tenant_id, and "_global" is the only value that means "fire OnChange."

Write path (SetForTenant)

Mirrors legacy Set's discipline exactly:
1. extract tenantID from ctx; fail-closed on missing/invalid (AC4)
2. registry check (unknown key → ErrUnknownKey; not tenant-scoped →
   ErrTenantScopeNotRegistered)
3. apply validator (same chain as Set; AC6)
4. json.Marshal the value
5. store.SetTenantValue
6. write-through cache update with JSON round-trip canonicalization
7. return nil — subscribers fire ONLY from changefeed echo, never
   synchronously (set.go:18-21 invariant preserved)

Read path (GetForTenant)

Eager mode: cacheMu.RLock → tenantCache.get → fall through to legacy
cache → fall through to default. Never errors on "no override" — always
returns some value.

Lazy mode: same except on tenantCache miss, single-flight
store.GetTenantValue with 5s timeout, populate LRU under cacheMu.Lock.
Concurrency subtlety: tenantCacheLRU.get mutates (MoveToFront for MRU
promotion), so lazy mode takes a write lock even on reads. Eager mode
can use RLock because the map impl is truly read-only on get.

Delete path (DeleteForTenant)

Removes the row via store.DeleteTenantValue, updates tenantCache locally
for same-process read consistency, then relies on the changefeed echo
to fire OnTenantChange with the registered default. No synchronous
subscriber fire.

Eager-mode Start hydration (TRD §4.5)

After existing global hydration, calls store.ListTenantValues, filters
out "_global" rows (already loaded), and populates tenantCache for
registered tenant-scoped keys. Rows for unregistered or non-tenant-
scoped keys are skipped with a warn log (data-integrity signal but
non-fatal, matching the existing loop in client.go:201-208).

Lazy mode skips this pass; LRU stays cold until first GetForTenant miss.

OnTenantChange + fireTenantSubscribers

Mirrors the OnChange / fireSubscribers pattern in onchange.go:
- tenantSubsMu.Lock() for register/unregister
- RLock + slice-copy + RUnlock + invoke under runtime.RecoverAndLog for
  dispatch, so subscriber callbacks can safely unsubscribe themselves
  without deadlock

Typed accessors fail-closed (D8)

GetStringForTenant, GetIntForTenant, GetBoolForTenant, GetFloat64ForTenant,
GetDurationForTenant all return (zero, err) on missing ctx / invalid
tenant / unregistered key. They cannot silently collapse to a shared
global the way legacy typed accessors do — "missing tenant" is not a
valid read state for a tenant-scoped key.

Task 2 //nolint:unused comments removed

All four scaffolding fields (tenantSubsMu, nextTenantSubID,
tenantSubscription.id, tenantSubscription.fn) are now consumed by
OnTenantChange + fireTenantSubscribers. The lint suppressions Task 2
had to carry are gone.

Collateral: legacy fakeStore in client_test.go

Pre-Task-1 fakeStore synthesized events with empty Event.TenantID.
After Task 5's routing widened, those events went to refreshTenantFromStore
(which rejects tenant_id==""). Changed fakeStore.Set and
simulateExternalChange to emit TenantID: "_global" — matching what real
Postgres/MongoDB backends emit post-Task-3/4. No behavioral drift; the
fake now has parity with production backends.

File-size layout (300-LOC guardrail)

- client.go                  446 (was 470; moved refresh bodies out)
- refresh.go                 265 (new — refreshFromStoreRouted split)
- tenant_scoped.go           480 (grew from 105; split candidate flagged for future)
- tenant_storage.go          158 (new — span + error-wrap dispatch layer)
- tenant_onchange.go         133 (new — OnTenantChange + fireTenantSubscribers)
- tenant_scoped_accessors.go 118 (new — five typed accessor mirrors)
- tenant_scoped_smoke_test.go 396 (new — 6 smoke tests)

Smoke tests (Task 7 owns the comprehensive suite)

Six smoke tests pin the happy paths and the critical AC8 regression:
- TestSetForTenant_HappyPath
- TestGetForTenant_FallsThroughToGlobal
- TestDeleteForTenant_RevertsToDefault
- TestListTenantsForKey_SortsAndDedupes
- TestOnTenantChange_FiresOnTenantWrite_OnChangeStaysSilent  (AC8 pin)
- TestOnTenantChange_FiresOnDelete                            (AC9 pin)

All pass under -race. 143 tests across the systemplane package, 4911
across the whole module, zero lint issues, zero vet findings.

Part of: docs/pre-dev/tenant-scoped-systemplane/

* style(ssrf): move nolint:gochecknoglobals directive below the doc comment

Linter-preferred placement: directives attach to the var declaration,
doc comment stays a single contiguous block. Pure cosmetic rearrangement,
no behavior change.

* feat(systemplane/T-006): admin HTTP tenant routes + WithTenantAuthorizer

Adds three additive admin routes for tenant-scoped overrides under the
existing prefix (default /system):

  GET    :prefix/:namespace/:key/tenants               — list tenants with overrides
  PUT    :prefix/:namespace/:key/tenants/:tenantID     — upsert a tenant override
  DELETE :prefix/:namespace/:key/tenants/:tenantID     — remove a tenant override

Plus WithTenantAuthorizer(fn) MountOption with default-deny escalation:

If WithTenantAuthorizer is NOT set, tenant routes default to 403 for all
requests — they do NOT fall back to the legacy WithAuthorizer hook. This
is the explicit Option A recommendation from TRD §7.1: the legacy
authorizer does not know about tenants, so silently routing tenant-ID-
carrying requests through it would let a service with only WithAuthorizer
configured accept tenant writes it was never authorized to handle.

Consumers that currently use WithAuthorizer should upgrade to
WithTenantAuthorizer for tenant routes; the legacy global routes
(GET/GET/PUT on :namespace/:key) continue to use WithAuthorizer unchanged.

Handler order of operations (security-relevant)

Tenant-ID validation runs BEFORE authorization in every handler:

  1. validateTenantIDParam(:tenantID) — rejects empty, "_global" sentinel,
     and anything that fails core.IsValidTenantID. Returns 400.
  2. cfg.tenantAuthorizer(c, action, tenantID) — enforces policy. Returns 403.

Validating first means 400 for malformed IDs regardless of auth state.
If we authorized first, an unauthorized caller could probe the tenant-ID
regex via 400-vs-403 response-code differences — a low-bandwidth but real
side-channel leak.

Sentinel → HTTP mapping (extracted into shared mapSentinelErr helper)

  ErrUnknownKey               → 400 unknown_key              (preserves legacy behavior)
  ErrValidation               → 400 validation_error
  ErrMissingTenantContext     → 400 missing_tenant_context   (defensive — never reached in practice)
  ErrInvalidTenantID          → 400 invalid_tenant_id
  ErrTenantScopeNotRegistered → 400 tenant_scope_not_registered
  ErrNotStarted / ErrClosed   → 503 service_unavailable
  (anything else)             → 500 internal_error           (no detail leakage)

errors.Is-based dispatch, not string matching.

PUT response body includes the post-write redaction-applied value

handlePutTenant reads back via GetForTenant after the Set succeeds and
applies any registered KeyRedaction policy. This guarantees the response
shape matches what a subsequent GetForTenant would return, including
JSON round-trip canonicalization that the Client's write-through cache
performs. Caller submits {"value": 0.05}; response echoes 0.05 after the
same canonicalization path the cache took.

fakeStore upgrade

admin_test.go's fakeStore (Task 1 stubs) is upgraded to a real tenant-
aware store: tenantRows map keyed by (tenantID, ns, key), fire() helper
snapshotting subscribers under lock before dispatching, subscribedCh
closed on first Subscribe so tenant tests don't race the Client's Start
goroutine. Additive change — all 26 pre-existing tests continue to pass.

14 new tests land (all pass under -race)

Happy paths + default-deny + invalid-ID + unknown-key + non-tenant-scoped
+ sort/dedup + empty-list JSON wire format + authorizer-receives-tenantID:
- TestPutTenant_HappyPath
- TestPutTenant_MissingAuthorizer_Returns403
- TestListTenants_MissingAuthorizer_Returns403      (default-deny on GET)
- TestPutTenant_InvalidTenantID_Returns400          (_global sentinel)
- TestPutTenant_InvalidTenantID_SpecialChars        (leading hyphen)
- TestPutTenant_UnknownKey_Returns400
- TestPutTenant_NonTenantScopedKey                  (ErrTenantScopeNotRegistered)
- TestDeleteTenant_HappyPath                        (PUT→DELETE→fall-through)
- TestDeleteTenant_Idempotent
- TestListTenants_ReturnsSortedList
- TestListTenants_EmptyList                         (wire format [] not null)
- TestPutTenant_TenantAuthorizerReceivesTenantID
- TestListTenants_AuthorizerReceivesEmptyTenantID
- TestPutTenant_AppliesRedaction

File-size layout (300-LOC guardrail)

- admin.go         405 — Mount + mountConfig + legacy 3 routes + shared helpers
- admin_tenant.go  241 — 3 tenant handlers + validateTenantIDParam
- admin_test.go   1686 — test fixtures + 40 tests

admin.go is slightly over the 300 soft cap; further splitting would
fragment the tight handler↔error-mapping relationship. Judged the right
trade-off here.

40 admin tests pass under -race; 157 tests across the systemplane subtree;
0 lint issues.

Part of: docs/pre-dev/tenant-scoped-systemplane/

* test(systemplane/T-007): api_compat pin + comprehensive Client-layer unit suite

Adds the compile-time v5 API signature pin and the comprehensive unit
test suite covering every PRD acceptance criterion at the Client layer
via NewForTesting + in-memory TestStore. Integration-level verification
lives in Task 8; this commit is the negative-path and invariant-pin
backbone that runs in <3s under -race.

api_compat_test.go (compile-only pin)

Package systemplane_test. Every v5 public symbol is pinned via
var _ = ... and var _ func(...) = (*Client).Method assertions:

- Client: Register, Get, Set, OnChange, List, KeyRedaction, Start, Close,
  typed accessors (GetString / GetInt / GetBool / GetFloat64 / GetDuration)
- Client tenant surface: RegisterTenantScoped, GetForTenant, SetForTenant,
  DeleteForTenant, ListTenantsForKey, OnTenantChange, typed tenant
  accessors (GetStringForTenant / ... / GetDurationForTenant)
- Constructors: NewPostgres, NewMongoDB, NewForTesting
- Options: WithLogger, WithTelemetry, WithListenChannel, WithPollInterval,
  WithDebounce, WithCollection, WithTable, WithLazyTenantLoad
- KeyOptions: WithDescription, WithValidator, WithRedaction
- Sentinels (existing + new): ErrClosed, ErrNotStarted, ErrRegisterAfterStart,
  ErrUnknownKey, ErrValidation, ErrDuplicateKey, ErrMissingTenantContext,
  ErrInvalidTenantID, ErrTenantScopeNotRegistered
- Admin: Mount, WithPathPrefix, WithAuthorizer, WithTenantAuthorizer,
  WithActorExtractor

Any rename, signature change, or accidental unexport breaks the compile.
No test runtime overhead — pure type assertions.

tenant_scoped_test.go (~930 LOC, 29 tests)

Comprehensive negative-path and typed-accessor coverage:
- SetForTenant: missing-ctx / invalid-ID / unknown-key / non-tenant-scoped /
  validator-reject / nil-receiver / before-Start
- GetForTenant: same negative paths + fall-through-to-global-Set / no-
  override-returns-default / cross-tenant-isolation
- DeleteForTenant: same negative paths + idempotent-no-op on missing override
- ListTenantsForKey: unregistered-key-returns-empty / dedup-and-sort
- Typed accessors: wrong-type-returns-err / missing-tenant-returns-err /
  happy-path per type (fail-closed per decision D8)
- Lazy mode: miss-populates-LRU / store-failure-falls-through
- Critical regression pin: TestGet_IgnoresTenantOverrides (AC1)

tenant_onchange_test.go (~640 LOC, 14 tests)

OnTenantChange subscriber semantics + debouncer coalescing:
- Fire-on-SetForTenant / fire-on-DeleteForTenant-with-default (AC9)
- Multiple-subscribers-all-fire
- Unsubscribe-removes-from-list + self-unsubscribe-from-callback-safe
  (slice-copy invariant pin)
- Panicking-callback-does-not-affect-others (runtime.RecoverAndLog pin)
- TestOnChange_DoesNotFireOnTenantWrites (AC8 — dedicated literal name
  for bisect grep-ability, per TRD §9 and tasks.md §Task 7 line 159)
- TestOnTenantChange_DoesNotFireOnGlobalWrites (symmetric AC8 direction)
- Debouncer (AC12 — TRD §5.3):
  - CoalescesRapidSetsForSameTenant (within-tenant coalescing works)
  - DoesNotCollapseAcrossTenants (widened key prevents cross-tenant collapse)
  - GlobalAndTenantDoNotCollide (sentinel routing + widened key composite)
- Nil-receiver-returns-no-op-unsubscribe

admin/admin_tenant_test.go (~470 LOC, 10 tests)

Handler matrix extending Task 6's happy-path tests:
- Validator-reject / invalid-JSON-body / missing-value-field
- Delete-unknown-key / list-path-URL-encoding (special chars in ns/key)
- Authorizer receives correct action ("read" for GET, "write" for PUT/DELETE)
- TenantAuthorizer error propagates as 403 with message

Why these tests aren't in the existing admin_test.go

Task 6 landed 14 tests directly in admin_test.go (1686 LOC). Putting
Task 7's additional matrix in a separate file keeps admin_test.go from
growing past ~2000 LOC and makes the "comprehensive negative path +
authorizer contract" scope legible as its own unit.

Critical regression pin placement

TestGet_IgnoresTenantOverrides and TestOnChange_DoesNotFireOnTenantWrites
are placed under their literal names (not buried in composite tests) so
a maintainer bisecting a future AC1 or AC8 regression can find them via
git log -S or grep. This is the grep-ability discipline the TRD §9
Backward Compatibility Matrix implicitly assumes when it names tests by
their exact identifier.

All 219 tests across the systemplane subtree pass under -race. 0 lint
issues. 0 vet findings.

Part of: docs/pre-dev/tenant-scoped-systemplane/

* test(systemplane/T-008): integration + race + benchmarks + MongoDB migration fix

Extends systemplanetest.Run contract suite with 9 tenant-aware subtests
and wires both backends into real testcontainer-based integration tests,
a 100-goroutine race stress test, and a benchmark suite that validates
PRD AC15 performance targets. Also fixes a Task 4 bug in the MongoDB
migration path that integration testing surfaced immediately.

systemplanetest/contract.go — extended contract suite

Added 9 subtests run against both Postgres and MongoDB (same test body,
identical assertions — the AC13 parity proof):

1. TenantListOnEmpty                    — fresh client, no writes → empty slice
2. SetTenantThenGetRoundtrip            — write + read + cross-tenant fall-through
3. SetTenantTwiceLastWriteWins          — idempotent upsert
4. DeleteTenantValueReturnsMissing      — delete then read → default
5. DeleteTenantValueIsIdempotent        — double-delete no-op
6. ListTenantsForKeySorted              — dedup + sort + _global excluded
7. GlobalAndTenantRowsCoexist           — Set(global) + SetForTenant(tenant-A)
                                          visible via correct paths
8. TenantSubscribeReceivesSetEvent      — changefeed echo → OnTenantChange
9. TenantSubscribeReceivesDeleteEvent   — delete event → OnTenantChange fires
                                          with registered default (AC9 at
                                          integration level)

New RunOption + SkipSubtest mechanism

SkipSubtest(name) lets the polling-mode MongoDB factory opt out of
TenantSubscribeReceivesDeleteEvent — polling cannot observe inter-tick
deletes. The alternative (branching inside the subtest body on "am I
polling?") would hide the capability limitation from the factory. Explicit
opt-out keeps the contract suite honest about backend capabilities.

postgres_tenant_integration_test.go — Postgres integration

TestIntegration_PostgresTenantContracts spins up a Postgres 17 container,
builds the Client via NewPostgres, runs the full 19-subtest suite (10
existing globals + 9 new tenant contracts). All pass in ~3s.

mongodb_tenant_integration_test.go — MongoDB integration (both modes)

TestIntegration_MongoDBTenantContracts_ChangeStream runs the 19-subtest
suite against a MongoDB 7.0 replica-set container. Full pass.

TestIntegration_MongoDBTenantContracts_Polling runs the same suite
against a standalone MongoDB 7.0 container with WithPollInterval(100ms),
skipping TenantSubscribeReceivesDeleteEvent via SkipSubtest. 18 pass, 1
skip.

mongodb_migration.go — Task 4 bug fix exposed by integration

Task 4's ensureSchema called dropLegacyIndex unconditionally on fresh
collections. dropLegacyIndex handled IndexNotFound (code 27) but not
NamespaceNotFound (code 26). On first boot against an empty database,
the collection itself doesn't exist, so the driver returns
NamespaceNotFound and the error propagated up as a schema-init failure.

Unit tests couldn't catch this because they mocked the collection
handle. Integration tests against a real testcontainer exposed it on
the first run. Fix is +42 lines:

- isNamespaceNotFoundErr helper (mirrors isIndexNotFoundErr)
- dropLegacyIndex swallows both IndexNotFound AND NamespaceNotFound
- containsIndexNotFoundMessage renamed to containsSubstring (shared)
- Unit tests extended to pin the new branch

This is the strongest argument for cycle-end deferred integration
testing: the interaction the bug depended on (fresh collection + ensure
schema + DropOne) was invisible to unit tests but immediate at real-
backend integration.

bench_tenant_test.go — benchmarks vs PRD AC15

Apple M5 Max numbers (informational on Linux CI targets):

  BenchmarkGetForTenant_Hit            133.0 ns/op   16 B/op   1 allocs/op
  BenchmarkGetForTenant_Miss_Eager     130.1 ns/op   16 B/op   1 allocs/op
  BenchmarkGetForTenant_Miss_Lazy      382.1 ns/op  480 B/op   6 allocs/op
  BenchmarkSetForTenant                408.3 ns/op  400 B/op   8 allocs/op
  BenchmarkOnTenantChange_FireFanout_10 98.96 ns/op 160 B/op   1 allocs/op
  BenchmarkListTenantsForKey          4024   ns/op 4504 B/op  13 allocs/op

PRD AC15 targets: Hit < 1µs, Miss < 2µs. Actual results clear both with
5-10× headroom — fast enough that downstream consumers can freely call
GetForTenant inside per-request hot paths without caching the result.

A concurrency-safe benchTenantStore is included in the file so the
benchmark isolates Client cost from storage cost (storage is an O(1)
RWMutex-protected map).

tenant_race_test.go — 100-goroutine stress (PRD AC11)

TestRace_ConcurrentSetForTenant_DistinctTenants: 100 goroutines, each
with a distinct tenantID (tenant-0..tenant-99), each issuing 10
SetForTenant + 10 GetForTenant + 1 DeleteForTenant on the same (ns, key).
Assertions:

- No -race findings
- OnChange counter stays at 0 (AC8 holds under race)
- OnTenantChange received >=1 fire per tenant (debouncer may coalesce
  within-tenant bursts, never across tenants)

TestRace_ConcurrentSubscribeUnsubscribe: 50 goroutines registering/
unregistering OnTenantChange while 50 issue SetForTenant. No -race
findings, no deadlock.

raceStore uses RWMutex + fire fanout with subscriber-snapshot-under-lock
discipline matching the production Client's fireTenantSubscribers.

File-size budget deviations (informational, per TRD §10)

- bench_tenant_test.go    361 (budget 250) — +111 is the concurrency-
                                              safe benchTenantStore needed
                                              to isolate benchmark targets
- tenant_race_test.go     455 (budget 200) — +255 is the raceStore with
                                              fire fanout for 100-goroutine
                                              stress
- contract.go             +597 in a single file (existing file was
                           smaller; splitting would force every factory
                           to reference two contract files)

None split per the "split if any exceeds 500" guidance because splitting
would fragment cohesion. Judged the right trade-off and flagged here for
review visibility.

Acceptance criteria coverage

  AC11  100-goroutine race safety on distinct tenants        PASS (-race clean)
  AC12  Debouncer coalesce correctness                       PASS (unit + integration)
  AC13  Backend parity (Postgres + MongoDB)                  PASS (19-subtest suite green on both)
  AC15  GetForTenant_Hit < 1µs, _Miss < 2µs                  PASS (7.5× / 5-15× headroom)

All 219 unit tests pass under -race. All 19 integration subtests pass
on Postgres. All 19+18 subtests pass on MongoDB change-stream + polling.
0 lint issues. 0 vet findings.

Part of: docs/pre-dev/tenant-scoped-systemplane/

* docs(systemplane/T-009): migration guide + AGENTS.md + README changelog

Closes the tenant-scoped systemplane feature cycle with consumer-facing
documentation: a full adoption guide, updated repo-level agent guidance,
and a README changelog entry. No code changes — this commit is the
documentation surface of the feature.

commons/systemplane/MIGRATION_TENANT_SCOPED.md (NEW, ~535 LOC)

Primary adoption guide for downstream consumers, targeting
plugin-br-bank-transfer Phase 3 as the first-order audience. Sections:

1. Opt-in via RegisterTenantScoped — before/after code example showing
   a globals-only key becoming eligible for per-tenant overrides with
   identical WithDescription / WithValidator / WithRedaction options.
2. Context setup — core.ContextWithTenantID injection, validation via
   core.IsValidTenantID, the "_global" sentinel as reserved.
3. Admin authorizer migration — legacy WithAuthorizer stays unchanged
   on legacy routes; WithTenantAuthorizer is required for tenant routes
   with default-deny escalation when absent.
4. Rollback stance — feature is additive so opting back out is safe at
   the registration layer, but DDL rollback (dropping tenant_id column
   / compound _id index) is explicitly unsupported because it would
   silently drop every tenant override row. Export-before-downgrade
   recommended with a simple ListTenantValues dump script.
5. MongoDB _id rewrite migration — first-boot idempotent backfill
   against pre-existing collections; crashes are recoverable on restart.
6. Performance notes — GetForTenant hit path is sub-200ns (133ns on
   Apple M5 Max per the bench suite), fast enough to call directly at
   the natural logical point without caching the result.
7. New sentinel errors — ErrMissingTenantContext, ErrInvalidTenantID,
   ErrTenantScopeNotRegistered, all wrap-compatible with errors.Is /
   errors.As.
8. OnTenantChange semantics — changefeed-echo fire discipline, AC9
   delete-reverts-to-default, AC8 OnChange-stays-silent-on-tenant-writes.
9. Full lifecycle example — register → start → tenant override → read
   in handler → subscribe → unsubscribe.

AGENTS.md — systemplane section update (+7 / -4)

Expanded the existing Runtime-configuration section to cover:
- Client lifecycle now mentions RegisterTenantScoped alongside Register
- WithLazyTenantLoad(maxEntries) as the new Client option
- Full tenant API method list including typed accessor mirrors
- Ctx-carries-tenant-ID fail-closed discipline explicitly called out
- All six admin HTTP routes (legacy 3 + tenant 3) documented
- WithTenantAuthorizer default-deny escalation rationale
- Storage evolution (Postgres tenant_id column / MongoDB compound _id)
- Three new sentinel errors
- Link to the migration doc for consumer adoption

Tone matches existing AGENTS.md — factual, code-backed, no speculation.

README.md — changelog entry (+1 / -1)

Extended the commons/systemplane one-line description in the Data and
messaging section to mention the additive tenant-scoped surface and
point to MIGRATION_TENANT_SCOPED.md.

.env.reference — no changes

This feature introduces zero new environment variables. Verified.

make ci verification

Final `make ci` run (lint-fix → format → tidy → check-tests → sec →
vet → test-unit → test-integration) exits 0. 219 unit tests pass under
-race. 19 Postgres integration subtests pass. 19 + 18 MongoDB
integration subtests pass (change-stream + polling). 0 lint issues.
0 gosec findings. 0 vet issues.

Part of: docs/pre-dev/tenant-scoped-systemplane/
Closes cycle: T-001 through T-009

* feat(systemplane): two-phase schema migration for rolling-deploy safety

Closes C1 (build-tag client_testing.go), C2+C3 (rolling-deploy PK conflict),
H6 (MongoDB mixed-version backfill collision), H8 (tenant_id='' ambiguous
pre-migration state), M1 (Postgres _global sentinel rejection).

Why: pre-tenant lib-commons binaries (v5.0.x) still deployed in
plugin-br-bank-transfer and matcher use ON CONFLICT (namespace, key) for
upserts. The prior unconditional PK drop broke legacy binaries mid-rolling-
deploy and let legacy Get() return arbitrary tenant rows. Fred chose a
two-phase migration: phase 1 keeps the legacy constraint alive, phase 2
opts in via explicit config.

Schema phases:
- Phase 1 (default): legacy (namespace, key) PK intact, tenant_id column
  present with '_global' default + backfill, NOTIFY trigger + function
  installed. Tenant writes return store.ErrTenantSchemaNotEnabled at the
  backend boundary. Pre-tenant binaries continue to upsert safely.
- Phase 2 (opt-in via WithTenantSchemaEnabled): legacy PK dropped,
  composite unique on (namespace, key, tenant_id) created, MongoDB
  compound _id migration runs. Pre-flight duplicate-detection (H6/H8)
  refuses to proceed if ambiguous pre-migration rows exist.

client_testing.go: added //go:build unit || integration so TestStore /
TestEntry / TestEvent / NewForTesting do not ship in production binaries.

Integration test factories opt in to phase 2 so every tenant contract test
runs against the full schema. Phase 1 is covered by new unit tests that
assert the guard at postgres and mongodb Store boundaries.

See commons/systemplane/MIGRATION_TENANT_SCOPED.md §4 for the operator
runbook.

* refactor(systemplane): perf critical + code quality round 1

Round 1 of the post-review audit — three parallel workstreams landed clean
(C4/C5/C6/H3 perf, H1/H2/M4/L13 code quality, H4/H5/M10/M11/M13-M15/L7/L16-L19/L23/L1/L14 test+docs).

Performance criticals (3):
- C4: replaced hand-rolled LRU (container/list + MoveToFront under write
  lock) with hashicorp/golang-lru/v2 in tenant_cache_lru.go. Lazy-mode
  GetForTenant hit path now runs under cacheMu.RLock instead of Lock —
  the LRU library handles its own MRU promotion atomically. Removes the
  50-100× p99 latency penalty under concurrent load.
- C5: wrapped lazy-miss fetch in golang.org/x/sync/singleflight.Group
  (c.sfg). 20 concurrent misses on same (tenantID, ns, key) now produce
  exactly 1 backend GetTenantValue call. Pinned by
  TestGetForTenant_LazyModeSingleFlightCoalescesMisses under -race.
- C6: made internal/debounce.Debouncer generic on K comparable; onEvent
  now builds an evtKey struct instead of concatenating strings per
  changefeed event. Zero allocation per NOTIFY in the hot path.

Code quality (4):
- H1: exported store.SentinelGlobal as single source of truth; removed 4
  private sentinelGlobal declarations across client.go, postgres.go,
  mongodb.go, contract.go. Drift-by-rename is now impossible.
- H2: replaced containsSubstring with strings.Contains in mongodb_migration.go.
- H3: rewrote tenant_cache.go package doc — get() is safe under RLock for
  both eager and lazy (LRU library handles its own locking), set/delete
  still require Lock.
- M4: removed tenantCache.iterate phantom interface method + both impls +
  the single test that locked the dead contract.
- L13: fixed stale newClientFromStore docstring.

Test + docs (12):
- H4: new perf_gate_test.go enforces AC15 thresholds (<1µs hit, <2µs
  miss) via testing.Benchmark() + require.Less. Observed: 132ns hit /
  128ns miss. Skipped under -short; arch-gated to amd64/arm64.
- H5: drain events channel after Subscribe-before-delete in
  testTenantSubscribeReceivesDeleteEvent — no more false-positive on
  seed's insert event.
- M10+M11+L7: MIGRATION doc gained §8.1 Operational caveats documenting
  transient duplicate window during MongoDB _id rewrite, polling-mode
  missed DELETE events (AC9 gap), and NOTIFY-on-DELETE fanout expansion.
- M13: replaced time.Sleep(300ms) debouncer waits with signal-channel +
  deadline races — deterministic, no longer flaky on loaded CI.
- M14: replaced private-state assertion (c.cacheMu.RLock, c.tenantCache.get)
  with behavior test (populate then inject backend failure — second
  read survives proves population observably).
- M15: removed unnecessary Go 1.25 loop-capture defensive copy.
- L16: b.N → b.Loop across 6 benchmarks.
- L17: custom itoa → strconv.Itoa.
- L18: deleted dead 2-second polling loop after wg.Wait in race test.
- L19: table-driven TestNilClient_TenantMethods consolidates 11 nil-receiver cases.
- L23: removed dead _=c assignment.
- L1+L14: confirmed no action needed (SSRF nolint positioned correctly;
  AGENTS.md runtime section accurately describes RecoverAndLog).

Deferred:
- M12 (consolidate 6 fake stores): cost-benefit declines — each store is
  tightly fitted to its test's needs (raceStore's RWMutex+split-handler,
  benchTenantStore's no-fire, tenantFakeStore's injectable errors).
  Consolidation produces either a god-struct or same-LOC wrappers.
  H1's sentinel unification already eliminated the literal-vs-const
  drift concern that was the original reviewer worry.

go.mod gains: github.com/hashicorp/golang-lru/v2 v2.0.7.

239 unit tests pass. All changes reviewed under -race.

* refactor(systemplane): OnTenantChange ctx param + related doc/helper polish

Closes M2, M7, M9, L10, L20. Round 2a of post-review audit — bundled because
all five touch the tenant-callback region.

M2 (breaking API change): OnTenantChange callback signature gains a leading
ctx context.Context parameter. fireTenantSubscribers now synthesizes the
tenant-scoped ctx via core.ContextWithTenantID before each callback, so
subscribers can use tenant-aware lib-commons facilities (commons/dlq,
commons/net/http/idempotency, commons/webhook) without manually propagating
the tenant ID. This is a deliberate v5 public-API change — the tenant
surface is new in this branch, no external consumers exist yet.

M7: short-circuit fireTenantSubscribers when len(subs) == 0 — skip the
make/copy allocation on empty channels. Hot in the no-subscriber-registered
case during Start-phase hydration.

M9: tightened DeleteForTenant godoc — explicitly notes that no-op deletes
(tenant had no override) produce no change event, so OnTenantChange will
NOT fire in that case. Mirrored in AGENTS.md and register.go where the
idempotency claim was loose.

L10: consolidated log-helper asymmetry — introduced c.logDebug alongside
existing c.logWarn, routed onchange.go:37 and tenant_onchange.go:68
through it. Both sites were using Debug level anyway; the asymmetry was
that other tenant paths already went through nil-safe helpers.

L20: documented mutable-defaults caveat in both RegisterTenantScoped
and legacy Register godocs. Registered defaults are shared by reference;
a subscriber mutating a slice/map default affects every other caller
that falls through to the default.

239 unit tests pass across 7 packages.

* refactor(systemplane): backend mediums + client/admin lows (round 2b)

Closes M3, M5, M6, M8, L2, L3, L4, L5, L6, L8, L9, L11, L12, L21, L22, L24, H7.
Round 2b of post-review audit — two parallel agents (backend cluster, client+admin cluster).

Backend mediums (E):
- M3: MongoDB ListTenantsForKey now checks DistinctResult.Err() before Decode().
  mongo-driver v2.5.0's Decode silently swallows operation-level errors into
  an empty slice; this closes the fail-closed gap.
- M5: hydrateTenantCache snapshots the tenant-scoped registry under a short
  registryMu.RLock, then json.Unmarshal-s outside any registry lock, then
  populates the cache under a single cacheMu.Lock window. At 10k tenants
  this eliminates a ~500ms Start-time blocking window on registryMu.
- M6: new store.Store.ListTenantOverrides() method returning only non-_global
  rows. hydrateTenantCache uses it in place of ListTenantValues — at 10k
  tenants × 12 keys, ~60MB less memory churn per Start().
- M8: extracted Postgres LISTEN loop into internal/postgres/postgres_listen.go
  (195 LOC). postgres.go drops from 514 → 377 LOC, comfortably under the
  500 CRITICAL threshold.

Rolling-deploy safety (H7):
- WithListenChannel tracks explicit opt-in via clientConfig.listenChannelExplicit.
  postgres.New logs WARN when the default channel "systemplane_changes" is
  used without explicit opt-in AND a logger is configured — surfaces the
  cross-service NOTIFY sharing hazard documented in MIGRATION §4.

Lows (various):
- L2: admin handlePutTenant captures and logs (Debug) the GetForTenant error
  instead of discarding — still falls back to echoing submitted value.
- L3: parseNotifyPayload coerces empty tenant_id to store.SentinelGlobal,
  stopping warn-spam during mixed-version upgrade windows.
- L4: dropped redundant _global literal check from validateTenantIDParam —
  regex already rejects leading underscore, and the improvement also stops
  leaking the sentinel name to unauthorized callers.
- L5: new Client.recordTenantLazyFetchError metric (counter
  `systemplane_tenant_lazy_fetch_errors_total` with ns+key attrs).
  Nil-safe when telemetry is unset.
- L6: refreshTenantFromStore now synthesizes a tenant-bound bgCtx via
  core.ContextWithTenantID before the timeout wrap, so observability
  middleware reading ctx sees the correct tenant.
- L8: validateKeyArgs rejects key == "tenants" (reserved for admin routing
  shadow prevention).
- L9: removed dead DeleteOne ErrNoDocuments branch — mongo-driver v2 returns
  (result, nil) with DeletedCount=0 on no-match.
- L11: collapsed confusing internal-debate prose in ListTenantsForKey godoc.
- L12: removed tenant_storage.go's redundant nil→[]string{} coercion — the
  store.Store interface contract already guarantees non-nil.
- L21: validateKeyArgs rejects U+001F (information-separator-one) in ns/key —
  still reserved as the singleflight key delimiter even after the debouncer
  went struct-keyed.
- L22: watchOnce godoc documents the resume-token-not-persisted behavior;
  MIGRATION §8.1 extended with the same caveat.
- L24: admin handlePutTenant doc rewritten to match reality (authorization
  runs first via middleware; malformed :tenantID produces 403 on denied paths).

241 unit tests pass across 7 packages. Go build clean.

New file: metrics.go (lazy-fetch error counter).
New file: internal/postgres/postgres_listen.go (extracted LISTEN loop).

* fix(systemplane/admin): add unit||integration build tag to admin tests

admin_test.go and admin_tenant_test.go use systemplane.TestEntry /
systemplane.TestEvent, which became build-tag-gated in Round 1 (C1).
Without matching tags, 'go vet' on untagged package views fails with
'undefined: systemplane.TestEntry'. Adding '//go:build unit || integration'
aligns the admin test files with the rest of the systemplane test tree.

Flagged by Agent E as a scope-adjacent loose end after C1 landed.

* fix(systemplane): skip AC15 perf gate under -race

make ci runs tests with -race. Race instrumentation adds 10-50x latency
overhead on atomic/mutex ops, pushing sub-microsecond AC15 targets into
multi-microsecond territory (observed 2.58µs vs 1µs threshold). The perf
gate fails spuriously in that lane even when the actual GetForTenant
hot path is well under budget.

Tagged the file '//go:build unit && !race' so the gate still runs on
'go test -tags=unit' without -race, where it's meaningful. The race
lane exercises the same code for concurrency safety via the other
tenant test suites.

Documented the gating in the file's package doc.

* style: apply cosmetic formatting to Go files

Corrects comment formatting in the postgres schema file for better
readability and consistency. This includes fixing indentation and
normalizing quote characters.

Adds a blank line in the tenant refresh logic to improve code
readability by visually separating context creation from its subsequent
use.

* chore: remove REVIEW.md file

The review findings in this document are outdated. All relevant issues
have been addressed or migrated to the project's issue tracker.

This removal cleans up the repository and prevents confusion caused by
obsolete information.

* chore(ci): update go-combined-analysis workflow

X-Lerian-Ref: 0x1

* docs: remove obsolete pre-dev planning artifacts

X-Lerian-Ref: 0x1

* refactor(systemplane): strengthen tenant-scoped runtime config

Adds tenant-aware hydration, admin response helpers, client telemetry, MongoDB CRUD and legacy migration splits, and a Postgres migration integration test. Expands contract suite, race tests, and perf gates around tenant-scoped keys.

X-Lerian-Ref: 0x1

* fix(systemplane): apply CodeRabbit review feedback on PR #445

Addresses 13 unresolved CodeRabbit threads from the tenant-scoped runtime
config PR, plus two additional defects surfaced while driving the CI
pipeline to green. Grouped below by the class of problem each fix closes.

Correctness (Major, CodeRabbit):
- admin: map ErrTenantSchemaNotEnabled to 503 tenant_schema_not_enabled
  in mapSentinelErr so tenant writes during phase-1 rollout surface a
  descriptive gated-feature response instead of a generic 500.
- client_test: switch fakeStore tenant-scoped stubs from nil/no-op to a
  shared errTenantStubCalled sentinel so any unexpected tenant-path
  invocation in the legacy-globals suite fails fast instead of passing
  silently with a zero-value return.
- mongodb integration: snapshot csEvents/pollEvents under their mutexes
  after waiting for both Subscribe loops to exit, eliminating a race in
  the change-stream/polling parity assertion where cancel() is
  asynchronous and subscriber goroutines could still be appending.
- mongodb_tenant: drop tenant.id, entry.actor, and cursor.after_tenant
  from span attributes on every hot-path call. These identifiers are
  unbounded in a real multi-tenant deployment and would inflate trace
  index cardinality; they remain available through structured logs and
  the upstream Client layer.

Test tightness (Minor, CodeRabbit):
- mongodb integration: assertCompoundIndexExists now also requires the
  index-level unique flag to be true, so a regression that creates a
  non-unique (namespace, key, tenant_id) index is caught by the
  migration suite instead of passing silently.
- postgres migration integration: only tolerate the expected SQLSTATE
  23502 ("contains null values") on the post-seed SET NOT NULL; any
  other ALTER failure now fails the test via require.Contains instead
  of being silently swallowed.
- tenant_onchange_test: replace unchecked f.value.(float64) with a
  require-guarded type assertion so a type mismatch surfaces as a
  diagnosable test failure instead of a panic.

Code quality (Trivial, CodeRabbit):
- contract: remove unreachable return after t.Skipf (t.Skipf calls
  runtime.Goexit).
- tenant_cache_lru: surface a WARN log when the defensive LRU-init
  fallback to unbounded eager cache fires. Also threads cfg.logger
  through newTenantCacheForConfig and updates test callers.
- tenant_scoped_accessors: fix double %w in GetDurationForTenant; the
  parse error is now formatted with %v so ErrValidation stays the sole
  root of the wrapped chain.

Doc accuracy (Minor, CodeRabbit):
- AGENTS.md: correct the MongoDB migration timing (runs during store
  construction inside NewMongoDB, not before Start returns) and add
  ErrTenantSchemaNotEnabled to the sentinel errors list.
- tenant_scoped: rewrite the extractTenantID godoc to match actual
  behavior (value is returned verbatim from core.GetTenantIDContext —
  no trimming) to prevent reliance on a non-existent normalization.
- tenant_scoped: add ErrNotStarted to the GetForTenant exported errors
  list — it was already being returned, just undocumented.

CI-green fixes (Major, surfaced by make ci):
- mongodb_crud: phase-1 upsert now filters by (namespace, key) alone
  instead of (namespace, key, tenant_id). The prior filter missed the
  legacy ObjectId-keyed rows that pre-tenant v5.0.x binaries wrote —
  those rows have no tenant_id field at all — which drove the upsert
  into an insert branch and tripped the legacy unique index on
  (namespace, key) with E11000 on every phase-1 Set against a legacy
  row (the exact regression
  TestIntegration_Mongo_Phase1_SetOverwritesLegacyObjectIDRow pins).
  Safe because phase-1 rejects tenant writes (ErrTenantSchemaNotEnabled),
  so every upsert routes through tenantID="_global" and the legacy
  unique index already guarantees one row per (namespace, key).
- mongodb integration: assertCompoundIndexExists now correctly iterates
  the listIndexes key subdocument as bson.D instead of bson.M.
  mongo-driver/v2 decodes nested documents inside a top-level bson.M as
  bson.D (to preserve field order), so the previous bson.M type
  assertion skipped every index and the helper always reported
  found=false — which is why TestIntegration_Mongo_Migration_FreshCollection
  failed even when the compound unique index was present and correctly
  constructed by ensureSchema.

All tests now pass: 5052 unit tests, 221 integration tests (2 skipped —
TenantSubscribeReceivesDeleteEvent in polling mode, which is the
documented change-stream-only delete visibility gap). Lint, format,
tidy, sec, vet all clean.

* fix(systemplane): apply CodeRabbit follow-up review on PR #445

Four new unresolved CodeRabbit threads surfaced after the prior feedback
round; this commit closes all of them plus completes one fix that was
documented but never actually applied.

Correctness (Major, CodeRabbit):
- mongodb integration: assert killAllSessionsByPattern actually ran in
  TestIntegration_Mongo_ChangeStream_ReconnectsAfterStreamClose. The
  teardown command previously had its error swallowed with '_ ='; if the
  server rejected or no-oped the command, the follow-up write would land
  on the original stream and the reconnect path would silently never be
  exercised. Now require.NoError fails the test loudly with a
  diagnosable message.

Test tightness (Minor, CodeRabbit):
- mongodb integration: assertCompoundIndexExists now pins the compound
  index key ORDER, not just field membership. The migration contract and
  keyset-ordering paths rely on (namespace, key, tenant_id) in that
  exact sequence; a reordered variant like (tenant_id, namespace, key)
  would have passed the prior membership-only check while breaking
  production keyset pagination.

Code quality (Trivial, CodeRabbit):
- tenant_scoped_accessors: finish the double-%w fix. Commit 458fbe7
  added the 'Use %v (not %w)' rationale comment but left the fmt.Errorf
  format string untouched, so errors.Unwrap still returned a sibling
  chain. This commit applies the documented change so ErrValidation is
  the sole wrapped root and the parse error is rendered via %v.
- client_test: replace the remaining eight '_global' string literals in
  fakeStore.Set, tamperEntry, and the four manual changefeed-injection
  tests with store.SentinelGlobal. The doc comment on line 68 keeps the
  literal because it explains the wire-level value; everywhere the
  string is used as an identifier in code now goes through the canonical
  constant so future renames stay contained to one spot.

Verified: make ci passes — 5052 unit tests, 221 integration tests (2
intentional polling-mode skips), lint/format/tidy/sec/vet all clean.

* fix(systemplane): apply CodeRabbit follow-up review on PR #445

Four new unresolved CodeRabbit threads surfaced after the prior feedback
round; this commit closes all of them. Grouped below by severity.

Correctness (Major, CodeRabbit):
- mongodb integration (InsertUpdateParity): replace the unbounded
  `<-csErrCh` / `<-pollErrCh` reads after cancel() with bounded drains
  via a new assertSubscribeTerminated helper. A naked channel read would
  hang the package-level `go test` run until the global 10-minute
  timeout if a Subscribe loop ever regressed and stopped honoring
  context cancellation; the helper bounds the wait at 5s and validates
  the terminal error matches the cancel-path contract (nil for the
  polling backend, context.Canceled for the change-stream backend).
- mongodb integration (DeleteSemanticsDiverge): capture the Subscribe
  return values into buffered channels instead of discarding them with
  `_ = Subscribe(...)`. A Subscribe that failed before the delete wave
  would previously leave pollDeletesAfter at zero and let this subtest
  pass silently even though the polling delivery path was never
  exercised — draining and asserting after an explicit cancel() turns
  that class of regression into a loud test failure. The bounded drain
  uses the same 5s helper so both subtests agree on the deadline. Also
  restructured the `defer pollMu.Unlock()` block so the mutex is
  released before the bounded drain runs; holding it through a 5s wait
  could have self-deadlocked against a subscriber goroutine in flight.

Code quality (Trivial → Minor, CodeRabbit):
- tenant_scoped_accessors.GetFloat64ForTenant: accept both float64 and
  int backing types to match GetIntForTenant's dual-type handling.
  Values set via SetForTenant round-trip through JSON and come back as
  float64, but a registered default that is a Go int literal
  (RegisterTenantScoped(..., 0, ...)) would otherwise surface
  ErrValidation on a tenant without any override — a misleading
  "not a float64" error for what the configuration surface was never
  meant to enforce. New test TestGetFloat64ForTenant_AcceptsIntDefault
  locks the symmetry invariant.
- tenant_scoped_accessors.GetDurationForTenant: add the
  //nolint:errorlint directive that golangci-lint itself suggested in
  thread #4. Root cause discovered while investigating: `make ci` runs
  `make lint-fix` which invokes `golangci-lint run --fix`, and
  errorlint's auto-fix mode silently rewrites %v → %w on every run.
  The prior two commits (458fbe7 and d692433) both documented the
  intentional %v choice but left the line vulnerable to the auto-fix;
  the line was flip-flopping across commits. The nolint directive is
  now load-bearing — without it, `make ci` will undo the fix again
  the next time it runs. A companion comment explains why the
  directive cannot be removed.

Verified: make ci passes — 5336 unit + integration tests (2 intentional
polling-mode skips), lint/format/tidy/sec/vet all clean. One net new
test (TestGetFloat64ForTenant_AcceptsIntDefault) compared to the prior
pristine run (5335 → 5336).

* fix(systemplane): apply CodeRabbit follow-up review on PR #445

Addresses the fourth round of CodeRabbit feedback:

- tenant_scoped_accessors.go: GetIntForTenant now rejects non-integral
  float64 values (e.g., 0.9) with ErrValidation instead of silently
  truncating to int. Silent truncation would turn a bad config into a
  different valid config rather than surface the misconfiguration.

- tenant_scoped_test.go: TestSetForTenant_SurfacesErrTenantSchemaNotEnabled
  now asserts the returned value is the global default (0.0), not the
  rejected 0.5. The previous assertion only checked found==true, which
  could not have caught an in-process split-brain where SetForTenant
  cached a failed write.

- tenant_scoped_test.go: hydrateFakeStore tenant CRUD methods now fail
  fast with errUnexpectedHydrateStoreCall instead of returning nil/zero.
  A regression where eager hydration starts hitting GetTenantValue or
  tenant CRUD during these tests would have stayed green silently;
  now it fails loudly with a targeted sentinel. The unused `rows` map
  and its population are removed since no accessor reads it anymore.

- mongodb_integration_test.go: TestIntegration_Mongo_ChangeStream_
  ReconnectsAfterStreamClose now captures the Subscribe terminal error
  through subErrCh and uses assertSubscribeTerminated for a bounded
  5-second drain at test end, matching the pattern established in
  earlier tests in the same file. An unbounded read would turn a
  cancellation-handling regression into a 10-minute package timeout.

- security/ssrf.go: removed duplicated comment block above
  blockedPrefixes. The canonical description and MAINTENANCE note are
  consolidated into a single block.

All affected tests pass. Lint clean. Unit: 5474/0. Integration on
modified package (mongodb): passes in 5.86s.

Pre-existing flake: commons/streaming.TestIntegration_CircuitBreaker_
TripsOrganically times out on clean HEAD too (confirmed via stash+retry);
unrelated to this change.

* fix(streaming): honor context cancellation in produce paths

franz-go's ProduceSync blocks on a WaitGroup that ignores ctx.Done(). When a broker is unreachable and metadata discovery stalls, RecordDeliveryTimeout never starts — the call hangs indefinitely. Replaced ProduceSync with an async Produce + select{} wrapper (produceWithContext) so the caller's deadline is always respected. The buffered channel (cap 1) prevents callback goroutine leaks when we exit via ctx.Done().

X-Lerian-Ref: 0x1

* fix(rabbitmq): add TCP port wait strategy for macOS Docker Desktop

On macOS Docker Desktop, VM-level port forwarding can lag behind the 'Server startup complete' log line. MappedPort('5672/tcp') would fail with 'port not found' before the AMQP port was actually accepting connections. Switched to WithAdditionalWaitStrategy + ForListeningPort to ensure the port is both mapped and reachable before tests proceed.

X-Lerian-Ref: 0x1

* style(streaming): add blank line for improved readability

A blank line is added in the `produceWithContext` function to visually
separate the channel declaration from the client produce call.

This change enhances code clarity without affecting logic.

v5.0.0-beta.5

19 Apr 23:35
ffe8e13

Choose a tag to compare

v5.0.0-beta.5 Pre-release
Pre-release
feat(streaming): CloudEvents producer with DLQ, outbox fallback, and …

v5.0.2

17 Apr 14:41
a3720cf

Choose a tag to compare

Merge pull request #442 from LerianStudio/hotfix/rabbitmq-producer-re…

v5.0.1

16 Apr 17:49
5f429bb

Choose a tag to compare

Merge pull request #441 from LerianStudio/hotfix/telemetry-unsafe-str…

v5.0.0-beta.4

16 Apr 17:44
9fd63f7

Choose a tag to compare

v5.0.0-beta.4 Pre-release
Pre-release
Merge pull request #440 from LerianStudio/fix/telemetry-unsafe-string…

v5.0.0

14 Apr 18:33
d56aa6c

Choose a tag to compare

What's Changed

  • release: lib-commons v5 — systemplane v2, certificate, dlq, idempotency, webhook, ssrf by @fredcamaral in #435

Full Changelog: v4.6.0...v5.0.0

v5.0.0-beta.3

14 Apr 18:15
6fc8455

Choose a tag to compare

v5.0.0-beta.3 Pre-release
Pre-release
fix: apply CodeRabbit auto-fixes for PR #435 (#439)

- Migrate proxy transport to ssrf.ResolveAndValidate, eliminating TOCTOU window
- Block hostnames that normalize to empty values (e.g., ".", "..")
- Fix DLQ batch cap bypass when all tenant queues have future-dated heads
- Normalize dequeued message Source to the authoritative queue name
- Restore tenant context before re-enqueueing during DLQ prune
- Fix potential OOB panic in truncateString with invalid UTF-8
- Clarify TLSCertificate PrivateKey sharing in doc comment
- Use keyStateProcessing constant in idempotency test
- Add maintenance comments linking expectedPrefixCount to blockedPrefixes
- Add test for blocked IP in middle of resolved IP list