feat(org-lens): shared Valkey cache for org membership lookups#914
Conversation
Replace the per-process in-memory memo in OrgMembershipResolverService with a cross-instance, fail-soft Valkey-backed cache keyed by the authorization principal (access-token sub), failing closed when it cannot be resolved. This removes the 'anon' fallback that could serve one user's FGA-filtered membership data to another across pods. Add a reusable ValkeyService (lazy, TLS, fail-soft) and migrate OrgRoleGrantsService onto it. Caching activates only when VALKEY_URL is set; unset keeps today's direct-fetch behavior. Resolves https://linuxfoundation.atlassian.net/browse/LFXV2-2184 Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces an opt-in, cross-instance Valkey (Redis-compatible) cache to improve coherence and hit rate for server-side org membership lookups and org role-grants resolution in the multi-replica UI deployment, while tightening cache isolation to the authorization principal.
Changes:
- Add shared cache interfaces/constants (
CachePort,VALKEY_CACHE) and HelmVALKEY_URLwiring to enable/disable caching per environment. - Introduce a fail-soft
ValkeyService(lazy connect, TLS viarediss://, per-operation timeout) and use it for org membership + org role-grants caching. - Replace in-process memos with shared Valkey-backed caches (principal/username-keyed) and preserve Map ordering via entry-array serialization.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| yarn.lock | Adds ioredis and its transitive dependencies to the lockfile. |
| apps/lfx-one/package.json | Adds ioredis dependency for the server runtime. |
| packages/shared/src/interfaces/valkey-cache.interface.ts | Defines a generic cache port (CachePort) and cached envelope type for shared caching. |
| packages/shared/src/interfaces/org-selector.interface.ts | Adds a serializable cache entry shape for AccessAwareOrgsResult (Map ↔ entry arrays). |
| packages/shared/src/interfaces/index.ts | Exports the new Valkey cache interface(s) from the shared barrel. |
| packages/shared/src/constants/valkey-cache.constants.ts | Adds shared cache configuration (namespaces, TTLs, timeouts, size cap). |
| packages/shared/src/constants/index.ts | Exports the Valkey cache constants from the shared barrel. |
| charts/lfx-self-serve/values.yaml | Adds VALKEY_URL config knob to enable shared caching via Helm values. |
| apps/lfx-one/src/server/services/valkey.service.ts | Implements the lazy, fail-soft Valkey client + JSON read-through helper. |
| apps/lfx-one/src/server/services/org-role-grants.service.ts | Migrates role-grants memoization to Valkey using username-scoped keys and Map serialization. |
| apps/lfx-one/src/server/services/org-membership-resolver.service.ts | Migrates org membership memoization to Valkey using principal-bound keys and keeps per-process in-flight dedup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds a Valkey/Redis-backed fail-soft cache (singleton ValkeyService), shared cache interfaces and constants, migrates OrgMembershipResolverService and OrgRoleGrantsService to use principal/username-scoped Valkey caching with in-process coalescing, and exposes VALKEY_URL in Helm values. ChangesValkey-Backed Caching Infrastructure and Service Migration
Sequence DiagramsequenceDiagram
participant Client
participant OrgMembershipResolver as OrgMembershipResolverService
participant ValkeyService
participant Valkey as Valkey/Redis
participant ResourcesAPI as Resources Microservice
Client->>OrgMembershipResolver: fetchMembershipsBySlug(req, orgUid, slug)
OrgMembershipResolver->>OrgMembershipResolver: buildCacheKey(principal, orgUid, slug)
alt No principal (key null)
OrgMembershipResolver->>OrgMembershipResolver: runMembershipFetch(req, orgUid, slug)
OrgMembershipResolver->>ResourcesAPI: paginated /query/resources
else Principal present
alt membershipsInFlight hit
OrgMembershipResolver->>OrgMembershipResolver: return cached promise
else withCache
ValkeyService->>Valkey: getJson(cache_key)
alt Cache hit
Valkey-->>ValkeyService: cached envelope
else Cache miss
ValkeyService->>OrgMembershipResolver: call fetcher
OrgMembershipResolver->>ResourcesAPI: runMembershipFetch
ResourcesAPI-->>OrgMembershipResolver: ProjectMembershipDoc[]
OrgMembershipResolver-->>ValkeyService: result
ValkeyService->>Valkey: setJson(cache_key, result, ttl)
end
ValkeyService-->>OrgMembershipResolver: result
end
end
OrgMembershipResolver-->>Client: ProjectMembershipDoc[]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/lfx-one/src/server/services/org-role-grants.service.ts (1)
37-57: 💤 Low valueConsider adding in-flight request coalescing for concurrent cache misses.
Unlike
OrgMembershipResolverService, this service doesn't coalesce concurrent in-flight requests. On cache miss, multiple concurrent requests for the same username will all compute and write to cache (last write wins). This is safe but wasteful and could cause a thundering herd on cold-start or cache expiry.The membership service's
membershipsInFlightpattern could be applied here if load warrants it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/lfx-one/src/server/services/org-role-grants.service.ts` around lines 37 - 57, Add in-flight request coalescing to getAccessAwareOrgs by introducing a module-level Map (e.g., orgsInFlight) keyed by the same value returned from OrgRoleGrantsService.buildCacheKey(username); on cache miss check orgsInFlight for an existing Promise and return it if present, otherwise create and store the Promise returned by calling computeAccessAwareOrgs(req, username), await it, then on completion remove the entry from orgsInFlight and proceed to the existing cache set logic (only set when !result.upstreamFailed); ensure that on rejection you delete the in-flight entry and rethrow so failures aren’t cached and you avoid leaks (mirror the membershipsInFlight pattern used in OrgMembershipResolverService).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@apps/lfx-one/src/server/services/org-role-grants.service.ts`:
- Around line 37-57: Add in-flight request coalescing to getAccessAwareOrgs by
introducing a module-level Map (e.g., orgsInFlight) keyed by the same value
returned from OrgRoleGrantsService.buildCacheKey(username); on cache miss check
orgsInFlight for an existing Promise and return it if present, otherwise create
and store the Promise returned by calling computeAccessAwareOrgs(req, username),
await it, then on completion remove the entry from orgsInFlight and proceed to
the existing cache set logic (only set when !result.upstreamFailed); ensure that
on rejection you delete the in-flight entry and rethrow so failures aren’t
cached and you avoid leaks (mirror the membershipsInFlight pattern used in
OrgMembershipResolverService).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1d9c3e47-0f8f-49e9-807f-df11205a05f7
⛔ Files ignored due to path filters (1)
yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (10)
apps/lfx-one/package.jsonapps/lfx-one/src/server/services/org-membership-resolver.service.tsapps/lfx-one/src/server/services/org-role-grants.service.tsapps/lfx-one/src/server/services/valkey.service.tscharts/lfx-self-serve/values.yamlpackages/shared/src/constants/index.tspackages/shared/src/constants/valkey-cache.constants.tspackages/shared/src/interfaces/index.tspackages/shared/src/interfaces/org-selector.interface.tspackages/shared/src/interfaces/valkey-cache.interface.ts
|
Please hold off on merging as I am also making some changes on this function as part of the sub -> username swap. This can be merged after Thursday. |
Address review comments from copilot-pull-request-reviewer, coderabbitai[bot]: - valkey.service.ts / valkey-cache.interface.ts: add an optional `accept` shape-validator to getJson/withCache so a corrupt, partial, or legacy cache entry degrades to a miss instead of surfacing a fault (per copilot). - org-membership-resolver.service.ts: pass Array.isArray to withCache so a malformed cached value can't reach resolveContext().filter() and 500 (per copilot). - org-role-grants.service.ts: validate the cached entry shape before deserialize (no 500 from a corrupt hit, per copilot); add per-process in-flight coalescing of concurrent misses (per coderabbitai[bot]). Resolves 3 review threads. Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Review Feedback Addressed — round 1Commit: 150f560 (DCO signed-off + GPG-signed) Changes Made
Threads Resolved2 of 2 inline review threads resolved. The coderabbitai[bot] "in-flight coalescing" nitpick (posted in its review summary, not a separate thread) is also implemented in this commit. Validation
|
The 4-arg valkeyService.withCache(...) call exceeded the print width; Prettier reformats it across lines. Fixes the format:check CI failure. Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Reword the VALKEY_URL comment so it no longer implies the rediss:// scheme means "no auth token". rediss:// enables TLS in transit; credentials are optional and embeddable in the URL when a deployment requires them, while our deployment stays SG-gated with no token. Addresses PR review feedback on charts/lfx-self-serve/values.yaml. Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Review-thread round — recapWorked the 3 unresolved review threads (all from Fixed & resolved
Replied, intentionally left open (by-design — maintainer call)
Quality gate (lfx-v2-ui / BFF)
No unit tests added (Playwright e2e only; no behavior change in this round). The build-regenerated |
Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # apps/lfx-one/src/server/services/org-membership-resolver.service.ts
Address post-merge Copilot review on the shared Valkey cache: - valkey.service: redact the per-user tail of cache keys in warn/debug logs so JWT sub/usernames are not leaked into infrastructure logs. - org-membership-resolver: reject cached arrays containing nullish/non-object elements (degrade to a miss) so a malformed hit can't throw a 500 downstream. - org-role-grants: also validate loadedAt (string) and upstreamFailed (boolean) in the cache-entry shape guard so partial entries can't violate the API contract. Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
PR review-thread round — post-merge Copilot pass (commit
|
| Thread | File | Fix |
|---|---|---|
| Array guard allowed nullish elements | org-membership-resolver.service.ts |
withCache now uses isMembershipDocArray (rejects arrays with nullish/non-object elements) so a malformed hit degrades to a miss + refetch instead of throwing on m.status/m.uid. |
isValidCacheEntry under-validated |
org-role-grants.service.ts |
Now also requires loadedAt: string and upstreamFailed: boolean, so a partial entry can't emit loaded_at: undefined/non-boolean on the wire. |
getJson logs leaked full key |
valkey.service.ts |
New redactKey(); valkey_get warn logs emit only ${APP_PREFIX}:${NAMESPACE}:***. |
setJson logs leaked full key |
valkey.service.ts |
valkey_set warn logs use the redacted namespace. |
withCache debug logs leaked full key |
valkey.service.ts |
cache_bypass/cache_hit/cache_miss debug logs use the redacted namespace. |
Left open — by-design (2, maintainer call):
org-membership-resolver.service.ts:86andorg-role-grants.service.ts:52— Copilot's "removing the per-process 30s TTL memo regresses the no-Valkey path." These already carry the author's rationale (the per-process result memo was deliberately dropped in favor of the principal-keyed shared cache to avoid cross-user leak + cross-instance drift; in-flight coalescing remains). Intentional trade-off — left for a maintainer call, not silently resolved.
No merge attempted (PR carries do-not-merge + hold pending the sub→username swap on OrgRoleGrantsService). Changes are guard/log-only and avoid that logic to stay conflict-free.
Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Tighten the cache shape guards so a corrupt or legacy entry reliably
degrades to a miss + recompute instead of being served as a bogus hit:
- isMembershipDocArray now requires each element to be a non-array object
carrying a string `uid` (the one guaranteed field), rejecting `[{}]` and
array elements that previously passed `typeof el === 'object'`.
- isEntryTupleArray now rejects array-valued tuples like `[["uid", []]]`,
which previously passed the `typeof item[1] === 'object'` check and would
rebuild a Map with non-object docs.
Signed-off-by: Luis Mori Guerra <luismorith@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
PR thread-resolution round — recapBranch synced with
Both shape-guard fixes make a corrupt/legacy cache entry degrade to a miss + recompute instead of being served as a bogus hit — no behavior change for valid entries. Preserved by design (not regressed): full TLS verification via plain Quality gate: Commit |
…namespace Address review feedback on the shared Valkey cache: - cacheKeyNamespace() now sanitizes unsafe characters to `-` instead of silently dropping a configured namespace, matching its doc comment. - Reword ORG_MEMBERSHIP_NAMESPACE comment: bumping the version starts a new namespace (old entries age out via TTL), it does not actively invalidate. - Remove deployment-specific claims from the MAX_VALUE_BYTES comment and the VALKEY_KEY_NAMESPACE runtime-configuration doc row. Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
PR thread-resolution round —
|
| Thread | Fix |
|---|---|
valkey.service.ts — namespace not sanitized |
cacheKeyNamespace() now replaces unsafe chars with - instead of silently dropping a configured namespace; doc comment corrected. |
runtime-configuration.md — deployment-specific docs |
Removed Unset in dev/staging/prod + feature-branch wording; row is now deployment-agnostic. |
valkey-cache.constants.ts:9 — "bump to invalidate" |
Reworded: bumping starts a new namespace; old entries age out via TTL. |
valkey-cache.constants.ts:24 — "single-node" |
Dropped the deployment-specific detail. |
Quality gate (UI/BFF): yarn format:check, yarn lint:check, yarn build all green. Build-regenerated search-index.json reverted; yarn.lock unchanged.
Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Anchor redactKey() on the `vN` version segment instead of a fixed index so the cache domain/version (e.g. org-membership:v1) is kept in logs whether or not the optional deployment key-namespace segment is present, while still masking the principal tail. Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Review-thread resolution round — recapAddressed the one remaining open thread. Fixed (1):
Quality gate: Open threads remaining: 0. Note: |
dealako
left a comment
There was a problem hiding this comment.
Follow-up review (round 3) — new namespacing + guard hardening looks clean ✅
Thanks @luismoriguerra — another tight, well-targeted round. Since my last approval (which was auto-dismissed when you pushed new commits), you've added per-deployment cache-key isolation and addressed @jordane's and Copilot's latest comments. I re-ran both my security and code-quality reviewers against the incremental cache diff only (ignoring the reddit-ads / docs-portal / committees changes that rode in via the main merges) — both came back clean, re-approve.
👏 Nice work
VALKEY_KEY_NAMESPACEper-deployment isolation — a genuinely good catch for branch previews sharing dev's ElastiCache. The sanitizer (.replace(/[^A-Za-z0-9._-]/g, '-')) correctly strips the:delimiter so an operator-set namespace can't inject extra key segments, and both services build the prefix identically. Verified branch deployments can't read each other's keys, and the no-namespace key shape is unchanged.cacheKeyNamespace()now sanitizes in place instead of silently returning''— directly resolving @jordane's complaint and matching its doc comment.- Hardened shape guards —
isMembershipDocArraynow requires a stringuid(rejects[{}]and arrays), andisEntryTupleArrayrejects array-valued tuples like[["uid", []]]. Both correctly degrade corrupt entries to a miss + recompute with no false rejection of valid data (empty arrays still pass as legit hits). redactKey()anchored on thevNversion segment — verified it masks the principal across every key shape, with or without the deployment namespace.
Revision tracking — all RESOLVED ✅
- ✅ @jordane:
cacheKeyNamespacereturning''instead of sanitizing → fixed in99f01187(sanitizes to-). - ✅ @jordane:
runtime-configuration.mddeployment-specific docs → genericized in99f01187. - ✅ @jordane: constants comment "bumping invalidates" → reworded to "starts a new namespace, old entries age out via TTL" (
99f01187). - ✅ @jordane:
MAX_VALUE_BYTES"single-node" deployment claim → removed (99f01187). - ✅ Copilot: redactKey dropping domain/version with a namespace set → fixed via findIndex anchor (
281ff998). - ✅ Copilot: shape guards accepting
[{}]/ array-valued tuples → hardened (bc8d38ac). - ✅ Copilot:
readonly clientreassigned in constructor flagged as a compile error → confirmed false positive. TypeScript permits assigning areadonlyproperty within the declaring class's constructor even with an initializer; the= nullinitializer is required for theVALKEY_URL-unset early-return path, so keep it as-is. Agreeing with your assessment.
Note on timing: @jordane's CHANGES_REQUESTED (18:05 local) predates your fixes (99f01187 18:11, 281ff998 18:21), so that formal block is now stale and just needs @jordane's re-review to clear.
Reconciliation with new bot comments
All new Copilot comments this round are addressed (shape guards, redactKey, readonly false-positive). The recurring TLS-hostname note is the same accepted design (companion argocd #936 points VALKEY_URL at the real ElastiCache endpoint whose cert SAN matches) — no change needed.
Open items
- 🔴 Blocking: 0
- 🟡 Minor: 0
- ⚪ Nit (optional, non-blocking): 2 — (1) if
VALKEY_KEY_NAMESPACEitself matched/^v\d+$/(e.g.v5),redactKeywould anchor on it and over-mask the domain in logs (never a principal leak); the doc comment's "version immediately precedes the principal" is slightly overstated for that pathological config. (2)cacheKeyNamespace()has no length cap and treats whitespace/all-unsafe input (' ') as a set namespace (---); distinct inputs can also collapse to the same sanitized value. Both are operator-config edge cases, bounded by principal-keyed entries — fine to leave or tidy in a follow-up.
Decision: ✅ Approved with minor comments
No blocking issues, no security findings, no merge conflicts, clean fail-closed/fail-soft semantics throughout. Solid iteration. (FYI @jordane — your specific requests appear fully addressed in 99f01187/281ff998; a re-review would clear the standing block.)
Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Document that redactKey() anchors on the first vN-shaped segment, so a deployment namespace that is itself vN over-masks the domain. This is harmless (reduced log fidelity only, never a principal leak), so the behavior is intentionally left as-is. Signed-off-by: Luis Mori Guerra <luismorith@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Review-thread resolution round (HEAD
|
Strengthen the access-aware orgs cache-entry guard so a structurally
valid but semantically corrupt resolved tuple value (e.g. `["uid", {}]`)
degrades to a cache miss + recompute instead of deserializing and later
throwing on `roleSource.startsWith(...)`. isEntryTupleArray now accepts an
optional value guard, and resolved entries are validated to carry a
non-empty string roleSource plus string-typed optional parentUid/parentName.
Signed-off-by: Luis Mori Guerra <luismorith@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Review-thread resolution round — recapCommit: Threads addressed (1 open → 0 open):
Quality gate (BFF — Playwright e2e only, no unit tests):
No prior locked decisions were regressed (full TLS verification via plain |
dealako
left a comment
There was a problem hiding this comment.
Follow-up review (round 4) — clean ✅
Thanks @luismoriguerra — small, surgical round and both open items are fully closed. I re-ran both reviewers against just the incremental cache diff (the crowdfunding / routing / committees churn in the branch is main-merge noise, not this PR), and they came back clean.
👏 Nice work
isResolvedOrgRolevalue guard (org-role-grants.service.ts) — exactly the right fix for the Copilot finding. Threading an optionalvalueGuardintoisEntryTupleArrayand validating thatresolvedtuple values carry a non-empty stringroleSource(plus string-typed optionalparentUid/parentName) closes the last deserialize-to-500 path: a corruptresolved: [["uid", {}]]now degrades to a miss + recompute instead of throwing onroleSource.startsWith(...)downstream. I traced the consumers —roleSourceis genuinely the only unsafe-access (throw-risk) field onResolvedOrgRole, and the guard matches the interface's required/optional split exactly. No false-rejection risk: every validOrgRolePersonaliteral is a non-empty string, solength > 0never rejects a legitimate entry.- redactKey doc caveat — picked the low-effort option on my round-3 nit and documented that the
vNanchor over-masks (never leaks) when a deployment namespace is itselfvN-shaped. Comment-only, accurate. 👍
Revision tracking — all RESOLVED ✅
- ✅ My round-3 nit: redactKey doc overstating "version immediately precedes the principal" → caveat added in
f61f4692. - ✅ Copilot (14:29):
isValidCacheEntrynot validatingresolvedtuple values haveroleSource→ guarded viaisResolvedOrgRolein4864d5b2.
Reconciliation with new bot comments
The one new Copilot comment this round (roleSource guard) is addressed — agreed, no duplication. No other new bot findings.
Open items
- 🔴 Blocking: 0
- 🟡 Minor: 0
- ⚪ Nit: 0 (one informational, no-action note from my security pass:
isResolvedOrgRoleaccepts any non-empty string rather than restricting to the fourOrgRolePersonaliterals — correct and forward-compatible for a crash-safety guard, since an unrecognized value is silently dropped by the defaultless switch, never thrown. Nothing to change.)
Decision: ✅ Approved
No blocking issues, no security findings, no merge conflicts, no false-rejection risk. The cache hardening is complete and the fail-soft/fail-closed semantics are sound end-to-end.
Reminder for merge: @jordane's earlier CHANGES_REQUESTED is still the standing review decision and predates the last several fix rounds — it's stale and just needs a quick re-review from them to clear before this can land.
Summary
OrgMembershipResolverServicewith a cross-instance, fail-soft Valkey-backed cache. The 3-replica UI (no sticky sessions) previously got ~1/3 cache hit rate and incoherent results across pods.getEffectiveUsername(req)— the same principal the query-service FGA-filters the data on) and fails closed when it can't be resolved — removing the?? 'anon'fallback that could serve one user's authorized membership data to another.ValkeyService(lazy connect, TLS viarediss://, per-op timeout, never throws) and migrateOrgRoleGrantsService(org-selector role grants) onto it, with Map↔ordered-entry-array (de)serialization.VALKEY_URLis set; unset = today's direct-fetch behavior. Enabled per-env via a companionlfx-v2-argocdPR.Behavior when disabled / cache down
Fail-soft: any cache fault, timeout, unset
VALKEY_URL, or unresolved principal degrades to a direct upstream fetch. Cache state never gates startup or/readyz(lazy client).Test plan
yarn lint:check(0 errors)yarn check-typesyarn format:checkyarn build(server + client)specs/026-valkey-shared-cache/quickstart.md(per-user isolation, cross-pod coherence, fail-soft) — see note belowResolves https://linuxfoundation.atlassian.net/browse/LFXV2-2184
Made with Cursor