Skip to content

Session activity interceptor with strip-on-consume#233

Merged
rajkannan-carousell merged 19 commits intomasterfrom
TRUST-5708
Mar 10, 2026
Merged

Session activity interceptor with strip-on-consume#233
rajkannan-carousell merged 19 commits intomasterfrom
TRUST-5708

Conversation

@rajkannan-carousell
Copy link
Copy Markdown
Contributor

@rajkannan-carousell rajkannan-carousell commented Mar 4, 2026

What is this?

Session activity tracking at the gRPC interceptor layer — publishes a structured Kafka event (service, action, status, duration, encoded session context) whenever a gateway-initiated, security-sensitive operation completes (e.g. ChangePassword, UpdateProfile, Login).

Why interceptor level, not application code?
Interceptors run for every gRPC handler automatically, capturing the real completion status and duration without touching business logic. Putting this in service code would scatter event publishing across every handler and risk missing error paths.

Note: **As per latest alignement, we just publish the activity event only for opted in API's via gateway from this interceptor. Also, we don't store the activity events right now and we only store when any signal changes like device id, country as metadata history with the reference that which activity triggered. **


How It Works

The interceptor fires only when x-session-track: true is present. After reading the tracking headers, it strips them from incoming metadata before calling handler(). Since ForwardMetadataInterceptor reads from incoming metadata, the stripped headers are never relayed downstream.

Gateway sets:  x-session-track: true  +  x-session-operation: change_password
                        │
               [Service A — interceptor]
                  1. reads x-session-track: true  → proceed
                  2. reads x-session-context      → proceed
                  3. reads x-session-operation    → "change_password"
                  4. STRIPS x-session-track + x-session-operation
                  5. handler(strippedCtx, req)     ← runs business logic
                  6. publishes SessionActivityEvent to Kafka
                        │
               Service A calls Service B (ForwardMetadataInterceptor sees stripped MD)
                  x-session-track  → ABSENT  ✅  no event at Service B
                  x-session-context → still present (useful for other purposes)

Why strip and not just skip? Skipping wouldn't prevent forwarding — ForwardMetadataInterceptor runs inside the handler and copies from incoming metadata. Stripping replaces the incoming context before the handler runs, so downstream calls never carry the header.


Opt-In Only

This interceptor is entirely opt-in at two levels:

  1. Service level — a service only participates if it adds GlobalSessionActivityInterceptor() to its interceptor chain and registers SessionInitializer.
  2. Request level — even on a participating service, the interceptor skips unless the caller explicitly sets x-session-track: true. Internal service-to-service calls never set this header (it gets stripped on first hop), so they are always silent.

No gateway changes → no events. No x-session-track → no events. Zero noise by default.


How to Onboard a Downstream Service

1. Register the initializer (wires the Kafka producer at startup):

// main.go or server setup
server := orion.GetDefaultServer("your-service")
server.AddInitializers(orion.SessionInitializer())

2. Add config (viper / config file):

orion:
  session_tracking:
    kafka_brokers: ["broker1:9092", "broker2:9092"]
    kafka_topic: "session-activities"   # optional, defaults to "session-activities"

3. Add interceptor to your service:

func (s *serviceImpl) GetInterceptors() []grpc.UnaryServerInterceptor {
    return append(
        cinterceptors.CoreServerInterceptors(),
        interceptors.GlobalSessionActivityInterceptor(), // ← add this
    )
}

4. Gateway sets the headers for operations you want tracked:

ctx = metadata.AppendToOutgoingContext(ctx, "x-session-track", "true")
ctx = metadata.AppendToOutgoingContext(ctx, "x-session-operation", "change_password")

If kafka_brokers is not configured, the initializer is a no-op and the interceptor silently drops events — startup is never blocked.


Files Changed

File What changed
interceptors/session_interceptor.go New interceptor: x-session-track guard, strip-on-consume, operation label, GlobalSessionActivityInterceptor, nil-producer warning
interceptors/session_interceptor_test.go Unit tests covering all guard, strip, event, and error scenarios
orion/session_initializer.go Wires Kafka producer to globals; non-blocking startup (goroutine + 10s timeout)
orion/session_initializer_test.go Initializer tests including timeout/error non-fatal cases
orion/config.go SessionTrackingConfig struct + viper config key constants

Test Plan

  • x-session-track absent → handler runs, no event
  • x-session-track: true + x-session-context present → event published
  • x-session-track: true + x-session-context empty → skip
  • After handler: x-session-track and x-session-operation stripped from incoming MD
  • ForwardMetadataInterceptor simulation: stripped headers not in outgoing MD
  • x-session-operation set → Action = label; absent → Action = info.FullMethod
  • Handler error → Status = gRPC code string
  • Nil producer → no panic, single warn via sync.Once
  • GlobalSessionActivityInterceptor uses globals from SessionInitializer
  • Kafka init timeout is non-fatal (service starts, tracking disabled)

@rajkannan-carousell rajkannan-carousell requested a review from a team as a code owner March 4, 2026 15:27
@rajkannan-carousell rajkannan-carousell changed the title Optional Session Interceptor for sending session activity event to consumers TRUST-5708: Session activity interceptor with strip-on-consume Mar 6, 2026
@rajkannan-carousell rajkannan-carousell self-assigned this Mar 6, 2026
@rajkannan-carousell rajkannan-carousell changed the title TRUST-5708: Session activity interceptor with strip-on-consume Session activity interceptor with strip-on-consume Mar 6, 2026
@bhadreswar-ghuku
Copy link
Copy Markdown
Contributor

🔴 CRITICAL Findings

C1. .gitignore adding *.md blocks all new markdown files.gitignore

The diff adds both CLAUDE.md and *.md to .gitignore. The *.md glob prevents ANY new markdown file from being tracked by git across the entire repo. Existing tracked .md files are unaffected, but new READMEs, changelogs, docs/ content, and contribution guides can never be committed. This is almost certainly unintentional — the author meant to ignore only their local CLAUDE.md file (already handled by the line above).

Fix: Remove the *.md line from .gitignore. The specific CLAUDE.md entry already covers the intended file.

@bhadreswar-ghuku
Copy link
Copy Markdown
Contributor

🟠 HIGH Findings

H1. Goroutine leak when Kafka is unavailableorion/session_initializer.go (PublishAsync method)

PublishAsync wraps a.producer.Produce() in a goroutine with a select/time.After timeout. When Produce blocks (Kafka down, sarama async producer input channel full), the timeout fires and PublishAsync returns an error — but the goroutine running Produce continues to block indefinitely. saramaProducer.Produce() writes to sp.p.Input() channel with no select on context cancellation; if the channel is full, the goroutine leaks.

Since the interceptor launches PublishAsync in its own goroutine on every tracked request (session_interceptor.go line ~121), a busy service under sustained Kafka backpressure could accumulate thousands of leaked goroutines.

Fix options:

  1. Pass a context with timeout to Produce and use select on ctx.Done() when writing to the sarama input channel.
  2. Use sarama's buffered Input() channel with bounded capacity and drop events when full, instead of blocking.
  3. At minimum, add a metrics counter for timeout events so operators can detect the leak.

H2. Excessive Info-level logging per tracked requestinterceptors/session_interceptor.go and orion/session_initializer.go

4 Info-level log lines fire per tracked request:

  1. session_interceptor.go ~line 93: Duration log for every tracked request
  2. session_interceptor.go ~line 119: Full event object logged at Info
  3. session_initializer.go ~line 177: Full JSON payload dumped at Info (string(payload))
  4. session_initializer.go ~line 186: Produce result logged at Info (even on success)

Since Orion is a shared framework imported by many services, these logs affect ALL adopting services. Under moderate traffic with session tracking enabled, this will flood log aggregation.

Fix: Move these to Debug level. Keep Error logs for failures only.


H3. Major dependency version bumps bundled with feature PRgo.mod

gRPC bumped from v1.56.3 to v1.76.0 (20 minor versions), Go from 1.22 to 1.25.3, plus ~30 other transitive dependency bumps. Since Orion is a shared framework, these version bumps cascade to ALL consuming services when they update their Orion dependency. The gRPC jump may include breaking API changes, deprecation removals, or behavioral changes. Sarama (the new dependency for Kafka) does not require gRPC — this bump likely comes from running go mod tidy on a newer Go toolchain.

Recommendation: Consider splitting dependency bumps into a separate PR. Verify gRPC v1.76 compatibility with consuming services (api-gw, user-svc, auth-svc at minimum) before merging.

@bhadreswar-ghuku
Copy link
Copy Markdown
Contributor

🧪 Test Suggestions

🟠 HIGH T1. PublishAsync goroutine leak testorion/session_initializer_test.go

Test that when Produce blocks forever and the timeout fires, the adapter does not accumulate goroutines. Use runtime.NumGoroutine() before/after to detect leaks. This validates the goroutine leak scenario described in H1.

- Remove overly broad *.md gitignore entry that blocked all markdown files
- Move per-request session tracking logs from Info to Debug level
- Keep timeout log at Error level for operational visibility
Use context.WithTimeout in PublishAsync instead of goroutine + time.After.
Produce now uses select on ctx.Done() so the goroutine exits when the
context expires, preventing accumulation under Kafka backpressure.
@rajkannan-carousell rajkannan-carousell merged commit df3cff4 into master Mar 10, 2026
0 of 2 checks passed
@rajkannan-carousell rajkannan-carousell deleted the TRUST-5708 branch March 10, 2026 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants