feat: add getSessionEvents tRPC endpoint to cloud-agent-next#415
Open
jeanduplessis wants to merge 1 commit intomainfrom
Open
feat: add getSessionEvents tRPC endpoint to cloud-agent-next#415jeanduplessis wants to merge 1 commit intomainfrom
jeanduplessis wants to merge 1 commit intomainfrom
Conversation
Add an HTTP-based query endpoint for retrieving stored execution events from the Durable Object's SQLite storage. This enables server-side consumers (e.g. security agent callback handlers) to fetch events without requiring a WebSocket connection. Worker changes: - Add queryEvents() public RPC method to CloudAgentSession DO - Add getSessionEvents tRPC query with protectedProcedure auth - Add GetSessionEventsInput/Output zod schemas with validation Client changes: - Add StoredEvent and GetSessionEventsInput types - Add getSessionEvents() method to CloudAgentNextClient Tests: - 11 unit tests covering retrieval, filter forwarding, auth, error propagation, and user isolation
Contributor
Code Review SummaryStatus: No Issues Found | Recommendation: Merge This PR adds a well-structured
Files Reviewed (5 files)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
getSessionEventstRPC query endpoint to the cloud-agent-next worker, enabling server-side consumers to retrieve stored execution events via HTTP (without WebSocket)getSessionEvents()method and types toCloudAgentNextClientin the Next.js appplans/security-agent-migrate-cloud-agent-next.md)Changes
Worker (
cloud-agent-next/)src/persistence/CloudAgentSession.tsqueryEvents()public RPC method delegating toeventQueries.findByFilters()src/router/schemas.tsGetSessionEventsInput,StoredEventSchema,GetSessionEventsOutputzod schemassrc/router/handlers/session-management.tsgetSessionEventsquery handler withprotectedProcedureauth andwithDORetrysrc/router.test.tsClient (
src/lib/cloud-agent-next/)cloud-agent-client.tsGetSessionEventsInput,StoredEventtypes; addgetSessionEventsentry to TRPC client type; addgetSessionEvents()method with Sentry captureDesign
protectedProcedure(same asgetSession) — requires valid user tokenuserId:sessionId— users can only query their own sessionscloudAgentSessionId(required),eventTypes,executionId,fromId,limit(default 500, max 1000)executionId→ array: Converts toexecutionIds: [id]for the DO'sfindByFilters, simpler for callersrouter.tschange needed: Handler auto-registers viacreateSessionManagementHandlers()spreadVerification
cloud-agent-nexttypecheck: passescloud-agent-nextunit tests: 639/639 pass (11 new)customLlmRequest.tsfailures only)Deployment
This is purely additive — a new query endpoint with no changes to existing behavior. Zero risk to current consumers. Should be deployed to the Cloudflare Worker before the main security agent migration PR (Phase 2+).
Plan
# Security Agent: Migrate from cloud-agent to cloud-agent-nextBackground
The security agent's Tier 2 sandbox analysis currently uses the old
cloud-agentSSE-based streaming API (client.initiateSessionStream()). This needs to migrate tocloud-agent-next, which uses a two-stepprepareSession+initiateFromPreparedSessionpattern with callback-based completion notifications instead of SSE streaming.Current integration (3 import points)
src/lib/security-agent/services/analysis-service.tscreateCloudAgentClientsrc/lib/security-agent/services/analysis-service.tsStreamEvent,SystemKilocodeEventprocessAnalysisStreamsrc/routers/security-agent-router.tsgetGitHubTokenForUsersrc/routers/organizations/organization-security-agent-router.tsgetGitHubTokenForOrganizationCurrent flow (Tier 2 only)
Target flow
Phase 1: Add
getSessionEventstRPC endpoint to cloud-agent-next workerThe cloud-agent-next worker stores all execution events in DO SQLite via
eventQueries(cloud-agent-next/src/session/queries/events.ts), but this is only accessible through the WebSocket/streamendpoint. A new tRPC query is needed for server-side consumers to retrieve events without WebSocket.1.1 Add public RPC method to
CloudAgentSessionDOFile:
cloud-agent-next/src/persistence/CloudAgentSession.tsAdd a new public method that wraps
this.eventQueries.findByFilters():The
eventQueriesfield is currently private (line ~86). The new public method delegates to it without exposing the field directly.1.2 Add tRPC handler in session-management
File:
cloud-agent-next/src/router/handlers/session-management.tsAdd a
getSessionEventsquery tocreateSessionManagementHandlers():cloudAgentSessionId(required),eventTypes(optional string array),executionId(optional string),limit(optional number, default 500, max 1000)protectedProcedure(same asgetSession)userId:sessionId, callsstub.queryEvents(filters)viawithDORetryStoredEventobjects1.3 Add input/output schemas
File:
cloud-agent-next/src/router/schemas.tsAdd
GetSessionEventsInputandGetSessionEventsOutputzod schemas.1.4 Register in router
File:
cloud-agent-next/src/router.tsThe handler is returned from
createSessionManagementHandlers(), so it registers automatically — no change needed torouter.ts.1.5 Add
getSessionEventsmethod toCloudAgentNextClientFile:
src/lib/cloud-agent-next/cloud-agent-client.tsAdd:
GetSessionEventsInputtype (mirrors the worker schema)StoredEventtype (matchescloud-agent-next/src/websocket/types.ts:143-156)getSessionEventsentry inCloudAgentNextTRPCClienttypegetSessionEvents(input)method onCloudAgentNextClientclass1.6 Write tests
File:
cloud-agent-next/src/router/handlers/session-management.test.ts(or new test file)Test the
getSessionEventsendpoint with:Phase 2: Create internal callback endpoint
2.1 Create the callback route
New file:
src/app/api/internal/security-analysis-callback/[findingId]/route.tsThis endpoint receives the
ExecutionCallbackPayloadfrom cloud-agent-next when the sandbox analysis completes, fails, or is interrupted.Pattern: follows existing internal API convention (see
src/app/api/internal/code-review-status/[reviewId]/route.ts):2.2 Implement
handleAnalysisCompletedWhen status is
'completed':CloudAgentNextClientusing a service-level auth token (or the stored auth token)client.getSessionEvents({ cloudAgentSessionId, eventTypes: ['kilocode'] })to fetch kilocode eventsmessage.updatedormessage.part.updatedevents with the final assistant text)extractSandboxAnalysis()— unchanged from current codeupdateAnalysisStatus(findingId, 'completed', { analysis })isExploitable === false2.3 Implement
handleAnalysisFailedWhen status is
'failed'or'interrupted':updateAnalysisStatus(findingId, 'failed', { error: payload.errorMessage })2.4 Store analysis context for callback retrieval
The callback endpoint needs context that was available during
startSecurityAnalysisbut isn't in the callback payload. Two approaches:Option A (simpler): Store a JSON blob in the
security_findingstable when analysis starts — add/reuse a column (e.g.,analysis_context) containing{ model, userId, authToken, correlationId, organizationId, owner }. The callback handler reads this.Option B: Encode minimal context in the callback URL path or query params (e.g.,
/api/internal/security-analysis-callback/[findingId]?model=...&userId=...). Less clean but avoids schema changes.Recommendation: Option A. The
security_findingstable already storesanalysisJSON with similar metadata. We can store the needed context (model, userId, organizationId, correlationId) in the existinganalysisJSON field when starting the analysis. The callback handler then reads it back from the finding.For the auth token specifically: since the callback may arrive minutes later (after the original request has returned), the auth token from the original request may be expired. The callback handler should generate a fresh service-level API token using
generateApiToken()for the user stored in the analysis context, or use a system-level token if Tier 3 extraction supports it.Phase 3: Rewrite Tier 2 in analysis-service.ts
3.1 Replace client creation and session initiation
File:
src/lib/security-agent/services/analysis-service.tsReplace lines 709-747 (the entire Tier 2 section):
Old code (lines 724-747):
New code:
3.2 Store analysis context for callback
When updating the finding to
'pending'with partial analysis (line 714-721), include the context the callback handler will need:The callback handler can retrieve
model,triggeredByUserId,correlationIdfromfinding.analysis— these are already stored. TheorganizationIdandownercan be derived from the finding's ownership fields. This means no schema changes needed — the existing analysis JSON has everything.3.3 Handle
prepareSession/initiateFromPreparedSessionerrorsWrap both calls in try/catch:
prepareSessionfailure →updateAnalysisStatus(findingId, 'failed', { error })initiateFromPreparedSessionfailure → same, plus clean up the prepared session viaclient.deleteSession(cloudAgentSessionId)InsufficientCreditsError→ propagate up (same as current behavior)Phase 4: Delete dead code
4.1 Remove
processAnalysisStreamfunctionFile:
src/lib/security-agent/services/analysis-service.ts(lines 352-557)This ~200-line function is entirely replaced by the callback handler. Delete it.
4.2 Remove
fetchLastAssistantMessagefunctionFile:
src/lib/security-agent/services/analysis-service.ts(lines 143-237)This function fetches results from R2 blobs via the old
cli_sessionstable. No longer needed — results come fromgetSessionEventsvia the callback handler. Delete it.4.3 Remove helper types and functions
File:
src/lib/security-agent/services/analysis-service.tsRawCliMessagetype (lines 119-127) — R2 blob message format, no longer neededgetCliMessageContentfunction (lines 132-137) — R2 blob helper, no longer neededisSessionCreatedEventfunction (lines 108-110) — old SSE event helper, no longer needed4.4 Remove old imports
File:
src/lib/security-agent/services/analysis-service.tsRemove:
4.5 Move
finalizeAnalysisto callback handlerThe
finalizeAnalysisfunction (lines 249-339) contains the Tier 3 extraction + storage + auto-dismiss logic. This logic needs to move into (or be called from) the callback handler'shandleAnalysisCompleted. It can remain as an exported function inanalysis-service.tsand be imported by the callback route, or be moved to a shared module.Phase 5: Update frontend session links
5.1 Verify link format compatibility
The
/cloud/chat?sessionId=URL already auto-routes between old and new UIs based on the session ID prefix:ses_prefix → routes to oldCloudChatPageWrapperses_prefix → routes to newCloudChatPageWrapperNextFile:
src/app/(app)/cloud/chat/page.tsx(line 14):isNewSession(sessionId)checks forses_prefix.Since
prepareSessionreturns akiloSessionIdwith ases_prefix, the existing link format already works — no URL structure change needed.5.2 Update security findings to use
kiloSessionIdThe
cli_session_idfield insecurity_findingscurrently stores the old cli session ID (UUID format). After migration, it will store thekiloSessionIdfromprepareSession(which hasses_prefix). The frontend components that readcli_session_idto construct the link (FindingDetailDialog.tsx:114,269-270,315-316andAnalysisJobsCard.tsx:302,307-308) don't need changes — they just pass the ID as a query param.Verify: no changes needed to:
src/components/security-agent/FindingDetailDialog.tsxsrc/components/security-agent/AnalysisJobsCard.tsxPhase 6: Extract result from cloud-agent-next events
The callback handler needs to extract the final analysis text from the events returned by
getSessionEvents. The event format in cloud-agent-next differs from the old R2 blob format.6.1 Understand the event structure
Events in cloud-agent-next have
stream_event_typevalues like:kilocode— Kilocode CLI structured events (session lifecycle, message updates, etc.)complete— Execution completedoutput— stdout/stderrerror— Error occurredThe
payloadfield is a JSON string. Forkilocodeevents, the payload contains nested event data following the OpenCode SDK message format (Message + Part[]).6.2 Implement result extraction
Create a helper function (e.g.,
extractResultFromEvents) in the security agent service layer:getSessionEvents({ cloudAgentSessionId, eventTypes: ['kilocode'], limit: 1000 })payloadJSONThe exact payload parsing logic needs to be derived from how the
EventProcessor(src/lib/cloud-agent-next/processor/event-processor.ts) assembles messages from events. The processor handles events likemessage.updated,message.part.updated,session.status, etc.Alternatively, since the security agent only needs the final text result (not real-time streaming), a simpler approach may work:
completeevent typekilocodeevents before thecompleteeventThis extraction logic should be developed alongside Phase 1 testing, using real event data to validate the parsing.
Summary of files changed
cloud-agent-next/src/persistence/CloudAgentSession.tsqueryEvents()public RPC methodcloud-agent-next/src/router/handlers/session-management.tsgetSessionEventstRPC query handlercloud-agent-next/src/router/schemas.tsGetSessionEventsInput/Outputschemassrc/lib/cloud-agent-next/cloud-agent-client.tsStoredEventtype,getSessionEvents()method, updateCloudAgentNextTRPCClientsrc/app/api/internal/security-analysis-callback/[findingId]/route.tssrc/lib/security-agent/services/analysis-service.tssrc/components/security-agent/FindingDetailDialog.tsxses_prefixed session IDssrc/components/security-agent/AnalysisJobsCard.tsxFiles NOT changed
src/routers/security-agent-router.tsgetGitHubTokenForUseris a shared utility, not cloud-agent-specificsrc/routers/organizations/organization-security-agent-router.tsgetGitHubTokenForOrganizationstayssrc/lib/security-agent/services/triage-service.tssrc/lib/security-agent/services/extraction-service.tssecurity_findingssession_idandcli_session_idcolumns reused with new valuesRisks and considerations
Auth token lifetime: The callback may arrive minutes after the original request. The stored
authTokenmay be expired. The callback handler should generate a fresh token usinggenerateApiToken()for the user found in the analysis context (stored infinding.analysis.triggeredByUserId). This requires loading the user from the DB in the callback handler.Event payload parsing: The exact format of
kilocodeevents in cloud-agent-next needs to be validated against real execution data. TheEventProcessoris complex (~460 lines) because it handles streaming deltas. The security agent only needs the final result, so a simpler parser should suffice, but it needs to be tested with actual events.Reliability improvement: The callback pattern with Cloudflare Queue (5 retries, exponential backoff) is more reliable than the current SSE stream, which can silently fail if the Next.js process loses the connection.
cloud-agent-next worker deployment: Phase 1 requires deploying changes to the cloud-agent-next Cloudflare Worker. This should be deployed and verified before the Next.js changes go live.
Feature flag / gradual rollout: Consider gating the migration behind a feature flag or rolling it out per-organization. This allows fallback to the old cloud-agent path if issues arise. The triage-only path (Tier 1) is unaffected and continues working regardless.
Backwards compatibility of session links: Old findings analyzed before the migration will still have old-format session IDs in
cli_session_id. The/cloud/chatpage already handles both formats (line 14 ofpage.tsxchecksisNewSession), so old links continue working.PR and Deployment Strategy
This migration spans two independently deployed systems (the cloud-agent-next Cloudflare Worker and the Next.js app), so deployment order matters.
PR 1: Add
getSessionEventsto cloud-agent-next workerScope: Phases 1.1–1.4, 1.6 (DO method, tRPC handler, schemas, tests)
Files changed:
cloud-agent-next/src/persistence/CloudAgentSession.tscloud-agent-next/src/router/handlers/session-management.tscloud-agent-next/src/router/schemas.tsDeploy: Goes out first to the Cloudflare Worker. Purely additive — a new query endpoint with no changes to existing behavior. Zero risk to current consumers.
Verification: Call the new endpoint against an existing session to confirm it returns events correctly.
PR 2: Add
getSessionEventstoCloudAgentNextClientScope: Phase 1.5 (client-side types and method)
Files changed:
src/lib/cloud-agent-next/cloud-agent-client.tsDeploy: Merges to the Next.js app. Also purely additive — adds a method nobody calls yet. Can be deployed independently once PR 1's worker deploy is live.
Note: PR 2 could be bundled into PR 3, but keeping it separate makes reviews smaller and lets you verify the client method works in isolation.
PR 3: Security agent migration (main PR)
Scope: Phases 2, 3, 4, 5, 6
Files changed:
src/app/api/internal/security-analysis-callback/[findingId]/route.ts(new)src/lib/security-agent/services/analysis-service.ts(major rewrite)Prerequisite: PR 1 deployed to the worker, PR 2 merged.
This is the critical PR. It switches the security agent from old cloud-agent to cloud-agent-next. Everything in this PR is behind the existing Tier 2 code path (only runs when
forceSandbox || triage.needsSandboxAnalysis), so Tier 1 triage-only analyses are completely unaffected.Deploy: Standard Next.js deploy. Once live, all new Tier 2 analyses use cloud-agent-next.
Verification:
forceSandbox: trueor a finding that triage routes to Tier 2)Deployment order
PR 1 must be deployed before PR 3 goes live. PR 2 must be merged before PR 3. PR 1 and PR 2 have no ordering dependency on each other (they change different codebases), but PR 1's deploy must be live before anyone calls
getSessionEvents.Rollback plan
failed(callback arrives but the handler is gone) — these can be re-analyzed.getSessionEvents). But PR 1 is additive-only, so there's no reason to revert it.Feature flag consideration
The migration could be gated behind a feature flag (e.g.,
security-agent-cloud-agent-next) instartSecurityAnalysis. The flag would control which client to use — oldcreateCloudAgentClient+ SSE stream path vs newcreateCloudAgentNextClient+ callback path. This adds code complexity (both paths coexist temporarily) but allows per-org or percentage-based rollout. Given that the security agent is relatively low-traffic (runs per-finding, not per-request), a flag may be overkill — but it's available if the team prefers it.Cleanup PR (optional, after confidence)
Once the migration is verified in production:
getGitHubTokenForUser/getGitHubTokenForOrganizationhelpers should be moved out of thecloud-agent/directory into a shared location (they're used by both old and new consumers)