Conversation
Deploying hapi with
|
| Latest commit: |
b210308
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://d17c5faf.hapi-bqd.pages.dev |
| Branch Preview URL: | https://openclaw.hapi-bqd.pages.dev |
There was a problem hiding this comment.
Findings
- [Major] Rejected OpenClaw sends are persisted as delivered user messages — the hub inserts and broadcasts the user message before the plugin accepts it, so any official-mode
409/transport failure leaves a visible message that never reached OpenClaw. Evidencehub/src/openclaw/OpenClawChatService.ts:197, contextopenclaw-plugin/src/routes.ts:100.
Suggested fix:const storedUserMessage = this.store.openclawMessages.addMessage({ conversationId: input.conversationId, namespace: input.namespace, role: 'user', text: input.text, status: 'failed' }) try { const ack = await this.client.sendMessage({ ... }) this.store.openclawMessages.updateStatus(input.namespace, storedUserMessage.id, 'completed') this.publisher.message(input.namespace, input.conversationId, toMessage({ ...storedUserMessage, status: 'completed' })) } catch (error) { this.publisher.message(input.namespace, input.conversationId, toMessage(storedUserMessage)) throw error }
- [Major] Upstream OpenClaw transport errors are flattened into generic
500responses — the plugin already returns a retryable409for busy conversations, butrequestJson()throws a plainErrorandtoErrorResponse()converts it to500, so the web client loses the retry signal and sees an internal-server error instead. Evidencehub/src/openclaw/client.ts:187,hub/src/web/routes/openclaw.ts:26.
Suggested fix:class OpenClawUpstreamError extends Error { constructor(readonly status: number, readonly body: string) { super(body || `OpenClaw upstream HTTP ${status}`) } } if (!response.ok) { throw new OpenClawUpstreamError(response.status, bodyText) }
- [Major] Approval failures are acknowledged and then silently dropped — both approval microtasks swallow all exceptions, so a callback/signature/network failure leaves the request pending with no retry and no log line for operators. Evidence
openclaw-plugin/src/routes.ts:172,openclaw-plugin/src/routes.ts:218.
Suggested fix:}).catch((error) => { deps.idempotencyCache.delete(idempotencyKey) deps.logger.error( `[${deps.namespace}] hapi-openclaw approve task failed requestId=${requestId}: ` + (error instanceof Error ? error.message : String(error)) ) })
Summary
- Review mode: initial. Three major issues found: official-mode send failures can create phantom user messages, retryable upstream errors are exposed as generic
500s, and approval failures are silently dropped after the plugin already acked them.
Testing
- Not run (automation).
bunis unavailable in this runner, so I could not execute the added Bun/Vitest suites locally.
HAPI Bot
There was a problem hiding this comment.
Findings
- [Major] Approval resolutions are still dropped when the plugin cannot deliver the post-ack callback. Evidence
openclaw-plugin/src/routes.ts:115; the hub only clears pending approvals after ingesting anapproval-resolvedcallback (hub/src/openclaw/OpenClawChatService.ts:462).
Suggested fix:for (let attempt = 0; attempt < 3; attempt += 1) { try { await dispatchMaybeEvents(deps.callbackClient, events) return } catch (error) { if (attempt === 2) { deps.logger.error( `[${deps.namespace}] hapi-openclaw ${input.kind} callback failed ` + `conversation=${input.conversationId} requestId=${input.requestId}: ${formatError(error)}` ) break } await new Promise((resolve) => setTimeout(resolve, 1000 * (attempt + 1))) } }
- [Minor] The plugin idempotency cache never evicts successful command acks, so memory grows for every message and approval handled by a long-lived gateway. Evidence
openclaw-plugin/src/routes.ts:159, repeated atopenclaw-plugin/src/routes.ts:216andopenclaw-plugin/src/routes.ts:259.
Suggested fix:const IDEMPOTENCY_TTL_MS = 5 * 60_000 function rememberAck(idempotencyKey: string, ack: PluginCommandAck) { deps.idempotencyCache.set(idempotencyKey, ack) setTimeout(() => deps.idempotencyCache.delete(idempotencyKey), IDEMPOTENCY_TTL_MS) }
Summary
- Review mode: follow-up after new commits. Two issues remain: approval state can still diverge from OpenClaw after a successful approve/deny callback delivery failure, and the in-memory idempotency cache is unbounded.
Testing
- Not run (static review only; executing PR code is disallowed in this workflow).
HAPI Bot
There was a problem hiding this comment.
Findings
- [Major] Repeated assistant transcript updates are dropped after the first one, so HAPI can freeze on a partial reply and leave the running state too early. Evidence
openclaw-plugin/src/transcriptBridge.ts:23,openclaw-plugin/src/transcriptBridge.ts:29; related contexthub/src/store/openclawMessages.ts:108.
Suggested fix:const event = normalizeAssistantTranscriptEvent(update) if (!event) { return } await callbackClient.postEvent(event) // Let RealOpenClawAdapter.sendMessage() emit the terminal thinking=false state.
Summary
- Review mode: follow-up after new commits. One issue remains: the transcript bridge dedupes by
externalMessageId, but the hub explicitly supports multiplereplace/appendupdates for the same message id. Streaming or snapshot-style assistant updates will stop after the first bridged event, and the same path can clearthinkingbefore the embedded run actually finishes. - Residual risk: Not found in repo/tests is a service-level test that sends multiple transcript updates with the same
messageIdand asserts the final text reaches HAPI.
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
| return | ||
| } | ||
|
|
||
| if (!adapterState.rememberTranscriptMessage(event.externalMessageId)) { |
There was a problem hiding this comment.
[MAJOR] This dedupes by externalMessageId, so only the first transcript update for an assistant reply is forwarded. The hub intentionally supports repeated replace/append updates on the same logical message (hub/src/store/openclawMessages.ts:108), so a streaming reply will freeze on the first partial snapshot. Because this same path calls finishRun() immediately afterward, the UI can also leave the thinking state before runEmbeddedPiAgent() finishes.
Suggested fix:
const event = normalizeAssistantTranscriptEvent(update)
if (!event) {
return
}
await callbackClient.postEvent(event)
// Let RealOpenClawAdapter.sendMessage() emit the terminal thinking=false state.There was a problem hiding this comment.
Findings
- [Major] Existing OpenClaw conversations are never rebound to the plugin's new deterministic session key, so upgraded users can keep sending through a stale
external_idand never receive assistant replies.getOrCreateDefaultConversation()only callsensureDefaultConversation()for first-time rows, butsendMessage()still forwardsconversation.externalId, while the real plugin now emits/parses only HAPI session keys. Evidencehub/src/openclaw/OpenClawChatService.ts:125,hub/src/openclaw/OpenClawChatService.ts:217,openclaw-plugin/src/openclawAdapter.ts:138,openclaw-plugin/src/transcriptEvents.ts:67.
Suggested fix:const ensured = await this.client.ensureDefaultConversation({ externalUserKey: input.userKey }) if (!conversation) { conversation = this.store.openclawConversations.getOrCreateConversation(input.namespace, input.userKey, { externalId: ensured.conversationId, title: ensured.title ?? 'OpenClaw', status: 'ready' }) } else if (conversation.externalId !== ensured.conversationId) { conversation = this.store.openclawConversations.rebindConversation( conversation.id, input.namespace, ensured.conversationId, ensured.title ?? conversation.title )! }
Summary
Review mode: follow-up after new commits.
One major issue remains: the hub still never reconciles an existing conversation row with the plugin's deterministic conversation id, so upgraded installs can retain a legacy external_id and silently lose assistant replies.
Residual risk: Not found in repo/tests is a migration/self-heal case that starts with a legacy external_id, calls getOrCreateDefaultConversation(), then verifies sendMessage() plus transcript callbacks land in the same thread.
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
| namespace: string | ||
| userKey: string | ||
| }): Promise<OpenClawConversationSummary> { | ||
| let conversation = this.store.openclawConversations.getConversationByUserKey(input.namespace, input.userKey) |
There was a problem hiding this comment.
[MAJOR] This only calls ensureDefaultConversation() for first-time rows. If a user already has a stored openclaw_conversations.external_id from the earlier non-deterministic builds, sendMessage() keeps forwarding that stale id, but the real plugin/transcript bridge now expect HAPI session keys (agent:...). The user message is accepted, then the assistant transcript is dropped because parseHapiSessionKey() returns null.
Suggested fix:
const ensured = await this.client.ensureDefaultConversation({ externalUserKey: input.userKey })
if (!conversation) {
conversation = this.store.openclawConversations.getOrCreateConversation(input.namespace, input.userKey, {
externalId: ensured.conversationId,
title: ensured.title ?? 'OpenClaw',
status: 'ready'
})
} else if (conversation.externalId !== ensured.conversationId) {
conversation = this.store.openclawConversations.rebindConversation(
conversation.id,
input.namespace,
ensured.conversationId,
ensured.title ?? conversation.title
)!
}There was a problem hiding this comment.
Findings
- [Major] In-flight duplicate callbacks can replay append deltas —
openclawIngressonly short-circuits when an existing receipt already hasprocessed_at, so the same signed event can still fall through toingestInboundEvent()twice if OpenClaw retries before the first request finishes. Formessageevents withcontent.mode === 'append', that replays the same delta and duplicates assistant text. Evidencehub/src/web/routes/openclawIngress.ts:54,hub/src/store/openclawReceipts.ts:48,hub/src/store/openclawMessages.ts:136.
Suggested fix:const claim = store.openclawReceipts.claim({ namespace: event.namespace, eventId: event.eventId, upstreamConversationId: event.conversationId, eventType: event.type }) if (!claim.acquired) { return c.json({ ok: true, duplicate: true }) } try { await service.ingestInboundEvent(event) store.openclawReceipts.markProcessed(event.namespace, event.eventId) return c.json({ ok: true }) } catch (error) { store.openclawReceipts.release(event.namespace, event.eventId) throw error }
Summary
Review mode: follow-up after new commits.
One major issue remains: concurrent retries of the same inbound OpenClaw event are still processed twice before the receipt is marked processed, which can duplicate assistant transcript text.
Residual risk: Not found in repo/tests is a concurrent duplicate callback case that posts the same signed message event twice before the first request finishes and asserts the delta is applied once.
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
There was a problem hiding this comment.
Findings
- [Major] Assistant transcript callbacks are acknowledged before delivery but still dropped after a single transient failure, so accepted OpenClaw replies can disappear permanently. Evidence
openclaw-plugin/src/routes.ts:193,openclaw-plugin/src/transcriptBridge.ts:34. - [Major] Streaming OpenClaw SSE events invalidate three active queries immediately on every chunk, creating a refetch storm in the main chat path. Evidence
web/src/hooks/useSSE.ts:358,openclaw-plugin/src/transcriptBridge.test.ts:18.
Summary
Review mode: follow-up after new commits.
Two major issues remain: one reliability bug that can lose assistant replies after transient callback failures, and one client-side performance regression during streaming.
Residual risk: Not found in repo/tests are failure-retry coverage for transcript callback delivery or batching coverage for repeated OpenClaw SSE updates.
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
| const runtime = runtimeStore.getRuntime() | ||
|
|
||
| stopListening = runtime.events.onSessionTranscriptUpdate((update) => { | ||
| void handleTranscriptUpdate(callbackClient, update).catch((error) => { |
There was a problem hiding this comment.
[MAJOR] A single transient callback failure permanently drops this assistant update. send-message is already ACKed before the async work runs, and this handler only logs on postEvent() failure, so HAPI has no retry path to recover the missing reply/state later.
Suggested fix:
async function postEventWithRetry(callbackClient: HapiCallbackClient, event: HapiCallbackEvent) {
for (let attempt = 0; attempt < 3; attempt += 1) {
try {
await callbackClient.postEvent(event)
return
} catch (error) {
if (attempt === 2) throw error
await delay(1000 * (attempt + 1))
}
}
}
stopListening = runtime.events.onSessionTranscriptUpdate((update) => {
void handleTranscriptUpdate(callbackClient, update, postEventWithRetry).catch((error) => {
const message = error instanceof Error ? error.message : String(error)
ctx.logger.error(`Failed to bridge transcript update: ${message}`)
})
})
web/src/hooks/useSSE.ts
Outdated
| scheduleInvalidationFlush() | ||
| } | ||
|
|
||
| const queueOpenClawInvalidation = (conversationId: string) => { |
There was a problem hiding this comment.
[MAJOR] This turns every OpenClaw SSE event into three immediate refetches. The transcript bridge emits repeated updates for the same assistant message while it grows, so a long reply will keep hammering /messages, /state, and /conversation instead of batching once per frame/tick.
Suggested fix:
const pendingOpenClawConversationIds = new Set<string>()
const queueOpenClawInvalidation = (conversationId: string) => {
pendingOpenClawConversationIds.add(conversationId)
scheduleInvalidationFlush()
}
for (const conversationId of pendingOpenClawConversationIds) {
tasks.push(queryClient.invalidateQueries({ queryKey: queryKeys.openclawMessages(conversationId) }))
tasks.push(queryClient.invalidateQueries({ queryKey: queryKeys.openclawState(conversationId) }))
}
pendingOpenClawConversationIds.clear()There was a problem hiding this comment.
Findings
- [Major]
openclaw-statecan regress to stale data after an SSE state/approval update.useSSEnow mutates theopenclawStatecache in place, butuseOpenClawStatestill renders raw query results. If/api/openclaw/statestarted before the SSE event, its older response can land afterwards and overwrite the newer pending approvals orthinkingflag until the user manually refreshes. Evidenceweb/src/hooks/useSSE.ts:372,web/src/hooks/useSSE.ts:379,web/src/hooks/useSSE.ts:386,web/src/hooks/queries/useOpenClawState.ts:15,hub/src/openclaw/OpenClawChatService.ts:469.
Suggested fix:if (event.type === 'openclaw-state') { applyOpenClawStateUpdate(event) void queryClient.invalidateQueries({ queryKey: queryKeys.openclawState(event.conversationId) }) } if (event.type === 'openclaw-approval-request' || event.type === 'openclaw-approval-resolved') { // keep the immediate UI patch, but revalidate once so stale in-flight GETs cannot win void queryClient.invalidateQueries({ queryKey: queryKeys.openclawState(event.conversationId) }) }
- [Major] Accepted
send-messagework still depends on single-shotstatecallbacks. The route returnsaccepted: truebefore the background run starts, butRealOpenClawAdapter.sendMessage()poststhinking=true/falseevents with no retry. A transient callback failure at the first post aborts the run beforerunEmbeddedPiAgent(...); a failure at the final post can leave HAPI stuck in the wrong state with no recovery path. Evidenceopenclaw-plugin/src/routes.ts:193,openclaw-plugin/src/openclawAdapter.ts:159,openclaw-plugin/src/openclawAdapter.ts:201,openclaw-plugin/src/openclawAdapter.ts:221.
Suggested fix:const postStateWithRetry = async (thinking: boolean, lastError: string | null) => { for (let attempt = 0; attempt < 3; attempt += 1) { try { await this.callbackClient.postEvent(createStateEvent({ namespace, conversationId: action.conversationId, thinking, lastError })) return } catch (error) { if (attempt === 2) throw error await delay(1000 * (attempt + 1)) } } }
Summary
Review mode: follow-up after new commits.
Two major reliability issues remain: OpenClaw state/approval SSE updates can still be overwritten by older in-flight fetches, and accepted send-message runs still rely on one-shot state callbacks. Not found in repo/tests is coverage for SSE-vs-fetch ordering on openclaw-state or transient callback failures in RealOpenClawAdapter.sendMessage().
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
web/src/hooks/useSSE.ts
Outdated
| ) | ||
| } | ||
|
|
||
| const applyOpenClawStateUpdate = (event: Extract<SyncEvent, { type: 'openclaw-state' }>) => { |
There was a problem hiding this comment.
[MAJOR] openclaw-state is now vulnerable to stale in-flight GETs winning after a newer SSE update. These handlers patch the cache, but they never revalidate it; useOpenClawState() still renders raw query data, so an older /api/openclaw/state response can arrive later and erase the newer approval/thinking state.
Suggested fix:
if (event.type === 'openclaw-state') {
applyOpenClawStateUpdate(event)
void queryClient.invalidateQueries({ queryKey: queryKeys.openclawState(event.conversationId) })
}
if (event.type === 'openclaw-approval-request' || event.type === 'openclaw-approval-resolved') {
void queryClient.invalidateQueries({ queryKey: queryKeys.openclawState(event.conversationId) })
}| this.logger.info(`[${namespace}] hapi-openclaw send-message start conversation=${action.conversationId}`) | ||
|
|
||
| try { | ||
| await this.callbackClient.postEvent(createStateEvent({ |
There was a problem hiding this comment.
[MAJOR] send-message is acknowledged before the background run starts, but these state callbacks are still single-shot. If the initial thinking=true post fails transiently, the embedded run never starts; if the terminal thinking=false post fails, HAPI can stay stuck in the wrong state. The transcript retry queue does not cover these direct postEvent(...) calls.
Suggested fix:
const postStateWithRetry = async (thinking: boolean, lastError: string | null) => {
for (let attempt = 0; attempt < 3; attempt += 1) {
try {
await this.callbackClient.postEvent(createStateEvent({
namespace,
conversationId: action.conversationId,
thinking,
lastError
}))
return
} catch (error) {
if (attempt === 2) throw error
await delay(1000 * (attempt + 1))
}
}
}There was a problem hiding this comment.
Findings
- [Major]
openclaw-messagecan still regress to stale content after a newer SSE update.useSSEnow patches the OpenClaw message query in place, but it never revalidates it. If/api/openclaw/messagesstarted before a streaming update, that older response can land afterwards andmergeLatestOpenClawMessages()will replace the newer text/status for the same message id. Evidenceweb/src/hooks/useSSE.ts:365,web/src/hooks/useSSE.ts:584,web/src/hooks/queries/useOpenClawMessages.ts:48,web/src/hooks/queries/useOpenClawMessages.ts:184.
Suggested fix:if (event.type === 'openclaw-message') { applyOpenClawMessageUpdate(event) void queryClient.invalidateQueries({ queryKey: queryKeys.openclawMessages(event.conversationId) }) }
Summary
Review mode: follow-up after new commits.
One major reliability issue remains: openclaw-message SSE patches can still be overwritten by older in-flight /api/openclaw/messages responses. Not found in repo/tests is coverage for SSE-vs-fetch ordering on streaming message updates.
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
web/src/hooks/useSSE.ts
Outdated
| } | ||
|
|
||
| if (event.type === 'openclaw-message') { | ||
| applyOpenClawMessageUpdate(event) |
There was a problem hiding this comment.
[MAJOR] openclaw-message is still vulnerable to stale in-flight GETs winning after a newer SSE update. This handler patches the query cache, but it never revalidates it; when an older /api/openclaw/messages response lands afterwards, mergeLatestOpenClawMessages() can replace the fresher text/status for the same message id.
Suggested fix:
if (event.type === 'openclaw-message') {
applyOpenClawMessageUpdate(event)
void queryClient.invalidateQueries({ queryKey: queryKeys.openclawMessages(event.conversationId) })
}There was a problem hiding this comment.
Findings
- [Major] Same-conversation sends can both return
acceptedbefore the plugin actually reserves the run. The busy check happens beforequeueMicrotask(...), so two requests that arrive in the same event-loop turn can both receive200ACKs. When the microtasks run, only one call can acquire the real run lock; the other is dropped asConversationBusyErrorafter the hub has already marked its user message and command as successful. Evidenceopenclaw-plugin/src/routes.ts:186,openclaw-plugin/src/routes.ts:202,hub/src/openclaw/OpenClawChatService.ts:241,hub/src/openclaw/OpenClawChatService.ts:276.
Suggested fix:// Reserve the conversation before returning the ACK so a second request // cannot slip in before the worker starts. if (!adapterState.startRun(body.conversationId)) { return c.json({ error: 'Conversation already has an active OpenClaw run', retryAfterMs: 1000 }, 409) } queueMicrotask(() => { void deps.runtime.sendMessageReserved({ kind: 'send-message', conversationId: body.conversationId!, text: body.text!, localMessageId: body.localMessageId! }).finally(() => { adapterState.finishRun(body.conversationId!) }) })
Summary
Review mode: follow-up after new commits.
One major correctness issue remains: concurrent POST /hapi/channel/messages requests for the same conversation can both be ACKed even though only one run can actually start. Not found in tests is coverage for two same-conversation sends that arrive before the first queued worker acquires the run lock.
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
| return c.json({ error: 'Invalid body' }, 400) | ||
| } | ||
|
|
||
| if (deps.runtime.isConversationBusy?.(body.conversationId)) { |
There was a problem hiding this comment.
[MAJOR] The busy check is still racy here because the actual run reservation does not happen until the later queueMicrotask(...) callback. Two same-conversation requests that arrive back-to-back can both pass this check and both return 200 ACKs; the loser is only rejected later inside runtime.sendMessage(), after the hub has already marked that user message/command as accepted.
Suggested fix:
if (!adapterState.startRun(body.conversationId)) {
return c.json({ error: 'Conversation already has an active OpenClaw run', retryAfterMs: 1000 }, 409)
}
queueMicrotask(() => {
void deps.runtime.sendMessageReserved({
kind: 'send-message',
conversationId: body.conversationId!,
text: body.text!,
localMessageId: body.localMessageId!
}).finally(() => {
adapterState.finishRun(body.conversationId!)
})
})There was a problem hiding this comment.
Findings
- [Major] Default-conversation bootstrap errors on the new
/homepage currently bypass the shared OpenClaw error mapper, so any upstream timeout/plugin outage becomes a raw 500 instead of a structured 502/503 fallback. See inline comment onhub/src/web/routes/openclaw.ts:60. - [Major] Rebinding an existing OpenClaw conversation row to a new upstream
conversationIdkeeps all prior local messages attached to that row. HAPI then shows legacy history while sending future prompts to a different upstream thread. See inline comment onhub/src/openclaw/OpenClawChatService.ts:109.
Summary
Review mode: follow-up after new commits.
Two major correctness issues remain: the new homepage bootstrap path is not using the same upstream error handling as the rest of the OpenClaw routes, and the new rebind path can mix stale local history with a newly ensured upstream conversation.
Testing
- Not run (automation; executing PR code is disallowed in this workflow).
HAPI Bot
|
|
||
| const namespace = c.get('namespace') | ||
| const userKey = `${namespace}:${c.get('userId')}` | ||
| const conversation = await service.getOrCreateDefaultConversation({ namespace, userKey }) |
There was a problem hiding this comment.
[MAJOR] getOrCreateDefaultConversation() can throw upstream/network errors here, but this route is the only OpenClaw route that does not wrap the call in toErrorResponse(...). Because / now renders OpenClawChatPage, a transient plugin outage turns into a raw 500 on the homepage instead of the structured 502/503 handling used by the other OpenClaw endpoints.
Suggested fix:
try {
const conversation = await service.getOrCreateDefaultConversation({ namespace, userKey })
return c.json({ conversation })
} catch (error) {
return toErrorResponse(c, error)
}| }) | ||
| } | ||
| if (existing.externalId !== ensured.conversationId) { | ||
| return this.store.openclawConversations.rebindConversation( |
There was a problem hiding this comment.
[MAJOR] This rebind keeps the same internal conversation row and only swaps external_id, while message reads still key off conversation.id. Any legacy messages already stored for that row will be shown as if they belong to the newly ensured upstream thread, even though future sends now go to ensured.conversationId.
Suggested fix:
if (existing.externalId !== ensured.conversationId) {
this.store.openclawMessages.deleteByConversation(input.namespace, existing.id)
this.store.openclawApprovals.deleteByConversation(input.namespace, existing.id)
this.store.openclawCommands.deleteByConversation(input.namespace, existing.id)
return this.store.openclawConversations.rebindConversation(
existing.id,
input.namespace,
ensured.conversationId,
ensured.title ?? existing.title
) ?? existing
}
No description provided.