Skip to content

feat(debounce): typing-aware message debounce#730

Open
f-liva wants to merge 29 commits intoRightNow-AI:mainfrom
f-liva:feat/message-debounce
Open

feat(debounce): typing-aware message debounce#730
f-liva wants to merge 29 commits intoRightNow-AI:mainfrom
f-liva:feat/message-debounce

Conversation

@f-liva
Copy link

@f-liva f-liva commented Mar 18, 2026

Summary

Closes #728.

  • Increased default debounce_ms from 3000 → 5000 ms to better handle image uploads on Telegram/WhatsApp
  • Integrated TypingEvent into MessageDebouncer: on_typing(is_typing: true) pauses the flush timer while the user is composing; on_typing(is_typing: false) restarts the normal debounce timer. The debounce_max_ms safety cap always applies regardless of typing state
  • Extended ChannelAdapter trait with an optional typing_events() method (default None) so adapters that don't support typing detection require no changes
  • Telegram adapter: added typing_tx/typing_rx channel infrastructure, detects chat_action updates, auto-expires typing indicators after 6s, and emits is_typing: false on message arrival
  • WhatsApp adapter: added TODO comment for Baileys presence.update (composing/paused) integration — Cloud API does not expose user typing status
  • Bridge start_adapter(): wired typing event stream into the tokio::select! loop alongside the message stream and flush channel

Test plan

  • cargo build --workspace --lib compiles
  • cargo test -p openfang-channels — 426 unit + 9 integration tests pass
  • cargo clippy -p openfang-channels --all-targets -- -D warnings — zero warnings
  • New tests cover:
    • Typing pauses timer, message after typing-stop restarts it
    • Safety cap fires even during continuous typing
    • No effect when no buffered messages exist
    • Telegram typing_events() returns receiver once, None on subsequent calls

🤖 Generated with Claude Code

f-liva and others added 28 commits March 15, 2026 23:55
Images are now passed as transient content blocks instead of being
injected permanently into the session history. This prevents 56K+
tokens of base64 data from accumulating in the session and triggering
expensive compaction cycles on every subsequent message.

Also converts resolve_attachments() to async with non-blocking file I/O
via tokio::task::spawn_blocking.

Fixes RightNow-AI#645

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gement

- Wrap reconnect startConnection() in try-catch to prevent silent death
- Add 30s health check that detects zombie WebSocket connections
- Add uncaughtException/unhandledRejection handlers for self-healing
- Add PM2 ecosystem config for automatic process restart on crash

Fixes gateway disconnecting overnight and never reconnecting.
- Custom Dockerfile: Chromium, gh CLI, Node 22, Claude Code, Qwen Code,
  Homebrew + gogcli, ffmpeg, jq, uv, python3 symlink
- entrypoint.sh: custom entrypoint for data volume
- DOCKER_README.md: Docker Hub documentation
- sync-build.yml: CI workflow for upstream sync + Docker build
- .current-upstream-version: track upstream sync point (v0.4.0)
- Trigger on push to 'custom' instead of 'main'
- Sync step: update main from upstream tags, then rebase custom on main
- main stays clean (upstream + generic fixes only)
- custom = main + lazycat-specific changes (Dockerfile, entrypoint)
build_prompt() now decodes base64 image blocks, saves them to a temp
directory, and passes @/path references + --add-dir to the Claude CLI
so images are actually visible to the model instead of being silently
dropped as text placeholders.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…receipts

- Port cleanup: on startup, detect and kill any stale process holding
  port 3009 (via PID file check + ss port scan), with up to 3 retries
- PID file: write gateway.pid on start, remove on graceful shutdown
- EADDRINUSE safety net: retry cleanup if listen() still fails
- Read receipts: send blue checkmarks immediately on message receive

Prevents the silent failure where a nohup-started zombie blocks PM2
from binding the port, leaving the gateway offline with no visible error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e sending

Claude outputs Markdown syntax (**, ~~, #, links) which WhatsApp doesn't
render. Adds markdownToWhatsApp() to convert to WhatsApp-native formatting
(single asterisk bold, tilde strikethrough, etc.) while preserving code blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bility

Add 7 reliability improvements:
1. Health endpoint returns 503 when degraded, with queue depth and uptime
2. QR code auto-expires after 60s and regenerates fresh code
3. Media download/upload retries (3 attempts with exponential backoff)
4. WebSocket keepalive ping detects zombie connections via pong tracking
5. OpenFang API calls retry with backoff on transient failures
6. Clean timeout handling with proper resource cleanup
7. Message queue serializes processing to prevent overload

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After a rebuild, PM2 daemon starts empty — whatsapp-gateway and other
PM2-managed services were not auto-starting. This adds `pm2 resurrect`
to the entrypoint so saved processes are restored before OpenFang starts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ckoff

The gateway was stuck in a ~2s reconnect loop due to conflict:replaced
errors. Each reconnect briefly opened a connection (resetting backoff to 0)
before immediately conflicting again with the not-yet-deregistered old session.

Fix: track connection stability (must last >10s to reset backoff) and use
longer base delay (5s exponential) specifically for conflict disconnects,
giving the old session time to fully deregister.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Process history-sync (type=append) messages that arrived during
connection gaps, using a 2-minute recovery window and message ID
deduplication to prevent double-processing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Six critical fixes to prevent the gateway from getting stuck disconnected:

1. Connection watchdog (30s interval) — detects when gateway is disconnected
   with no reconnect pending and forces a new connection attempt. If disconnected
   for >2min, exits the process to let PM2 handle a clean restart.

2. Reconnect timer guard — single `reconnectTimerId` variable prevents duplicate
   concurrent reconnect timers from racing and causing conflicts.

3. Uncaught exception handler now recovers regardless of connection state
   (previously only recovered when status was 'connected').

4. Reduced zombie detection from 90s to 45s — faster detection of dead connections.

5. Persistent message dedup — saves processed message IDs to disk (.processed_ids.json),
   survives process restarts to prevent duplicate message processing.

6. Reduced max backoff from 60s to 30s. Failed reconnects now automatically
   schedule retries instead of silently giving up.

Also: graceful shutdown saves dedup state, health endpoint shows watchdog status.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite gateway from scratch to fix persistent disconnection issues:
- Use Baileys v6 sock.ev.process() API instead of raw event listeners
- Add per-sender serial queue to prevent message processing races
- Resolve agent name to UUID before forwarding to OpenFang
- Add TCP keepalive for container network resilience
- Add periodic buffer flush safety net (3s interval)
- Simplify config: env vars only, remove config.toml parsing
- Add reply buffering with auto-flush on reconnect
- Set OPENFANG_DEFAULT_AGENT=ambrogio in ecosystem.config.cjs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
git checkout main fails when both origin/main and upstream/main exist.
Use git checkout -B main to explicitly create/reset the local branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Parse incoming contactMessage and contactsArrayMessage from WhatsApp,
extracting display names and phone numbers from vCard data so they are
forwarded to OpenFang as readable text instead of being silently dropped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The catch-all Err branch in call_with_retry and stream_with_retry
returned immediately on any error, ignoring the is_retryable flag
from the error classifier. Timeouts and transient network errors
were treated as fatal instead of being retried with exponential
backoff.

Also adds an action validator to the agent loop: when the user
explicitly requests a side-effecting action (e.g. "send to
Telegram") but the LLM responds with text only and no tool call,
the loop re-prompts once to force tool execution.

Closes RightNow-AI#688

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…sages

- Add message deduplication using persistent .processed_ids.json to prevent
  re-processing after Signal session re-establishment / decryption retry
- Skip all group messages (@g.us) — only handle direct 1:1 chats
- Mark messages as processed BEFORE forwarding to prevent race conditions

Closes RightNow-AI#688 (partial — gateway-side fixes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent was treating strangers as the owner on WhatsApp because the
runtime only injected plain-text sender info ("Message from: Name (+39XXX)")
and relied on the LLM to compare phone numbers — which failed in practice.

Three interconnected bugs fixed:
1. agent.rs: Add `owner_ids` field to AgentManifest for storing authorized
   phone numbers per agent
2. kernel.rs: Populate `owner_ids` from manifest into PromptContext in both
   streaming and non-streaming execution paths
3. prompt_builder.rs: Rewrite `build_sender_section()` to normalize and
   compare sender phone numbers against owner_ids, injecting deterministic
   VERIFIED OWNER / STRANGER / UNVERIFIED verdicts into the system prompt

Also adds:
- Registry method to update owner_ids via API (PATCH /api/agents/{id})
- wizard.rs: Initialize owner_ids for newly created agents
- claude_code.rs: Fix pre-existing test compilation error (PreparedPrompt)

Closes RightNow-AI#677

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The agent was treating strangers as the owner on WhatsApp because the
runtime only injected plain-text sender info ("Message from: Name (+39XXX)")
and relied on the LLM to compare phone numbers — which failed in practice.

Three interconnected bugs fixed:
1. agent.rs: Add `owner_ids` field to AgentManifest for storing authorized
   phone numbers per agent
2. kernel.rs: Populate `owner_ids` from manifest into PromptContext in both
   streaming and non-streaming execution paths
3. prompt_builder.rs: Rewrite `build_sender_section()` to normalize and
   compare sender phone numbers against owner_ids, injecting deterministic
   VERIFIED OWNER / STRANGER / UNVERIFIED verdicts into the system prompt

Also adds:
- Registry method to update owner_ids via API (PATCH /api/agents/{id})
- wizard.rs: Initialize owner_ids for newly created agents
- claude_code.rs: Fix pre-existing test compilation error (PreparedPrompt)

Closes RightNow-AI#677

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…prevent cross-channel response delivery

When messages arrive simultaneously from web and WhatsApp, the kernel
now correctly identifies the originating channel via ChannelContext,
preventing responses from being delivered to the wrong channel.

Changes:
- Add ChannelContext struct to bridge layer with channel_type/sender_id/sender_name
- Implement send_message_with_context() and send_message_with_blocks_and_context() in KernelBridgeAdapter
- Add channel_type field to MessageRequest API type
- Propagate channel_type through all kernel send_message* functions to PromptContext
- Add per-agent mutex in streaming path to serialize concurrent messages
- WebSocket handler passes "web" as channel_type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge upstream changes while preserving channel_type propagation
through kernel, routes, and WebSocket handler to prevent
cross-channel response delivery between web and WhatsApp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arnings

- Add 7th argument (None) to send_message_streaming call in openai_compat.rs
- Remove unused mut on session variable
- Prefix unused needs_compact with underscore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate TypingEvent into MessageDebouncer to pause flush timers
while users are still composing, preventing premature dispatch when
sending text + images in rapid succession.

- Increase default debounce_ms from 3000 to 5000 for image uploads
- Add on_typing() to MessageDebouncer: pauses timer on is_typing=true,
  restarts on is_typing=false, safety cap always enforced
- Add typing_events() to ChannelAdapter trait (optional, default None)
- Wire typing event stream into start_adapter() select loop
- Telegram: add typing_tx/rx infrastructure, detect chat_action updates,
  auto-expire typing after 6s, emit is_typing=false on message arrival
- WhatsApp: add TODO for Baileys presence.update integration
- Add 4 unit tests for debouncer typing behavior
- Add 2 Telegram typing event tests
- Fix ChannelBridgeHandle mock for send_message_with_context

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The debounce_ms fallback was 0 when no channel overrides existed,
which completely disabled debouncing and caused each message in a
rapid sequence to be dispatched independently. This aligns the
fallback with the default_debounce_ms() value of 5000ms.

Fixes the "3 responses for 3 images" bug reported in RightNow-AI#728.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WhatsApp-specific debounce issues required gateway-level fixes that complement
the Rust bridge debounce (issue RightNow-AI#728):

1. Media download & caching: WhatsApp media messages (images, video, audio,
   documents, stickers) are now downloaded via Baileys downloadMediaMessage(),
   saved to a local media_cache/ directory, and served via HTTP endpoint
   /media/:filename. Cache auto-cleans files older than 30 minutes.

2. Message debounce at gateway level: replaces the per-sender serial queue
   with a proper debounce system. Messages from the same sender are accumulated
   and flushed as a single batch after 5s of silence (text) or 15s (media).
   This is critical because WhatsApp uploads images one at a time with variable
   delays (6-10s between images), causing the Rust-side 5s debounce to fire
   prematurely between images.

3. Async media pipeline: media downloads start immediately but are buffered as
   Promises in the debounce queue. The debounce timer starts on message arrival
   (not after download), and Promise.all() resolves all pending downloads at
   flush time. This prevents slow downloads from blocking the debounce timer.

4. Messages without captions (images sent without text) are now properly
   processed instead of being silently dropped.

Closes the WhatsApp-specific portion of RightNow-AI#728.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Member

@jaberjaber23 jaberjaber23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title says "message debounce" but this bundles 19 unrelated features. Please split into separate PRs.

Critical issues:

  1. WhatsApp sendMessage breaks group JIDs (same regression as #732 — strips @g.us).
  2. Dead Telegram typing detection — chat_action/sender_chat_action fields don't exist in the Telegram Bot API.
  3. Hardcoded Italian strings in core runtime (requires_tool_action) — not language-agnostic.
  4. Real phone number +393760105565 in test code — use fake numbers.
  5. PRIVACY-RULES.md referenced in prompt injection but file doesn't exist.
  6. Fork-specific content: fliva/openfang Docker image, personal agent name "ambrogio", fork CI/CD workflow.
  7. is_enabled() with #[allow(dead_code)] — dead code merged with suppression.
  8. Default 5s debounce for all channels adds latency to every interaction.
  9. owner_ids bypasses existing RBAC — parallel auth path with naive phone normalization.

The debounce architecture itself is well-designed. Please extract it into its own PR without the 18 other changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Message debounce: batch rapid messages before dispatching to agent

2 participants