Skip to content

fix: improve message delivery reliability (Telegram + Feishu)#266

Open
lowmiaq-gmail wants to merge 2 commits intoop7418:mainfrom
lowmiaq-gmail:fix/message-reliability
Open

fix: improve message delivery reliability (Telegram + Feishu)#266
lowmiaq-gmail wants to merge 2 commits intoop7418:mainfrom
lowmiaq-gmail:fix/message-reliability

Conversation

@lowmiaq-gmail
Copy link

Summary

  • Telegram 通知增加重试 + 指数退避(之前 fire-and-forget,一次失败就永久丢消息)
  • 飞书资源下载(图片/文件/音视频)增加重试,最多 3 次尝试 + 指数退避
  • 飞书入站消息去重持久化到 channel_offsets 表(重启后不丢 dedup 状态)
  • 飞书内存去重上限 1000 → 5000

Root Cause

通知发送和资源下载没有任何重试逻辑,网络瞬态故障(超时、断连、429 限流)直接导致消息静默丢失。

Changes

src/lib/telegram-bot.ts

  • 新增 callWithRetry() — 指数退避重试,4xx(非 429)不重试
  • sendMessage() 切换为 callWithRetry

src/lib/bridge/adapters/feishu-adapter.ts

  • downloadResource() 包裹重试循环(文件过大直接返回,不浪费重试)
  • addToDedup() 写入 channel_offsets 表持久化
  • DEDUP_MAX 1000 → 5000

Test plan

  • Telegram 通知网络失败后自动重试恢复
  • 飞书图片下载失败后重试成功
  • 重启 CodePilot 后飞书消息不重复处理
  • TypeScript 编译通过 ✅

🤖 Generated with Claude Code

Telegram notifications:
- Add retry with exponential backoff to sendMessage() (was fire-and-forget)
- Skip retry for 4xx errors (except 429 rate limit)
- Max 2 retries with jittered backoff

Feishu adapter:
- Add retry (max 2) with exponential backoff to resource downloads
- Size-limit failures skip retry (not transient)
- Persist last processed message_id to channel_offsets table (survives restart)
- Increase in-memory dedup cap from 1000 to 5000

Root cause: notifications and resource downloads had zero retry logic,
causing silent message loss on transient network failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 13, 2026

Someone is attempting to deploy a commit to the op7418's projects Team on Vercel.

A member of the Team first needs to authorize it.

When bridge_parallel_tasks setting is enabled and the current session
is busy processing a message, new incoming messages automatically spawn
ephemeral worker sessions instead of queueing behind the active task.

This allows users to send multiple independent tasks via Feishu/Telegram
and have them processed concurrently with separate Claude streams,
eliminating the sequential bottleneck.

- channel-router: add createWorkerBinding() for ephemeral worker sessions
- bridge-manager: detect busy sessions and dispatch to workers
- handleMessage: accept optional binding override for worker routing
- Worker sessions inherit model, provider, working dir, mode from parent
- Backward compatible: disabled by default (opt-in via setting)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant