Skip to content

Add document upload, memory monitoring, and session management improvements#70

Closed
JanusMarko wants to merge 40 commits intosix-ddc:mainfrom
JanusMarko:claude/telegram-markdown-upload-NH4CJ
Closed

Add document upload, memory monitoring, and session management improvements#70
JanusMarko wants to merge 40 commits intosix-ddc:mainfrom
JanusMarko:claude/telegram-markdown-upload-NH4CJ

Conversation

@JanusMarko
Copy link
Copy Markdown

Summary

This PR adds document upload support, proactive system memory monitoring with escalating actions, improved session lifecycle management, and several quality-of-life improvements to CCBot.

Key Changes

Document Upload Handler

  • New document_handler() processes text, code, Markdown, PDF, and Word documents
  • Saves files to {session_cwd}/docs/inbox/ and forwards paths to Claude Code
  • Converts .docx files to Markdown for better compatibility
  • Validates MIME types and file extensions against allowlist
  • Requires python-docx dependency for Word document support

System Memory Monitoring

  • New process_info.py module provides process tree inspection and OOM-kill detection
  • _check_system_memory() implements escalating response to memory pressure:
    • Warn level: Notifies all bound topics when MemAvailable drops below threshold
    • Interrupt level: Sends Escape to highest-RSS window to pause execution
    • Kill level: Terminates highest-RSS window and cleans up bindings
  • Detects OOM kills via dmesg parsing and correlates with specific PIDs
  • Includes hysteresis to prevent thrashing and per-window cooldowns
  • Configurable thresholds via environment variables

Session Management Improvements

  • New /kill command: terminates tmux window, cleans up state, and deletes forum topic
  • Added kill_window() method to TmuxManager
  • Improved window state tracking with _window_pids for OOM detection
  • Better handling of stale window references in status polling

Interactive UI Deduplication

  • Prevents duplicate interactive UI messages when both JSONL monitor and polling detect the same prompt
  • Adds generation counter to track state transitions
  • Implements 2-second deduplication window for rapid successive detections

Message Queue Enhancements

  • New enqueue_callable() for executing arbitrary async tasks in-order
  • Improved retry logic with MAX_TASK_RETRIES and MAX_REQUEUE_COUNT limits
  • Better handling of Telegram flood control (RetryAfter) with bounded wait times
  • Clearer queue counter management documentation

TmuxManager Optimizations

  • Added window list caching (1-second TTL) to reduce tmux server load during polling
  • New SHELL_COMMANDS constant to detect when Claude Code has exited
  • Added pane_current_command checks in /esc and /usage commands to prevent sending input to bare shell

Session State Improvements

  • Changed iter_thread_bindings() to all_thread_bindings() returning a list snapshot instead of generator
    • Prevents RuntimeError when async code unbinds threads between iteration points
    • Safer for concurrent modifications during polling
  • Added _extract_resume_command() to detect Claude Code resume hints in pane output
  • Improved session death context detection from JSONL transcripts

Screenshot and UI Updates

  • Changed screenshot reply from reply_document() to reply_photo() for better Telegram UX
  • Updated user message emoji prefix from "👤" to "💎"
  • Fixed asyncio event loop calls to use get_running_loop() instead of deprecated get_event_loop()

Documentation

  • Added comprehensive ccbot-workshop-setup.md with complete Windows/WSL setup guide
  • Covers bot creation, environment configuration, and daily usage patterns
  • Includes troubleshooting and common commands reference

Implementation Details

  • Memory monitoring runs on a separate check interval (configurable) to avoid overhead
  • OOM detection uses /proc filesystem and dmesg parsing (Linux-specific)
  • Document handler validates files before saving to prevent abuse
  • All async operations properly handle cancellation and cleanup
  • State snapshots in polling prevent concurrent modification issues
  • Escalation state is module-level with proper reset between cycles

https://claude.ai/code/session_01Db4zZKSaAkJrgHGzeLjRsz

claude and others added 30 commits February 28, 2026 18:16
New env var sets a fixed starting directory for the directory browser,
falling back to Path.cwd() if not set (preserving current behavior).

https://claude.ai/code/session_01Vn1pxPc8KahAYpofYGhLjY
…6kvZ

Add CCBOT_BROWSE_ROOT config for directory browser start path

New env var sets a fixed starting directory for the directory browser,
falling back to Path.cwd() if not set (preserving current behavior).

https://claude.ai/code/session_01Vn1pxPc8KahAYpofYGhLjY
Four callback handlers (CB_DIR_SELECT, CB_DIR_UP, CB_DIR_PAGE,
CB_DIR_CONFIRM) and build_directory_browser's invalid-path fallback
used raw Path.cwd() instead of config.browse_root. This meant users
could escape the configured browse root if user_data was lost or the
path became invalid during navigation.

https://claude.ai/code/session_01Vn1pxPc8KahAYpofYGhLjY
…6kvZ

Fix inconsistent Path.cwd() fallbacks in directory browser callbacks

Four callback handlers (CB_DIR_SELECT, CB_DIR_UP, CB_DIR_PAGE,
CB_DIR_CONFIRM) and build_directory_browser's invalid-path fallback
used raw Path.cwd() instead of config.browse_root. This meant users
could escape the configured browse root if user_data was lost or the
path became invalid during navigation.

https://claude.ai/code/session_01Vn1pxPc8KahAYpofYGhLjY
- session.py: Replace deprecated asyncio.get_event_loop() with
  asyncio.get_running_loop() (Python 3.12+ compat)
- session.py: Remove redundant pass statements
- session_monitor.py: Consolidate double stat() call into one
- screenshot.py: Add explicit parens in _font_tier() for clarity
- bot.py: Add /kill command handler — kills tmux window, unbinds
  thread, cleans up state, and best-effort deletes the topic.
  Previously the /kill bot command was registered in the menu but
  had no handler, falling through to forward_command_handler.

https://claude.ai/code/session_01Vn1pxPc8KahAYpofYGhLjY
…6kvZ

Fix misc bugs: asyncio deprecation, double stat, missing /kill handler

- session.py: Replace deprecated asyncio.get_event_loop() with
  asyncio.get_running_loop() (Python 3.12+ compat)
- session.py: Remove redundant pass statements
- session_monitor.py: Consolidate double stat() call into one
- screenshot.py: Add explicit parens in _font_tier() for clarity
- bot.py: Add /kill command handler — kills tmux window, unbinds
  thread, cleans up state, and best-effort deletes the topic.
  Previously the /kill bot command was registered in the menu but
  had no handler, falling through to forward_command_handler.

https://claude.ai/code/session_01Vn1pxPc8KahAYpofYGhLjY
Add timestamp-based deduplication in handle_interactive_ui() to prevent
both JSONL monitor and status poller from sending new interactive messages
in the same short window. The check-and-set has no await between them,
making it atomic in the asyncio event loop.

Also add a defensive check in status_polling.py to skip calling
handle_interactive_ui() when an interactive message is already tracked
for the user/thread (e.g. sent by the JSONL monitor path).

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
…returning list snapshot

iter_thread_bindings() was a generator yielding from live dicts. Callers
with await between iterations (find_users_for_session, status_poll_loop)
could allow concurrent unbind_thread() calls to mutate the dict mid-iteration,
causing RuntimeError: dictionary changed size during iteration.

Fix: rename to all_thread_bindings() returning a materialized list snapshot.
The list comprehension captures all (user_id, thread_id, window_id) tuples
eagerly, so no live dict reference escapes across await points.

Changes:
- session.py: iter_thread_bindings -> all_thread_bindings, returns list
- bot.py, status_polling.py: update all 4 call sites
- Remove unused Iterator import from collections.abc
- Add tests: snapshot independence, returns list type, empty bindings

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
queue.join() in handle_new_message blocked the entire monitor loop while
waiting for one user's queue to drain. If Telegram was rate-limiting, this
could stall all sessions for 30+ seconds.

Fix: use enqueue_callable() to push interactive UI handling as a callable
task into the queue. The worker executes it in FIFO order after all pending
content messages, guaranteeing correct ordering without blocking.

Also fixes:
- Callable tasks silently dropped during flood control (the guard checked
  task_type != "content" which matched "callable" too; changed to explicit
  check for "status_update"/"status_clear" only)
- Updated stale docstring in _merge_content_tasks referencing queue.join()

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
unpin_all_forum_topic_messages was used every 60s to detect deleted topics,
but it destructively removed all user-pinned messages as a side effect.

Replace with send_chat_action(ChatAction.TYPING) which is ephemeral
(5s typing indicator) and raises the same BadRequest("Topic_id_invalid")
for deleted topics. All existing error handling works unchanged.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
- Add MAX_TASK_RETRIES=3 retry loop for short RetryAfter (sleep and retry)
- Re-queue tasks on long RetryAfter (>10s) with MAX_REQUEUE_COUNT=5 cap
- Convert callable_fn from Coroutine to Callable factory (coroutines are
  single-use; retry requires a fresh coroutine each attempt)
- Catch RetryAfter from _check_and_send_status to prevent cosmetic status
  updates from triggering content message re-sends
- Fix test isolation: clear _last_interactive_send in test fixtures

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
The _file_mtimes dict used mtime+size to skip unchanged JSONL files, but
this introduced edge cases (sub-second writes, clock skew, file replacement).
For append-only JSONL files, comparing file size against last_byte_offset is
sufficient and eliminates all mtime-related issues.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
Previously byte offsets were persisted to disk BEFORE delivering messages
to Telegram. If the bot crashed after save but before delivery, messages
were silently lost. Now offsets are saved AFTER the delivery loop,
guaranteeing at-least-once delivery: a crash before save means messages
are re-read and re-delivered on restart (safe duplicate) rather than
permanently lost.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
Dead sessions were cleaned from persistent state but never from the
in-memory _pending_tools dict, causing a slow memory leak over time.
Add pop() calls in both cleanup paths (startup + runtime).

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
Previously _pending_thread_text was cleared from user_data BEFORE
attempting to send it to the tmux window. If send_to_window() failed,
the message was lost and the user had to retype it. Now the pending
text is only cleared after a successful send.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
Typing indicators in forum topics were silently failing because
message_thread_id was not passed to send_chat_action calls. Users
in forum topics wouldn't see typing indicators while Claude worked.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
The except Exception handler was catching RetryAfter (Telegram 429
rate limiting) and BadRequest("message is not modified"), preventing
proper rate limit propagation and causing unnecessary duplicate
message sends.

Changes:
- Re-raise RetryAfter in both edit and send paths so the queue
  worker retry loop can handle rate limiting correctly
- Treat BadRequest "is not modified" as success (content identical)
- For other BadRequest errors (message deleted, too old), delete
  orphan message before falling through to send new
- Log exception details in catch-all handler for debugging

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
When JSONL monitoring enqueues _send_interactive_ui, the callable may
execute after the interactive UI has been dismissed. This caused stale
callables to potentially send duplicate interactive messages.

Fix: introduce a monotonically incrementing generation counter per
(user_id, thread_id) key. Every state transition (set_interactive_mode,
clear_interactive_mode, clear_interactive_msg) increments the counter.
The JSONL monitor captures the generation at enqueue time and passes it
to handle_interactive_ui via expected_generation parameter. If the
generation has changed by execution time, the function bails out.

The status poller is unaffected (passes None, skipping the guard).

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
The second all_thread_bindings() call gets a fresh snapshot that
naturally excludes entries unbound by the topic probe loop above.
This is correct behavior, not a bug — add a comment to clarify
the intent for future readers.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
The return value was already handled correctly (proceed regardless),
but the ignored bool looked like a bug. Add a comment explaining that
on timeout the monitor's 2s poll cycle picks up the entry, and thread
binding, pending text, and topic rename work without session_map.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
…e-messages-FpKAU

Document intentionally ignored wait_for_session_map_entry return value

The return value was already handled correctly (proceed regardless),
but the ignored bool looked like a bug. Add a comment explaining that
on timeout the monitor's 2s poll cycle picks up the entry, and thread
binding, pending text, and topic rename work without session_map.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
…edia

Telegram clients fail to re-render document thumbnails when editing
document-type media in place via editMessageMedia, causing a "white
circle with X" on screenshot refresh. Switch from reply_document +
InputMediaDocument to reply_photo + InputMediaPhoto, which Telegram
clients handle reliably for inline image edits.

Also adds debug logging for the key-press screenshot edit path.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
…e-messages-FpKAU

Fix screenshot refresh showing broken preview by switching to photo m…
Check pane_current_command before sending keys to tmux windows.
If the pane is running a shell (bash, zsh, etc.), Claude Code has
exited and user text must not be forwarded — it would execute as
shell commands.

Guards added to: send_to_window (safety net), text_handler (with
auto-unbind), esc_command, usage_command, and screenshot key-press
callback.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
When send_to_window detects the pane is running a shell, it now
captures the pane content and looks for:
  - "Stopped ... claude" → sends "fg" (suspended process)
  - "claude --resume <id>" → sends the resume command

Waits up to 3s (fg) or 15s (--resume) for Claude Code to take
over the terminal, then sends the user's original text.

If no resume command is found, the text_handler unbinds the topic
and tells the user to start a new session.

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
…_pane

Reduces tmux subprocess calls from ~120/s to ~21/s with 20 windows by:
- Adding 1-second TTL cache to list_windows() (all callers in the same
  poll cycle share one tmux enumeration instead of N)
- Unifying capture_pane() to always use direct `tmux capture-pane`
  subprocess (plain text mode previously used libtmux which generated
  3-4 tmux round-trips per call)
- Invalidating cache on mutations (create/kill/rename)

https://claude.ai/code/session_016c4b8ioybZyscNayeY6Y18
…e-messages-FpKAU

Claude/fix duplicate interactive messages fp kau
When an assistant message contains both text blocks and an interactive
tool_use (ExitPlanMode/AskUserQuestion), the text entries were processed
first in handle_new_message, clearing the interactive UI state set by the
status poller. This caused the JSONL callable to send a second interactive
message instead of editing the existing one.

Fix: pre-scan assistant messages for interactive tools and suppress text
block emission when present — the terminal capture already includes that
preamble text.

https://claude.ai/code/session_01WHUN1GLBFr2ZkuEmeVtuPW
…icates-eUNgW

Fix duplicate interactive UI messages for numbered answers

When an assistant message contains both text blocks and an interactive
tool_use (ExitPlanMode/AskUserQuestion), the text entries were processed
first in handle_new_message, clearing the interactive UI state set by the
status poller. This caused the JSONL callable to send a second interactive
message instead of editing the existing one.

Fix: pre-scan assistant messages for interactive tools and suppress text
block emission when present — the terminal capture already includes that
preamble text.

https://claude.ai/code/session_01WHUN1GLBFr2ZkuEmeVtuPW
claude and others added 10 commits March 8, 2026 00:51
When a tmux window dies, the status poller now checks dmesg for OOM
kills matching the window's process tree and notifies the user in the
Telegram topic with the reason (OOM vs normal exit). Optionally monitors
RSS memory of Claude processes and warns when usage exceeds threshold.

- Add process_info.py: /proc-based process tree, RSS reading, dmesg OOM parsing
- Add TmuxManager.get_pane_pid() for correlating windows to processes
- Track pane PIDs in status poller for post-mortem OOM detection
- Send session-end notification to user topic on window death
- Add opt-in memory monitoring (CCBOT_MEMORY_MONITOR=true)
- Config: CCBOT_MEMORY_WARNING_MB (default 2048), CCBOT_MEMORY_CHECK_INTERVAL (30s)

https://claude.ai/code/session_01WHUN1GLBFr2ZkuEmeVtuPW
When a tmux window dies, the notification now includes what Claude was
doing at the time — extracted from the tail of the session JSONL file.
Shows pending/running tools, last completed tool, and last assistant
message (truncated to 200 chars).

Example notification:
  ⚠️ Session killed by OOM killer (process: node, RSS: 14500MB): refinery
  Last activity:
  • Running: Bash(`npm run test:e2e`)
  • Last message: "All 4 test agents running in parallel..."

https://claude.ai/code/session_01WHUN1GLBFr2ZkuEmeVtuPW
…icates-eUNgW

Claude/fix numbered answer duplicates e u ng w
  Summary of Changes

  src/ccbot/process_info.py

  - Added get_mem_available_mb() — reads /proc/meminfo for MemAvailable, returns MB as float (None on non-Linux)

  src/ccbot/config.py

  - Memory monitoring now on by default (opt-out with CCBOT_MEMORY_MONITOR=false)
  - Lowered memory_check_interval from 30s → 10s
  - Added 3 new threshold configs:
    - CCBOT_MEM_AVAIL_WARN_MB (1024) — triggers notification
    - CCBOT_MEM_AVAIL_INTERRUPT_MB (512) — sends Escape to heaviest session
    - CCBOT_MEM_AVAIL_KILL_MB (256) — kills heaviest session

  src/ccbot/handlers/status_polling.py

  - Added _find_highest_rss_window() — finds the window consuming the most memory
  - Added _check_system_memory() — escalating system memory response:
    - Level 1 (warn): Notifies all topics that system memory is low
    - Level 2 (interrupt): Sends Escape to the heaviest window
    - Level 3 (kill): Kills the heaviest window, cleans up bindings
  - Safety mechanisms:
    - One level per check cycle (can't jump from OK to kill)
    - Cooldowns: 2 cycles (~20s) after interrupt before kill, 3 cycles (~30s) between kills
    - Hysteresis: resets to normal when memory recovers above warn × 1.5
    - Downgrade: level drops when pressure partially relieves
    - No-target fallback: resets state if there are no windows to act on
  - Updated module docstring

  Tests

  - tests/ccbot/test_process_info.py — Added TestGetMemAvailableMb (3 tests)
  - tests/ccbot/handlers/test_system_memory.py — New file with 11 tests covering: normal operation, warn,
  one-level-per-cycle, interrupt escalation, cooldowns, kill + cleanup, hysteresis, partial pressure relief downgrade,
  disabled monitor, non-Linux fallback, highest-RSS finde
Save uploaded text-based files (Markdown, code, config, etc.) to
{session_cwd}/docs/inbox/ and forward the file path to Claude Code.
Follows the same pattern as the existing photo_handler.

https://claude.ai/code/session_01Db4zZKSaAkJrgHGzeLjRsz
- PDFs: saved directly to docs/inbox/ (Claude Code reads them natively)
- Word docs (.docx/.doc): converted to Markdown via python-docx, saved as .md
- Added python-docx dependency
- Updated MIME type and extension allowlists

https://claude.ai/code/session_01Db4zZKSaAkJrgHGzeLjRsz
Claude Code needs to know the file exists before processing the user's
instruction. Reorder so the file path/read hint comes first, followed
by the user's caption text.

https://claude.ai/code/session_01Db4zZKSaAkJrgHGzeLjRsz
Prefixes human-readable assistant messages with 💬 so they're
visually distinct from tool_use/tool_result messages in Telegram.

https://claude.ai/code/session_01Db4zZKSaAkJrgHGzeLjRsz
- 🟦 User messages
- 🟩 Assistant text
- 🟧 Tool use / tool result
- 🟪 Thinking

https://claude.ai/code/session_01Db4zZKSaAkJrgHGzeLjRsz
- 💎 User messages
- 🔮 Assistant text
- 🛠️ Tool use / tool result
- 🧠 Thinking

https://claude.ai/code/session_01Db4zZKSaAkJrgHGzeLjRsz
@JanusMarko JanusMarko closed this Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants