Skip to content

feat(telegram): surface file metadata and support document downloads#730

Open
dorey-agent[bot] wants to merge 3 commits into
mainfrom
feat/telegram-file-metadata
Open

feat(telegram): surface file metadata and support document downloads#730
dorey-agent[bot] wants to merge 3 commits into
mainfrom
feat/telegram-file-metadata

Conversation

@dorey-agent
Copy link
Copy Markdown
Contributor

@dorey-agent dorey-agent Bot commented May 19, 2026

Motivation

When a user sends a document (PDF, audio, video, etc.) via Telegram, the agent only receives a generic label like [📎 Document]. No file name, size, or mime type is passed through. The agent cannot acknowledge what was sent, let alone read it.

Photos are already downloaded and forwarded — other attachment types either fail silently or produce no useful metadata.

Closes #725

Solution

Always surface file metadata regardless of whether the file is downloaded:

  • FileMetadata struct carries filename, mime_type, file_size extracted from Telegram message objects
  • When download is skipped (file too large) or fails, the agent receives a rich label:
    [📎 Document (name=report.pdf, type=application/pdf, size=25.3MB)]
  • When download succeeds, the agent receives the full ContentPart::File/Audio/Image as before

Configurable size threshold via media_max_mb option (default: 20MB, Telegram Bot API hard limit):

  • Files exceeding the limit are not downloaded but metadata is still surfaced
  • Configurable per-channel in anyclaw.yaml options

Extended media type support:

  • Video and animation (GIF) attachments are now downloaded and forwarded (previously fell through to unresolved text label)
  • All unsupported types (sticker, location, contact, video_note) now surface available metadata via extract_generic_metadata()

Design decisions:

  • media_max_bytes uses AtomicU64 since SharedState is behind Arc (shared across dispatcher + adapter)
  • Size check happens before bot.get_file() call to avoid unnecessary API calls
  • Follows OpenClaw pattern: always metadata, conditionally download

Testing

cargo test --manifest-path ext/Cargo.toml -p channel-telegram   # 257 unit + 6 integration tests pass
cargo clippy --manifest-path ext/Cargo.toml -p channel-telegram  # zero warnings
cargo +1.95.0 fmt --check --all --manifest-path ext/Cargo.toml  # clean

Key tests:

  • when_process_media_video_then_sends_video_label — video routing
  • when_document_message_with_caption_then_sends_label_and_caption — metadata label formatting
  • when_process_media_without_caption_then_sends_placeholder — unresolved metadata stub

Checklist

  • Tests pass (cargo test)
  • No clippy warnings (cargo clippy --workspace -- -D warnings)
  • Formatted (cargo fmt --all -- --check)
  • PR title follows Conventional Commits format
  • Updated CHANGELOG.md if this is a user-facing binary change (version bump 0.12.2 → 0.13.0)

[ai-assisted]

Always pass file metadata (filename, size, mime_type) to the agent for
all attachment types, even when download is skipped or fails.

Changes:
- Add configurable media_max_mb option (default 20MB, Telegram API limit)
- Check file size before downloading — skip if over limit
- Surface metadata in text label when download is skipped:
  [📎 Document (name=report.pdf, type=application/pdf, size=25.3MB)]
- Add video/animation download support (previously fell through to
  unresolved text label)
- Add FileMetadata struct for structured metadata extraction
- Add extract_generic_metadata() for unsupported media types (sticker,
  location, video_note, contact)
- Add format_file_size() helper for human-readable sizes

The agent now always knows what was sent, regardless of whether the file
was downloaded. Large/binary files that exceed the limit still surface
their metadata so the agent can respond intelligently.

Closes #725
@dorey-agent dorey-agent Bot requested a review from donbader as a code owner May 19, 2026 08:45
@dorey-agent dorey-agent Bot enabled auto-merge (squash) May 19, 2026 08:45
VideoNote messages were falling through to the wildcard arm, returning
None for metadata. Now surfaces file_size so the agent sees:
[📹 Video note (size=1.2MB)]
@dorey-agent
Copy link
Copy Markdown
Contributor Author

dorey-agent Bot commented May 19, 2026

Automated Review Summary

Agents: correctness, edge_cases
Findings: 🔴 0 critical | 🟡 0 warning | 💬 3 nit


ext/channels/telegram/src/dispatcher.rs:854
💬 [correctness] VideoNote metadata was missing — fixed in follow-up commit d2b25a4.

ext/channels/telegram/src/adapter.rs:733
💬 [edge_cases] Setting media_max_mb=0 silently blocks all downloads. Acceptable behavior (metadata-only mode) but could document this in defaults.yaml.

ext/channels/telegram/src/dispatcher.rs:609
💬 [edge_cases] When Telegram reports file.size=0 (unknown size), the download proceeds without a streaming cap. Acceptable for now since Telegram Bot API hard-caps at 20MB anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(telegram): surface file metadata and support document downloads

0 participants