Skip to content

fix: feishu block type detection, table rendering & image download#1

Open
dabuddha wants to merge 1 commit intojoeseesun:mainfrom
dabuddha:fix/feishu-block-type-and-image-download
Open

fix: feishu block type detection, table rendering & image download#1
dabuddha wants to merge 1 commit intojoeseesun:mainfrom
dabuddha:fix/feishu-block-type-and-image-download

Conversation

@dabuddha
Copy link
Copy Markdown

Summary

Fixes three critical issues in fetch_feishu.py that cause most document content to be lost or corrupted:

  • Block type mapping is unreliable: The original code uses hardcoded block_type numbers (e.g., 10=Bullet, 17=Image) that don't match the actual API responses (where 12=Bullet, 27=Image). This causes bullet lists to render as empty code blocks and images to be completely lost.
  • Tables are not supported: Table (block_type=31) and TableCell (block_type=32) blocks fall through to the else branch, producing garbled text instead of proper Markdown tables.
  • Images are inaccessible: Images output as feishu-image://{token} — a custom protocol that no Markdown renderer can display.

Changes

  1. detect_block_kind(block) — Detects block type by checking actual data keys ("image", "table", "bullet", etc.) instead of relying on block_type numbers. This is robust against API version differences.

  2. render_table() + render_cell_content() — Renders Feishu Table blocks as proper Markdown tables, including images inside table cells.

  3. download_image() + download_all_images() — Downloads document images via Feishu drive API (/drive/v1/medias/{token}/download) and saves them locally. Markdown references use relative local paths.

  4. fetch_feishu_doc(save_dir=None) — New optional save_dir parameter. When provided, images are downloaded to {save_dir}/{title}_images/.

Backward Compatibility

  • All new function parameters have default values — existing callers are unaffected
  • --json mode behavior is unchanged (no image download)
  • If image download fails (permission denied, network error), gracefully falls back to feishu-image:// protocol

Additional Permission Required

Image download requires the Feishu app to have drive:drive:readonly permission enabled. Without it, images will use the fallback feishu-image:// format (same as before).

Test plan

  • Run python3 scripts/fetch_feishu.py <feishu_url> on a document with tables and images
  • Verify tables render as Markdown table syntax (| col1 | col2 |)
  • Verify images download to ~/Downloads/{title}_images/ directory
  • Verify --json mode still works without downloading images
  • Verify documents without tables/images still render correctly

- Replace unreliable block_type number matching with key-based detection
  (detect_block_kind), fixing bullet lists rendered as empty code blocks
  and images being completely lost
- Add table rendering support (render_table + render_cell_content) for
  Table and TableCell blocks, which were previously unhandled
- Add image download via Feishu drive API (drive:drive:readonly permission
  required), with graceful fallback to feishu-image:// protocol
- All changes are backward compatible with default parameter values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant