Skip to content

feat(cli): add batch URL input support#64

Open
DyNooob wants to merge 2 commits into
StarTrail-org:mainfrom
DyNooob:feat/cli-batch-input
Open

feat(cli): add batch URL input support#64
DyNooob wants to merge 2 commits into
StarTrail-org:mainfrom
DyNooob:feat/cli-batch-input

Conversation

@DyNooob

@DyNooob DyNooob commented Jun 22, 2026

Copy link
Copy Markdown

Summary

Adds support for batch URL input in the PixelRAG CLI.

Users can now provide either a single URL or a .txt file containing multiple URLs (one per line). Each URL is processed using the existing rendering pipeline.

Changes

  • Added support for .txt file input in CLI
  • Each non-empty line is treated as a URL
  • Single URL usage remains unchanged
  • Batch URLs are processed sequentially using existing render_url function
  • Failed URLs are skipped without stopping execution

Usage

Single URL

pixelshot https://example.com --output ./tiles

Batch input

pixelshot urls.txt --output ./tiles

Example urls.txt

https://example.com/a
https://example.com/b
https://example.com/c

Notes

  • CLI-only change
  • No changes to rendering / embedding / indexing logic
  • No new dependencies introduced
  • Fully backward compatible

Error handling

  • Invalid or failing URLs are skipped
  • Batch processing continues normally

Impact

This improves usability for processing multiple pages without changing the core system design.

Risk

Low. Isolated CLI enhancement with no core pipeline modifications.

@vercel

vercel Bot commented Jun 22, 2026

Copy link
Copy Markdown

@DyNooob is attempting to deploy a commit to the andylizf's projects Team on Vercel.

A member of the Team first needs to authorize it.

@andylizf andylizf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding .txt batch input support — the file-reading and error-handling per URL are good additions.

However, this PR has a silent performance regression: the old code called render_urls(urls, ...) which dispatches all URLs across --workers parallel browser processes. The new code loops for url in urls: render_url(url, ...) — making rendering serial across URLs. The --workers flag still exists but is now effectively a no-op on the CLI path, so a user with a 100-URL text file would see ~4x slower throughput without knowing why.

Could you keep the batched render_urls call (which parallelizes across workers) and add the .txt expansion + per-URL error handling around it? Something like:

# expand .txt inputs into URLs
urls = []
for inp in args.inputs:
    if inp.lower().endswith(".txt"):
        urls.extend(line.strip() for line in open(inp) if line.strip())
    else:
        urls.append(inp)

# still use the batched parallel path
render_urls(urls, args.output, workers=args.workers, ...)

Also, ruff format is needed — CI's ruff format --check fails on the new test.

Minor: the bare open(inp) for .txt files should be guarded (FileNotFoundError → clean error message).

@DyNooob

DyNooob commented Jun 23, 2026

Copy link
Copy Markdown
Author

Good catch.
I restored the batched render_urls(...) path so--workerscontinues to parallelize URL rendering, and kept.txtinput as a URL expansion step before the batch call.
I also added a clean FileNotFoundError parser error for missing .txt files and ran ruff format; the new tests now assert that .txt input uses render_urls with the requested workers

@DyNooob DyNooob requested a review from andylizf June 23, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants