feat(cli): add batch URL input support#64
Conversation
|
@DyNooob is attempting to deploy a commit to the andylizf's projects Team on Vercel. A member of the Team first needs to authorize it. |
andylizf
left a comment
There was a problem hiding this comment.
Thanks for adding .txt batch input support — the file-reading and error-handling per URL are good additions.
However, this PR has a silent performance regression: the old code called render_urls(urls, ...) which dispatches all URLs across --workers parallel browser processes. The new code loops for url in urls: render_url(url, ...) — making rendering serial across URLs. The --workers flag still exists but is now effectively a no-op on the CLI path, so a user with a 100-URL text file would see ~4x slower throughput without knowing why.
Could you keep the batched render_urls call (which parallelizes across workers) and add the .txt expansion + per-URL error handling around it? Something like:
# expand .txt inputs into URLs
urls = []
for inp in args.inputs:
if inp.lower().endswith(".txt"):
urls.extend(line.strip() for line in open(inp) if line.strip())
else:
urls.append(inp)
# still use the batched parallel path
render_urls(urls, args.output, workers=args.workers, ...)Also, ruff format is needed — CI's ruff format --check fails on the new test.
Minor: the bare open(inp) for .txt files should be guarded (FileNotFoundError → clean error message).
|
Good catch. |
Summary
Adds support for batch URL input in the PixelRAG CLI.
Users can now provide either a single URL or a
.txtfile containing multiple URLs (one per line). Each URL is processed using the existing rendering pipeline.Changes
.txtfile input in CLIrender_urlfunctionUsage
Single URL
Batch input
Example
urls.txtNotes
Error handling
Impact
This improves usability for processing multiple pages without changing the core system design.
Risk
Low. Isolated CLI enhancement with no core pipeline modifications.