diff --git a/QUICKSTART.md b/QUICKSTART.md index 8a45966..12cffa9 100644 --- a/QUICKSTART.md +++ b/QUICKSTART.md @@ -14,6 +14,8 @@ uv run playwright install ### 2. Configure LLM (Choose One) +**Important:** Extraction quality varies by LLM - stronger models find more specific tips! + #### Option A: OpenRouter (Recommended - Free Model!) ```bash cp .env.example .env @@ -63,23 +65,29 @@ Extract specific optimization tips for AI services: scapo scrape discover --update # Step 2: Extract tips for specific services -scapo scrape targeted --service "Eleven Labs" --limit 20 -scapo scrape targeted --service "GitHub Copilot" --limit 20 +scapo scrape targeted --service "Eleven Labs" --limit 20 --query-limit 20 +scapo scrape targeted --service "GitHub Copilot" --limit 20 --query-limit 20 # Or batch process by category -scapo scrape batch --category video --limit 15 +scapo scrape batch --category video --limit 20 --batch-size 3 # Process ALL priority services one by one -scapo scrape all --priority ultra --limit 20 # Process all ultra priority services -scapo scrape all --dry-run # Preview what will be processed +scapo scrape all --limit 20 --query-limit 20 --priority ultra # Process all ultra priority services +scapo scrape all --dry-run # Preview what will be processed ``` ### Key Commands: - `discover --update` - Find services from GitHub Awesome lists - `targeted --service NAME` - Extract tips for one service -- `batch --category TYPE` - Process multiple services (limited) +- `batch --category TYPE` - Process ALL services in category (in batches) - `all --priority LEVEL` - Process ALL services one by one +### Important Parameters: +- **--query-limit**: Number of search patterns (5 = quick, 20 = comprehensive) +- **--batch-size**: Services to process in parallel (3 = default balance) +- **--limit**: Posts per search (20+ recommended for best results) + + ## 📚 Approach 2: Legacy Sources Use predefined sources from `sources.yaml`: @@ -109,6 +117,27 @@ scapo models search "copilot" # Search for specific models cat models/audio/eleven-labs/cost_optimization.md ``` +### 5. (Optional) Use with Claude Desktop + +Add SCAPO as an MCP server to query your extracted tips (from models/ folder) directly in Claude: + +```json +// Add to claude_desktop_config.json +{ + "mcpServers": { + "scapo": { + "command": "npx", + "args": ["@scapo/mcp-server"], + "env": { + "SCAPO_MODELS_PATH": "path/to/scapo/models" + } + } + } +} +``` + +Then ask Claude: "Get best practices for Midjourney" - no Python needed! + ## 📊 Understanding the Output SCAPO creates organized documentation: @@ -126,13 +155,13 @@ models/ ```bash # ❌ Too few posts = no useful tips found -scapo scrape targeted --service "HeyGen" --limit 5 # ~20% success rate +scapo scrape targeted --service "HeyGen" --limit 5 --query-limit 5 # ~20% success rate # ✅ Sweet spot = reliable extraction -scapo scrape targeted --service "HeyGen" --limit 20 # ~80% success rate +scapo scrape targeted --service "HeyGen" --limit 20 --query-limit 20 # ~80% success rate # 🎯 Maximum insights = comprehensive coverage -scapo scrape targeted --service "HeyGen" --limit 30 # Finds rare edge cases +scapo scrape targeted --service "HeyGen" --limit 30 --query-limit 20 # Finds rare edge cases ``` **Why it matters:** LLMs need multiple examples to identify patterns. More posts = higher chance of finding specific pricing, bugs, and workarounds. @@ -148,7 +177,7 @@ LLM_QUALITY_THRESHOLD=0.4 # More tips (less strict) ### "No tips extracted" ```bash # Solution: Use more posts -scapo scrape targeted --service "Service Name" --limit 25 +scapo scrape targeted --service "Service Name" --limit 25 --query-limit 20 ``` ### "Service not found" diff --git a/README.md b/README.md index 74aeb33..2506f5b 100644 --- a/README.md +++ b/README.md @@ -53,7 +53,7 @@ scapo scrape discover --update Extract optimization tips for specific services ```bash -scapo scrape targeted --service "Eleven Labs" --limit 20 +scapo scrape targeted --service "Eleven Labs" --limit 20 --query-limit 20 ``` ![Scapo Discover](assets/scrape-targeted.gif) @@ -61,7 +61,7 @@ scapo scrape targeted --service "Eleven Labs" --limit 20 Batch process multiple priority services (Recommended) ```bash -scapo scrape batch --max-services 3 --category audio +scapo scrape batch --category audio --batch-size 3 --limit 20 ``` ![Scapo Discover](assets/scrape-batch.gif) @@ -89,6 +89,8 @@ uv run playwright install # Browser automation ### 2. Configure Your LLM Provider +**Note:** Extraction quality depends on your chosen LLM - experiment with different models for best results! + #### Recommended: OpenRouter (Cloud) ```bash cp .env.example .env @@ -111,14 +113,14 @@ Get your API key from [openrouter.ai](https://openrouter.ai/) scapo scrape discover --update # Step 2: Extract optimization tips for services -scapo scrape targeted --service "HeyGen" --limit 20 -scapo scrape targeted --service "Midjourney" --limit 20 +scapo scrape targeted --service "HeyGen" --limit 20 --query-limit 20 +scapo scrape targeted --service "Midjourney" --limit 20 --query-limit 20 # Or batch process multiple services -scapo scrape batch --category video --limit 15 +scapo scrape batch --category video --limit 20 --batch-size 3 # Process ALL priority services one by one (i.e. all services with 'ultra' tag, see targted_search_generator.py) -scapo scrape all --priority ultra --limit 20 +scapo scrape all --limit 20 --query-limit 20 --priority ultra ``` #### Option B: Legacy method: using sources.yaml file @@ -196,13 +198,13 @@ scapo scrape discover --show-all # List all services scapo scrape targeted \ --service "Eleven Labs" \ # Service name (handles variations, you can put whatever --> if we don't get hit in services.json, then it will be created under 'general' folder) --limit 20 \ # Posts per search (15-20 recommended) - --max-queries 10 # Number of searches + --query-limit 20 # Query patterns per service (20 = all) # Batch process scapo scrape batch \ --category audio \ # Filter by category - --max-services 3 \ # Services to process - --limit 15 # Posts per search + --batch-size 3 \ # Services per batch + --limit 20 # Posts per search ### Legacy Sources Mode @@ -249,13 +251,52 @@ SCRAPING_DELAY_SECONDS=2 # Be respectful MAX_POSTS_PER_SCRAPE=100 # Limit per source ``` -### Why --limit Matters (More Posts = Better Tips) +### Key Parameters Explained + +**--query-limit** (How many search patterns per service) +```bash +--query-limit 5 # Quick scan: 1 pattern per category (cost, optimization, technical, workarounds, bugs) +--query-limit 20 # Full scan: All 4 patterns per category (default, most comprehensive) +``` + +**--batch-size** (For `batch` command: services processed in parallel) +```bash +--batch-size 1 # Sequential (slowest, least resource intensive) +--batch-size 3 # Default (good balance) +--batch-size 5 # Faster (more resource intensive) +``` + +**--limit** (Posts per search - More = Better extraction) ```bash --limit 5 # ❌ Often finds nothing (too few samples) --limit 15 # ✅ Good baseline (finds common issues) --limit 25 # 🎯 Will find something (as long as there is active discussion on it) ``` -so, hand-wavy breakdown: With 5 posts, extraction success ~20%. With 20+ posts, success jumps to ~80%. +Hand-wavy breakdown: With 5 posts, extraction success ~20%. With 20+ posts, success jumps to ~80%. + +## 🤖 MCP Server for Claude Desktop + +Query your extracted tips directly in Claude (reads from models/ folder - run scrapers first!): + +```json +// Add to %APPDATA%\Claude\claude_desktop_config.json (Windows) +// or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) +{ + "mcpServers": { + "scapo": { + "command": "npx", + "args": ["@scapo/mcp-server"], + "env": { + "SCAPO_MODELS_PATH": "C:\\path\\to\\scapo\\models" // Your models folder + } + } + } +} +``` + +Then ask Claude: "Get me best practices for GitHub Copilot" or "What models are good for coding?" + +See [mcp/README.md](mcp/README.md) for full setup and available commands. ## 🎨 Interactive TUI diff --git a/assets/intro.gif b/assets/intro.gif index 61e048f..ec56711 100644 Binary files a/assets/intro.gif and b/assets/intro.gif differ diff --git a/assets/legacy.gif b/assets/legacy.gif index 3fe56eb..888be27 100644 Binary files a/assets/legacy.gif and b/assets/legacy.gif differ diff --git a/assets/scrape-batch.gif b/assets/scrape-batch.gif index c2f2d43..b0a5c6e 100644 Binary files a/assets/scrape-batch.gif and b/assets/scrape-batch.gif differ diff --git a/assets/scrape-discovery.gif b/assets/scrape-discovery.gif index c9d0e7f..a55eaad 100644 Binary files a/assets/scrape-discovery.gif and b/assets/scrape-discovery.gif differ diff --git a/assets/scrape-targeted.gif b/assets/scrape-targeted.gif index 7b38c98..685b7bd 100644 Binary files a/assets/scrape-targeted.gif and b/assets/scrape-targeted.gif differ diff --git a/assets/tui.gif b/assets/tui.gif index 065f578..60e0f88 100644 Binary files a/assets/tui.gif and b/assets/tui.gif differ diff --git a/models/audio/eleven-labs/cost_optimization.md b/models/audio/eleven-labs/cost_optimization.md index b775087..21a53fd 100644 --- a/models/audio/eleven-labs/cost_optimization.md +++ b/models/audio/eleven-labs/cost_optimization.md @@ -1,10 +1,25 @@ # Eleven Labs - Cost Optimization Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Cost & Pricing Information -- 60% of credits left (~400,000 credits) -- Subscription renewal failed due to paywall issues +- Free trial limited to 10,000 characters per month +- 60% of credits left (about 400,000 credits) +- $15k saved in ElevenLabs fees +- Free access limited to 15 minutes of voice recording per day +- Last year I was paying +$1000/month for AI voiceovers for only one channel. +- $29/month for unlimited usage on ElevenReader. - $99/month plan +- $29/month for unlimited +- Credits should last until June 5th +- 10,000 free credits per month on the free plan. + +## Money-Saving Tips + +- I built my own tool, just for me. No subscriptions, no limits, just fast, clean voice generation. Cost me ~ $4/month to run. +- MiniMax have daily credit refresh in TTS not like ElevenLabs where you need to wait 1 month to refresh. +- Use the free plan to get 10,000 credits per month for free. +- So, when I do, I use a temporary email to create a new account so the 10,000 chatacter limit 'resets.' +- When converting text to voice, adding periods between letters (e.g., B.O.D.) can force the model to pronounce acronyms letter by letter, though it may consume more credits. diff --git a/models/audio/eleven-labs/metadata.json b/models/audio/eleven-labs/metadata.json index a81c5cd..648a41d 100644 --- a/models/audio/eleven-labs/metadata.json +++ b/models/audio/eleven-labs/metadata.json @@ -1,13 +1,13 @@ { "service": "Eleven Labs", "category": "audio", - "last_updated": "2025-08-14T18:53:47.086694", + "last_updated": "2025-08-16T13:46:28.510586", "extraction_timestamp": null, "data_sources": [ "Reddit API", "Community discussions" ], - "posts_analyzed": 79, + "posts_analyzed": 338, "confidence": "medium", "version": "1.0.0" } \ No newline at end of file diff --git a/models/audio/eleven-labs/parameters.json b/models/audio/eleven-labs/parameters.json index c2ea1ac..14be4a6 100644 --- a/models/audio/eleven-labs/parameters.json +++ b/models/audio/eleven-labs/parameters.json @@ -1,6 +1,6 @@ { "service": "Eleven Labs", - "last_updated": "2025-08-14T18:53:46.993256", + "last_updated": "2025-08-16T13:46:28.342822", "recommended_settings": { "setting_0": { "description": "voice_name=Mun W" @@ -16,9 +16,12 @@ } }, "cost_optimization": { - "tip_0": "60% of credits left (~400,000 credits)", - "tip_1": "Subscription renewal failed due to paywall issues", - "pricing": "$99/month plan" + "tip_0": "Free trial limited to 10,000 characters per month", + "tip_1": "60% of credits left (about 400,000 credits)", + "pricing": "$29/month for unlimited", + "tip_3": "Free access limited to 15 minutes of voice recording per day", + "tip_4": "Credits should last until June 5th", + "tip_5": "10,000 free credits per month on the free plan." }, "sources": [ "Reddit community", diff --git a/models/audio/eleven-labs/pitfalls.md b/models/audio/eleven-labs/pitfalls.md index 8d0794f..4a0864b 100644 --- a/models/audio/eleven-labs/pitfalls.md +++ b/models/audio/eleven-labs/pitfalls.md @@ -1,16 +1,35 @@ # Eleven Labs - Common Pitfalls & Issues -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Technical Issues -### ⚠️ Unable to switch back to a Custom LLM after testing with a built-in model (gemini-2.0-flash); interface shows 'Fix the errors to proceed' even though Server URL, Model ID, and API Key are correctly filled. +### ⚠️ Cannot switch back to a Custom LLM after testing with a built-in model (gemini-2.0-flash) on the ElevenLabs Conversational AI dashboard; even after correctly filling out Server URL, Model ID, and API Key, the interface still shows the message: 'Fix the errors to proceed' even though there is no error. **Fix**: Store API keys in environment variables or use a secrets manager. -### ⚠️ audio plays back a female voice regardless of which option is selected when using elevenLabs API +### ⚠️ ElevenLabs API always returns a female voice regardless of the selected gender option + +### ⚠️ Tasker Action Error: 'HTTP Request' (step 11) Task: 'Text To Speech To File Elevenlabs {"detail":{"status":"invalid_uid","message". "An invalid ID has been received: %voice_id'. Make sure to provide a correct one."} ## Policy & Account Issues -### ⚠️ Account credits wiped (about 400,000 credits) after attempting to renew a $99/month subscription; paywall prevented payment and support ticket received no response. +### ⚠️ Eleven Labs wiped 400,000 credits from a user's account on the $99/month plan; the user had 60% of credits left (about 400,000 credits) and was unable to renew subscription due to paywall issues. +**Note**: Be aware of terms of service regarding account creation. + +### ⚠️ Free trial for ElevenLabs is limited to 10,000 characters a month, which is insufficient for scripts that are often ~20-40,000 characters long. **Note**: Be aware of terms of service regarding account creation. +## Cost & Limits + +### 💰 ElevenReader credit system is considered bad by some users, making it off-putting for average consumers. + +### 💰 Free access to ElevenLabs is limited to 15 minutes of voice recording per day. + +### 💰 Free trial limited to 10,000 characters per month + +### 💰 Free access limited to 15 minutes of voice recording per day + +### 💰 $29/month for unlimited usage on ElevenReader. + +### 💰 $29/month for unlimited + diff --git a/models/audio/eleven-labs/prompting.md b/models/audio/eleven-labs/prompting.md index 9fbb41e..7d1e1a3 100644 --- a/models/audio/eleven-labs/prompting.md +++ b/models/audio/eleven-labs/prompting.md @@ -1,11 +1,21 @@ # Eleven Labs Prompting Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Tips & Techniques +- I built my own tool, just for me. No subscriptions, no limits, just fast, clean voice generation. Cost me ~ $4/month to run. +- Use ElevenLabsService(voice_name="Mun W") in Manim Voiceover +- MiniMax have daily credit refresh in TTS not like ElevenLabs where you need to wait 1 month to refresh. +- The ElevenLabs voice agent is the entry point into the whole system, and then it will pass off web development or web design requests over to n8n agents via a webhook in order to actually do the work. +- Use the free plan to get 10,000 credits per month for free. +- So, when I do, I use a temporary email to create a new account so the 10,000 chatacter limit 'resets.' - self.set_speech_service(ElevenLabsService(voice_name="Mun W")) +- MacWhisper 11.10 supports ElevenLabs Scribe for cloud transcription. - from manim_voiceover.services.elevenlabs import ElevenLabsService +- I built my own tool to avoid ElevenLabs fees. +- When converting text to voice, adding periods between letters (e.g., B.O.D.) can force the model to pronounce acronyms letter by letter, though it may consume more credits. +- ElevenLabs Scribe v1 achieves 15.0% WER on 5-10 minute patient-doctor chats, averaging 36 seconds per file. ## Recommended Settings diff --git a/models/audio/firefliesai/metadata.json b/models/audio/firefliesai/metadata.json new file mode 100644 index 0000000..42938a1 --- /dev/null +++ b/models/audio/firefliesai/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Fireflies.ai", + "category": "audio", + "last_updated": "2025-08-16T13:46:29.623761", + "extraction_timestamp": "2025-08-16T13:29:54.297790", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 171, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/audio/firefliesai/pitfalls.md b/models/audio/firefliesai/pitfalls.md new file mode 100644 index 0000000..76989fb --- /dev/null +++ b/models/audio/firefliesai/pitfalls.md @@ -0,0 +1,8 @@ +# Fireflies.ai - Common Pitfalls & Issues + +*Last updated: 2025-08-16* + +## Technical Issues + +### ⚠️ Failed to create a send channel message in Slack. Error from Slack: invalid_thread_ts + diff --git a/models/audio/firefliesai/prompting.md b/models/audio/firefliesai/prompting.md new file mode 100644 index 0000000..0d2dc0d --- /dev/null +++ b/models/audio/firefliesai/prompting.md @@ -0,0 +1,14 @@ +# Fireflies.ai Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- Configure Zapier to send transcripts to a channel without duplicate notifications by adjusting thread settings +- Use custom prompts called 'apps' in Fireflies.ai to create reusable ready‑made prompts. +- Use Zapier to send Fireflies.ai transcripts to Slack + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/audio/mubert/metadata.json b/models/audio/mubert/metadata.json new file mode 100644 index 0000000..3a899a4 --- /dev/null +++ b/models/audio/mubert/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Mubert", + "category": "audio", + "last_updated": "2025-08-16T13:46:30.709332", + "extraction_timestamp": "2025-08-16T13:42:52.386062", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 39, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/audio/mubert/prompting.md b/models/audio/mubert/prompting.md new file mode 100644 index 0000000..01ce399 --- /dev/null +++ b/models/audio/mubert/prompting.md @@ -0,0 +1,12 @@ +# Mubert Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- If the order was approved by the moderator and submitted in due time, payment will be made within 7-10 business days. + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/audio/murf-ai/cost_optimization.md b/models/audio/murf-ai/cost_optimization.md new file mode 100644 index 0000000..4a30bf6 --- /dev/null +++ b/models/audio/murf-ai/cost_optimization.md @@ -0,0 +1,8 @@ +# Murf AI - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- Get 33% OFF on Murf AI annual plans Today — [Click Here to Redeem](https://get.murf.ai/pu42t7km32e9) + diff --git a/models/audio/murf-ai/metadata.json b/models/audio/murf-ai/metadata.json new file mode 100644 index 0000000..5e08fb5 --- /dev/null +++ b/models/audio/murf-ai/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Murf AI", + "category": "audio", + "last_updated": "2025-08-16T13:46:28.892820", + "extraction_timestamp": "2025-08-16T13:25:27.042235", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 111, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/audio/murf-ai/parameters.json b/models/audio/murf-ai/parameters.json new file mode 100644 index 0000000..072d4bc --- /dev/null +++ b/models/audio/murf-ai/parameters.json @@ -0,0 +1,12 @@ +{ + "service": "Murf AI", + "last_updated": "2025-08-16T13:46:28.710482", + "recommended_settings": {}, + "cost_optimization": { + "tip_0": "Get 33% OFF on Murf AI annual plans Today \u2014 [Click Here to Redeem](https://get.murf.ai/pu42t7km32e9)" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/audio/murf-ai/prompting.md b/models/audio/murf-ai/prompting.md new file mode 100644 index 0000000..dd1827c --- /dev/null +++ b/models/audio/murf-ai/prompting.md @@ -0,0 +1,12 @@ +# Murf AI Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- Murf AI turns plain text into ultra realistic speech across 20+ languages with over 200 voices—no recording booth required. + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/audio/otterai/cost_optimization.md b/models/audio/otterai/cost_optimization.md new file mode 100644 index 0000000..9bad7e8 --- /dev/null +++ b/models/audio/otterai/cost_optimization.md @@ -0,0 +1,14 @@ +# Otter.ai - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- subscription tier that had the 100 hours of audio transcription per month + +## Money-Saving Tips + +- The free tier of Otter.ai feels limited, especially for users needing more than min of transcription per month for FREE +- $100 per year for the paid plan that includes automatic video import +- Use Otter.ai to transcribe your videos, which offers 600 minutes of transcription per month for FREE. + diff --git a/models/audio/otterai/metadata.json b/models/audio/otterai/metadata.json new file mode 100644 index 0000000..b2075bd --- /dev/null +++ b/models/audio/otterai/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Otter.ai", + "category": "audio", + "last_updated": "2025-08-16T13:46:29.256512", + "extraction_timestamp": null, + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 174, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/audio/otterai/parameters.json b/models/audio/otterai/parameters.json new file mode 100644 index 0000000..266cbca --- /dev/null +++ b/models/audio/otterai/parameters.json @@ -0,0 +1,12 @@ +{ + "service": "Otter.ai", + "last_updated": "2025-08-16T13:46:29.085739", + "recommended_settings": {}, + "cost_optimization": { + "tip_0": "subscription tier that had the 100 hours of audio transcription per month" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/audio/otterai/prompting.md b/models/audio/otterai/prompting.md new file mode 100644 index 0000000..e0f1670 --- /dev/null +++ b/models/audio/otterai/prompting.md @@ -0,0 +1,15 @@ +# Otter.ai Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- The free tier of Otter.ai feels limited, especially for users needing more than min of transcription per month for FREE +- $100 per year for the paid plan that includes automatic video import +- Use Otter.ai to transcribe your videos, which offers 600 minutes of transcription per month for FREE. +- Otter.ai allows you to import your videos automatically for transcription. + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/audio/stable-audio/cost_optimization.md b/models/audio/stable-audio/cost_optimization.md new file mode 100644 index 0000000..2ac60cd --- /dev/null +++ b/models/audio/stable-audio/cost_optimization.md @@ -0,0 +1,8 @@ +# Stable Audio - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- free version for generating and downloading tracks up to 45 seconds long + diff --git a/models/audio/stable-audio/metadata.json b/models/audio/stable-audio/metadata.json new file mode 100644 index 0000000..c1dc7e9 --- /dev/null +++ b/models/audio/stable-audio/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Stable Audio", + "category": "audio", + "last_updated": "2025-08-16T13:46:30.308483", + "extraction_timestamp": "2025-08-16T13:35:08.539487", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 142, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/audio/stable-audio/parameters.json b/models/audio/stable-audio/parameters.json new file mode 100644 index 0000000..e79ca91 --- /dev/null +++ b/models/audio/stable-audio/parameters.json @@ -0,0 +1,12 @@ +{ + "service": "Stable Audio", + "last_updated": "2025-08-16T13:46:30.150034", + "recommended_settings": {}, + "cost_optimization": { + "tip_0": "free version for generating and downloading tracks up to 45 seconds long" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/audio/stable-audio/pitfalls.md b/models/audio/stable-audio/pitfalls.md new file mode 100644 index 0000000..9681165 --- /dev/null +++ b/models/audio/stable-audio/pitfalls.md @@ -0,0 +1,8 @@ +# Stable Audio - Common Pitfalls & Issues + +*Last updated: 2025-08-16* + +## Technical Issues + +### ⚠️ I'm not sure if this is an intentional thing or a bug, but when I try to disable stable audio, it just doesn't do anything? I'll click it, it will take me out of the video settings, but will keep stable audio on. Has anyone else encountered this, and if so, is there a surefire way of fixing it, or is it meant to not be turned off for safety reasons or something? + diff --git a/models/audio/stable-audio/prompting.md b/models/audio/stable-audio/prompting.md new file mode 100644 index 0000000..fe556c4 --- /dev/null +++ b/models/audio/stable-audio/prompting.md @@ -0,0 +1,19 @@ +# Stable Audio Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- Stability AI has released Stable Audio 2.0, an AI model that generates high-quality, full-length audio tracks up to 3 minutes long with coherent musical structure. +- Stable Audio 2.0 introduces audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts. +- Stable audio by placing the SSDT-HPET.aml and some settings in the config.plist file. +- Stable Audio 2.0 can generate high-quality, full-length audio tracks up to 3 minutes long with coherent musical structure. +- The model introduces audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts. +- Generates high-quality, full-length audio tracks up to 3 minutes long with coherent musical structure +- The model introduces audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts +- The model introduces audio-to-audio generation, allowing users to transform uploaded samples using natural language prompts, and enhances sound effect gener + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/audio/whisper/cost_optimization.md b/models/audio/whisper/cost_optimization.md new file mode 100644 index 0000000..ce1ee78 --- /dev/null +++ b/models/audio/whisper/cost_optimization.md @@ -0,0 +1,13 @@ +# Whisper - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- free, private, and unlimited transcription system + +## Money-Saving Tips + +- I believe in the power of open-source tools and want to share how you can set up a free, private, and unlimited transcription system on your own computer using OpenAI's Whisper. +- MacWhisper is a free Mac app to transcribe audio and video files for easy transcription and subtitle generation. + diff --git a/models/audio/whisper/metadata.json b/models/audio/whisper/metadata.json new file mode 100644 index 0000000..7ddf94b --- /dev/null +++ b/models/audio/whisper/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Whisper", + "category": "audio", + "last_updated": "2025-08-16T13:46:29.973474", + "extraction_timestamp": "2025-08-16T13:31:51.650731", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 380, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/audio/whisper/parameters.json b/models/audio/whisper/parameters.json new file mode 100644 index 0000000..fedfb96 --- /dev/null +++ b/models/audio/whisper/parameters.json @@ -0,0 +1,37 @@ +{ + "service": "Whisper", + "last_updated": "2025-08-16T13:46:29.798264", + "recommended_settings": { + "setting_0": { + "description": "language=en" + }, + "setting_1": { + "description": "beam_size=5" + }, + "setting_2": { + "description": "no_speech_threshold=0.3" + }, + "setting_3": { + "description": "condition_on_previous_text=False" + }, + "setting_4": { + "description": "temperature=0" + }, + "setting_5": { + "description": "vad_filter=True" + }, + "setting_6": { + "description": "model=base" + }, + "setting_7": { + "description": "endpoint=http://192.168.60.96:5000/transcribe" + } + }, + "cost_optimization": { + "unlimited_option": "free, private, and unlimited transcription system" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/audio/whisper/pitfalls.md b/models/audio/whisper/pitfalls.md new file mode 100644 index 0000000..228339a --- /dev/null +++ b/models/audio/whisper/pitfalls.md @@ -0,0 +1,18 @@ +# Whisper - Common Pitfalls & Issues + +*Last updated: 2025-08-16* + +## Technical Issues + +### ⚠️ faster-whisper is not compatible with the Open AI API + +### ⚠️ whisper.cpp is not compatible with the Open AI API + +### ⚠️ Need alternative to MacWhisper that allows connecting an API to use Whisper via the Groq API and is open‑source and free + +### ⚠️ OpenWebUI does not offer API endpoint for Whisper (for audio transcriptions) + +## Cost & Limits + +### 💰 free, private, and unlimited transcription system + diff --git a/models/audio/whisper/prompting.md b/models/audio/whisper/prompting.md new file mode 100644 index 0000000..67393a9 --- /dev/null +++ b/models/audio/whisper/prompting.md @@ -0,0 +1,29 @@ +# Whisper Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- Use faster-whisper-server (now speaches) for a local STT server that adheres to the Open AI Whisper API +- Use the offline version of Whisper that does not require an API key so you won't be paying a few cents each time the scripts run +- It can recognize speech in numerous languages and convert it to text. +- I believe in the power of open-source tools and want to share how you can set up a free, private, and unlimited transcription system on your own computer using OpenAI's Whisper. +- Use transcribe(file_path, language="en", beam_size=5, no_speech_threshold=0.3, condition_on_previous_text=False, temperature=0, vad_filter=True) to minimize hallucinations +- Use curl to send audio to local Whisper API: curl -X POST -F "audio=@/2025-02-03_14-31-12.m4a" -F "model=base" http://192.168.60.96:5000/transcribe +- MacWhisper is a free Mac app to transcribe audio and video files for easy transcription and subtitle generation. + +## Recommended Settings + +- language=en +- beam_size=5 +- no_speech_threshold=0.3 +- condition_on_previous_text=False +- temperature=0 +- vad_filter=True +- model=base +- endpoint=http://192.168.60.96:5000/transcribe + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/video/beatovenai/cost_optimization.md b/models/video/beatovenai/cost_optimization.md new file mode 100644 index 0000000..2605501 --- /dev/null +++ b/models/video/beatovenai/cost_optimization.md @@ -0,0 +1,12 @@ +# Beatoven.ai - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- it's absolutely free!! + +## Money-Saving Tips + +- You can recompose, edit and change as per your convenience for any time limit you want + diff --git a/models/video/beatovenai/metadata.json b/models/video/beatovenai/metadata.json new file mode 100644 index 0000000..eb85bb3 --- /dev/null +++ b/models/video/beatovenai/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Beatoven.ai", + "category": "video", + "last_updated": "2025-08-16T13:04:55.894824", + "extraction_timestamp": "2025-08-16T13:03:25.229739", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 25, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/video/beatovenai/parameters.json b/models/video/beatovenai/parameters.json new file mode 100644 index 0000000..b223a99 --- /dev/null +++ b/models/video/beatovenai/parameters.json @@ -0,0 +1,12 @@ +{ + "service": "Beatoven.ai", + "last_updated": "2025-08-16T13:04:55.144659", + "recommended_settings": {}, + "cost_optimization": { + "tip_0": "it's absolutely free!!" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/video/beatovenai/prompting.md b/models/video/beatovenai/prompting.md new file mode 100644 index 0000000..ad632b5 --- /dev/null +++ b/models/video/beatovenai/prompting.md @@ -0,0 +1,13 @@ +# Beatoven.ai Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- Simple to use the tool- choose the duration, tempo, genre, mood, and voila! you have the tune that carries your stor +- You can recompose, edit and change as per your convenience for any time limit you want + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/video/generative-ai-video/cost_optimization.md b/models/video/generative-ai-video/cost_optimization.md new file mode 100644 index 0000000..38addc4 --- /dev/null +++ b/models/video/generative-ai-video/cost_optimization.md @@ -0,0 +1,13 @@ +# Generative AI Video - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- $1 per video +- $250/month + +## Money-Saving Tips + +- I made a free, open-source app to generate AI video from images directly in your DaVinci Resolve Media Pool. + diff --git a/models/video/generative-ai-video/metadata.json b/models/video/generative-ai-video/metadata.json new file mode 100644 index 0000000..fc0c708 --- /dev/null +++ b/models/video/generative-ai-video/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Generative AI Video", + "category": "video", + "last_updated": "2025-08-16T13:04:51.485428", + "extraction_timestamp": "2025-08-16T12:57:13.619612", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 137, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/video/generative-ai-video/parameters.json b/models/video/generative-ai-video/parameters.json new file mode 100644 index 0000000..8ca0a1d --- /dev/null +++ b/models/video/generative-ai-video/parameters.json @@ -0,0 +1,16 @@ +{ + "service": "Generative AI Video", + "last_updated": "2025-08-16T13:04:50.720592", + "recommended_settings": { + "setting_0": { + "description": "aspect_ratio=9:16" + } + }, + "cost_optimization": { + "pricing": "$250/month" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/video/generative-ai-video/pitfalls.md b/models/video/generative-ai-video/pitfalls.md new file mode 100644 index 0000000..53d696f --- /dev/null +++ b/models/video/generative-ai-video/pitfalls.md @@ -0,0 +1,8 @@ +# Generative AI Video - Common Pitfalls & Issues + +*Last updated: 2025-08-16* + +## Technical Issues + +### ⚠️ Trying to generate AI video to go along with my music, but I get the same error every time. Image is upscaled, passed to VAE encode into a KSampler where the error occurs. Error occurred when executing KSampler: cannot access local variable 'cond_item' where it is not associated with a value. + diff --git a/models/video/generative-ai-video/prompting.md b/models/video/generative-ai-video/prompting.md new file mode 100644 index 0000000..1045ebf --- /dev/null +++ b/models/video/generative-ai-video/prompting.md @@ -0,0 +1,20 @@ +# Generative AI Video Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- I have another tool I created which generates AI videos from text. +- I made a free, open-source app to generate AI video from images directly in your DaVinci Resolve Media Pool. +- Use Veo 3 Fast API to automatically generate AI videos in seconds +- AnimateDiff in ComfyUI is an amazing way to generate AI Videos. +- WORKFLOWS ARE ON CIVIT https://civitai.com/articles/2379 AS WELL AS THIS GUIDE WITH PICTURES + +## Recommended Settings + +- aspect_ratio=9:16 + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/video/heygen/cost_optimization.md b/models/video/heygen/cost_optimization.md index 0ef6237..8b70a30 100644 --- a/models/video/heygen/cost_optimization.md +++ b/models/video/heygen/cost_optimization.md @@ -1,16 +1,10 @@ # HeyGen - Cost Optimization Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Cost & Pricing Information -- Why am I doing this? HeyGen requires a minimum of two seats for -- This is the same price as a single seat in the Team plan when billed annually. -- You get all Team plan features and benefits. -- You pay $30/month (billed monthly). -- No annual commitment required from you. - -## Money-Saving Tips - -- If you need longer videos and priority rendering, consider the HeyGen Team plan. +- Creating 40 videos of 30 minutes each would cost approximately $4,800. +- $120 per 30 minutes (credits) for HeyGen video generation. +- You pay $30/month (billed monthly) for a single seat in the HeyGen Team plan, which is the same price as a single seat when billed annually. diff --git a/models/video/heygen/metadata.json b/models/video/heygen/metadata.json index 03f203f..3bf4dec 100644 --- a/models/video/heygen/metadata.json +++ b/models/video/heygen/metadata.json @@ -1,13 +1,13 @@ { "service": "HeyGen", "category": "video", - "last_updated": "2025-08-14T18:53:16.244752", - "extraction_timestamp": "2025-08-14T18:53:16.033238", + "last_updated": "2025-08-16T13:04:47.180134", + "extraction_timestamp": "2025-08-16T12:43:38.426608", "data_sources": [ "Reddit API", "Community discussions" ], - "posts_analyzed": 48, + "posts_analyzed": 226, "confidence": "medium", "version": "1.0.0" } \ No newline at end of file diff --git a/models/video/heygen/parameters.json b/models/video/heygen/parameters.json index 5198e59..0c5080a 100644 --- a/models/video/heygen/parameters.json +++ b/models/video/heygen/parameters.json @@ -1,13 +1,19 @@ { "service": "HeyGen", - "last_updated": "2025-08-14T18:53:16.148751", - "recommended_settings": {}, + "last_updated": "2025-08-16T13:04:46.544871", + "recommended_settings": { + "setting_0": { + "description": "voice_id=6d9be61f6e" + }, + "setting_1": { + "description": "type=voice" + }, + "setting_2": { + "description": "name=script_01" + } + }, "cost_optimization": { - "tip_0": "Why am I doing this? HeyGen requires a minimum of two seats for", - "tip_1": "This is the same price as a single seat in the Team plan when billed annually.", - "tip_2": "You get all Team plan features and benefits.", - "pricing": "You pay $30/month (billed monthly).", - "tip_4": "No annual commitment required from you." + "pricing": "You pay $30/month (billed monthly) for a single seat in the HeyGen Team plan, which is the same price as a single seat when billed annually." }, "sources": [ "Reddit community", diff --git a/models/video/heygen/pitfalls.md b/models/video/heygen/pitfalls.md new file mode 100644 index 0000000..fe3f3ab --- /dev/null +++ b/models/video/heygen/pitfalls.md @@ -0,0 +1,20 @@ +# HeyGen - Common Pitfalls & Issues + +*Last updated: 2025-08-16* + +## Technical Issues + +### ⚠️ "Text missing: CQ8hKNwp" error indicates missing text in API request + +### ⚠️ hit a red flag when I got to the API part and needed a credit card. + +### ⚠️ 404 error when loading generated video URL in N8N workflow + +### ⚠️ I decided to ask the HeyGen GPT to give me a worst case scenario on how bad could API costs r + +## Cost & Limits + +### 💰 HeyGen has a 30-minute limit per audio when using the unlimited plan, which is more focused on video. + +### 💰 HeyGen has limits on a paid subscription. + diff --git a/models/video/heygen/prompting.md b/models/video/heygen/prompting.md index 0b4e16b..105ee12 100644 --- a/models/video/heygen/prompting.md +++ b/models/video/heygen/prompting.md @@ -1,12 +1,23 @@ # HeyGen Prompting Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Tips & Techniques -- Go to HeyGen, click on 'Create New Avatar', pick the 'Hyper-Realistic' option, and upload a clear, 2-minute video of yourself to generate your avatar. -- After creating the avatar, start a new video project in HeyGen and connect your AI voice to the avatar. -- If you need longer videos and priority rendering, consider the HeyGen Team plan. +- HeyGen offers over 100+ customizable avatars for creating videos. +- Connect your AI voice to your avatar: Start a new video project in HeyGen. +- HeyGen affiliate program provides a 25% recurring commission for 12 months. +- Include a "text" field in the template JSON for each script variable to avoid the "Text missing" error +- Check that authentication headers or tokens are correctly set in the N8N HeyGen node to prevent 404 responses +- Verify that the video URL returned by HeyGen is correct and that the video has finished processing before attempting to load it in N8N +- Create New Avatar: Head over to HeyGen, click on 'Create New Avatar', pick the 'Hyper-Realistic' option, and upload a clear, 2-minute video of yourself to generate your avatar. +- Use HeyGen clone to automatically post short form videos about trending topics across multiple social media platforms. + +## Recommended Settings + +- voice_id=6d9be61f6e +- type=voice +- name=script_01 ## Sources diff --git a/models/video/lovoai/cost_optimization.md b/models/video/lovoai/cost_optimization.md new file mode 100644 index 0000000..e1bb4b4 --- /dev/null +++ b/models/video/lovoai/cost_optimization.md @@ -0,0 +1,12 @@ +# Lovo.ai - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- FREE + +## Money-Saving Tips + +- beta test offer to produce your work into audio format for FREE with AI + diff --git a/models/video/lovoai/metadata.json b/models/video/lovoai/metadata.json new file mode 100644 index 0000000..313ea20 --- /dev/null +++ b/models/video/lovoai/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Lovo.ai", + "category": "video", + "last_updated": "2025-08-16T13:04:54.434190", + "extraction_timestamp": "2025-08-16T13:02:23.552424", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 42, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/video/lovoai/parameters.json b/models/video/lovoai/parameters.json new file mode 100644 index 0000000..fee7c35 --- /dev/null +++ b/models/video/lovoai/parameters.json @@ -0,0 +1,12 @@ +{ + "service": "Lovo.ai", + "last_updated": "2025-08-16T13:04:53.737025", + "recommended_settings": {}, + "cost_optimization": { + "tip_0": "FREE" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/video/lovoai/prompting.md b/models/video/lovoai/prompting.md new file mode 100644 index 0000000..5ba49c1 --- /dev/null +++ b/models/video/lovoai/prompting.md @@ -0,0 +1,12 @@ +# Lovo.ai Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- beta test offer to produce your work into audio format for FREE with AI + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/video/meetgeek/metadata.json b/models/video/meetgeek/metadata.json new file mode 100644 index 0000000..19d61e0 --- /dev/null +++ b/models/video/meetgeek/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "MeetGeek", + "category": "video", + "last_updated": "2025-08-16T13:04:52.935395", + "extraction_timestamp": "2025-08-16T12:59:13.624879", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 6, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/video/meetgeek/prompting.md b/models/video/meetgeek/prompting.md new file mode 100644 index 0000000..3c9b44b --- /dev/null +++ b/models/video/meetgeek/prompting.md @@ -0,0 +1,15 @@ +# MeetGeek Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- MeetGeek supports platforms: Zoom, MS Teams, Google Meet, any browser-based meeting + offline meetings +- MeetGeek offers summaries & insights: action items, next steps, talk-to-listen ratios, sentiment, and key topic detection +- MeetGeek provides accurate AI-powered transcripts in 50+ languages +- MeetGeek enables collaboration via shareable meeting snip + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/models/video/pika/cost_optimization.md b/models/video/pika/cost_optimization.md new file mode 100644 index 0000000..6bc074a --- /dev/null +++ b/models/video/pika/cost_optimization.md @@ -0,0 +1,15 @@ +# Pika - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- Switching to 5GB tier saves $0.15/month +- free for next five days +- $1.41/mo for 10GB tier +- $1.26/mo for 5GB tier + +## Money-Saving Tips + +- Switch to the 5GB tier to save $0.15/month. + diff --git a/models/video/pika/metadata.json b/models/video/pika/metadata.json index d4f4cf2..fbee3ab 100644 --- a/models/video/pika/metadata.json +++ b/models/video/pika/metadata.json @@ -1,13 +1,13 @@ { "service": "Pika", "category": "video", - "last_updated": "2025-08-12T20:07:40.706852", - "extraction_timestamp": "2025-08-12T20:07:30.860248", + "last_updated": "2025-08-16T13:04:45.881806", + "extraction_timestamp": "2025-08-16T12:40:48.986723", "data_sources": [ "Reddit API", "Community discussions" ], - "posts_analyzed": 86, + "posts_analyzed": 333, "confidence": "medium", "version": "1.0.0" } \ No newline at end of file diff --git a/models/video/pika/parameters.json b/models/video/pika/parameters.json new file mode 100644 index 0000000..85caadc --- /dev/null +++ b/models/video/pika/parameters.json @@ -0,0 +1,13 @@ +{ + "service": "Pika", + "last_updated": "2025-08-16T13:04:45.145886", + "recommended_settings": {}, + "cost_optimization": { + "pricing": "$1.26/mo for 5GB tier", + "tip_1": "free for next five days" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/video/pika/pitfalls.md b/models/video/pika/pitfalls.md index e824d4e..11d0a44 100644 --- a/models/video/pika/pitfalls.md +++ b/models/video/pika/pitfalls.md @@ -1,8 +1,6 @@ # Pika - Common Pitfalls & Issues -*Last updated: 2025-08-12* +*Last updated: 2025-08-16* -## Technical Issues - -### ⚠️ Blank window bug when using Pika backup on Intel HD 4000 GPU +*No major issues reported yet. This may indicate limited community data.* diff --git a/models/video/pika/prompting.md b/models/video/pika/prompting.md index 1f51089..d6490d7 100644 --- a/models/video/pika/prompting.md +++ b/models/video/pika/prompting.md @@ -1,8 +1,12 @@ # Pika Prompting Guide -*Last updated: 2025-08-12* +*Last updated: 2025-08-16* -*No specific prompting tips available yet. Check back for updates.* +## Tips & Techniques + +- Check updates on GNOME software to get new mesa drivers and fix the blank window issue for apps using GTK4 such as pika backup +- Enable automatic backups for your pods. +- Switch to the 5GB tier to save $0.15/month. ## Sources diff --git a/models/video/runway/cost_optimization.md b/models/video/runway/cost_optimization.md index ea2e55c..3ca14bb 100644 --- a/models/video/runway/cost_optimization.md +++ b/models/video/runway/cost_optimization.md @@ -1,8 +1,16 @@ # Runway - Cost Optimization Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Cost & Pricing Information +- lol see the limit is 3 +- $95/month Unlimited plan +- $35/month pro plan for 90 videos (2250 credits) +- $15/month standard plan for 25 videos (625 credits, 25 credits per video) - $0.08 (8 credits) per image generation +## Money-Saving Tips + +- Use automation to bypass throttling on Unlimited accounts + diff --git a/models/video/runway/metadata.json b/models/video/runway/metadata.json index 0ae9460..8a0cd23 100644 --- a/models/video/runway/metadata.json +++ b/models/video/runway/metadata.json @@ -1,13 +1,13 @@ { "service": "Runway", "category": "video", - "last_updated": "2025-08-14T18:58:55.287621", - "extraction_timestamp": null, + "last_updated": "2025-08-16T13:04:44.483857", + "extraction_timestamp": "2025-08-16T12:37:17.234079", "data_sources": [ "Reddit API", "Community discussions" ], - "posts_analyzed": 75, + "posts_analyzed": 400, "confidence": "medium", "version": "1.0.0" } \ No newline at end of file diff --git a/models/video/runway/parameters.json b/models/video/runway/parameters.json index 224b659..d7097f3 100644 --- a/models/video/runway/parameters.json +++ b/models/video/runway/parameters.json @@ -1,12 +1,55 @@ { "service": "Runway", - "last_updated": "2025-08-14T18:58:55.198172", + "last_updated": "2025-08-16T13:04:43.762221", "recommended_settings": { "setting_0": { - "description": "max_reference_images=3" + "description": "credits_per_video=25" + }, + "setting_1": { + "description": "standard_plan_videos=25" + }, + "setting_2": { + "description": "standard_plan_credits=625" + }, + "setting_3": { + "description": "pro_plan_videos=90" + }, + "setting_4": { + "description": "pro_plan_credits=2250" + }, + "setting_5": { + "description": "unlimited_plan_cost=$95" + }, + "setting_6": { + "description": "standard_plan_cost=$15" + }, + "setting_7": { + "description": "pro_plan_cost=$35" + }, + "setting_8": { + "description": "automation_workaround=use automation script" + }, + "setting_9": { + "description": "credits_per_image=8" + }, + "setting_10": { + "description": "cost_per_image=0.08" + }, + "setting_11": { + "description": "max_references=3" + }, + "setting_12": { + "description": "sdk_version=python_v3.1" + }, + "setting_13": { + "description": "api_docs=https://docs.dev.runwayml.com/" + }, + "setting_14": { + "description": "api_key_url=https://dev.runwayml.com/" } }, "cost_optimization": { + "tip_0": "lol see the limit is 3", "pricing": "$0.08 (8 credits) per image generation" }, "sources": [ diff --git a/models/video/runway/pitfalls.md b/models/video/runway/pitfalls.md index 1874557..ced6cdb 100644 --- a/models/video/runway/pitfalls.md +++ b/models/video/runway/pitfalls.md @@ -1,8 +1,23 @@ # Runway - Common Pitfalls & Issues -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* + +## Technical Issues + +### ⚠️ stuck generating only one video at a time, with incredibly slow rendering times + +### ⚠️ Probably the most frustrating career bug for me right now is the landing scores. So often when I touch down it says that I landed off runway (I didn't) or that I entered the taxiway without announcing (still in the middle of the runway). When the second glitch happens it will immediately give me at least one taxiway speeding violation as well. Most of the time this will take my rating from an S down to a B, making it very difficult to keep my reputation up. I would say one of these bugs happens + +## Policy & Account Issues + +### ⚠️ Runway Unlimited accounts ($95/month) experience prominent throttling +**Note**: Be aware of terms of service regarding account creation. ## Cost & Limits -### 💰 limit is 3 +### 💰 Over the past few days, using my trained model (character) it doesn't seem to matter what style I select, I'll get photographic/realistic results. I'm trying to get cartoon and graphic novel results but it keeps spitting out realistic results (and most of the time it looks nothing like my trained model. What the hell am I paying for if it's garbage results 90% of the time? Was working fine for me a few days ago, now everything is trash (right after I dropped $30 on credits, no less). What gives + +### 💰 lol see the limit is 3 + +### 💰 $95/month Unlimited plan diff --git a/models/video/runway/prompting.md b/models/video/runway/prompting.md index 3d7b9f3..5995ec8 100644 --- a/models/video/runway/prompting.md +++ b/models/video/runway/prompting.md @@ -1,16 +1,32 @@ # Runway Prompting Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Tips & Techniques -- Gen-4 References is now available in our API -- Use Python SDK v3.1 → https://github.com/runwayml/sdk-python -- Get an API key at https://dev.runwayml.com/ +- Runway is pretty user-friendly, but only lets you input 30 seconds of video at a time. +- Use Gen-4 References via the Runway API to generate images with up to 3 reference images per request. +- Obtain an API key from https://dev.runwayml.com/ to access Runway’s image generation features. +- Use automation to bypass throttling on Unlimited accounts +- Use the Python SDK v3.1 for easier integration with Runway services. ## Recommended Settings -- max_reference_images=3 +- credits_per_video=25 +- standard_plan_videos=25 +- standard_plan_credits=625 +- pro_plan_videos=90 +- pro_plan_credits=2250 +- unlimited_plan_cost=$95 +- standard_plan_cost=$15 +- pro_plan_cost=$35 +- automation_workaround=use automation script +- credits_per_image=8 +- cost_per_image=0.08 +- max_references=3 +- sdk_version=python_v3.1 +- api_docs=https://docs.dev.runwayml.com/ +- api_key_url=https://dev.runwayml.com/ ## Sources diff --git a/models/video/runwayml/cost_optimization.md b/models/video/runwayml/cost_optimization.md index f32e0e2..e6d92d4 100644 --- a/models/video/runwayml/cost_optimization.md +++ b/models/video/runwayml/cost_optimization.md @@ -1,11 +1,12 @@ # RunwayML - Cost Optimization Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Cost & Pricing Information -- Runwayml Gen3 Too expensive than Kling and Vidu -- lol see the limit is 3 -- Unlimited [$95/month] accounts -- RunwayML Monthly 6 10s for 15$ +- daily_image_limit=3 + +## Money-Saving Tips + +- I think Sam is taking my image generation limit as expectation lol. lol see the limit is 3 diff --git a/models/video/runwayml/metadata.json b/models/video/runwayml/metadata.json index b237551..7af7b5c 100644 --- a/models/video/runwayml/metadata.json +++ b/models/video/runwayml/metadata.json @@ -1,13 +1,13 @@ { "service": "RunwayML", "category": "video", - "last_updated": "2025-08-14T18:55:25.468159", - "extraction_timestamp": "2025-08-14T18:55:16.191298", + "last_updated": "2025-08-16T13:04:48.622051", + "extraction_timestamp": null, "data_sources": [ "Reddit API", "Community discussions" ], - "posts_analyzed": 67, + "posts_analyzed": 282, "confidence": "medium", "version": "1.0.0" } \ No newline at end of file diff --git a/models/video/runwayml/parameters.json b/models/video/runwayml/parameters.json index d11f21e..f64e914 100644 --- a/models/video/runwayml/parameters.json +++ b/models/video/runwayml/parameters.json @@ -1,15 +1,13 @@ { "service": "RunwayML", - "last_updated": "2025-08-14T18:55:25.373596", + "last_updated": "2025-08-16T13:04:47.882957", "recommended_settings": { "setting_0": { - "description": "pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5" + "description": "model=runwayml/stable-diffusion-v1-5" } }, "cost_optimization": { - "tip_0": "Runwayml Gen3 Too expensive than Kling and Vidu", - "tip_1": "lol see the limit is 3", - "pricing": "RunwayML Monthly 6 10s for 15$" + "tip_0": "daily_image_limit=3" }, "sources": [ "Reddit community", diff --git a/models/video/runwayml/pitfalls.md b/models/video/runwayml/pitfalls.md index 8ef0b91..38331f4 100644 --- a/models/video/runwayml/pitfalls.md +++ b/models/video/runwayml/pitfalls.md @@ -1,15 +1,22 @@ # RunwayML - Common Pitfalls & Issues -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* -## Policy & Account Issues +## Technical Issues -### ⚠️ There were a lot of complaints about Runway due to prominent throttling of Unlimited [$95/month] accounts. -**Note**: Be aware of terms of service regarding account creation. +### ⚠️ Stuck generating only one video at a time, with incredibly slow rendering times while on the unlimited monthly plan. + +### ⚠️ Runway errors out saying that there are more than one face in the footage or they are too close together. + +### ⚠️ When using the model "runwayml/stable-diffusion-v1-5" on a SageMaker p2.XL instance, AWS starts downloading the model and after a few minutes crashes with error "OSError: [Errno 28] No space left on device" even after increasing the volume size from 5GB to 30GB. + +### ⚠️ subprocess.CalledProcessError: Command '['C:\Users\xande\Desktop\SkynetScribbles\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=C:/Users/xande/Desktop/SkynetScribbles/Training Images/Ro' + +### ⚠️ RunwayML does a decent job of animating Dall-E 3 images but they don't seem to have an API. ## Cost & Limits -### 💰 lol see the limit is 3 +### 💰 There was an issue on our end. Any used credits have been refunded (they may take a f -### 💰 Unlimited [$95/month] accounts +### 💰 daily_image_limit=3 diff --git a/models/video/runwayml/prompting.md b/models/video/runwayml/prompting.md index 749f278..b7b110a 100644 --- a/models/video/runwayml/prompting.md +++ b/models/video/runwayml/prompting.md @@ -1,15 +1,19 @@ # RunwayML Prompting Guide -*Last updated: 2025-08-14* +*Last updated: 2025-08-16* ## Tips & Techniques -- While throttling is bad, there's a reasonable workaround using automation (https://useapi.net/docs/articles/runway-bash). -- Use runwayml/stableiffusion-v1-5 as pretrained_model_name_or_path in training scripts +- Gen-3 Alpha is now generally available, previously only partners and testers. +- Runway brings 3D control to video generation. +- I think Sam is taking my image generation limit as expectation lol. lol see the limit is 3 +- Adobe partners with OpenAI, RunwayML & Pika for Premiere Pro. +- Runway brings 3D control to video generation +- RunwayML: Best KREA alternative for built-in timeline editing. ## Recommended Settings -- pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 +- model=runwayml/stable-diffusion-v1-5 ## Sources diff --git a/models/video/sora/cost_optimization.md b/models/video/sora/cost_optimization.md new file mode 100644 index 0000000..8dd4ac3 --- /dev/null +++ b/models/video/sora/cost_optimization.md @@ -0,0 +1,35 @@ +# Sora - Cost Optimization Guide + +*Last updated: 2025-08-16* + +## Cost & Pricing Information + +- 5 secs per video +- Sora video aspect ratio: 9:16 +- Sora video length limit: 5 seconds +- After free clips: 100 Microsoft Rewards points per video +- Free plan: 3 image generations per day +- Plus users get 1000 credits per month. +- Unlimited to all plus, team, and pro users (as per billing FAQ) +- Sora: $20 for 50 videos (or) $200 for 500 videos == $0.4 per video +- 10,000 credits for Sora pro plan +- 60 credits per generation (example from user) +- Free video generation: 10 clips per session +- Sora is available on the ChatGPT Plus or Pro plans +- Unlimited generations available for all paid tiers +- Cost: zero (so far) +- Table: | Resolution | Aspect Ratio | Credits | Total Generations | +|------------|----------------|---------|-------------------| +| 480p | 1:1 | 20 | 50 | +| 480p | 16:9/9:16 | 25 | 40 | +| 720p | 1:1 | 30 | 33 | +| 720p | 16:9/9:16 | 60 + +## Money-Saving Tips + +- You can generate up to 10 video clips for free; after that each video costs 100 Microsoft Rewards points +- Use unlimited generations available for all paid tiers (plus, team, pro users) +- Be aware that Sora video clips are currently limited to 5 seconds and 9:16 aspect ratio, suitable for TikTok and Instagram +- Use the Bing mobile app to access Sora’s free video generation feature +- If you need more than 3 image generations per day, consider upgrading to the PLUS plan + diff --git a/models/video/sora/metadata.json b/models/video/sora/metadata.json new file mode 100644 index 0000000..0ed1110 --- /dev/null +++ b/models/video/sora/metadata.json @@ -0,0 +1,13 @@ +{ + "service": "Sora", + "category": "video", + "last_updated": "2025-08-16T13:04:49.996421", + "extraction_timestamp": "2025-08-16T12:50:04.667498", + "data_sources": [ + "Reddit API", + "Community discussions" + ], + "posts_analyzed": 380, + "confidence": "medium", + "version": "1.0.0" +} \ No newline at end of file diff --git a/models/video/sora/parameters.json b/models/video/sora/parameters.json new file mode 100644 index 0000000..832ec36 --- /dev/null +++ b/models/video/sora/parameters.json @@ -0,0 +1,29 @@ +{ + "service": "Sora", + "last_updated": "2025-08-16T13:04:49.318183", + "recommended_settings": { + "setting_0": { + "description": "Image input capabilities=Enabled" + } + }, + "cost_optimization": { + "tip_0": "5 secs per video", + "tip_1": "Sora video aspect ratio: 9:16", + "tip_2": "Sora video length limit: 5 seconds", + "tip_3": "After free clips: 100 Microsoft Rewards points per video", + "tip_4": "Free plan: 3 image generations per day", + "tip_5": "Plus users get 1000 credits per month.", + "unlimited_option": "Unlimited generations available for all paid tiers", + "pricing": "Sora: $20 for 50 videos (or) $200 for 500 videos == $0.4 per video", + "tip_8": "10,000 credits for Sora pro plan", + "tip_9": "60 credits per generation (example from user)", + "tip_10": "Free video generation: 10 clips per session", + "tip_11": "Sora is available on the ChatGPT Plus or Pro plans", + "tip_12": "Cost: zero (so far)", + "tip_13": "Table: | Resolution | Aspect Ratio | Credits | Total Generations |\n|------------|----------------|---------|-------------------|\n| 480p | 1:1 | 20 | 50 |\n| 480p | 16:9/9:16 | 25 | 40 |\n| 720p | 1:1 | 30 | 33 |\n| 720p | 16:9/9:16 | 60" + }, + "sources": [ + "Reddit community", + "User reports" + ] +} \ No newline at end of file diff --git a/models/video/sora/pitfalls.md b/models/video/sora/pitfalls.md new file mode 100644 index 0000000..8a590c7 --- /dev/null +++ b/models/video/sora/pitfalls.md @@ -0,0 +1,16 @@ +# Sora - Common Pitfalls & Issues + +*Last updated: 2025-08-16* + +## Cost & Limits + +### 💰 Sora free plan limits image generation to 3 generations per day + +### 💰 Sora video generation is limited to 5 seconds in length and a vertical 9:16 aspect ratio + +### 💰 Sora video length limit: 5 seconds + +### 💰 Unlimited to all plus, team, and pro users (as per billing FAQ) + +### 💰 Unlimited generations available for all paid tiers + diff --git a/models/video/sora/prompting.md b/models/video/sora/prompting.md new file mode 100644 index 0000000..72bb40c --- /dev/null +++ b/models/video/sora/prompting.md @@ -0,0 +1,31 @@ +# Sora Prompting Guide + +*Last updated: 2025-08-16* + +## Tips & Techniques + +- You can generate up to 10 video clips for free; after that each video costs 100 Microsoft Rewards points +- Output specs: 1080p max, MP4 only, silent by default +- Generate 2x20 second videos at 720P for higher resolution base generations +- Use unlimited generations available for all paid tiers (plus, team, pro users) +- Use relaxed mode for faster waiting times (as experienced in Australia) +- Generate 4x20 second videos at 480P for maximum variations and then upscale the ones you like +- Queue up to three video generations at a time to keep the process efficient +- Be aware that Sora video clips are currently limited to 5 seconds and 9:16 aspect ratio, suitable for TikTok and Instagram +- 20s length cap (and why it's enforced) +- Use the Bing mobile app to access Sora’s free video generation feature +- If you need more than 3 image generations per day, consider upgrading to the PLUS plan +- There is an 8k dongle released recently to enable the 8k polling rate. +- or sora if you are in the eu, it is available on altstore pal. +- Resolution workarounds: upscale v +- Install the Betterjoy controller drivers for your PC via: https://github.com/Davidobot/BetterJoy +- I log into Sora, queue up the clips I need, then use the Chrome extension Video DownloadHelper to snag the MP4s. This gives me a bottomless library of short AI videos I can drop into product demos and social ads. + +## Recommended Settings + +- Image input capabilities=Enabled + +## Sources + +- Reddit community discussions +- User-reported experiences diff --git a/scripts/tapes/intro.tape b/scripts/tapes/intro.tape index a0320a6..0a13582 100644 --- a/scripts/tapes/intro.tape +++ b/scripts/tapes/intro.tape @@ -24,7 +24,7 @@ Set WaitTimeout 120 s # hidden: go to right folder Hide -Type "cd .." +Type "cd ../../" Enter Type "clear" Enter diff --git a/scripts/tapes/legacy.tape b/scripts/tapes/legacy.tape index fe7a1c8..4ba893e 100644 --- a/scripts/tapes/legacy.tape +++ b/scripts/tapes/legacy.tape @@ -24,7 +24,7 @@ Set WaitTimeout 120 s # hidden: go to right folder Hide -Type "cd .." +Type "cd ../../" Enter Type "clear" Enter diff --git a/scripts/tapes/scrape-batch.tape b/scripts/tapes/scrape-batch.tape index d042ac4..7cd34d5 100644 --- a/scripts/tapes/scrape-batch.tape +++ b/scripts/tapes/scrape-batch.tape @@ -20,18 +20,18 @@ Set PlaybackSpeed 1 Set TypingSpeed 30 ms Set CursorBlink false Set Theme "Catppuccin Frappe" -Set WaitTimeout 160 s +Set WaitTimeout 1000 s # hidden: go to right folder Hide -Type "cd .." +Type "cd ../../" Enter Type "clear" Enter Show # Batch process multiple priority services -Type "uv run scapo scrape batch --max-services 1 --category audio" +Type "uv run scapo scrape batch --category audio --batch-size 10 --limit 1" Sleep 1 s Enter Sleep 1 s diff --git a/scripts/tapes/scrape-discovery.tape b/scripts/tapes/scrape-discovery.tape index da0216f..d3231af 100644 --- a/scripts/tapes/scrape-discovery.tape +++ b/scripts/tapes/scrape-discovery.tape @@ -24,7 +24,7 @@ Set WaitTimeout 160 s # hidden: go to right folder Hide -Type "cd .." +Type "cd ../../" Enter Type "clear" Enter diff --git a/scripts/tapes/scrape-targeted.tape b/scripts/tapes/scrape-targeted.tape index cc2eec2..5d10c14 100644 --- a/scripts/tapes/scrape-targeted.tape +++ b/scripts/tapes/scrape-targeted.tape @@ -24,14 +24,14 @@ Set WaitTimeout 160 s # hidden: go to right folder Hide -Type "cd .." +Type "cd ../../" Enter Type "clear" Enter Show # Extract optimization tips for specific services -Type "uv run scapo scrape targeted --service 'Eleven Labs' --limit 2" +Type "uv run scapo scrape targeted --service 'Eleven Labs' --limit 20 --query-limit 20" Sleep 1s Enter Wait diff --git a/scripts/tapes/tui.tape b/scripts/tapes/tui.tape index 49bf571..cd695ea 100644 --- a/scripts/tapes/tui.tape +++ b/scripts/tapes/tui.tape @@ -24,7 +24,7 @@ Set WaitTimeout 120 s # hidden: go to right folder Hide -Type "cd .." +Type "cd ../../" Enter Type "clear" Enter @@ -39,7 +39,7 @@ Down 4 Sleep 1s Enter Sleep 1s -Down 4 +Down 7 Sleep 1s Enter Sleep 1s diff --git a/src/cli.py b/src/cli.py index 21c3820..7e93cf1 100644 --- a/src/cli.py +++ b/src/cli.py @@ -384,14 +384,13 @@ async def _discover(): @scrape.command(name="targeted") @click.option("--service", "-s", help="Target specific service") @click.option("--category", "-c", help="Target services by category (video, audio, etc)") -@click.option("--limit", "-l", default=20, help="Max posts per search") -@click.option("--batch-size", "-b", default=50, help="Posts per LLM batch") +@click.option("--limit", "-l", default=20, help="Max posts per search (default: 20)") +@click.option("--priority", type=click.Choice(['ultra', 'critical', 'high', 'all']), default='all', help="Service priority level") +@click.option("--query-limit", "-q", default=20, help="Number of query patterns per service (default: 20, max: 20)") +@click.option("--parallel", default=3, help="Number of parallel scraping tasks") @click.option("--dry-run", is_flag=True, help="Show queries without running") -@click.option("--all", "run_all", is_flag=True, help="Run all generated queries") -@click.option("--max-queries", "-m", default=10, help="Maximum queries to run (default: 10)") -@click.option("--parallel", "-p", default=3, help="Number of parallel scraping tasks") -@click.option("--use-all-patterns", is_flag=True, help="Use ALL 20 search patterns instead of just 5 (uses all 4 patterns from each category: cost, optimization, technical, workarounds, bugs)") -def targeted_scrape(service, category, limit, batch_size, dry_run, run_all, max_queries, parallel, use_all_patterns): +def targeted_scrape(service, category, limit, priority, query_limit, parallel, dry_run): + """Run targeted searches for specific AI services.""" show_banner() @@ -406,7 +405,8 @@ async def _targeted(): from datetime import datetime # Access outer scope variables - nonlocal service, category, limit, batch_size, dry_run, run_all, max_queries, parallel, use_all_patterns + nonlocal service, category, limit, priority, query_limit, parallel, dry_run + # Generate targeted searches generator = TargetedSearchGenerator() @@ -415,17 +415,23 @@ async def _targeted(): if service and not category: # Just generate queries for the requested service - don't generate for all services first console.print(f"[cyan]Generating queries for {service}...[/cyan]") + use_all_patterns = query_limit >= 20 # Use all patterns if query_limit is 20 or more if use_all_patterns: console.print(f"[yellow]Using ALL patterns (20 total search queries)[/yellow]") - queries = generator.generate_queries_for_service(service, max_queries=max_queries, use_all_patterns=use_all_patterns) + else: + console.print(f"[yellow]Using {query_limit} query patterns[/yellow]") + queries = generator.generate_queries_for_service(service, max_queries=query_limit, use_all_patterns=use_all_patterns) + if not queries: console.print(f"[red]Could not generate queries for service: {service}[/red]") return else: # Generate queries based on category or all services + use_all_patterns = query_limit >= 20 # Use all patterns if query_limit is 20 or more all_queries = generator.generate_queries( - max_queries=100 if run_all else max_queries, + max_queries=1000, # Get all services + category_filter=category if category else None, use_all_patterns=use_all_patterns ) @@ -435,9 +441,13 @@ async def _targeted(): console.print("[yellow]No matching queries found[/yellow]") return - # Limit queries if not running all - if not run_all and len(queries) > max_queries: - queries = queries[:max_queries] + # Apply priority filter if specified + if priority != 'all': + queries = [q for q in queries if q.get('priority', 'all') == priority or priority == 'all'] + + # Limit queries to query_limit + if len(queries) > query_limit: + queries = queries[:query_limit] console.print(f"[cyan]Generated {len(queries)} targeted searches[/cyan]") @@ -622,11 +632,13 @@ async def process_query(query, scraper, batch_processor, llm, semaphore): @scrape.command(name="batch") @click.option("--category", "-c", help="Target services by category (video, audio, etc)") -@click.option("--limit", "-l", default=15, help="Max posts per search (default: 15)") -@click.option("--max-services", "-m", default=3, help="Maximum number of services to process (default: 3)") -@click.option("--priority", "-p", type=click.Choice(['ultra', 'critical', 'high', 'all']), default='ultra', help="Service priority level") -def batch_scrape(category, limit, max_services, priority): - """Batch process multiple high-priority services.""" +@click.option("--limit", "-l", default=20, help="Max posts per search (default: 20)") +@click.option("--priority", type=click.Choice(['ultra', 'critical', 'high', 'all']), default='all', help="Service priority level") +@click.option("--query-limit", "-q", default=20, help="Number of query patterns per service (default: 20, max: 20)") +@click.option("--batch-size", "-b", default=3, help="Number of services to process in parallel per batch (default: 3)") +@click.option("--dry-run", is_flag=True, help="Show what would be processed without running") +def batch_scrape(category, limit, priority, query_limit, batch_size, dry_run): + """Batch process all services in a category (or all services) in parallel batches.""" show_banner() async def _batch(): @@ -645,10 +657,12 @@ async def _batch(): generator = TargetedSearchGenerator() alias_manager = ServiceAliasManager() - # Generate queries for multiple services + # Generate queries for all services in category + use_all_patterns = query_limit >= 20 # Use all patterns if query_limit is 20 or more all_queries = generator.generate_queries( - max_queries=max_services * 5, # 5 query types per service - category_filter=category + max_queries=1000, # High limit to get all services + category_filter=category, + use_all_patterns=use_all_patterns ) # Filter by priority if specified @@ -663,13 +677,20 @@ async def _batch(): queries_by_service[service] = [] queries_by_service[service].append(query) - # Limit to max_services - services_to_process = list(queries_by_service.keys())[:max_services] + # Get all services (no limiting) + services_to_process = list(queries_by_service.keys()) console.print(f"[cyan]Processing {len(services_to_process)} services:[/cyan]") + console.print(f"[yellow]Using {query_limit} query patterns per service[/yellow]") for service in services_to_process: + # Limit queries per service based on query_limit + queries_by_service[service] = queries_by_service[service][:query_limit] console.print(f" • {service} ({len(queries_by_service[service])} queries)") + if dry_run: + console.print("\n[yellow]DRY RUN - Would process these services[/yellow]") + return + if not Confirm.ask(f"\n[yellow]Process {len(services_to_process)} services?[/yellow]", default=True): console.print("[red]Cancelled[/red]") return @@ -681,7 +702,7 @@ async def _batch(): all_results = [] - # Process each service + # Process services in batches with Progress( SpinnerColumn(), TextColumn("[progress.description]{task.description}"), @@ -691,30 +712,43 @@ async def _batch(): console=console, ) as progress: total_queries = sum(len(queries_by_service[s]) for s in services_to_process) - task = progress.add_task(f"Processing {len(services_to_process)} services...", total=total_queries) + task = progress.add_task(f"Processing {len(services_to_process)} services in batches of {batch_size}...", total=total_queries) - for service in services_to_process: - service_queries = queries_by_service[service] - progress.update(task, description=f"Processing {service}...") + # Process services in batches + for batch_start in range(0, len(services_to_process), batch_size): + batch_end = min(batch_start + batch_size, len(services_to_process)) + batch_services = services_to_process[batch_start:batch_end] - for query in service_queries: - try: - posts = await scraper.scrape(query['query_url'], max_posts=limit) - - if posts: - # Batch process with LLM - batches = batch_processor.batch_posts_by_tokens(posts, service) + console.print(f"\n[cyan]Processing batch {batch_start//batch_size + 1}: {', '.join(batch_services)}[/cyan]") + + # Process all services in this batch + for service in batch_services: + service_queries = queries_by_service[service] + progress.update(task, description=f"Processing {service}...") + + for query in service_queries: + try: + posts = await scraper.scrape(query['query_url'], max_posts=limit) - for batch in batches: - result = await batch_processor.process_batch(batch, service, llm) - all_results.append(result) - - progress.update(task, advance=1) - await asyncio.sleep(1) # Rate limiting - - except Exception as e: - logger.error(f"Failed to process {service}: {e}") - progress.update(task, advance=1) + if posts: + # Batch process with LLM + batches = batch_processor.batch_posts_by_tokens(posts, service) + + for batch in batches: + result = await batch_processor.process_batch(batch, service, llm) + all_results.append(result) + + progress.update(task, advance=1) + await asyncio.sleep(1) # Rate limiting + + except Exception as e: + logger.error(f"Failed to process {service}: {e}") + progress.update(task, advance=1) + + # Add delay between batches if not the last batch + if batch_end < len(services_to_process): + console.print(f"[dim]Waiting 2 seconds before next batch...[/dim]") + await asyncio.sleep(2) # Save results timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') @@ -752,15 +786,13 @@ async def _batch(): @scrape.command(name="all") -@click.option('-l', '--limit', default=20, help='Max posts per search (default: 20)') -@click.option('-c', '--category', help='Filter by category (video, audio, code, etc)') -@click.option('-p', '--priority', - type=click.Choice(['ultra', 'critical', 'high', 'all']), - default='ultra', - help='Service priority level') +@click.option('--category', '-c', help='Filter by category (video, audio, code, etc)') +@click.option('--limit', '-l', default=20, help='Max posts per search (default: 20)') +@click.option('--priority', type=click.Choice(['ultra', 'critical', 'high', 'all']), default='all', help='Service priority level') +@click.option('--query-limit', '-q', default=20, help='Number of query patterns per service (default: 20, max: 20)') +@click.option('--delay', '-d', default=5, help='Delay in seconds between services (default: 5)') @click.option('--dry-run', is_flag=True, help='Show what would be processed without running') -@click.option('--delay', default=5, help='Delay in seconds between services (default: 5)') -def scrape_all(limit: int, category: str, priority: str, dry_run: bool, delay: int): +def scrape_all(category: str, limit: int, priority: str, query_limit: int, delay: int, dry_run: bool): """Process all priority services one by one.""" show_banner() @@ -830,7 +862,7 @@ def scrape_all(limit: int, category: str, priority: str, dry_run: bool, delay: i ['uv', 'run', 'scapo', 'scrape', 'targeted', '--service', service_name, '--limit', str(limit), - '--max-queries', '5'], + '--query-limit', str(query_limit)], capture_output=True, text=True )