Skip to content
49 changes: 39 additions & 10 deletions QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ uv run playwright install

### 2. Configure LLM (Choose One)

**Important:** Extraction quality varies by LLM - stronger models find more specific tips!

#### Option A: OpenRouter (Recommended - Free Model!)
```bash
cp .env.example .env
Expand Down Expand Up @@ -63,23 +65,29 @@ Extract specific optimization tips for AI services:
scapo scrape discover --update

# Step 2: Extract tips for specific services
scapo scrape targeted --service "Eleven Labs" --limit 20
scapo scrape targeted --service "GitHub Copilot" --limit 20
scapo scrape targeted --service "Eleven Labs" --limit 20 --query-limit 20
scapo scrape targeted --service "GitHub Copilot" --limit 20 --query-limit 20

# Or batch process by category
scapo scrape batch --category video --limit 15
scapo scrape batch --category video --limit 20 --batch-size 3

# Process ALL priority services one by one
scapo scrape all --priority ultra --limit 20 # Process all ultra priority services
scapo scrape all --dry-run # Preview what will be processed
scapo scrape all --limit 20 --query-limit 20 --priority ultra # Process all ultra priority services
scapo scrape all --dry-run # Preview what will be processed
```

### Key Commands:
- `discover --update` - Find services from GitHub Awesome lists
- `targeted --service NAME` - Extract tips for one service
- `batch --category TYPE` - Process multiple services (limited)
- `batch --category TYPE` - Process ALL services in category (in batches)
- `all --priority LEVEL` - Process ALL services one by one

### Important Parameters:
- **--query-limit**: Number of search patterns (5 = quick, 20 = comprehensive)
- **--batch-size**: Services to process in parallel (3 = default balance)
- **--limit**: Posts per search (20+ recommended for best results)


## πŸ“š Approach 2: Legacy Sources

Use predefined sources from `sources.yaml`:
Expand Down Expand Up @@ -109,6 +117,27 @@ scapo models search "copilot" # Search for specific models
cat models/audio/eleven-labs/cost_optimization.md
```

### 5. (Optional) Use with Claude Desktop

Add SCAPO as an MCP server to query your extracted tips (from models/ folder) directly in Claude:

```json
// Add to claude_desktop_config.json
{
"mcpServers": {
"scapo": {
"command": "npx",
"args": ["@scapo/mcp-server"],
"env": {
"SCAPO_MODELS_PATH": "path/to/scapo/models"
}
}
}
}
```

Then ask Claude: "Get best practices for Midjourney" - no Python needed!

## πŸ“Š Understanding the Output

SCAPO creates organized documentation:
Expand All @@ -126,13 +155,13 @@ models/

```bash
# ❌ Too few posts = no useful tips found
scapo scrape targeted --service "HeyGen" --limit 5 # ~20% success rate
scapo scrape targeted --service "HeyGen" --limit 5 --query-limit 5 # ~20% success rate

# βœ… Sweet spot = reliable extraction
scapo scrape targeted --service "HeyGen" --limit 20 # ~80% success rate
scapo scrape targeted --service "HeyGen" --limit 20 --query-limit 20 # ~80% success rate

# 🎯 Maximum insights = comprehensive coverage
scapo scrape targeted --service "HeyGen" --limit 30 # Finds rare edge cases
scapo scrape targeted --service "HeyGen" --limit 30 --query-limit 20 # Finds rare edge cases
```
**Why it matters:** LLMs need multiple examples to identify patterns. More posts = higher chance of finding specific pricing, bugs, and workarounds.

Expand All @@ -148,7 +177,7 @@ LLM_QUALITY_THRESHOLD=0.4 # More tips (less strict)
### "No tips extracted"
```bash
# Solution: Use more posts
scapo scrape targeted --service "Service Name" --limit 25
scapo scrape targeted --service "Service Name" --limit 25 --query-limit 20
```

### "Service not found"
Expand Down
63 changes: 52 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,15 @@ scapo scrape discover --update
Extract optimization tips for specific services

```bash
scapo scrape targeted --service "Eleven Labs" --limit 20
scapo scrape targeted --service "Eleven Labs" --limit 20 --query-limit 20
```
![Scapo Discover](assets/scrape-targeted.gif)


Batch process multiple priority services (Recommended)

```bash
scapo scrape batch --max-services 3 --category audio
scapo scrape batch --category audio --batch-size 3 --limit 20
```
![Scapo Discover](assets/scrape-batch.gif)

Expand Down Expand Up @@ -89,6 +89,8 @@ uv run playwright install # Browser automation

### 2. Configure Your LLM Provider

**Note:** Extraction quality depends on your chosen LLM - experiment with different models for best results!

#### Recommended: OpenRouter (Cloud)
```bash
cp .env.example .env
Expand All @@ -111,14 +113,14 @@ Get your API key from [openrouter.ai](https://openrouter.ai/)
scapo scrape discover --update

# Step 2: Extract optimization tips for services
scapo scrape targeted --service "HeyGen" --limit 20
scapo scrape targeted --service "Midjourney" --limit 20
scapo scrape targeted --service "HeyGen" --limit 20 --query-limit 20
scapo scrape targeted --service "Midjourney" --limit 20 --query-limit 20

# Or batch process multiple services
scapo scrape batch --category video --limit 15
scapo scrape batch --category video --limit 20 --batch-size 3

# Process ALL priority services one by one (i.e. all services with 'ultra' tag, see targted_search_generator.py)
scapo scrape all --priority ultra --limit 20
scapo scrape all --limit 20 --query-limit 20 --priority ultra
```

#### Option B: Legacy method: using sources.yaml file
Expand Down Expand Up @@ -196,13 +198,13 @@ scapo scrape discover --show-all # List all services
scapo scrape targeted \
--service "Eleven Labs" \ # Service name (handles variations, you can put whatever --> if we don't get hit in services.json, then it will be created under 'general' folder)
--limit 20 \ # Posts per search (15-20 recommended)
--max-queries 10 # Number of searches
--query-limit 20 # Query patterns per service (20 = all)

# Batch process
scapo scrape batch \
--category audio \ # Filter by category
--max-services 3 \ # Services to process
--limit 15 # Posts per search
--batch-size 3 \ # Services per batch
--limit 20 # Posts per search


### Legacy Sources Mode
Expand Down Expand Up @@ -249,13 +251,52 @@ SCRAPING_DELAY_SECONDS=2 # Be respectful
MAX_POSTS_PER_SCRAPE=100 # Limit per source
```

### Why --limit Matters (More Posts = Better Tips)
### Key Parameters Explained

**--query-limit** (How many search patterns per service)
```bash
--query-limit 5 # Quick scan: 1 pattern per category (cost, optimization, technical, workarounds, bugs)
--query-limit 20 # Full scan: All 4 patterns per category (default, most comprehensive)
```

**--batch-size** (For `batch` command: services processed in parallel)
```bash
--batch-size 1 # Sequential (slowest, least resource intensive)
--batch-size 3 # Default (good balance)
--batch-size 5 # Faster (more resource intensive)
```

**--limit** (Posts per search - More = Better extraction)
```bash
--limit 5 # ❌ Often finds nothing (too few samples)
--limit 15 # βœ… Good baseline (finds common issues)
--limit 25 # 🎯 Will find something (as long as there is active discussion on it)
```
so, hand-wavy breakdown: With 5 posts, extraction success ~20%. With 20+ posts, success jumps to ~80%.
Hand-wavy breakdown: With 5 posts, extraction success ~20%. With 20+ posts, success jumps to ~80%.

## πŸ€– MCP Server for Claude Desktop

Query your extracted tips directly in Claude (reads from models/ folder - run scrapers first!):

```json
// Add to %APPDATA%\Claude\claude_desktop_config.json (Windows)
// or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
{
"mcpServers": {
"scapo": {
"command": "npx",
"args": ["@scapo/mcp-server"],
"env": {
"SCAPO_MODELS_PATH": "C:\\path\\to\\scapo\\models" // Your models folder
}
}
}
}
```

Then ask Claude: "Get me best practices for GitHub Copilot" or "What models are good for coding?"

See [mcp/README.md](mcp/README.md) for full setup and available commands.

## 🎨 Interactive TUI

Expand Down
Binary file modified assets/intro.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/legacy.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/scrape-batch.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/scrape-discovery.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/scrape-targeted.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/tui.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 18 additions & 3 deletions models/audio/eleven-labs/cost_optimization.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,25 @@
# Eleven Labs - Cost Optimization Guide

*Last updated: 2025-08-14*
*Last updated: 2025-08-16*

## Cost & Pricing Information

- 60% of credits left (~400,000 credits)
- Subscription renewal failed due to paywall issues
- Free trial limited to 10,000 characters per month
- 60% of credits left (about 400,000 credits)
- $15k saved in ElevenLabs fees
- Free access limited to 15 minutes of voice recording per day
- Last year I was paying +$1000/month for AI voiceovers for only one channel.
- $29/month for unlimited usage on ElevenReader.
- $99/month plan
- $29/month for unlimited
- Credits should last until June 5th
- 10,000 free credits per month on the free plan.

## Money-Saving Tips

- I built my own tool, just for me. No subscriptions, no limits, just fast, clean voice generation. Cost me ~ $4/month to run.
- MiniMax have daily credit refresh in TTS not like ElevenLabs where you need to wait 1 month to refresh.
- Use the free plan to get 10,000 credits per month for free.
- So, when I do, I use a temporary email to create a new account so the 10,000 chatacter limit 'resets.'
- When converting text to voice, adding periods between letters (e.g., B.O.D.) can force the model to pronounce acronyms letter by letter, though it may consume more credits.

4 changes: 2 additions & 2 deletions models/audio/eleven-labs/metadata.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{
"service": "Eleven Labs",
"category": "audio",
"last_updated": "2025-08-14T18:53:47.086694",
"last_updated": "2025-08-16T13:46:28.510586",
"extraction_timestamp": null,
"data_sources": [
"Reddit API",
"Community discussions"
],
"posts_analyzed": 79,
"posts_analyzed": 338,
"confidence": "medium",
"version": "1.0.0"
}
11 changes: 7 additions & 4 deletions models/audio/eleven-labs/parameters.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"service": "Eleven Labs",
"last_updated": "2025-08-14T18:53:46.993256",
"last_updated": "2025-08-16T13:46:28.342822",
"recommended_settings": {
"setting_0": {
"description": "voice_name=Mun W"
Expand All @@ -16,9 +16,12 @@
}
},
"cost_optimization": {
"tip_0": "60% of credits left (~400,000 credits)",
"tip_1": "Subscription renewal failed due to paywall issues",
"pricing": "$99/month plan"
"tip_0": "Free trial limited to 10,000 characters per month",
"tip_1": "60% of credits left (about 400,000 credits)",
"pricing": "$29/month for unlimited",
"tip_3": "Free access limited to 15 minutes of voice recording per day",
"tip_4": "Credits should last until June 5th",
"tip_5": "10,000 free credits per month on the free plan."
},
"sources": [
"Reddit community",
Expand Down
27 changes: 23 additions & 4 deletions models/audio/eleven-labs/pitfalls.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,35 @@
# Eleven Labs - Common Pitfalls & Issues

*Last updated: 2025-08-14*
*Last updated: 2025-08-16*

## Technical Issues

### ⚠️ Unable to switch back to a Custom LLM after testing with a built-in model (gemini-2.0-flash); interface shows 'Fix the errors to proceed' even though Server URL, Model ID, and API Key are correctly filled.
### ⚠️ Cannot switch back to a Custom LLM after testing with a built-in model (gemini-2.0-flash) on the ElevenLabs Conversational AI dashboard; even after correctly filling out Server URL, Model ID, and API Key, the interface still shows the message: 'Fix the errors to proceed' even though there is no error.
**Fix**: Store API keys in environment variables or use a secrets manager.

### ⚠️ audio plays back a female voice regardless of which option is selected when using elevenLabs API
### ⚠️ ElevenLabs API always returns a female voice regardless of the selected gender option

### ⚠️ Tasker Action Error: 'HTTP Request' (step 11) Task: 'Text To Speech To File Elevenlabs {"detail":{"status":"invalid_uid","message". "An invalid ID has been received: %voice_id'. Make sure to provide a correct one."}

## Policy & Account Issues

### ⚠️ Account credits wiped (about 400,000 credits) after attempting to renew a $99/month subscription; paywall prevented payment and support ticket received no response.
### ⚠️ Eleven Labs wiped 400,000 credits from a user's account on the $99/month plan; the user had 60% of credits left (about 400,000 credits) and was unable to renew subscription due to paywall issues.
**Note**: Be aware of terms of service regarding account creation.

### ⚠️ Free trial for ElevenLabs is limited to 10,000 characters a month, which is insufficient for scripts that are often ~20-40,000 characters long.
**Note**: Be aware of terms of service regarding account creation.

## Cost & Limits

### πŸ’° ElevenReader credit system is considered bad by some users, making it off-putting for average consumers.

### πŸ’° Free access to ElevenLabs is limited to 15 minutes of voice recording per day.

### πŸ’° Free trial limited to 10,000 characters per month

### πŸ’° Free access limited to 15 minutes of voice recording per day

### πŸ’° $29/month for unlimited usage on ElevenReader.

### πŸ’° $29/month for unlimited

12 changes: 11 additions & 1 deletion models/audio/eleven-labs/prompting.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@
# Eleven Labs Prompting Guide

*Last updated: 2025-08-14*
*Last updated: 2025-08-16*

## Tips & Techniques

- I built my own tool, just for me. No subscriptions, no limits, just fast, clean voice generation. Cost me ~ $4/month to run.
- Use ElevenLabsService(voice_name="Mun W") in Manim Voiceover
- MiniMax have daily credit refresh in TTS not like ElevenLabs where you need to wait 1 month to refresh.
- The ElevenLabs voice agent is the entry point into the whole system, and then it will pass off web development or web design requests over to n8n agents via a webhook in order to actually do the work.
- Use the free plan to get 10,000 credits per month for free.
- So, when I do, I use a temporary email to create a new account so the 10,000 chatacter limit 'resets.'
- self.set_speech_service(ElevenLabsService(voice_name="Mun W"))
- MacWhisper 11.10 supports ElevenLabs Scribe for cloud transcription.
- from manim_voiceover.services.elevenlabs import ElevenLabsService
- I built my own tool to avoid ElevenLabs fees.
- When converting text to voice, adding periods between letters (e.g., B.O.D.) can force the model to pronounce acronyms letter by letter, though it may consume more credits.
- ElevenLabs Scribe v1 achieves 15.0% WER on 5-10 minute patient-doctor chats, averaging 36 seconds per file.

## Recommended Settings

Expand Down
13 changes: 13 additions & 0 deletions models/audio/firefliesai/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"service": "Fireflies.ai",
"category": "audio",
"last_updated": "2025-08-16T13:46:29.623761",
"extraction_timestamp": "2025-08-16T13:29:54.297790",
"data_sources": [
"Reddit API",
"Community discussions"
],
"posts_analyzed": 171,
"confidence": "medium",
"version": "1.0.0"
}
8 changes: 8 additions & 0 deletions models/audio/firefliesai/pitfalls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Fireflies.ai - Common Pitfalls & Issues

*Last updated: 2025-08-16*

## Technical Issues

### ⚠️ Failed to create a send channel message in Slack. Error from Slack: invalid_thread_ts

14 changes: 14 additions & 0 deletions models/audio/firefliesai/prompting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Fireflies.ai Prompting Guide

*Last updated: 2025-08-16*

## Tips & Techniques

- Configure Zapier to send transcripts to a channel without duplicate notifications by adjusting thread settings
- Use custom prompts called 'apps' in Fireflies.ai to create reusable ready‑made prompts.
- Use Zapier to send Fireflies.ai transcripts to Slack

## Sources

- Reddit community discussions
- User-reported experiences
Loading