Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ OPENROUTER_MODEL=your_model
#### Option B: Ollama (Local)
```bash
ollama serve
ollama pull model_alias
ollama pull model_alias # or you can just configure using the recent Ollama gui
# Edit .env:
LLM_PROVIDER=local
LOCAL_LLM_TYPE=ollama
Expand Down Expand Up @@ -79,7 +79,6 @@ scapo scrape all --dry-run # Preview what will be processed
- `targeted --service NAME` - Extract tips for one service
- `batch --category TYPE` - Process multiple services (limited)
- `all --priority LEVEL` - Process ALL services one by one
- `update-status` - See what needs updating

## 📚 Approach 2: Legacy Sources

Expand Down Expand Up @@ -189,9 +188,8 @@ NOT generic advice like (but sometimes we get them... sadly):
## 🚀 Next Steps

1. **Explore extracted tips**: `scapo tui`
2. **Update regularly**: `scapo scrape update-status`
3. **Track changes**: `python scripts/git_update.py --status`
4. **Contribute**: Share your findings via PR!
2. **Track changes**: `python scripts/git_update.py --status`
3. **Contribute**: Share your findings via PR!

## Need Help?

Expand Down
64 changes: 28 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
[![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

### 🎯 Real optimization tips from real users for AI services
### 🎯 Real usage tips from real users for AI services

If you find **SCAPO** useful, please consider giving it a star on GitHub!
Your support helps the project grow and reach more people.
Expand All @@ -29,54 +29,51 @@ Your support helps the project grow and reach more people.

**Keywords**: AI cost optimization, prompt engineering, LLM tips, OpenAI, Claude, Anthropic, Midjourney, Stable Diffusion, ElevenLabs, GitHub Copilot, reduce AI costs, AI service best practices, Reddit scraper, community knowledge base

Ever burned through credits in minutes? Searching Reddit for that one optimization tip? Getting generic advice when you need specific settings?
Ever burned through credits in minutes? Searching Reddit for one peculiar problem that you were having? Seach results telling you just generic advice when you need specific info?

![Scapo Intro](assets/intro.gif)

**SCAPO** extracts **specific, actionable optimization techniques** from Reddit about AI services - not generic "write better prompts" advice, but real discussions.
**SCAPO** extracts **specific usage tips and discussion** from Reddit about AI services - not generic "write better prompts" advice, but real discussions. So, can be sometimes wrong (i.e., crowd wisdom) but for sure will lift your eyebrows often "huh? ok, didn't know that..."

## ✨ Two Approaches

SCAPO offers two distinct workflows:

### 1. 🎯 **Service Discovery Mode** (NEW - Recommended)

Automatically discovers AI services and extracts specific optimization tips:

![Scapo Discover](assets/scrape-discovery.gif)

Discover services from GitHub Awesome lists
### 1. 🎯 **Batch Processing via Service Discovery (recommended)**

Discovers existing AI services and cache them for reference and downstream usage (see below):
```bash
scapo scrape discover --update
```

![Scapo Discover](assets/scrape-targeted.gif)

![Scapo Discover](assets/scrape-discovery.gif)


Extract optimization tips for specific services

```bash
scapo scrape targeted --service "Eleven Labs" --limit 20
```
![Scapo Discover](assets/scrape-targeted.gif)

![Scapo Discover](assets/scrape-batch.gif)

Batch process multiple priority services
Batch process multiple priority services (Recommended)

```bash
scapo scrape batch --max-services 3 --category audio
```

### 2. 📚 **Legacy Sources Mode**

![Scapo Batch](assets/legacy.gif)
![Scapo Discover](assets/scrape-batch.gif)


### 2. 📚 **Legacy Sources Mode**
Traditional approach using predefined sources from `sources.yaml`:
```bash
# Scrape from configured sources
scapo scrape run --sources reddit:LocalLLaMA --limit 10
```
![Scapo Batch](assets/legacy.gif)


## 🏃‍♂️ Quick Start (2 Minutes)

Expand All @@ -102,6 +99,8 @@ cp .env.example .env
```

Get your API key from [openrouter.ai](https://openrouter.ai/)
* you can also use local LLMs (Ollama, LMstudio). Check [QUICKSTART.md](./QUICKSTART.md)


### 3. Start Extracting Optimization Tips

Expand All @@ -122,7 +121,7 @@ scapo scrape batch --category video --limit 15
scapo scrape all --priority ultra --limit 20
```

#### Option B: Legacy Sources
#### Option B: Legacy method: using sources.yaml file

```bash
# Use predefined sources from sources.yaml
Expand Down Expand Up @@ -155,13 +154,6 @@ cat models/video/heygen/pitfalls.md
❌ **Generic**: "Try different settings"
✅ **Specific**: "Use 720p instead of 1080p in HeyGen to save 40% credits"

## 📊 Real Results

From actual extractions:
- **Eleven Labs**: Found 15+ specific optimization techniques from 75 Reddit posts
- **GitHub Copilot**: Discovered exact limits and configuration tips
- **Character.AI**: Found 32,000 character limit and mobile workarounds
- **HeyGen**: Credit optimization techniques and API alternatives

## 🛠️ How It Works

Expand All @@ -174,10 +166,10 @@ From actual extractions:
### Intelligent Extraction
- **Specific search patterns**: "config settings", "API key", "rate limit daily", "parameters"
- **Aggressive filtering**: Ignores generic advice like "be patient"
- **Batch processing**: Processes 50+ posts at once for efficiency
- **Context awareness**: Uses full 128k token windows when available
- **Batch processing**: Can process 50+ posts at once for efficiency (we recommend minimum of 15 posts per query)
- **Context awareness**: Uses full token windows of your chosen LLM when available (for local LLM, you need to set your context window in .env)

### Smart Organization
### Output Organization
```
models/
├── audio/
Expand All @@ -202,7 +194,7 @@ scapo scrape discover --show-all # List all services

# Target specific services
scapo scrape targeted \
--service "Eleven Labs" \ # Service name (handles variations)
--service "Eleven Labs" \ # Service name (handles variations, you can put whatever --> if we don't get hit in services.json, then it will be created under 'general' folder)
--limit 20 \ # Posts per search (15-20 recommended)
--max-queries 10 # Number of searches

Expand All @@ -212,9 +204,6 @@ scapo scrape batch \
--max-services 3 \ # Services to process
--limit 15 # Posts per search

# Check update status
scapo scrape update-status # See what needs updating
```

### Legacy Sources Mode
```bash
Expand All @@ -232,7 +221,7 @@ scapo scrape run \
# CLI commands
scapo models list # List all models
scapo models search "copilot" # Search models
scapo models info github-copilot --category coding
scapo models info github-copilot --category code
```

## ⚙️ Configuration
Expand All @@ -252,7 +241,7 @@ LOCAL_LLM_OPTIMAL_CHUNK=2048 # Optimal batch size (typically 1/4 of m
LOCAL_LLM_TIMEOUT_SECONDS=600 # 10 minutes for slower local models
LLM_TIMEOUT_SECONDS=120 # 2 minutes for cloud models

# Extraction Quality
# Extraction Quality (depends on your chosen LLM's discretion)
LLM_QUALITY_THRESHOLD=0.6 # Min quality (0.0-1.0)

# Scraping
Expand All @@ -264,7 +253,7 @@ MAX_POSTS_PER_SCRAPE=100 # Limit per source
```bash
--limit 5 # ❌ Often finds nothing (too few samples)
--limit 15 # ✅ Good baseline (finds common issues)
--limit 25 # 🎯 Optimal (uncovers hidden gems & edge cases)
--limit 25 # 🎯 Will find something (as long as there is active discussion on it)
```
so, hand-wavy breakdown: With 5 posts, extraction success ~20%. With 20+ posts, success jumps to ~80%.

Expand All @@ -277,13 +266,16 @@ scapo tui
Navigate extracted tips with:
- **↑/↓** - Browse models
- **Enter** - View content
- **Space** - Expand/collapse tree nodes
- **Tab** - Cycle focus between tree and content
- **h** - Show help
- **c** - Copy to clipboard
- **o** - Open file location
- **q** - Quit

## 🔄 Git-Friendly Updates tracking AI services in the Models folder

SCAPO is designed for version control:
SCAPO is designed for version control (this is only for tracking the models folder):
```bash
# Check what changed
uv run scripts/git_update.py --status
Expand Down
Binary file modified assets/intro.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/legacy.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/scrape-batch.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/scrape-discovery.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/scrape-targeted.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed assets/service-discovery-audio.gif
Binary file not shown.
Binary file removed assets/service-discovery-video.gif
Binary file not shown.
Binary file modified assets/tui.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions models/image/midjourney/cost_optimization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Midjourney - Cost Optimization Guide

*Last updated: 2025-08-15*

## Cost & Pricing Information

- 200 image limit
- I use Midjourney quite a bit for graphic design (mainly to generate assets for thumbnails to save trawling through hundreds of pages of stock images). But the tier I use is £30 a month.
- $10 version
- The company’s image AI service, accessible through Discord, stands out with a diverse range of packages priced between $10 and $120 per month.
- $4 additional rollover GPU time

6 changes: 3 additions & 3 deletions models/image/midjourney/metadata.json
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{
"service": "Midjourney",
"category": "image",
"last_updated": "2025-08-11T23:01:57.902430",
"extraction_timestamp": "2025-08-11T23:01:57.902430",
"last_updated": "2025-08-15T14:50:42.037636",
"extraction_timestamp": "2025-08-15T14:50:34.751518",
"data_sources": [
"Reddit API",
"Community discussions"
],
"posts_analyzed": 0,
"posts_analyzed": 113,
"confidence": "medium",
"version": "1.0.0"
}
14 changes: 14 additions & 0 deletions models/image/midjourney/parameters.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"service": "Midjourney",
"last_updated": "2025-08-15T14:50:41.953959",
"recommended_settings": {},
"cost_optimization": {
"tip_0": "200 image limit",
"tip_1": "I use Midjourney quite a bit for graphic design (mainly to generate assets for thumbnails to save trawling through hundreds of pages of stock images). But the tier I use is \u00a330 a month.",
"pricing": "$4 additional rollover GPU time"
},
"sources": [
"Reddit community",
"User reports"
]
}
24 changes: 24 additions & 0 deletions models/image/midjourney/pitfalls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Midjourney - Common Pitfalls & Issues

*Last updated: 2025-08-15*

## Technical Issues

### ⚠️ It's possible to queue 12 image generations/upscales in the Pro plan, but usually this is really annoying when I'm batch-upscaling images for later. Is there any way to bypass this 12 image queue limit? I don't want to have to go to Discord every few minutes to add more items to the queue (also it's really buggy and sometimes it's impossible to tell if an image has been added to the queue since the button doesn't get pressed)

### ⚠️ TLDR: MJ is the best for artistic generations compared to other models, but is artificially limiting its use-cases by not offering an API to artists who want to create dynamic, interactive, artworks. I suggest a personal API tier to allow artists to use MJ in this way.

---

I want to start by saying I understand there are many reasons why MJ would not want to offer an API. They are totally reasonable, especially from a business perspective.

I want to present a case as to why I feel the lack of

## Cost & Limits

### 💰 Currently on the $10 version of midjourney, was curious if I hit the 200 image mark, then ourchas the $4 additional rollover gpu time, if that will let me go over the limit?

Thanks

### 💰 200 image limit

9 changes: 6 additions & 3 deletions models/image/midjourney/prompting.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Midjourney Prompting Guide

*Last updated: 2025-08-11*
*Last updated: 2025-08-15*

## Usage Tips
## Tips & Techniques

- Try using the --raw parameter with Midjourney's Video Model
- Use the midjourney-python-api, an open-source Python client built for the unofficial MidJourney API, leveraging a Discord self bot and the Merubokkusu/Discord-S.C.U.M library. Key features include info retrieval, imagine prompt, image upscale and vectorization by lab.
- switch to fast mode
- upgrade your plan
- You can build a simple interface on Wix that uses open-source GitHub APIs to connect to Midjourney, sending image and text prompts and storing output images in a gallery after receiving their links. It required about 70 lines of code.

## Sources

Expand Down
25 changes: 25 additions & 0 deletions scripts/run_all_tapes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash
set -euo pipefail

# Run all .tape files under scripts/tapes with vhs
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
TAPES_DIR="$SCRIPT_DIR/tapes"

cd "$TAPES_DIR"

shopt -s nullglob
tapes=( *.tape )

if (( ${#tapes[@]} == 0 )); then
echo "No .tape files found in $TAPES_DIR"
exit 0
fi

for tape in "${tapes[@]}"; do
echo "==> Running VHS: $tape"
vhs "$tape"
done

echo "All tapes processed."


63 changes: 0 additions & 63 deletions scripts/service-discovery-audio.tape

This file was deleted.

Loading