czero-cc · arahangua · Aug 15, 2025 · Aug 15, 2025 · Aug 15, 2025 · Aug 15, 2025
diff --git a/QUICKSTART.md b/QUICKSTART.md
@@ -26,7 +26,7 @@ OPENROUTER_MODEL=your_model
 #### Option B: Ollama (Local)
 ```bash
 ollama serve
-ollama pull model_alias
+ollama pull model_alias # or you can just configure using the recent Ollama gui 
 # Edit .env:
 LLM_PROVIDER=local
 LOCAL_LLM_TYPE=ollama
@@ -79,7 +79,6 @@ scapo scrape all --dry-run                      # Preview what will be processed
 - `targeted --service NAME` - Extract tips for one service
 - `batch --category TYPE` - Process multiple services (limited)
 - `all --priority LEVEL` - Process ALL services one by one
-- `update-status` - See what needs updating
 
 ## 📚 Approach 2: Legacy Sources
 
@@ -189,9 +188,8 @@ NOT generic advice like (but sometimes we get them... sadly):
 ## 🚀 Next Steps
 
 1. **Explore extracted tips**: `scapo tui`
-2. **Update regularly**: `scapo scrape update-status`
-3. **Track changes**: `python scripts/git_update.py --status`
-4. **Contribute**: Share your findings via PR!
+2. **Track changes**: `python scripts/git_update.py --status`
+3. **Contribute**: Share your findings via PR!
 
 ## Need Help?
 

diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@
 [![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-brightgreen.svg)](CONTRIBUTING.md)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
-### 🎯 Real optimization tips from real users for AI services
+### 🎯 Real usage tips from real users for AI services
 
 If you find **SCAPO** useful, please consider giving it a star on GitHub!  
 Your support helps the project grow and reach more people.  
@@ -29,54 +29,51 @@ Your support helps the project grow and reach more people.
 
 **Keywords**: AI cost optimization, prompt engineering, LLM tips, OpenAI, Claude, Anthropic, Midjourney, Stable Diffusion, ElevenLabs, GitHub Copilot, reduce AI costs, AI service best practices, Reddit scraper, community knowledge base
 
-Ever burned through credits in minutes? Searching Reddit for that one optimization tip? Getting generic advice when you need specific settings?
+Ever burned through credits in minutes? Searching Reddit for one peculiar problem that you were having? Seach results telling you just generic advice when you need specific info?
 
 ![Scapo Intro](assets/intro.gif)
 
-**SCAPO** extracts **specific, actionable optimization techniques** from Reddit about AI services - not generic "write better prompts" advice, but real discussions.
+**SCAPO** extracts **specific usage tips and discussion** from Reddit about AI services - not generic "write better prompts" advice, but real discussions. So, can be sometimes wrong (i.e., crowd wisdom) but for sure will lift your eyebrows often "huh? ok, didn't know that..."
 
 ## ✨ Two Approaches
 
 SCAPO offers two distinct workflows:
 
-### 1. 🎯 **Service Discovery Mode** (NEW - Recommended)
-
-Automatically discovers AI services and extracts specific optimization tips:
-
-![Scapo Discover](assets/scrape-discovery.gif)
-
-Discover services from GitHub Awesome lists
+### 1. 🎯 **Batch Processing via Service Discovery (recommended)** 
 
+Discovers existing AI services and cache them for reference and downstream usage (see below):
 ```bash
 scapo scrape discover --update
 ```
 
-![Scapo Discover](assets/scrape-targeted.gif)
+
+![Scapo Discover](assets/scrape-discovery.gif)
+
 
 Extract optimization tips for specific services
 
 ```bash
 scapo scrape targeted --service "Eleven Labs" --limit 20
 ```
+![Scapo Discover](assets/scrape-targeted.gif)
 
-![Scapo Discover](assets/scrape-batch.gif)
 
-Batch process multiple priority services
+Batch process multiple priority services (Recommended)
 
 ```bash
 scapo scrape batch --max-services 3 --category audio
 ```
-
-### 2. 📚 **Legacy Sources Mode**
-
-![Scapo Batch](assets/legacy.gif)
+![Scapo Discover](assets/scrape-batch.gif)
 
 
+### 2. 📚 **Legacy Sources Mode**
 Traditional approach using predefined sources from `sources.yaml`:
 ```bash
 # Scrape from configured sources
 scapo scrape run --sources reddit:LocalLLaMA --limit 10
 ```
+![Scapo Batch](assets/legacy.gif)
+
 
 ## 🏃‍♂️ Quick Start (2 Minutes)
 
@@ -102,6 +99,8 @@ cp .env.example .env
 ```
 
 Get your API key from [openrouter.ai](https://openrouter.ai/)
+* you can also use local LLMs (Ollama, LMstudio). Check [QUICKSTART.md](./QUICKSTART.md)
+
 
 ### 3. Start Extracting Optimization Tips
 
@@ -122,7 +121,7 @@ scapo scrape batch --category video --limit 15
 scapo scrape all --priority ultra --limit 20
 ```
 
-#### Option B: Legacy Sources
+#### Option B: Legacy method: using sources.yaml file
 
 ```bash
 # Use predefined sources from sources.yaml
@@ -155,13 +154,6 @@ cat models/video/heygen/pitfalls.md
 ❌ **Generic**: "Try different settings"  
 ✅ **Specific**: "Use 720p instead of 1080p in HeyGen to save 40% credits"
 
-## 📊 Real Results
-
-From actual extractions:
-- **Eleven Labs**: Found 15+ specific optimization techniques from 75 Reddit posts
-- **GitHub Copilot**: Discovered exact limits and configuration tips
-- **Character.AI**: Found 32,000 character limit and mobile workarounds
-- **HeyGen**: Credit optimization techniques and API alternatives
 
 ## 🛠️ How It Works
 
@@ -174,10 +166,10 @@ From actual extractions:
 ### Intelligent Extraction
 - **Specific search patterns**: "config settings", "API key", "rate limit daily", "parameters"
 - **Aggressive filtering**: Ignores generic advice like "be patient"
-- **Batch processing**: Processes 50+ posts at once for efficiency
-- **Context awareness**: Uses full 128k token windows when available
+- **Batch processing**: Can process 50+ posts at once for efficiency (we recommend minimum of 15 posts per query)
+- **Context awareness**: Uses full token windows of your chosen LLM when available (for local LLM, you need to set your context window in .env)
 
-### Smart Organization
+### Output Organization
 ```
 models/
 ├── audio/
@@ -202,7 +194,7 @@ scapo scrape discover --show-all        # List all services
 
 # Target specific services
 scapo scrape targeted \
-  --service "Eleven Labs" \              # Service name (handles variations)
+  --service "Eleven Labs" \              # Service name (handles variations, you can put whatever --> if we don't get hit in services.json, then it will be created under 'general' folder)
   --limit 20 \                          # Posts per search (15-20 recommended)
   --max-queries 10                      # Number of searches
 
@@ -212,9 +204,6 @@ scapo scrape batch \
   --max-services 3 \                    # Services to process
   --limit 15                           # Posts per search
 
-# Check update status
-scapo scrape update-status              # See what needs updating
-```
 
 ### Legacy Sources Mode
 ```bash
@@ -232,7 +221,7 @@ scapo scrape run \
 # CLI commands
 scapo models list                       # List all models
 scapo models search "copilot"          # Search models
-scapo models info github-copilot --category coding
+scapo models info github-copilot --category code
 ```
 
 ## ⚙️ Configuration
@@ -252,7 +241,7 @@ LOCAL_LLM_OPTIMAL_CHUNK=2048            # Optimal batch size (typically 1/4 of m
 LOCAL_LLM_TIMEOUT_SECONDS=600           # 10 minutes for slower local models
 LLM_TIMEOUT_SECONDS=120                 # 2 minutes for cloud models
 
-# Extraction Quality
+# Extraction Quality (depends on your chosen LLM's discretion)
 LLM_QUALITY_THRESHOLD=0.6               # Min quality (0.0-1.0)
 
 # Scraping
@@ -264,7 +253,7 @@ MAX_POSTS_PER_SCRAPE=100               # Limit per source
 ```bash
 --limit 5   # ❌ Often finds nothing (too few samples)
 --limit 15  # ✅ Good baseline (finds common issues)  
---limit 25  # 🎯 Optimal (uncovers hidden gems & edge cases)
+--limit 25  # 🎯 Will find something (as long as there is active discussion on it)
 ```
 so, hand-wavy breakdown: With 5 posts, extraction success ~20%. With 20+ posts, success jumps to ~80%.
 
@@ -277,13 +266,16 @@ scapo tui
 Navigate extracted tips with:
 - **↑/↓** - Browse models
 - **Enter** - View content
+- **Space** - Expand/collapse tree nodes
+- **Tab** - Cycle focus between tree and content
+- **h** - Show help
 - **c** - Copy to clipboard
 - **o** - Open file location
 - **q** - Quit
 
 ## 🔄 Git-Friendly Updates tracking AI services in the Models folder
 
-SCAPO is designed for version control:
+SCAPO is designed for version control (this is only for tracking the models folder):
 ```bash
 # Check what changed
 uv run scripts/git_update.py --status

diff --git a/assets/intro.gif b/assets/intro.gif
diff --git a/assets/legacy.gif b/assets/legacy.gif
diff --git a/assets/scrape-batch.gif b/assets/scrape-batch.gif
diff --git a/assets/scrape-discovery.gif b/assets/scrape-discovery.gif
diff --git a/assets/scrape-targeted.gif b/assets/scrape-targeted.gif
diff --git a/assets/service-discovery-audio.gif b/assets/service-discovery-audio.gif
diff --git a/assets/service-discovery-video.gif b/assets/service-discovery-video.gif
diff --git a/assets/tui.gif b/assets/tui.gif
diff --git a/models/image/midjourney/cost_optimization.md b/models/image/midjourney/cost_optimization.md
@@ -0,0 +1,12 @@
+# Midjourney - Cost Optimization Guide
+
+*Last updated: 2025-08-15*
+
+## Cost & Pricing Information
+
+- 200 image limit
+- I use Midjourney quite a bit for graphic design (mainly to generate assets for thumbnails to save trawling through hundreds of pages of stock images). But the tier I use is £30 a month.
+- $10 version
+- The company’s image AI service, accessible through Discord, stands out with a diverse range of packages priced between $10 and $120 per month.
+- $4 additional rollover GPU time
+
diff --git a/models/image/midjourney/metadata.json b/models/image/midjourney/metadata.json
@@ -1,13 +1,13 @@
 {
   "service": "Midjourney",
   "category": "image",
-  "last_updated": "2025-08-11T23:01:57.902430",
-  "extraction_timestamp": "2025-08-11T23:01:57.902430",
+  "last_updated": "2025-08-15T14:50:42.037636",
+  "extraction_timestamp": "2025-08-15T14:50:34.751518",
   "data_sources": [
     "Reddit API",
     "Community discussions"
   ],
-  "posts_analyzed": 0,
+  "posts_analyzed": 113,
   "confidence": "medium",
   "version": "1.0.0"
 }
diff --git a/models/image/midjourney/parameters.json b/models/image/midjourney/parameters.json
@@ -0,0 +1,14 @@
+{
+  "service": "Midjourney",
+  "last_updated": "2025-08-15T14:50:41.953959",
+  "recommended_settings": {},
+  "cost_optimization": {
+    "tip_0": "200 image limit",
+    "tip_1": "I use Midjourney quite a bit for graphic design (mainly to generate assets for thumbnails to save trawling through hundreds of pages of stock images). But the tier I use is \u00a330 a month.",
+    "pricing": "$4 additional rollover GPU time"
+  },
+  "sources": [
+    "Reddit community",
+    "User reports"
+  ]
+}
diff --git a/models/image/midjourney/pitfalls.md b/models/image/midjourney/pitfalls.md
@@ -0,0 +1,24 @@
+# Midjourney - Common Pitfalls & Issues
+
+*Last updated: 2025-08-15*
+
+## Technical Issues
+
+### ⚠️ It's possible to queue 12 image generations/upscales in the Pro plan, but usually this is really annoying when I'm batch-upscaling images for later. Is there any way to bypass this 12 image queue limit? I don't want to have to go to Discord every few minutes to add more items to the queue (also it's really buggy and sometimes it's impossible to tell if an image has been added to the queue since the button doesn't get pressed)
+
+### ⚠️ TLDR: MJ is the best for artistic generations compared to other models, but is artificially limiting its use-cases by not offering an API to artists who want to create dynamic, interactive, artworks. I suggest a personal API tier to allow artists to use MJ in this way.
+
+---
+
+I want to start by saying I understand there are many reasons why MJ would not want to offer an API. They are totally reasonable, especially from a business perspective.
+
+I want to present a case as to why I feel the lack of
+
+## Cost & Limits
+
+### 💰 Currently on the $10 version of midjourney, was curious if I hit the 200 image mark, then ourchas the $4 additional rollover gpu time, if that will let me go over the limit? 
+
+Thanks
+
+### 💰 200 image limit
+
diff --git a/models/image/midjourney/prompting.md b/models/image/midjourney/prompting.md
@@ -1,10 +1,13 @@
 # Midjourney Prompting Guide
 
-*Last updated: 2025-08-11*
+*Last updated: 2025-08-15*
 
-## Usage Tips
+## Tips & Techniques
 
-- Try using the --raw parameter with Midjourney's Video Model
+- Use the midjourney-python-api, an open-source Python client built for the unofficial MidJourney API, leveraging a Discord self bot and the Merubokkusu/Discord-S.C.U.M library. Key features include info retrieval, imagine prompt, image upscale and vectorization by lab.
+- switch to fast mode
+- upgrade your plan
+- You can build a simple interface on Wix that uses open-source GitHub APIs to connect to Midjourney, sending image and text prompts and storing output images in a gallery after receiving their links. It required about 70 lines of code.
 
 ## Sources
 

diff --git a/scripts/run_all_tapes.sh b/scripts/run_all_tapes.sh
@@ -0,0 +1,25 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Run all .tape files under scripts/tapes with vhs
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+TAPES_DIR="$SCRIPT_DIR/tapes"
+
+cd "$TAPES_DIR"
+
+shopt -s nullglob
+tapes=( *.tape )
+
+if (( ${#tapes[@]} == 0 )); then
+  echo "No .tape files found in $TAPES_DIR"
+  exit 0
+fi
+
+for tape in "${tapes[@]}"; do
+  echo "==> Running VHS: $tape"
+  vhs "$tape"
+done
+
+echo "All tapes processed."
+
+
diff --git a/scripts/service-discovery-audio.tape b/scripts/service-discovery-audio.tape