From 398bd330dbc00c7976424840c509be64ae779f66 Mon Sep 17 00:00:00 2001
From: arahangua <arahangua@gmail.com>
Date: Fri, 15 Aug 2025 13:22:17 +0900
Subject: [PATCH] fixed/updated: model search function, removed outdated
 update-status method

---
 QUICKSTART.md |  8 +++----
 README.md     | 61 +++++++++++++++++++++------------------------------
 src/cli.py    | 42 +++--------------------------------
 3 files changed, 31 insertions(+), 80 deletions(-)

diff --git a/QUICKSTART.md b/QUICKSTART.md
index cf0d4d5..8a45966 100644
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -26,7 +26,7 @@ OPENROUTER_MODEL=your_model
 #### Option B: Ollama (Local)
 ```bash
 ollama serve
-ollama pull model_alias
+ollama pull model_alias # or you can just configure using the recent Ollama gui 
 # Edit .env:
 LLM_PROVIDER=local
 LOCAL_LLM_TYPE=ollama
@@ -79,7 +79,6 @@ scapo scrape all --dry-run                      # Preview what will be processed
 - `targeted --service NAME` - Extract tips for one service
 - `batch --category TYPE` - Process multiple services (limited)
 - `all --priority LEVEL` - Process ALL services one by one
-- `update-status` - See what needs updating
 
 ## 📚 Approach 2: Legacy Sources
 
@@ -189,9 +188,8 @@ NOT generic advice like (but sometimes we get them... sadly):
 ## 🚀 Next Steps
 
 1. **Explore extracted tips**: `scapo tui`
-2. **Update regularly**: `scapo scrape update-status`
-3. **Track changes**: `python scripts/git_update.py --status`
-4. **Contribute**: Share your findings via PR!
+2. **Track changes**: `python scripts/git_update.py --status`
+3. **Contribute**: Share your findings via PR!
 
 ## Need Help?
 
diff --git a/README.md b/README.md
index 9b44da8..b219124 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@
 [![PRs Welcome](https://img.shields.io/badge/PRs-Welcome-brightgreen.svg)](CONTRIBUTING.md)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 
-### 🎯 Real optimization tips from real users for AI services
+### 🎯 Real usage tips from real users for AI services
 
 If you find **SCAPO** useful, please consider giving it a star on GitHub!  
 Your support helps the project grow and reach more people.  
@@ -29,54 +29,51 @@ Your support helps the project grow and reach more people.
 
 **Keywords**: AI cost optimization, prompt engineering, LLM tips, OpenAI, Claude, Anthropic, Midjourney, Stable Diffusion, ElevenLabs, GitHub Copilot, reduce AI costs, AI service best practices, Reddit scraper, community knowledge base
 
-Ever burned through credits in minutes? Searching Reddit for that one optimization tip? Getting generic advice when you need specific settings?
+Ever burned through credits in minutes? Searching Reddit for one peculiar problem that you were having? Seach results telling you just generic advice when you need specific info?
 
 ![Scapo Intro](assets/intro.gif)
 
-**SCAPO** extracts **specific, actionable optimization techniques** from Reddit about AI services - not generic "write better prompts" advice, but real discussions.
+**SCAPO** extracts **specific usage tips and discussion** from Reddit about AI services - not generic "write better prompts" advice, but real discussions. So, can be sometimes wrong (i.e., crowd wisdom) but for sure will lift your eyebrows often "huh? ok, didn't know that..."
 
 ## ✨ Two Approaches
 
 SCAPO offers two distinct workflows:
 
-### 1. 🎯 **Service Discovery Mode** (NEW - Recommended)
-
-Automatically discovers AI services and extracts specific optimization tips:
-
-![Scapo Discover](assets/scrape-discovery.gif)
-
-Discover services from GitHub Awesome lists
+### 1. 🎯 **Batch Processing via Service Discovery (recommended)** 
 
+Discovers existing AI services and cache them for reference and downstream usage (see below):
 ```bash
 scapo scrape discover --update
 ```
 
-![Scapo Discover](assets/scrape-targeted.gif)
+
+![Scapo Discover](assets/scrape-discovery.gif)
+
 
 Extract optimization tips for specific services
 
 ```bash
 scapo scrape targeted --service "Eleven Labs" --limit 20
 ```
+![Scapo Discover](assets/scrape-targeted.gif)
 
-![Scapo Discover](assets/scrape-batch.gif)
 
-Batch process multiple priority services
+Batch process multiple priority services (Recommended)
 
 ```bash
 scapo scrape batch --max-services 3 --category audio
 ```
-
-### 2. 📚 **Legacy Sources Mode**
-
-![Scapo Batch](assets/legacy.gif)
+![Scapo Discover](assets/scrape-batch.gif)
 
 
+### 2. 📚 **Legacy Sources Mode**
 Traditional approach using predefined sources from `sources.yaml`:
 ```bash
 # Scrape from configured sources
 scapo scrape run --sources reddit:LocalLLaMA --limit 10
 ```
+![Scapo Batch](assets/legacy.gif)
+
 
 ## 🏃‍♂️ Quick Start (2 Minutes)
 
@@ -102,6 +99,8 @@ cp .env.example .env
 ```
 
 Get your API key from [openrouter.ai](https://openrouter.ai/)
+* you can also use local LLMs (Ollama, LMstudio). Check [QUICKSTART.md](./QUICKSTART.md)
+
 
 ### 3. Start Extracting Optimization Tips
 
@@ -122,7 +121,7 @@ scapo scrape batch --category video --limit 15
 scapo scrape all --priority ultra --limit 20
 ```
 
-#### Option B: Legacy Sources
+#### Option B: Legacy method: using sources.yaml file
 
 ```bash
 # Use predefined sources from sources.yaml
@@ -155,13 +154,6 @@ cat models/video/heygen/pitfalls.md
 ❌ **Generic**: "Try different settings"  
 ✅ **Specific**: "Use 720p instead of 1080p in HeyGen to save 40% credits"
 
-## 📊 Real Results
-
-From actual extractions:
-- **Eleven Labs**: Found 15+ specific optimization techniques from 75 Reddit posts
-- **GitHub Copilot**: Discovered exact limits and configuration tips
-- **Character.AI**: Found 32,000 character limit and mobile workarounds
-- **HeyGen**: Credit optimization techniques and API alternatives
 
 ## 🛠️ How It Works
 
@@ -174,10 +166,10 @@ From actual extractions:
 ### Intelligent Extraction
 - **Specific search patterns**: "config settings", "API key", "rate limit daily", "parameters"
 - **Aggressive filtering**: Ignores generic advice like "be patient"
-- **Batch processing**: Processes 50+ posts at once for efficiency
-- **Context awareness**: Uses full 128k token windows when available
+- **Batch processing**: Can process 50+ posts at once for efficiency (we recommend minimum of 15 posts per query)
+- **Context awareness**: Uses full token windows of your chosen LLM when available (for local LLM, you need to set your context window in .env)
 
-### Smart Organization
+### Output Organization
 ```
 models/
 ├── audio/
@@ -202,7 +194,7 @@ scapo scrape discover --show-all        # List all services
 
 # Target specific services
 scapo scrape targeted \
-  --service "Eleven Labs" \              # Service name (handles variations)
+  --service "Eleven Labs" \              # Service name (handles variations, you can put whatever --> if we don't get hit in services.json, then it will be created under 'general' folder)
   --limit 20 \                          # Posts per search (15-20 recommended)
   --max-queries 10                      # Number of searches
 
@@ -212,9 +204,6 @@ scapo scrape batch \
   --max-services 3 \                    # Services to process
   --limit 15                           # Posts per search
 
-# Check update status
-scapo scrape update-status              # See what needs updating
-```
 
 ### Legacy Sources Mode
 ```bash
@@ -232,7 +221,7 @@ scapo scrape run \
 # CLI commands
 scapo models list                       # List all models
 scapo models search "copilot"          # Search models
-scapo models info github-copilot --category coding
+scapo models info github-copilot --category code
 ```
 
 ## ⚙️ Configuration
@@ -252,7 +241,7 @@ LOCAL_LLM_OPTIMAL_CHUNK=2048            # Optimal batch size (typically 1/4 of m
 LOCAL_LLM_TIMEOUT_SECONDS=600           # 10 minutes for slower local models
 LLM_TIMEOUT_SECONDS=120                 # 2 minutes for cloud models
 
-# Extraction Quality
+# Extraction Quality (depends on your chosen LLM's discretion)
 LLM_QUALITY_THRESHOLD=0.6               # Min quality (0.0-1.0)
 
 # Scraping
@@ -264,7 +253,7 @@ MAX_POSTS_PER_SCRAPE=100               # Limit per source
 ```bash
 --limit 5   # ❌ Often finds nothing (too few samples)
 --limit 15  # ✅ Good baseline (finds common issues)  
---limit 25  # 🎯 Optimal (uncovers hidden gems & edge cases)
+--limit 25  # 🎯 Will find something (as long as there is active discussion on it)
 ```
 so, hand-wavy breakdown: With 5 posts, extraction success ~20%. With 20+ posts, success jumps to ~80%.
 
@@ -283,7 +272,7 @@ Navigate extracted tips with:
 
 ## 🔄 Git-Friendly Updates tracking AI services in the Models folder
 
-SCAPO is designed for version control:
+SCAPO is designed for version control (this is only for tracking the models folder):
 ```bash
 # Check what changed
 uv run scripts/git_update.py --status
diff --git a/src/cli.py b/src/cli.py
index be70379..97c6a4e 100644
--- a/src/cli.py
+++ b/src/cli.py
@@ -746,43 +746,6 @@ async def _batch():
     asyncio.run(_batch())
 
 
-@scrape.command(name="update-status")
-def update_status():
-    """Show which services need updating."""
-    show_banner()
-    
-    from src.services.update_manager import UpdateManager
-    manager = UpdateManager()
-    status = manager.get_update_status()
-    
-    # Display update status
-    console.print(Panel(
-        f"[bold]Update Status[/bold]\n\n"
-        f"Total services tracked: [cyan]{status['total_services']}[/cyan]\n"
-        f"Last update: [yellow]{status.get('last_update', 'Never')}[/yellow]\n"
-        f"Update frequency: {status.get('update_frequency', 'N/A')}\n",
-        border_style="blue",
-        title="SCAPO Update Tracker"
-    ))
-    
-    if status['recent_updates']:
-        console.print("\n[green]Recently Updated:[/green]")
-        for service in status['recent_updates'][:10]:
-            console.print(f"  ✓ {service}")
-    
-    if status['stale_services']:
-        console.print("\n[yellow]Needs Update (>30 days old):[/yellow]")
-        for service in status['stale_services'][:10]:
-            console.print(f"  ⚠ {service}")
-        
-        if len(status['stale_services']) > 10:
-            console.print(f"  ... and {len(status['stale_services']) - 10} more")
-    
-    # Suggest next action
-    if status['stale_services']:
-        console.print(f"\n[dim]Tip: Run 'scapo scrape batch --max-services {min(3, len(status['stale_services']))}' to update stale services[/dim]")
-
-
 @scrape.command(name="all")
 @click.option('-l', '--limit', default=20, help='Max posts per search (default: 20)')
 @click.option('-c', '--category', help='Filter by category (video, audio, code, etc)')
@@ -1167,8 +1130,9 @@ def search_models(query, limit):
         console.print("[yellow]No models directory found. Run 'sota scrape run' first.[/yellow]")
         return
     
-    # Search through all categories and models
-    for category in ["text", "image", "video", "audio", "multimodal"]:
+    # Search through all categories and models dynamically
+    categories = [d for d in os.listdir(models_dir) if os.path.isdir(os.path.join(models_dir, d))]
+    for category in categories:
         cat_dir = os.path.join(models_dir, category)
         if os.path.exists(cat_dir):
             for model in os.listdir(cat_dir):