Skip to content

feat(store): store popularity backend (GitHub stars)#936

Merged
jaylfc merged 3 commits into
devfrom
feat/store-popularity
Jun 15, 2026
Merged

feat(store): store popularity backend (GitHub stars)#936
jaylfc merged 3 commits into
devfrom
feat/store-popularity

Conversation

@jaylfc

@jaylfc jaylfc commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Adds a store popularity backend sourced from GitHub stars today and structured to accept real install telemetry later (#15).

What it does:

  • New module tinyagentos/store_popularity.py parses owner/repo from a catalog entry's homepage when it points at github.com, fetches the star count unauthenticated from api.github.com (allowed by the network policy), and caches it in-memory with a 6 hour TTL so the Store list endpoint never hammers GitHub.
  • The Store catalog endpoint (GET /api/store/catalog) now carries repo, stars, and a telemetry-ready popularity object on every entry. Stars are fetched concurrently over one shared httpx client so the list stays fast.
  • New GET /api/store/popularity returns the popularity shape keyed by app id.

Telemetry-ready shape (forward-compatible, no breaking change when #15 lands):

{ "github_stars": int|null, "installs": int|null, "score": float }

installs is null today; score is derived from whatever signals exist (just stars now, stars plus weighted installs later).

Graceful degradation: any GitHub failure (rate-limit 403/429, 404, network error, bad JSON) degrades that entry to github_stars=null and never raises. Negative results are cached so a known-bad repo is not retried on every request. Entries whose homepage is not a github.com/owner/repo URL get a popularity shape with github_stars=null; no stars are fabricated. 137 of 266 catalog manifests already carry a github.com homepage, so they get real star counts; the rest need a repo homepage added to their manifest to surface stars.

Frontend: the Store frontend already reads repo and stars off the catalog response (and tolerates their absence), so no frontend change was needed.

Tests: tests/test_store_popularity.py (unit, mocked httpx) and tests/routes/test_store_popularity_route.py (endpoint) assert a github homepage gets stars plus a score, a 404 and a rate-limit both yield null without raising, caching avoids a second call within the TTL, and a non-github homepage gets null without a call. All green; existing store route tests unaffected; create_app() ok.

Summary by CodeRabbit

  • New Features
    • Store catalog items now include GitHub star counts and popularity metrics
    • Added new /api/store/popularity endpoint to query popularity information by app ID with optional type filtering
    • Implemented background cache system that periodically refreshes GitHub star data for improved catalog performance

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5866790c-3308-4653-aad8-b72ab74c3314

📥 Commits

Reviewing files that changed from the base of the PR and between 9aaa767 and c003969.

📒 Files selected for processing (5)
  • tests/routes/test_store_popularity_route.py
  • tests/test_store_popularity.py
  • tinyagentos/app.py
  • tinyagentos/routes/store.py
  • tinyagentos/store_popularity.py

📝 Walkthrough

Walkthrough

A new tinyagentos/store_popularity.py module is added, implementing GitHub star fetching with TTL-aware caching, a bounded async cache warmer, and atomic JSON persistence. The store catalog route is extended to surface repo, stars, and popularity fields per app, and a new /api/store/popularity endpoint is added. A background warmer task is wired into app lifespan startup. Unit and route integration tests cover all paths.

Changes

Store Popularity Feature

Layer / File(s) Summary
Core store_popularity module
tinyagentos/store_popularity.py
Adds the full popularity module: TTL constants, in-memory star cache, parse_repo, popularity_shape/_compute_score, read-only cache accessors, async fetch_stars (never-raise, error-class TTL rules), _is_rate_limited, _has_fresh_entry, warm_popularity_cache with semaphore-bounded concurrency, configure_persistence, atomic _load_cache/_persist_cache, and _reset_cache_for_tests.
Route helpers and new endpoints
tinyagentos/routes/store.py
Adds _popularity_by_app_id helper building a cached popularity map; extends list_catalog response items with repo, stars, and popularity; adds GET /api/store/popularity endpoint.
App lifespan background warmer
tinyagentos/app.py
Inserts a background task at startup that configures persistence, derives repo keys from agent homepage fields, and calls warm_popularity_cache every 10 minutes.
Unit tests for store_popularity
tests/test_store_popularity.py
Covers parse_repo, fetch_stars TTL and rate-limit rules, cache read helpers, warm_popularity_cache concurrency and backoff, popularity_shape scoring, and persistence round-trips.
Route integration tests
tests/routes/test_store_popularity_route.py
Tests catalog endpoint with warm/cold cache and non-GitHub entries, live-fetch prevention via monkeypatching, and the /api/store/popularity endpoint response shape.

Sequence Diagram(s)

sequenceDiagram
    participant AppLifespan as App Lifespan
    participant Warmer as warm_popularity_cache
    participant Cache as _star_cache
    participant GitHub as GitHub API

    AppLifespan->>Warmer: warm repos from agent homepages (every 10 min)
    Warmer->>Cache: check _has_fresh_entry(repo)
    alt stale or missing
        Warmer->>GitHub: GET /repos/{owner}/{repo} via fetch_stars
        GitHub-->>Warmer: stargazers_count / error
        Warmer->>Cache: write stars + TTL expiry
        Warmer->>Cache: _persist_cache() atomic write
    end

    participant Client as HTTP Client
    participant CatalogRoute as list_catalog / list_popularity
    Client->>CatalogRoute: GET /api/store/catalog
    CatalogRoute->>Cache: _popularity_by_app_id(apps) — cache-only reads
    Cache-->>CatalogRoute: {app_id: {repo, github_stars, score}}
    CatalogRoute-->>Client: items with repo, stars, popularity fields
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • jaylfc/taOS#871: Extends the frontend CatalogApp type to accept and display the repo/stars popularity fields that this PR adds to the store catalog API response schema.

Poem

🐇 A bunny once counted the stars in the night,
Cached every twinkle, each GitHub delight.
With TTLs guarding each warm, precious score,
The catalog blooms with new data galore.
No live fetch on reads — just the cache's soft glow,
Stars persisted to disk so the scores always show! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/store-popularity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread tinyagentos/store_popularity.py Outdated
Comment thread tinyagentos/routes/store.py Outdated
Comment thread tinyagentos/store_popularity.py Outdated
Comment thread tinyagentos/routes/store.py Outdated
Comment on lines +212 to +213
_star_cache[repo] = (time.time() + ttl, stars)
_persist_cache()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Performance: Full cache file rewritten + blocking I/O on every fetch_stars

fetch_stars calls _persist_cache() after every single repo lookup (store_popularity.py:213). _persist_cache serializes the entire _star_cache dict and does a synchronous write_text (store_popularity.py:303-310). During a warm pass over the ~137 GitHub homepages this rewrites the whole JSON file ~137 times, and because write_text is blocking I/O executed on the asyncio event loop, each write briefly stalls the loop (and thus any in-flight request handlers). Impact is small at current catalog size but grows with the catalog and is wasteful.

Suggested fix: persist once at the end of a warm pass instead of per fetch (e.g. call _persist_cache() from warm_popularity_cache after asyncio.gather), and/or offload the write via asyncio.to_thread so it does not block the loop.

Was this helpful? React with 👍 / 👎

Comment thread tinyagentos/store_popularity.py
Prevents popularity-cache corruption if the process crashes mid-write.
Writes to a sibling .tmp then atomically replaces the target.
@gitar-bot

gitar-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown
Code Review 👍 Approved with suggestions 5 resolved / 6 findings

Implements a non-blocking, cache-aware GitHub popularity backend for the store catalog. Address the minor issue where the cache file is rewritten and blocks I/O on every individual star fetch to improve performance.

💡 Performance: Full cache file rewritten + blocking I/O on every fetch_stars

📄 tinyagentos/store_popularity.py:212-213 📄 tinyagentos/store_popularity.py:303-312 📄 tinyagentos/store_popularity.py:237-251

fetch_stars calls _persist_cache() after every single repo lookup (store_popularity.py:213). _persist_cache serializes the entire _star_cache dict and does a synchronous write_text (store_popularity.py:303-310). During a warm pass over the ~137 GitHub homepages this rewrites the whole JSON file ~137 times, and because write_text is blocking I/O executed on the asyncio event loop, each write briefly stalls the loop (and thus any in-flight request handlers). Impact is small at current catalog size but grows with the catalog and is wasteful.

Suggested fix: persist once at the end of a warm pass instead of per fetch (e.g. call _persist_cache() from warm_popularity_cache after asyncio.gather), and/or offload the write via asyncio.to_thread so it does not block the loop.

✅ 5 resolved
Bug: Rate-limit (403/429) failures cached for full 6h TTL

📄 tinyagentos/store_popularity.py:106-119 📄 tinyagentos/store_popularity.py:87-88
In fetch_stars (tinyagentos/store_popularity.py:106-119), any non-200 response — including transient rate-limit responses (403/429) — is stored as None with the same 6-hour TTL used for successful lookups. The unauthenticated GitHub API limit is 60 requests/hour/IP. Once that limit is hit, every valid repo that returns 403 gets its star count pinned to null for 6 hours, even though the repo is fine and stars would be available again within the hour. This conflates a permanent failure (404 not-found) with a transient one (rate-limit), so a brief rate-limit window poisons the cache long-term and most catalog entries silently show no stars.

Suggested fix: distinguish transient (403/429/5xx/network) from permanent (404) failures and cache transient failures with a much shorter TTL (e.g. a few minutes) or not at all, while keeping the long TTL for 200 and 404.

Performance: Unbounded concurrent GitHub fetches exhaust rate limit on cold cache

📄 tinyagentos/routes/store.py:38-47
_popularity_by_app_id (tinyagentos/routes/store.py:38-47) fans out one GitHub request per uncached app via asyncio.gather with no concurrency cap. On the first /api/store/catalog (or /api/store/popularity) call after startup, all uncached github.com manifests are fetched at once — the PR description states 137 entries carry a github.com homepage. Unauthenticated GitHub allows only 60 requests/hour/IP, so a single cold-cache catalog load fires ~137 simultaneous requests, immediately exhausts the limit, and (combined with the negative-caching issue above) leaves most entries with github_stars=null for hours. A burst of 137 concurrent connections is also unkind to the event loop and to GitHub.

Suggested fix: bound concurrency with an asyncio.Semaphore and consider deferring/lazy-loading stars rather than fetching every uncached repo on each list request.

Edge Case: parse_repo false-positives on github.com subdomains/paths

📄 tinyagentos/store_popularity.py:48-56
parse_repo (tinyagentos/store_popularity.py:48-56) gates on the substring github.com and then runs re.search(r"github\.com/([^/\s]+)/([^/\s#?]+)", homepage). Hosts like docs.github.com, gist.github.com, or raw.github.com contain the substring github.com and match the regex, so e.g. https://docs.github.com/en/repositories parses to the bogus repo en/repositories. Impact is limited (the bogus repo just 404s and degrades to null, also wasting a request + a cache slot), but it is incorrect and adds needless GitHub calls. Anchor the host to github.com/ at a path boundary (e.g. require //github.com/ or ://github.com/).

Performance: 8s GitHub timeout can stall catalog list endpoint

📄 tinyagentos/routes/store.py:38-47 📄 tinyagentos/routes/store.py:159
The shared client in _popularity_by_app_id uses httpx.AsyncClient(timeout=8) (tinyagentos/routes/store.py:38), and list_catalog awaits _popularity_by_app_id before returning (store.py:159). Although requests run concurrently, if GitHub is slow/unreachable the catalog list response is blocked for up to ~8 seconds on a cold cache — contrary to the module docstring's claim that a slow GitHub never blocks the response for long. Consider a tighter per-request timeout (e.g. 2-3s) and/or wrapping the gather in an overall asyncio.wait_for budget so the list endpoint degrades to null stars quickly instead of hanging.

Edge Case: Cache persist is non-atomic; crash mid-write corrupts file

📄 tinyagentos/store_popularity.py:303-312 📄 tinyagentos/store_popularity.py:282-296
_persist_cache writes directly to store_popularity.json with write_text (store_popularity.py:308). If the process is killed mid-write the file is left truncated/corrupt. _load_cache swallows the JSON error and returns (store_popularity.py:285-289), so the failure mode is a silently dropped cache (full re-fetch) rather than a crash — low impact, but easily avoided. Write to a temp file and os.replace for an atomic swap.

🤖 Prompt for agents
Code Review: Implements a non-blocking, cache-aware GitHub popularity backend for the store catalog. Address the minor issue where the cache file is rewritten and blocks I/O on every individual star fetch to improve performance.

1. 💡 Performance: Full cache file rewritten + blocking I/O on every fetch_stars
   Files: tinyagentos/store_popularity.py:212-213, tinyagentos/store_popularity.py:303-312, tinyagentos/store_popularity.py:237-251

   `fetch_stars` calls `_persist_cache()` after every single repo lookup (store_popularity.py:213). `_persist_cache` serializes the entire `_star_cache` dict and does a synchronous `write_text` (store_popularity.py:303-310). During a warm pass over the ~137 GitHub homepages this rewrites the whole JSON file ~137 times, and because `write_text` is blocking I/O executed on the asyncio event loop, each write briefly stalls the loop (and thus any in-flight request handlers). Impact is small at current catalog size but grows with the catalog and is wasteful.
   
   Suggested fix: persist once at the end of a warm pass instead of per fetch (e.g. call `_persist_cache()` from `warm_popularity_cache` after `asyncio.gather`), and/or offload the write via `asyncio.to_thread` so it does not block the loop.

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@jaylfc jaylfc marked this pull request as ready for review June 15, 2026 14:40
@jaylfc jaylfc merged commit 51eb7b7 into dev Jun 15, 2026
7 of 8 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in TinyAgentOS Roadmap Jun 15, 2026
@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

Comment thread tinyagentos/app.py
if (r := store_popularity.parse_repo(getattr(a, "homepage", "") or ""))
})
if repos:
await store_popularity.warm_popularity_cache(repos)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: First popularity warm pass can block app startup

warm_popularity_cache awaits all uncached repos with an 8s per-request timeout. On a cold cache this runs before _startup_complete is set, so slow or unreachable GitHub can keep the server in startup/503 for minutes instead of warming in the background.

Suggested change
await store_popularity.warm_popularity_cache(repos)
await _asyncio.wait_for(store_popularity.warm_popularity_cache(repos), timeout=30)

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

if time.time() < _rate_limited_until:
return # a sibling fetch hit the limit; stop spending budget
async with sem:
await fetch_stars(repo, client=client)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Re-check the rate-limit gate after acquiring the semaphore

A task can pass the pre-semaphore _rate_limited_until check, wait for the semaphore, then call fetch_stars after sibling tasks have already armed the rate-limit back-off gate. Re-check inside the semaphore before fetching so the warmer stops spending GitHub budget promptly.

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

@kilo-code-bot

kilo-code-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 1
Issue Details (click to expand)

CRITICAL

File Line Issue

WARNING

File Line Issue
tinyagentos/app.py 742 First popularity warm pass can block app startup because warm_popularity_cache awaits all uncached repos before _startup_complete is set.

SUGGESTION

File Line Issue
tinyagentos/store_popularity.py 260 Tasks can spend one extra GitHub request after the rate-limit gate is armed because the rate-limit check is not repeated after acquiring the semaphore.
Other Observations (not in diff)

None.

Files Reviewed (5 files)
  • tests/routes/test_store_popularity_route.py - 0 issues
  • tests/test_store_popularity.py - 0 issues
  • tinyagentos/app.py - 1 issue
  • tinyagentos/routes/store.py - 0 issues
  • tinyagentos/store_popularity.py - 1 issue

Fix Link: Fix these issues in Kilo Cloud


Reviewed by nex-n2-pro:free · 425,463 tokens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant