feat(store): store popularity backend (GitHub stars) by jaylfc · Pull Request #936 · jaylfc/taOS

jaylfc · 2026-06-15T13:36:00Z

Adds a store popularity backend sourced from GitHub stars today and structured to accept real install telemetry later (#15).

What it does:

New module tinyagentos/store_popularity.py parses owner/repo from a catalog entry's homepage when it points at github.com, fetches the star count unauthenticated from api.github.com (allowed by the network policy), and caches it in-memory with a 6 hour TTL so the Store list endpoint never hammers GitHub.
The Store catalog endpoint (GET /api/store/catalog) now carries repo, stars, and a telemetry-ready popularity object on every entry. Stars are fetched concurrently over one shared httpx client so the list stays fast.
New GET /api/store/popularity returns the popularity shape keyed by app id.

Telemetry-ready shape (forward-compatible, no breaking change when #15 lands):

{ "github_stars": int|null, "installs": int|null, "score": float }

installs is null today; score is derived from whatever signals exist (just stars now, stars plus weighted installs later).

Graceful degradation: any GitHub failure (rate-limit 403/429, 404, network error, bad JSON) degrades that entry to github_stars=null and never raises. Negative results are cached so a known-bad repo is not retried on every request. Entries whose homepage is not a github.com/owner/repo URL get a popularity shape with github_stars=null; no stars are fabricated. 137 of 266 catalog manifests already carry a github.com homepage, so they get real star counts; the rest need a repo homepage added to their manifest to surface stars.

Frontend: the Store frontend already reads repo and stars off the catalog response (and tolerates their absence), so no frontend change was needed.

Tests: tests/test_store_popularity.py (unit, mocked httpx) and tests/routes/test_store_popularity_route.py (endpoint) assert a github homepage gets stars plus a score, a 404 and a rate-limit both yield null without raising, caching avoids a second call within the TTL, and a non-github homepage gets null without a call. All green; existing store route tests unaffected; create_app() ok.

Summary by CodeRabbit

New Features
- Store catalog items now include GitHub star counts and popularity metrics
- Added new /api/store/popularity endpoint to query popularity information by app ID with optional type filtering
- Implemented background cache system that periodically refreshes GitHub star data for improved catalog performance

coderabbitai · 2026-06-15T13:36:07Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5866790c-3308-4653-aad8-b72ab74c3314

📥 Commits

Reviewing files that changed from the base of the PR and between 9aaa767 and c003969.

📒 Files selected for processing (5)

tests/routes/test_store_popularity_route.py
tests/test_store_popularity.py
tinyagentos/app.py
tinyagentos/routes/store.py
tinyagentos/store_popularity.py

📝 Walkthrough

Walkthrough

A new tinyagentos/store_popularity.py module is added, implementing GitHub star fetching with TTL-aware caching, a bounded async cache warmer, and atomic JSON persistence. The store catalog route is extended to surface repo, stars, and popularity fields per app, and a new /api/store/popularity endpoint is added. A background warmer task is wired into app lifespan startup. Unit and route integration tests cover all paths.

Changes

Store Popularity Feature

Layer / File(s)	Summary
Core store_popularity module `tinyagentos/store_popularity.py`	Adds the full popularity module: TTL constants, in-memory star cache, `parse_repo`, `popularity_shape`/`_compute_score`, read-only cache accessors, async `fetch_stars` (never-raise, error-class TTL rules), `_is_rate_limited`, `_has_fresh_entry`, `warm_popularity_cache` with semaphore-bounded concurrency, `configure_persistence`, atomic `_load_cache`/`_persist_cache`, and `_reset_cache_for_tests`.
Route helpers and new endpoints `tinyagentos/routes/store.py`	Adds `_popularity_by_app_id` helper building a cached popularity map; extends `list_catalog` response items with `repo`, `stars`, and `popularity`; adds `GET /api/store/popularity` endpoint.
App lifespan background warmer `tinyagentos/app.py`	Inserts a background task at startup that configures persistence, derives repo keys from agent `homepage` fields, and calls `warm_popularity_cache` every 10 minutes.
Unit tests for store_popularity `tests/test_store_popularity.py`	Covers `parse_repo`, `fetch_stars` TTL and rate-limit rules, cache read helpers, `warm_popularity_cache` concurrency and backoff, `popularity_shape` scoring, and persistence round-trips.
Route integration tests `tests/routes/test_store_popularity_route.py`	Tests catalog endpoint with warm/cold cache and non-GitHub entries, live-fetch prevention via monkeypatching, and the `/api/store/popularity` endpoint response shape.

Sequence Diagram(s)

sequenceDiagram
    participant AppLifespan as App Lifespan
    participant Warmer as warm_popularity_cache
    participant Cache as _star_cache
    participant GitHub as GitHub API

    AppLifespan->>Warmer: warm repos from agent homepages (every 10 min)
    Warmer->>Cache: check _has_fresh_entry(repo)
    alt stale or missing
        Warmer->>GitHub: GET /repos/{owner}/{repo} via fetch_stars
        GitHub-->>Warmer: stargazers_count / error
        Warmer->>Cache: write stars + TTL expiry
        Warmer->>Cache: _persist_cache() atomic write
    end

    participant Client as HTTP Client
    participant CatalogRoute as list_catalog / list_popularity
    Client->>CatalogRoute: GET /api/store/catalog
    CatalogRoute->>Cache: _popularity_by_app_id(apps) — cache-only reads
    Cache-->>CatalogRoute: {app_id: {repo, github_stars, score}}
    CatalogRoute-->>Client: items with repo, stars, popularity fields

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

jaylfc/taOS#871: Extends the frontend CatalogApp type to accept and display the repo/stars popularity fields that this PR adds to the store catalog API response schema.

Poem

🐇 A bunny once counted the stars in the night,
Cached every twinkle, each GitHub delight.
With TTLs guarding each warm, precious score,
The catalog blooms with new data galore.
No live fetch on reads — just the cache's soft glow,
Stars persisted to disk so the scores always show! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/store-popularity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…po parse

gitar-bot · 2026-06-15T14:01:56Z

+    _star_cache[repo] = (time.time() + ttl, stars)
+    _persist_cache()


💡 Performance: Full cache file rewritten + blocking I/O on every fetch_stars

fetch_stars calls _persist_cache() after every single repo lookup (store_popularity.py:213). _persist_cache serializes the entire _star_cache dict and does a synchronous write_text (store_popularity.py:303-310). During a warm pass over the ~137 GitHub homepages this rewrites the whole JSON file ~137 times, and because write_text is blocking I/O executed on the asyncio event loop, each write briefly stalls the loop (and thus any in-flight request handlers). Impact is small at current catalog size but grows with the catalog and is wasteful.

Suggested fix: persist once at the end of a warm pass instead of per fetch (e.g. call _persist_cache() from warm_popularity_cache after asyncio.gather), and/or offload the write via asyncio.to_thread so it does not block the loop.

_{Was this helpful? React with 👍 / 👎}

Prevents popularity-cache corruption if the process crashes mid-write. Writes to a sibling .tmp then atomically replaces the target.

gitar-bot · 2026-06-15T14:24:59Z

Code Review 👍 Approved with suggestions 5 resolved / 6 findings

Implements a non-blocking, cache-aware GitHub popularity backend for the store catalog. Address the minor issue where the cache file is rewritten and blocks I/O on every individual star fetch to improve performance.

💡 Performance: Full cache file rewritten + blocking I/O on every fetch_stars

📄 tinyagentos/store_popularity.py:212-213 📄 tinyagentos/store_popularity.py:303-312 📄 tinyagentos/store_popularity.py:237-251

fetch_stars calls _persist_cache() after every single repo lookup (store_popularity.py:213). _persist_cache serializes the entire _star_cache dict and does a synchronous write_text (store_popularity.py:303-310). During a warm pass over the ~137 GitHub homepages this rewrites the whole JSON file ~137 times, and because write_text is blocking I/O executed on the asyncio event loop, each write briefly stalls the loop (and thus any in-flight request handlers). Impact is small at current catalog size but grows with the catalog and is wasteful.

Suggested fix: persist once at the end of a warm pass instead of per fetch (e.g. call _persist_cache() from warm_popularity_cache after asyncio.gather), and/or offload the write via asyncio.to_thread so it does not block the loop.

✅ 5 resolved

✅ Bug: Rate-limit (403/429) failures cached for full 6h TTL

📄 tinyagentos/store_popularity.py:106-119 📄 tinyagentos/store_popularity.py:87-88
In fetch_stars (tinyagentos/store_popularity.py:106-119), any non-200 response — including transient rate-limit responses (403/429) — is stored as None with the same 6-hour TTL used for successful lookups. The unauthenticated GitHub API limit is 60 requests/hour/IP. Once that limit is hit, every valid repo that returns 403 gets its star count pinned to null for 6 hours, even though the repo is fine and stars would be available again within the hour. This conflates a permanent failure (404 not-found) with a transient one (rate-limit), so a brief rate-limit window poisons the cache long-term and most catalog entries silently show no stars.

Suggested fix: distinguish transient (403/429/5xx/network) from permanent (404) failures and cache transient failures with a much shorter TTL (e.g. a few minutes) or not at all, while keeping the long TTL for 200 and 404.

✅ Performance: Unbounded concurrent GitHub fetches exhaust rate limit on cold cache

📄 tinyagentos/routes/store.py:38-47
_popularity_by_app_id (tinyagentos/routes/store.py:38-47) fans out one GitHub request per uncached app via asyncio.gather with no concurrency cap. On the first /api/store/catalog (or /api/store/popularity) call after startup, all uncached github.com manifests are fetched at once — the PR description states 137 entries carry a github.com homepage. Unauthenticated GitHub allows only 60 requests/hour/IP, so a single cold-cache catalog load fires ~137 simultaneous requests, immediately exhausts the limit, and (combined with the negative-caching issue above) leaves most entries with github_stars=null for hours. A burst of 137 concurrent connections is also unkind to the event loop and to GitHub.

Suggested fix: bound concurrency with an asyncio.Semaphore and consider deferring/lazy-loading stars rather than fetching every uncached repo on each list request.

✅ Edge Case: parse_repo false-positives on github.com subdomains/paths

📄 tinyagentos/store_popularity.py:48-56
parse_repo (tinyagentos/store_popularity.py:48-56) gates on the substring github.com and then runs re.search(r"github\.com/([^/\s]+)/([^/\s#?]+)", homepage). Hosts like docs.github.com, gist.github.com, or raw.github.com contain the substring github.com and match the regex, so e.g. https://docs.github.com/en/repositories parses to the bogus repo en/repositories. Impact is limited (the bogus repo just 404s and degrades to null, also wasting a request + a cache slot), but it is incorrect and adds needless GitHub calls. Anchor the host to github.com/ at a path boundary (e.g. require //github.com/ or ://github.com/).

✅ Performance: 8s GitHub timeout can stall catalog list endpoint

📄 tinyagentos/routes/store.py:38-47 📄 tinyagentos/routes/store.py:159
The shared client in _popularity_by_app_id uses httpx.AsyncClient(timeout=8) (tinyagentos/routes/store.py:38), and list_catalog awaits _popularity_by_app_id before returning (store.py:159). Although requests run concurrently, if GitHub is slow/unreachable the catalog list response is blocked for up to ~8 seconds on a cold cache — contrary to the module docstring's claim that a slow GitHub never blocks the response for long. Consider a tighter per-request timeout (e.g. 2-3s) and/or wrapping the gather in an overall asyncio.wait_for budget so the list endpoint degrades to null stars quickly instead of hanging.

✅ Edge Case: Cache persist is non-atomic; crash mid-write corrupts file

📄 tinyagentos/store_popularity.py:303-312 📄 tinyagentos/store_popularity.py:282-296
_persist_cache writes directly to store_popularity.json with write_text (store_popularity.py:308). If the process is killed mid-write the file is left truncated/corrupt. _load_cache swallows the JSON error and returns (store_popularity.py:285-289), so the failure mode is a silently dropped cache (full re-fetch) rather than a crash — low impact, but easily avoided. Write to a temp file and os.replace for an atomic swap.

🤖 Prompt for agents

Code Review: Implements a non-blocking, cache-aware GitHub popularity backend for the store catalog. Address the minor issue where the cache file is rewritten and blocks I/O on every individual star fetch to improve performance.

1. 💡 Performance: Full cache file rewritten + blocking I/O on every fetch_stars
   Files: tinyagentos/store_popularity.py:212-213, tinyagentos/store_popularity.py:303-312, tinyagentos/store_popularity.py:237-251

   `fetch_stars` calls `_persist_cache()` after every single repo lookup (store_popularity.py:213). `_persist_cache` serializes the entire `_star_cache` dict and does a synchronous `write_text` (store_popularity.py:303-310). During a warm pass over the ~137 GitHub homepages this rewrites the whole JSON file ~137 times, and because `write_text` is blocking I/O executed on the asyncio event loop, each write briefly stalls the loop (and thus any in-flight request handlers). Impact is small at current catalog size but grows with the catalog and is wasteful.
   
   Suggested fix: persist once at the end of a warm pass instead of per fetch (e.g. call `_persist_cache()` from `warm_popularity_cache` after `asyncio.gather`), and/or offload the write via `asyncio.to_thread` so it does not block the loop.

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

qodo-code-review · 2026-06-15T14:44:15Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

kilo-code-bot · 2026-06-15T14:46:26Z

+                        if (r := store_popularity.parse_repo(getattr(a, "homepage", "") or ""))
+                    })
+                    if repos:
+                        await store_popularity.warm_popularity_cache(repos)


WARNING: First popularity warm pass can block app startup

warm_popularity_cache awaits all uncached repos with an 8s per-request timeout. On a cold cache this runs before _startup_complete is set, so slow or unreachable GitHub can keep the server in startup/503 for minutes instead of warming in the background.

Suggested change

await store_popularity.warm_popularity_cache(repos)

await _asyncio.wait_for(store_popularity.warm_popularity_cache(repos), timeout=30)

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

kilo-code-bot · 2026-06-15T14:46:27Z

+        if time.time() < _rate_limited_until:
+            return  # a sibling fetch hit the limit; stop spending budget
+        async with sem:
+            await fetch_stars(repo, client=client)


SUGGESTION: Re-check the rate-limit gate after acquiring the semaphore

A task can pass the pre-semaphore _rate_limited_until check, wait for the semaphore, then call fetch_stars after sibling tasks have already armed the rate-limit back-off gate. Re-check inside the semaphore before fetching so the warmer stops spending GitHub budget promptly.

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

kilo-code-bot · 2026-06-15T14:47:21Z

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	1
SUGGESTION	1

Issue Details (click to expand)

CRITICAL

File	Line	Issue

WARNING

File	Line	Issue
`tinyagentos/app.py`	742	First popularity warm pass can block app startup because `warm_popularity_cache` awaits all uncached repos before `_startup_complete` is set.

SUGGESTION

File	Line	Issue
`tinyagentos/store_popularity.py`	260	Tasks can spend one extra GitHub request after the rate-limit gate is armed because the rate-limit check is not repeated after acquiring the semaphore.

Other Observations (not in diff)

None.

Files Reviewed (5 files)

tests/routes/test_store_popularity_route.py - 0 issues
tests/test_store_popularity.py - 0 issues
tinyagentos/app.py - 1 issue
tinyagentos/routes/store.py - 0 issues
tinyagentos/store_popularity.py - 1 issue

Fix Link: Fix these issues in Kilo Cloud

_{Reviewed by nex-n2-pro:free · 425,463 tokens}

feat(store): popularity backend from GitHub stars, telemetry-ready (#13)

51362c7

jaylfc added this to TinyAgentOS Roadmap Jun 15, 2026

github-project-automation Bot moved this to Todo in TinyAgentOS Roadmap Jun 15, 2026