rename latest_revision with get_max_revision#27
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthrough
ChangesWG21Index Encapsulation and Thread-Safety
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/paperscout/sources.py (1)
296-300:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid rebuilding the full known-id snapshot for every probe.
get_known_paper_ids()allocates a newfrozenseton every_probe_one()call. Becauserun_cycle()schedules one probe per URL, this turns a cheap membership check into repeated O(len(index)) work on the hottest path.💡 Suggested change
async def run_cycle(self) -> list[ProbeHit]: """HEAD all scheduled URLs; return recent hits and persist discovery state.""" self._cycle += 1 self._stats = dict(self._STATS_TEMPLATE) t0 = time.monotonic() urls = self._build_probe_list() + known_paper_ids = self.index.get_known_paper_ids() hot_count = sum(1 for u in urls if u[1] in (Tier.WATCHLIST, Tier.FRONTIER, Tier.RECENT)) cold_count = sum(1 for u in urls if u[1] == Tier.COLD) slice_idx = (self._cycle - 1) % self.cfg.cold_cycle_divisor ... tasks = [ - self._probe_one(client, sem, url, prefix, num, rev, ext, tier) + self._probe_one(client, sem, url, prefix, num, rev, ext, tier, known_paper_ids) for url, tier, prefix, num, rev, ext in urls ]async def _probe_one( self, client: httpx.AsyncClient, sem: asyncio.Semaphore, url: str, prefix: str, num: int, rev: int, ext: str, tier: Tier, + known_paper_ids: frozenset[str], ) -> ProbeHit | None: ... paper_id = f"{prefix}{num:04d}R{rev}" - if paper_id in self.index.get_known_paper_ids(): + if paper_id in known_paper_ids: log.debug("SKIP idx %s", paper_id) self._stats["skipped_in_index"] += 1 return NoneAlso applies to: 476-476
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/paperscout/sources.py` around lines 296 - 300, The code rebuilds the full frozenset from get_known_paper_ids() inside each _probe_one() call, causing an expensive allocation per probe; compute known_ids once in run_cycle() before creating tasks (e.g., known_ids = self.get_known_paper_ids()), change _probe_one() to accept a known_ids parameter (or read a cached attribute) and use that for membership checks, update the tasks comprehension to pass known_ids into self._probe_one(..., known_ids), and apply the same change to the other call site referenced around line 476.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/paperscout/sources.py`:
- Around line 296-300: The code rebuilds the full frozenset from
get_known_paper_ids() inside each _probe_one() call, causing an expensive
allocation per probe; compute known_ids once in run_cycle() before creating
tasks (e.g., known_ids = self.get_known_paper_ids()), change _probe_one() to
accept a known_ids parameter (or read a cached attribute) and use that for
membership checks, update the tasks comprehension to pass known_ids into
self._probe_one(..., known_ids), and apply the same change to the other call
site referenced around line 476.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0f3c09e4-33ea-4f58-b40b-ddb6d0ef341c
📒 Files selected for processing (2)
src/paperscout/sources.pytests/test_sources.py
Summary
src/paperscout/sources.py
There are no remaining self.index.papers or self.index._max_rev references on ISOProber.
tests/test_sources.py
Tests
Related issues
close #26
Summary by CodeRabbit
New Features
Behavior Changes / Bug Fixes
Tests
Breaking Changes