rename latest_revision with get_max_revision by henry0816191 · Pull Request #27 · cppalliance/paperscout

henry0816191 · 2026-05-13T16:28:26Z

Summary

src/paperscout/sources.py

WG21Index docstring — Includes the issue’s exact sentence (“This class is designed for single-threaded async use. Do not access from the Bolt daemon thread.”) plus the MessageQueue / sync-worker note and refresh() / papers / _max_rev race warning.
get_max_revision(paper_number) — Renamed from latest_revision; same behavior and docstring clarified for None / sentinel -1.
get_known_paper_ids() — Returns frozenset(self.papers).
get_papers_snapshot() — Returns MappingProxyType(dict(self.papers)) as Mapping[str, Paper].
Imports: Mapping from collections.abc, MappingProxyType from types.
ISOProber._hot_numbers — Uses get_papers_snapshot().values() instead of papers.values().
ISOProber._build_hot_list / _build_cold_slice — Call get_max_revision instead of latest_revision.
ISOProber._probe_one — Uses paper_id in self.index.get_known_paper_ids() instead of self.index.papers.

There are no remaining self.index.papers or self.index._max_rev references on ISOProber.

tests/test_sources.py

Renamed test_get_max_revision_* and updated assertions.
test_get_papers_snapshot_and_known_ids_are_independent_snapshots — Uses index.papers = index._parse_and_index(...) (matching how _parse_and_index actually wires papers), checks proxy identity, TypeError on mutation, frozenset / AttributeError on .add, then clears live papers and asserts the earlier snapshot still holds P1000R0.
Comment updates in the ISO prober list tests for get_max_revision wording.

Tests

tests/test_sources.py: 78 passed, 2 skipped
Full suite: 301 passed, 2 skipped

Related issues

close #26

Summary by CodeRabbit

New Features
- Read-only snapshot accessors providing immutable views of index content for safer reads.
Behavior Changes / Bug Fixes
- Probing now uses precomputed snapshots to avoid racey reads; hot/cold list logic clarified (unknown revisions treated as starting at R0; some cold-known items are silently skipped).
Tests
- Test suite updated to verify snapshot immutability and expected probing behavior.
Breaking Changes
- Index query API renamed; callers must adapt to the new method name.

coderabbitai · 2026-05-13T16:28:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b7ea893d-c9e0-43a2-8170-13af1792c64d

📥 Commits

Reviewing files that changed from the base of the PR and between 497739e and 79fdb4f.

📒 Files selected for processing (1)

src/paperscout/sources.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/paperscout/sources.py

📝 Walkthrough

Walkthrough

WG21Index now exposes public read-only snapshot methods (get_max_revision(), get_known_paper_ids(), get_papers_snapshot()) with documented thread-safety constraints. ISOProber probe logic is refactored to use these public APIs instead of accessing private _max_rev and papers fields, eliminating the encapsulation breach. Tests verify snapshot independence and immutability.

Changes

WG21Index Encapsulation and Thread-Safety

Layer / File(s)	Summary
WG21Index API and thread-safety contract `src/paperscout/sources.py`	Imports for `Mapping` and `MappingProxyType` are added. Class docstring documents single-threaded async usage and prohibits cross-thread access/mutation. Methods `get_max_revision(paper_number)`, `get_known_paper_ids()`, and `get_papers_snapshot()` provide read-only views: `get_max_revision()` returns the highest revision or `None`; `get_known_paper_ids()` returns an immutable `frozenset`; `get_papers_snapshot()` returns a `MappingProxyType` read-only snapshot.
ISOProber probe logic using public API `src/paperscout/sources.py`	`run_cycle()` precomputes `known_ids` via `index.get_known_paper_ids()` and passes it into `_probe_one()`. `_hot_numbers()` iterates `index.get_papers_snapshot().values()` instead of `index.papers.values()`. `_build_hot_list()` and `_build_cold_slice()` call `index.get_max_revision(num)` instead of `index.latest_revision(num)`. `_probe_one()` uses the provided `known_ids` for membership checks.
Test coverage for snapshot API and semantic changes `tests/test_sources.py`	New/updated tests assert `get_max_revision` semantics for known/unknown numbers, verify `get_papers_snapshot()` returns an independent `MappingProxyType` snapshot that persists after `index.papers` reassignment, and verify `get_known_paper_ids()` returns an immutable `frozenset`. Cold-slice tests assert known papers with `get_max_revision() is None` are skipped.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

cppalliance/paperscout#5: Both PRs modify src/paperscout/sources.py's WG21 index API and probing flow, with this PR introducing snapshot/accessor methods used by the probe logic to eliminate private field access.

Suggested labels

bug

Suggested reviewers

wpak-ai

Poem

🐰 I hopped through code both old and new,
Froze the snapshots, made them true,
Prober now knocks at a proper door,
No more races on the shared floor,
A tidy patch — a carrot or two!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and accurately describes the main change: renaming the latest_revision method to get_max_revision, which is the primary refactoring in this PR.
Description check	✅ Passed	The PR description comprehensively covers the main changes, includes test results, and links to related issue `#26`. However, the description lacks the Test plan section from the template.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#26` are met: WG21Index exposes public APIs (get_max_revision, get_known_paper_ids, get_papers_snapshot), ISOProber uses these APIs instead of accessing internals, thread-safety docstring added, public APIs return frozen views/copies, existing tests pass (78 passed), and new snapshot independence test added.
Out of Scope Changes check	✅ Passed	All changes directly support the stated objective of fixing the ISOProber encapsulation breach. The API additions, method renames, and test updates are all in scope and aligned with issue `#26` requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bugfix/ISOProber-Encapsulation-Breach

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/paperscout/sources.py (1)

296-300: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid rebuilding the full known-id snapshot for every probe.

get_known_paper_ids() allocates a new frozenset on every _probe_one() call. Because run_cycle() schedules one probe per URL, this turns a cheap membership check into repeated O(len(index)) work on the hottest path.

💡 Suggested change

 async def run_cycle(self) -> list[ProbeHit]:
     """HEAD all scheduled URLs; return recent hits and persist discovery state."""
     self._cycle += 1
     self._stats = dict(self._STATS_TEMPLATE)
     t0 = time.monotonic()
 
     urls = self._build_probe_list()
+    known_paper_ids = self.index.get_known_paper_ids()
     hot_count = sum(1 for u in urls if u[1] in (Tier.WATCHLIST, Tier.FRONTIER, Tier.RECENT))
     cold_count = sum(1 for u in urls if u[1] == Tier.COLD)
     slice_idx = (self._cycle - 1) % self.cfg.cold_cycle_divisor
     ...
         tasks = [
-            self._probe_one(client, sem, url, prefix, num, rev, ext, tier)
+            self._probe_one(client, sem, url, prefix, num, rev, ext, tier, known_paper_ids)
             for url, tier, prefix, num, rev, ext in urls
         ]

     async def _probe_one(
         self,
         client: httpx.AsyncClient,
         sem: asyncio.Semaphore,
         url: str,
         prefix: str,
         num: int,
         rev: int,
         ext: str,
         tier: Tier,
+        known_paper_ids: frozenset[str],
     ) -> ProbeHit | None:
         ...
         paper_id = f"{prefix}{num:04d}R{rev}"
-        if paper_id in self.index.get_known_paper_ids():
+        if paper_id in known_paper_ids:
             log.debug("SKIP  idx   %s", paper_id)
             self._stats["skipped_in_index"] += 1
             return None

Also applies to: 476-476

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/sources.py` around lines 296 - 300, The code rebuilds the full
frozenset from get_known_paper_ids() inside each _probe_one() call, causing an
expensive allocation per probe; compute known_ids once in run_cycle() before
creating tasks (e.g., known_ids = self.get_known_paper_ids()), change
_probe_one() to accept a known_ids parameter (or read a cached attribute) and
use that for membership checks, update the tasks comprehension to pass known_ids
into self._probe_one(..., known_ids), and apply the same change to the other
call site referenced around line 476.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/paperscout/sources.py`:
- Around line 296-300: The code rebuilds the full frozenset from
get_known_paper_ids() inside each _probe_one() call, causing an expensive
allocation per probe; compute known_ids once in run_cycle() before creating
tasks (e.g., known_ids = self.get_known_paper_ids()), change _probe_one() to
accept a known_ids parameter (or read a cached attribute) and use that for
membership checks, update the tasks comprehension to pass known_ids into
self._probe_one(..., known_ids), and apply the same change to the other call
site referenced around line 476.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f3c09e4-33ea-4f58-b40b-ddb6d0ef341c

📥 Commits

Reviewing files that changed from the base of the PR and between 5c59108 and 497739e.

📒 Files selected for processing (2)

src/paperscout/sources.py
tests/test_sources.py

rename latest_revision with get_max_revision

497739e

henry0816191 self-assigned this May 13, 2026

henry0816191 requested a review from wpak-ai as a code owner May 13, 2026 16:28

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

addressed ai reviews

79fdb4f

wpak-ai approved these changes May 13, 2026

View reviewed changes

wpak-ai merged commit 87a5475 into develop May 13, 2026
7 checks passed

wpak-ai deleted the bugfix/ISOProber-Encapsulation-Breach branch May 13, 2026 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rename latest_revision with get_max_revision#27

rename latest_revision with get_max_revision#27
wpak-ai merged 2 commits into
developfrom
bugfix/ISOProber-Encapsulation-Breach

henry0816191 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

henry0816191 commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

src/paperscout/sources.py

tests/test_sources.py

Tests

Related issues

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

henry0816191 commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading