Skip to content

rename latest_revision with get_max_revision#27

Merged
wpak-ai merged 2 commits into
developfrom
bugfix/ISOProber-Encapsulation-Breach
May 13, 2026
Merged

rename latest_revision with get_max_revision#27
wpak-ai merged 2 commits into
developfrom
bugfix/ISOProber-Encapsulation-Breach

Conversation

@henry0816191
Copy link
Copy Markdown
Collaborator

@henry0816191 henry0816191 commented May 13, 2026

Summary

src/paperscout/sources.py

  • WG21Index docstring — Includes the issue’s exact sentence (“This class is designed for single-threaded async use. Do not access from the Bolt daemon thread.”) plus the MessageQueue / sync-worker note and refresh() / papers / _max_rev race warning.
  • get_max_revision(paper_number) — Renamed from latest_revision; same behavior and docstring clarified for None / sentinel -1.
  • get_known_paper_ids() — Returns frozenset(self.papers).
  • get_papers_snapshot() — Returns MappingProxyType(dict(self.papers)) as Mapping[str, Paper].
  • Imports: Mapping from collections.abc, MappingProxyType from types.
  • ISOProber._hot_numbers — Uses get_papers_snapshot().values() instead of papers.values().
  • ISOProber._build_hot_list / _build_cold_slice — Call get_max_revision instead of latest_revision.
  • ISOProber._probe_one — Uses paper_id in self.index.get_known_paper_ids() instead of self.index.papers.

There are no remaining self.index.papers or self.index._max_rev references on ISOProber.

tests/test_sources.py

  • Renamed test_get_max_revision_* and updated assertions.
  • test_get_papers_snapshot_and_known_ids_are_independent_snapshots — Uses index.papers = index._parse_and_index(...) (matching how _parse_and_index actually wires papers), checks proxy identity, TypeError on mutation, frozenset / AttributeError on .add, then clears live papers and asserts the earlier snapshot still holds P1000R0.
  • Comment updates in the ISO prober list tests for get_max_revision wording.

Tests

  • tests/test_sources.py: 78 passed, 2 skipped
  • Full suite: 301 passed, 2 skipped

Related issues

close #26

Summary by CodeRabbit

  • New Features

    • Read-only snapshot accessors providing immutable views of index content for safer reads.
  • Behavior Changes / Bug Fixes

    • Probing now uses precomputed snapshots to avoid racey reads; hot/cold list logic clarified (unknown revisions treated as starting at R0; some cold-known items are silently skipped).
  • Tests

    • Test suite updated to verify snapshot immutability and expected probing behavior.
  • Breaking Changes

    • Index query API renamed; callers must adapt to the new method name.

Review Change Stack

@henry0816191 henry0816191 self-assigned this May 13, 2026
@henry0816191 henry0816191 requested a review from wpak-ai as a code owner May 13, 2026 16:28
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b7ea893d-c9e0-43a2-8170-13af1792c64d

📥 Commits

Reviewing files that changed from the base of the PR and between 497739e and 79fdb4f.

📒 Files selected for processing (1)
  • src/paperscout/sources.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/paperscout/sources.py

📝 Walkthrough

Walkthrough

WG21Index now exposes public read-only snapshot methods (get_max_revision(), get_known_paper_ids(), get_papers_snapshot()) with documented thread-safety constraints. ISOProber probe logic is refactored to use these public APIs instead of accessing private _max_rev and papers fields, eliminating the encapsulation breach. Tests verify snapshot independence and immutability.

Changes

WG21Index Encapsulation and Thread-Safety

Layer / File(s) Summary
WG21Index API and thread-safety contract
src/paperscout/sources.py
Imports for Mapping and MappingProxyType are added. Class docstring documents single-threaded async usage and prohibits cross-thread access/mutation. Methods get_max_revision(paper_number), get_known_paper_ids(), and get_papers_snapshot() provide read-only views: get_max_revision() returns the highest revision or None; get_known_paper_ids() returns an immutable frozenset; get_papers_snapshot() returns a MappingProxyType read-only snapshot.
ISOProber probe logic using public API
src/paperscout/sources.py
run_cycle() precomputes known_ids via index.get_known_paper_ids() and passes it into _probe_one(). _hot_numbers() iterates index.get_papers_snapshot().values() instead of index.papers.values(). _build_hot_list() and _build_cold_slice() call index.get_max_revision(num) instead of index.latest_revision(num). _probe_one() uses the provided known_ids for membership checks.
Test coverage for snapshot API and semantic changes
tests/test_sources.py
New/updated tests assert get_max_revision semantics for known/unknown numbers, verify get_papers_snapshot() returns an independent MappingProxyType snapshot that persists after index.papers reassignment, and verify get_known_paper_ids() returns an immutable frozenset. Cold-slice tests assert known papers with get_max_revision() is None are skipped.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • cppalliance/paperscout#5: Both PRs modify src/paperscout/sources.py's WG21 index API and probing flow, with this PR introducing snapshot/accessor methods used by the probe logic to eliminate private field access.

Suggested labels

bug

Suggested reviewers

  • wpak-ai

Poem

🐰 I hopped through code both old and new,
Froze the snapshots, made them true,
Prober now knocks at a proper door,
No more races on the shared floor,
A tidy patch — a carrot or two!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately describes the main change: renaming the latest_revision method to get_max_revision, which is the primary refactoring in this PR.
Description check ✅ Passed The PR description comprehensively covers the main changes, includes test results, and links to related issue #26. However, the description lacks the Test plan section from the template.
Linked Issues check ✅ Passed All acceptance criteria from issue #26 are met: WG21Index exposes public APIs (get_max_revision, get_known_paper_ids, get_papers_snapshot), ISOProber uses these APIs instead of accessing internals, thread-safety docstring added, public APIs return frozen views/copies, existing tests pass (78 passed), and new snapshot independence test added.
Out of Scope Changes check ✅ Passed All changes directly support the stated objective of fixing the ISOProber encapsulation breach. The API additions, method renames, and test updates are all in scope and aligned with issue #26 requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/ISOProber-Encapsulation-Breach

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/paperscout/sources.py (1)

296-300: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid rebuilding the full known-id snapshot for every probe.

get_known_paper_ids() allocates a new frozenset on every _probe_one() call. Because run_cycle() schedules one probe per URL, this turns a cheap membership check into repeated O(len(index)) work on the hottest path.

💡 Suggested change
 async def run_cycle(self) -> list[ProbeHit]:
     """HEAD all scheduled URLs; return recent hits and persist discovery state."""
     self._cycle += 1
     self._stats = dict(self._STATS_TEMPLATE)
     t0 = time.monotonic()
 
     urls = self._build_probe_list()
+    known_paper_ids = self.index.get_known_paper_ids()
     hot_count = sum(1 for u in urls if u[1] in (Tier.WATCHLIST, Tier.FRONTIER, Tier.RECENT))
     cold_count = sum(1 for u in urls if u[1] == Tier.COLD)
     slice_idx = (self._cycle - 1) % self.cfg.cold_cycle_divisor
     ...
         tasks = [
-            self._probe_one(client, sem, url, prefix, num, rev, ext, tier)
+            self._probe_one(client, sem, url, prefix, num, rev, ext, tier, known_paper_ids)
             for url, tier, prefix, num, rev, ext in urls
         ]
     async def _probe_one(
         self,
         client: httpx.AsyncClient,
         sem: asyncio.Semaphore,
         url: str,
         prefix: str,
         num: int,
         rev: int,
         ext: str,
         tier: Tier,
+        known_paper_ids: frozenset[str],
     ) -> ProbeHit | None:
         ...
         paper_id = f"{prefix}{num:04d}R{rev}"
-        if paper_id in self.index.get_known_paper_ids():
+        if paper_id in known_paper_ids:
             log.debug("SKIP  idx   %s", paper_id)
             self._stats["skipped_in_index"] += 1
             return None

Also applies to: 476-476

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/paperscout/sources.py` around lines 296 - 300, The code rebuilds the full
frozenset from get_known_paper_ids() inside each _probe_one() call, causing an
expensive allocation per probe; compute known_ids once in run_cycle() before
creating tasks (e.g., known_ids = self.get_known_paper_ids()), change
_probe_one() to accept a known_ids parameter (or read a cached attribute) and
use that for membership checks, update the tasks comprehension to pass known_ids
into self._probe_one(..., known_ids), and apply the same change to the other
call site referenced around line 476.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/paperscout/sources.py`:
- Around line 296-300: The code rebuilds the full frozenset from
get_known_paper_ids() inside each _probe_one() call, causing an expensive
allocation per probe; compute known_ids once in run_cycle() before creating
tasks (e.g., known_ids = self.get_known_paper_ids()), change _probe_one() to
accept a known_ids parameter (or read a cached attribute) and use that for
membership checks, update the tasks comprehension to pass known_ids into
self._probe_one(..., known_ids), and apply the same change to the other call
site referenced around line 476.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0f3c09e4-33ea-4f58-b40b-ddb6d0ef341c

📥 Commits

Reviewing files that changed from the base of the PR and between 5c59108 and 497739e.

📒 Files selected for processing (2)
  • src/paperscout/sources.py
  • tests/test_sources.py

@wpak-ai wpak-ai merged commit 87a5475 into develop May 13, 2026
7 checks passed
@wpak-ai wpak-ai deleted the bugfix/ISOProber-Encapsulation-Breach branch May 13, 2026 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix ISOProber Encapsulation Breach: Add Public API Method

2 participants