Internal refactoring: replacing iterative logic with a recursive implementation. by fhalde · Pull Request #7 · fhalde/jsontap

fhalde · 2026-03-03T14:48:15Z

Summary by CodeRabbit

New Features
- Asynchronous streaming JSON parsing with awaitable, path-based access to parsed values.
- In-memory, path-keyed store to deliver parsed values to awaiters.
- Small demo entrypoint that simulates chunked streaming for local testing.
Chores
- Added attr/attrs to project dependencies.
- Updated .gitignore to ignore editor settings.

coderabbitai · 2026-03-03T14:48:34Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds asynchronous streaming JSON parsing: an async iterator adapter and AsyncParser using ijson.parse_async, a Future-backed in-memory Store with path-keyed NodeState, an awaitable AsyncJsonNode frontend, dependency additions, and a demo runner simulating chunked streams.

Changes

Cohort / File(s)	Summary
Async parser & stream adapter `src/jsontap/parser.py`	Adds `AsyncIterableFileLike` (adapts an async iterator to a read(n) interface) and `AsyncParser` that drives `ijson.parse_async`, with async methods `parse`, `parse_value`, `parse_object`, and `parse_array`, and a private `_result` cache.
In-memory Future store `src/jsontap/store.py`	Adds `Path` type alias, `NodeState` (attrs) containing an `asyncio.Future` and optional `exception`, and `Store` that lazily provides NodeState per path and resolves futures via `set`.
Awaitable frontend node `src/jsontap/frontend.py`	Adds `AsyncJsonNode`, holding a path and `Store`; supports `__getitem__` for path chaining and `__await__` to await the path's future; implements `__str__/__repr__`.
Demo / CLI & helpers `src/jsontap/main.py`	Adds `FakeChatCompletion` (async iterator yielding chunks), an `async main()` demonstrating wiring `Store` with `AsyncParser` and parsing a streamed JSON, and an `if __name__ == "__main__"` runner.
Project deps `pyproject.toml`	Adds `attr>=0.3.2` and `attrs>=25.4.0` to project dependencies (retains `ijson>=3.5.0`).
Ignore rules `.gitignore`	Adds `.vscode` to .gitignore.

Sequence Diagram

sequenceDiagram
    participant Stream
    participant AsyncIterableFileLike
    participant AsyncParser
    participant ijson
    participant Store
    participant AsyncJsonNode

    Stream->>AsyncIterableFileLike: yield chunk
    AsyncIterableFileLike->>AsyncParser: read(n) -> bytes
    AsyncParser->>ijson: parse_async(stream)
    ijson->>AsyncParser: emit events (start_map, map_key, value, end_map, ...)
    loop handle events
        AsyncParser->>AsyncParser: parse_value / parse_object / parse_array
        AsyncParser->>Store: set(path, value)
        Store->>Store: resolve future[path]
    end
    AsyncJsonNode->>Store: __await__ -> future for path
    Store-->>AsyncJsonNode: future resolves with parsed value

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇
Chunks of JSON tumble down the stream,
Futures wake and stitch the data seam,
Parsers hop through trees and nested lanes,
AsyncNodes wait — the values come like rains,
A rabbit applauds: small hops, bright dreams.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title claims 'replacing iterative logic with a recursive implementation' but the changeset adds new async streaming JSON parsing classes (AsyncParser, AsyncIterableFileLike, AsyncJsonNode, Store) with no evidence of iterative-to-recursive refactoring of existing code.	Revise the title to accurately reflect the main change: adding asynchronous streaming JSON parsing support. For example: 'Add asynchronous streaming JSON parser with AsyncParser and related classes.'
Docstring Coverage	⚠️ Warning	Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch impl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

src/jsontap/parser.py (1)

90-111: Consider moving demo streaming code to docs/tests.

Keeping example() in the runtime module mixes library and demo concerns; moving it out keeps the module surface cleaner.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 90 - 111, Remove the demo streaming
example from the runtime module by extracting the example() function (including
nested simulate_stream and main) out of src/jsontap/parser.py; instead create a
separate docs or tests file that imports AsyncParser and demonstrates usage by
calling AsyncParser(simulate_stream(...)), await p.parse_value(()), and
inspecting p._result—leave the parser implementation (AsyncParser, parse_value)
untouched and ensure any helpers like simulate_stream are relocated to the new
demo/test file.

src/jsontap/json_node.py (1)

5-10: Prefer encapsulated fields for path/store to keep node API consistent and safer.

Using public mutable attributes here makes accidental reassignment easy and diverges from the method-style path access used in src/jsontap/core.py.

💡 Proposed refactor

 class AsyncJsonNode:
     def __init__(self, path: tuple[str | int, ...], store: Store):
-        self.path = path
-        self.store = store
+        self._path = path
+        self._store = store
+
+    `@property`
+    def path(self) -> tuple[str | int, ...]:
+        return self._path
 
     def __getitem__(self, key: str | int) -> "AsyncJsonNode":
-        return AsyncJsonNode(path=(*self.path, key), store=self.store)
+        return AsyncJsonNode(path=(*self._path, key), store=self._store)
 
     def __await__(self):
-        return self.store.getdefault(self.path).__await__()
+        return self._store.getdefault(self._path).__await__()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/json_node.py` around lines 5 - 10, Switch the public mutable
attributes to encapsulated private fields and expose read-only properties:
rename self.path and self.store to self._path and self._store in the __init__ of
the JsonNode class, add `@property` methods path(self) -> tuple[...] and
store(self) -> Store that return the internal values, and update __getitem__ to
reference self._path/self._store (or use the new properties) when constructing
the returned AsyncJsonNode (keep AsyncJsonNode(path=(*self.path, key),
store=self.store) semantics but source the values from the private
fields/properties). This preserves the external API while preventing accidental
reassignment of path/store.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/jsontap/parser.py`:
- Around line 32-76: The parser uses recursive calls in parse_value,
parse_object, and parse_array which can hit Python recursion limits for deeply
nested JSON; add an explicit depth guard by threading a depth counter (e.g., a
parameter like depth: int defaulting to 0 or an instance attribute like
self._depth) into parse_value, parse_object, and parse_array, incrementing it on
each recursive descent and checking against a configurable max (e.g.,
self._max_depth) before recursing; when the max is exceeded raise a clear
exception (or return a controlled error) instead of recursing further; update
all recursive call sites (parse_value -> parse_object/parse_array, parse_object
calling parse_value, parse_array calling parse_object/parse_array) to pass the
incremented depth and initialize/configure self._max_depth in the parser
constructor.
- Around line 11-18: The async read method currently returns whole chunks from
self._iter which can exceed the requested n; change it to maintain an internal
byte buffer (e.g., self._buffer: bytes) and implement logic in read(self, n=-1)
to: if n==0 return b""; if n<0 drain the iterator appending each chunk (encoding
str to utf-8) to the buffer and return all bytes; otherwise ensure the buffer
has at least n bytes by repeatedly awaiting anext(self._iter) and appending
encoded chunks until buffer length >= n or iterator is exhausted, then return
buffer[:n] and keep the remainder in self._buffer; on StopAsyncIteration return
whatever is left in the buffer or b""; update any initialization to create
self._buffer = b"" and use read, self._iter, and StopAsyncIteration handling as
described.

In `@src/jsontap/store.py`:
- Around line 14-15: The current Store.set calls
self.getdefault(path).set_result(value) which raises asyncio.InvalidStateError
if the future is already resolved; modify Store.set to fetch the future via f =
self.getdefault(path) and either check f.done() before calling
f.set_result(value) or wrap set_result in a try/except that catches
asyncio.InvalidStateError (or concurrent.futures.InvalidStateError) and silently
ignore or optionally log the duplicate write; this change touches Store.set and
references getdefault and set_result so you can locate and update that logic
accordingly.

---

Nitpick comments:
In `@src/jsontap/json_node.py`:
- Around line 5-10: Switch the public mutable attributes to encapsulated private
fields and expose read-only properties: rename self.path and self.store to
self._path and self._store in the __init__ of the JsonNode class, add `@property`
methods path(self) -> tuple[...] and store(self) -> Store that return the
internal values, and update __getitem__ to reference self._path/self._store (or
use the new properties) when constructing the returned AsyncJsonNode (keep
AsyncJsonNode(path=(*self.path, key), store=self.store) semantics but source the
values from the private fields/properties). This preserves the external API
while preventing accidental reassignment of path/store.

In `@src/jsontap/parser.py`:
- Around line 90-111: Remove the demo streaming example from the runtime module
by extracting the example() function (including nested simulate_stream and main)
out of src/jsontap/parser.py; instead create a separate docs or tests file that
imports AsyncParser and demonstrates usage by calling
AsyncParser(simulate_stream(...)), await p.parse_value(()), and inspecting
p._result—leave the parser implementation (AsyncParser, parse_value) untouched
and ensure any helpers like simulate_stream are relocated to the new demo/test
file.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae4b4ac and dc8852f.

📒 Files selected for processing (3)

src/jsontap/json_node.py
src/jsontap/parser.py
src/jsontap/store.py

coderabbitai · 2026-03-03T14:56:04Z

src/jsontap/parser.py

+    async def read(self, n=-1):
+        if n == 0:
+            return b""
+        try:
+            chunk = await anext(self._iter)
+            return chunk.encode("utf-8") if isinstance(chunk, str) else chunk
+        except StopAsyncIteration:
+            return b""


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.py" | head -20

Repository: fhalde/jsontap

Length of output: 221

🏁 Script executed:

cat -n src/jsontap/parser.py | head -50

Repository: fhalde/jsontap

Length of output: 1833

🏁 Script executed:

rg "ijson" --type py

Repository: fhalde/jsontap

Length of output: 489

🌐 Web query:

ijson parse_async documentation async file-like read(size) contract

💡 Result:

ijson.parse_async() (and the other *_async entry points) expect an async file-like object whose read method is awaitable and follows the same basic semantics as a normal binary file’s read(size):

Required contract for the async source

Must have read(size) where read is an async function (await source.read(...)). The library detects “async file” specifically by checking that read is awaitable. [2]

read(size) must accept the size argument ijson passes (default buf_size=64*1024), and it will be called repeatedly as await read(buf_size). [3]

read(0) must be supported: ijson calls await source.read(0) once to determine whether the stream yields bytes or str. [3]

Return type must be either:

bytes (preferred), or

str (ijson will UTF-8 encode it internally, via an async UTF-8 wrapper, and emits a warning). [1] [3]

EOF must be signaled by returning an empty/falsy chunk (b'' or ''), same as normal file objects. (ijson stops when the returned chunk is falsy). [3]

Where this is documented/implemented

README states the (sync) “file-like object” requirement: a read(size) method returning bytes (preferably) or str, and that the same APIs work with async file-like objects. [1]

The async implementation in utils35.py shows the concrete behavior: probing with read(0), then looping on await read(buf_size) and feeding chunks into the parser pipeline until EOF. [3]

Sources: GitHub README and source for ijson. [1] [3] plus async-file detection helper in common.py. [2]

Honor the async read(size) contract in the ijson adapter.

read(n) currently returns a single chunk as-is, which can exceed n and violates the file-like read(size) semantics. This breaks ijson's buffering expectations with ijson.parse_async(). Implement buffering to cap reads to n bytes:

💡 Proposed fix (buffer and cap reads to n)

class AsyncIterableFileLike: def __init__(self, stream): self._iter = stream.__aiter__() + self._buffer = bytearray() async def read(self, n=-1): if n == 0: return b"" - try: - chunk = await anext(self._iter) - return chunk.encode("utf-8") if isinstance(chunk, str) else chunk - except StopAsyncIteration: - return b"" + if n < 0: + # Keep streaming behavior but normalize type. + try: + chunk = await anext(self._iter) + except StopAsyncIteration: + return b"" + return chunk.encode("utf-8") if isinstance(chunk, str) else bytes(chunk) + + while len(self._buffer) < n: + try: + chunk = await anext(self._iter) + except StopAsyncIteration: + break + if isinstance(chunk, str): + chunk = chunk.encode("utf-8") + self._buffer.extend(chunk) + + out = bytes(self._buffer[:n]) + del self._buffer[:n] + return out

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/jsontap/parser.py` around lines 11 - 18, The async read method currently returns whole chunks from self._iter which can exceed the requested n; change it to maintain an internal byte buffer (e.g., self._buffer: bytes) and implement logic in read(self, n=-1) to: if n==0 return b""; if n<0 drain the iterator appending each chunk (encoding str to utf-8) to the buffer and return all bytes; otherwise ensure the buffer has at least n bytes by repeatedly awaiting anext(self._iter) and appending encoded chunks until buffer length >= n or iterator is exhausted, then return buffer[:n] and keep the remainder in self._buffer; on StopAsyncIteration return whatever is left in the buffer or b""; update any initialization to create self._buffer = b"" and use read, self._iter, and StopAsyncIteration handling as described.

coderabbitai · 2026-03-03T14:56:04Z

src/jsontap/parser.py

+    async def parse_value(self, prefix: tuple[str | int, ...]) -> Any:
+        _, event, value = await self._next_event()
+        # recurse into object handling
+        if event == "start_map":
+            node = await self.parse_object(prefix)
+            self._result[prefix] = node
+            return node
+        # recurse into array handling
+        elif event == "start_array":
+            node = await self.parse_array(prefix)
+            self._result[prefix] = node
+            return node
+        # primitives
+        else:
+            self._result[prefix] = value
+            return value
+
+    async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]:
+        obj = {}
+        while True:
+            _, event, value = await self._next_event()
+            # go back to parse_value for handling this key's value
+            if event == "map_key":
+                obj[value] = await self.parse_value((*prefix, value))
+            # done with this object, return the result
+            elif event == "end_map":
+                break
+
+        self._result[prefix] = obj
+        return obj
+
+    async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]:
+        arr = []
+        index = 0
+        while True:
+            _, event, value = await self._next_event()
+
+            # recurse into object handling
+            if event == "start_map":
+                arr.append(await self.parse_object((*prefix, index)))
+                index += 1
+            # recurse into array handling
+            elif event == "start_array":
+                arr.append(await self.parse_array((*prefix, index)))
+                index += 1


⚠️ Potential issue | 🟠 Major

Guard recursive descent depth to avoid RecursionError on deeply nested JSON.

This parser now relies on recursive calls across object/array/value parsing. Deeply nested payloads can crash at runtime near Python’s recursion limit.

💡 Proposed fix (explicit depth guard)

class AsyncParser: - def __init__(self, stream): + def __init__(self, stream, max_depth: int = 900): self._events = ijson.parse_async(AsyncIterableFileLike(stream)) self._result = {} + self._max_depth = max_depth @@ - async def parse(self): - return await self.parse_value(()) + async def parse(self): + return await self.parse_value((), 0) - async def parse_value(self, prefix: tuple[str | int, ...]) -> Any: + async def parse_value(self, prefix: tuple[str | int, ...], depth: int) -> Any: + if depth > self._max_depth: + raise ValueError("Maximum JSON nesting depth exceeded") _, event, value = await self._next_event() @@ - node = await self.parse_object(prefix) + node = await self.parse_object(prefix, depth + 1) @@ - node = await self.parse_array(prefix) + node = await self.parse_array(prefix, depth + 1) @@ - async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]: + async def parse_object(self, prefix: tuple[str | int, ...], depth: int) -> dict[str | int, Any]: @@ - obj[value] = await self.parse_value((*prefix, value)) + obj[value] = await self.parse_value((*prefix, value), depth + 1) @@ - async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]: + async def parse_array(self, prefix: tuple[str | int, ...], depth: int) -> list[Any]: @@ - arr.append(await self.parse_object((*prefix, index))) + arr.append(await self.parse_object((*prefix, index), depth + 1)) @@ - arr.append(await self.parse_array((*prefix, index))) + arr.append(await self.parse_array((*prefix, index), depth + 1))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/jsontap/parser.py` around lines 32 - 76, The parser uses recursive calls in parse_value, parse_object, and parse_array which can hit Python recursion limits for deeply nested JSON; add an explicit depth guard by threading a depth counter (e.g., a parameter like depth: int defaulting to 0 or an instance attribute like self._depth) into parse_value, parse_object, and parse_array, incrementing it on each recursive descent and checking against a configurable max (e.g., self._max_depth) before recursing; when the max is exceeded raise a clear exception (or return a controlled error) instead of recursing further; update all recursive call sites (parse_value -> parse_object/parse_array, parse_object calling parse_value, parse_array calling parse_object/parse_array) to pass the incremented depth and initialize/configure self._max_depth in the parser constructor.

src/jsontap/store.py

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 19-20: Remove the unused and conflicting dependency "attr>=0.3.2"
from pyproject.toml: keep only "attrs>=25.4.0", then regenerate your
lock/install (e.g., poetry lock && poetry install) and run tests; verify imports
like "from attrs import define, field" in src/jsontap/store.py still resolve to
the attrs package and there are no remaining references to the legacy "attr"
package.

In `@src/jsontap/main.py`:
- Around line 32-37: The background parse task created with
asyncio.create_task(parser.parse_value(())) is fire-and-forget and can swallow
exceptions; change it to keep a reference (e.g., parse_task =
asyncio.create_task(parser.parse_value(()))) and ensure its exceptions are
observed before awaiting parser["name"] — either await parse_task (or await
asyncio.shield/asyncio.gather with return_exceptions=False) or add a done
callback to log/raise exceptions; update the jsontap usage around
SimulatedStream and parser.parse() so the parse task is never discarded and any
errors from parser.parse_value are propagated to the caller.

In `@src/jsontap/store.py`:
- Around line 27-29: The raise uses self.nodes[path].value but NodeState has no
value attribute, causing an AttributeError; update the error message to
reference an existing representation — e.g., use getattr(self.nodes[path],
"value", repr(self.nodes[path])) or replace .value with the actual NodeState
attribute name (NodeState) used to hold the stored value (for example .state or
.data) so the ValueError shows the current node contents; adjust the raise in
the method that performs the duplicate-write check to use that safe/reference
attribute instead.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a536473 and e2089d5.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

pyproject.toml
src/jsontap/frontend.py
src/jsontap/main.py
src/jsontap/parser.py
src/jsontap/store.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/jsontap/parser.py

coderabbitai · 2026-03-03T15:57:24Z

pyproject.toml

+    "attr>=0.3.2",
+    "attrs>=25.4.0",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether legacy attr namespace is used anywhere in the repo echo "=== Searching for attr imports and usage ===" rg -n -C2 --type=py '^\s*(import\s+attr\b|from\s+attr\s+import\b|\battr\.)' echo "" echo "=== Checking if attr is used in any context (broader search) ===" rg -n --type=py '\battr\b' | grep -v 'attrs' | head -30

Repository: fhalde/jsontap

Length of output: 168

🏁 Script executed:

#!/bin/bash echo "=== Reading pyproject.toml ===" cat -n pyproject.toml | head -50 echo "" echo "=== Checking if attr package info in comments or requirements ===" rg -n 'attr' pyproject.toml

Repository: fhalde/jsontap

Length of output: 1296

🏁 Script executed:

#!/bin/bash echo "=== Searching for attrs usage in Python code ===" rg -n --type=py 'attrs' | head -20 echo "" echo "=== Searching for imports of attrs or attr-related classes ===" rg -n --type=py '(import attrs|from attrs|@attrs|@attr)' | head -20 echo "" echo "=== Looking for any Python files to understand the project ===" find . -name '*.py' -type f | head -10

Repository: fhalde/jsontap

Length of output: 529

🏁 Script executed:

#!/bin/bash echo "=== Checking if there are any lock files or requirements ===" find . -name 'requirements*.txt' -o -name 'uv.lock' -o -name 'poetry.lock' -o -name 'Pipfile*' 2>/dev/null echo "" echo "=== Checking the actual store.py file to confirm attrs usage ===" cat -n src/jsontap/store.py | head -20 echo "" echo "=== Searching for attr in all files (not just Python) ===" rg -i 'attr' --type-list py txt toml | head -30

Repository: fhalde/jsontap

Length of output: 1255

🏁 Script executed:

#!/bin/bash echo "=== Checking uv.lock for attr and attrs packages ===" rg 'name = "(attr|attrs)"' uv.lock -A 5 | head -40

Repository: fhalde/jsontap

Length of output: 1752

Remove attr>=0.3.2 from dependencies—it is unused and conflicts with attrs.

The codebase only imports and uses attrs (specifically from attrs import define, field in src/jsontap/store.py). The attr package is never referenced anywhere in the project. Since attr is the legacy, unmaintained package (last updated July 2022) and attrs is the modern successor, keeping both creates unnecessary namespace collision risk and increases the supply-chain surface area without providing any value.

🔧 Proposed fix

dependencies = [ - "attr>=0.3.2", "attrs>=25.4.0", "ijson>=3.5.0", ]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"attr>=0.3.2",

"attrs>=25.4.0",

dependencies = [

"attrs>=25.4.0",

"ijson>=3.5.0",

]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 19 - 20, Remove the unused and conflicting dependency "attr>=0.3.2" from pyproject.toml: keep only "attrs>=25.4.0", then regenerate your lock/install (e.g., poetry lock && poetry install) and run tests; verify imports like "from attrs import define, field" in src/jsontap/store.py still resolve to the attrs package and there are no remaining references to the legacy "attr" package.

coderabbitai · 2026-03-03T15:57:24Z

src/jsontap/main.py

+        asyncio.create_task(parser.parse_value(()))
+        return parser.parse()
+
+    stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)
+    parser = jsontap(stream)
+    print(await parser["name"])


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/jsontap/main.py | head -50

Repository: fhalde/jsontap

Length of output: 1387

🏁 Script executed:

cat -n src/jsontap/parser.py | head -100

Repository: fhalde/jsontap

Length of output: 3952

🏁 Script executed:

rg -n "class AsyncJsonNode" -A 20 --type=py

Repository: fhalde/jsontap

Length of output: 1916

🏁 Script executed:

rg -n "class Store" -A 30 --type=py

Repository: fhalde/jsontap

Length of output: 1039

🏁 Script executed:

rg -n 'asyncio\.create_task\(' --type=py -A 2

Repository: fhalde/jsontap

Length of output: 356

🏁 Script executed:

rg -n "_task" -B 5 -A 5 --type=py src/jsontap/core.py

Repository: fhalde/jsontap

Length of output: 389

🏁 Script executed:

cat -n src/jsontap/store.py

Repository: fhalde/jsontap

Length of output: 1171

🏁 Script executed:

cat -n src/jsontap/parser.py | tail -20

Repository: fhalde/jsontap

Length of output: 889

🏁 Script executed:

rg -n "try|except|raise" -A 2 src/jsontap/parser.py

Repository: fhalde/jsontap

Length of output: 269

🏁 Script executed:

rg -n "set_exception|exception" --type=py src/jsontap/

Repository: fhalde/jsontap

Length of output: 1464

🏁 Script executed:

find . -name "*.py" -type f -exec grep -l "jsontap\|AsyncParser" {} \; | head -10

Repository: fhalde/jsontap

Length of output: 193

🏁 Script executed:

cat -n tests/test_jsontap.py | head -100

Repository: fhalde/jsontap

Length of output: 3603

🏁 Script executed:

cat -n examples/run.py

Repository: fhalde/jsontap

Length of output: 5906

Retain the parser task and handle exceptions; fire-and-forget parsing loses errors.

Line 32 creates a background task without retaining it. If parse_value() raises an exception, it will be silently lost. If an error occurs before the expected path is resolved, line 37 will block indefinitely with no error indication.

Suggested fix

async def main(): def jsontap(s): store = Store() parser = AsyncParser(s, store) - asyncio.create_task(parser.parse_value(())) - return parser.parse() + parse_task = asyncio.create_task(parser.parse_value(())) + return parser.parse(), parse_task stream = SimulatedStream('{"name": "Alice"}', chunk_size=3) - parser = jsontap(stream) - print(await parser["name"]) + parser, parse_task = jsontap(stream) + try: + print(await parser["name"]) + await parse_task + finally: + if not parse_task.done(): + parse_task.cancel()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

asyncio.create_task(parser.parse_value(()))

return parser.parse()

stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)

parser = jsontap(stream)

print(await parser["name"])

parse_task = asyncio.create_task(parser.parse_value(()))

return parser.parse(), parse_task

stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)

parser, parse_task = jsontap(stream)

try:

print(await parser["name"])

await parse_task

finally:

if not parse_task.done():

parse_task.cancel()

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 32-32: Store a reference to the return value of asyncio.create_task

(RUF006)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/jsontap/main.py` around lines 32 - 37, The background parse task created with asyncio.create_task(parser.parse_value(())) is fire-and-forget and can swallow exceptions; change it to keep a reference (e.g., parse_task = asyncio.create_task(parser.parse_value(()))) and ensure its exceptions are observed before awaiting parser["name"] — either await parse_task (or await asyncio.shield/asyncio.gather with return_exceptions=False) or add a done callback to log/raise exceptions; update the jsontap usage around SimulatedStream and parser.parse() so the parse task is never discarded and any errors from parser.parse_value are propagated to the caller.

src/jsontap/store.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/jsontap/main.py (1)

32-37: ⚠️ Potential issue | 🟠 Major

Retain and observe the parse task; current fire-and-forget can hide failures.

Line 32 drops the create_task handle, so parse exceptions may be unobserved, and Line 37 can wait indefinitely if "name" is never produced due to upstream parse failure.

Suggested fix

-import asyncio
+import asyncio
+import contextlib
@@
     def jsontap(s):
         store = Store()
         parser = AsyncParser(s, store)
-        asyncio.create_task(parser.parse_value(()))
-        return parser.parse()
+        parse_task = asyncio.create_task(parser.parse_value(()))
+        return parser.parse(), parse_task
@@
-    parser = jsontap(stream)
-    print(await parser["name"])
+    parser, parse_task = jsontap(stream)
+    try:
+        print(await parser["name"])
+        await parse_task
+    finally:
+        if not parse_task.done():
+            parse_task.cancel()
+            with contextlib.suppress(asyncio.CancelledError):
+                await parse_task

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 32 - 37, The code currently fires
parser.parse_value via asyncio.create_task without keeping the Task, causing
exceptions to be unobserved and leaving parser["name"] awaiting forever on
upstream failures; change the fire-and-forget into an observed task by assigning
asyncio.create_task(parser.parse_value(())) to a variable (e.g., parse_task),
start the parser (ensure parser.parse() is invoked appropriately), and either
await parse_task after you consume parser["name"] or add a done-callback that
logs/propagates exceptions so parse errors are not swallowed; reference the
functions/classes parser.parse_value, parser.parse, jsontap, FakeChatCompletion
and the await of parser["name"] when updating the code.

🧹 Nitpick comments (1)

src/jsontap/main.py (1)

23-24: Avoid unconditional stdout logging in stream iteration.

Line 23 prints raw chunks every iteration. Consider using a debug logger or a verbose flag to avoid noisy output and accidental payload exposure.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 23 - 24, Replace the unconditional
print(repr(chunk), flush=True) in the stream loop with a conditional or
debug-level logger; specifically, remove the direct stdout call and either (a)
wrap it behind a verbose flag (e.g., if verbose: print(...)) or (b) use a module
logger (logger = logging.getLogger(__name__) and logger.debug(repr(chunk))).
Keep the await asyncio.sleep(0.1) unchanged; ensure any logging does not
accidentally expose payloads in non-debug modes and add a CLI/config option to
enable verbose/debug output if needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/jsontap/main.py`:
- Around line 10-13: The __init__ of the iterator stores chunk_size without
validation which allows chunk_size <= 0 and causes __anext__ to never advance
(or regress); update the Jsontap iterator __init__ to validate that chunk_size
is an int > 0 and raise a ValueError (or TypeError for non-int) if not, ensuring
self._chunk_size is always a positive integer so the advancement logic in
__anext__ uses a safe, terminating step size.

---

Duplicate comments:
In `@src/jsontap/main.py`:
- Around line 32-37: The code currently fires parser.parse_value via
asyncio.create_task without keeping the Task, causing exceptions to be
unobserved and leaving parser["name"] awaiting forever on upstream failures;
change the fire-and-forget into an observed task by assigning
asyncio.create_task(parser.parse_value(())) to a variable (e.g., parse_task),
start the parser (ensure parser.parse() is invoked appropriately), and either
await parse_task after you consume parser["name"] or add a done-callback that
logs/propagates exceptions so parse errors are not swallowed; reference the
functions/classes parser.parse_value, parser.parse, jsontap, FakeChatCompletion
and the await of parser["name"] when updating the code.

---

Nitpick comments:
In `@src/jsontap/main.py`:
- Around line 23-24: Replace the unconditional print(repr(chunk), flush=True) in
the stream loop with a conditional or debug-level logger; specifically, remove
the direct stdout call and either (a) wrap it behind a verbose flag (e.g., if
verbose: print(...)) or (b) use a module logger (logger =
logging.getLogger(__name__) and logger.debug(repr(chunk))). Keep the await
asyncio.sleep(0.1) unchanged; ensure any logging does not accidentally expose
payloads in non-debug modes and add a CLI/config option to enable verbose/debug
output if needed.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2089d5 and ef565ad.

📒 Files selected for processing (1)

src/jsontap/main.py

coderabbitai · 2026-03-03T16:00:16Z

src/jsontap/main.py

+    def __init__(self, data: str, chunk_size: int = 1):
+        self._data = data
+        self._chunk_size = chunk_size
+        self._i = 0


⚠️ Potential issue | 🟡 Minor

Validate chunk_size to prevent non-terminating iteration.

If chunk_size <= 0, Line 22 does not advance correctly, and __anext__ can loop forever (or regress for negatives).

Suggested fix

def __init__(self, data: str, chunk_size: int = 1): + if chunk_size <= 0: + raise ValueError("chunk_size must be > 0") self._data = data self._chunk_size = chunk_size self._i = 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/jsontap/main.py` around lines 10 - 13, The __init__ of the iterator stores chunk_size without validation which allows chunk_size <= 0 and causes __anext__ to never advance (or regress); update the Jsontap iterator __init__ to validate that chunk_size is an int > 0 and raise a ValueError (or TypeError for non-int) if not, ensuring self._chunk_size is always a positive integer so the advancement logic in __anext__ uses a safe, terminating step size.

coderabbitai

♻️ Duplicate comments (4)

src/jsontap/parser.py (2)

36-83: ⚠️ Potential issue | 🟠 Major

Add an explicit recursion depth guard for nested JSON.

Line [36]-Line [83] recursively descends without a max-depth check, so deeply nested payloads can raise RecursionError and terminate parsing unexpectedly.

💡 Proposed fix

 class AsyncParser:
-    def __init__(self, stream, store: Store):
+    def __init__(self, stream, store: Store, max_depth: int = 900):
         self._events = ijson.parse_async(AsyncIterableFileLike(stream))
         self._result = {}
         self._store = store
+        self._max_depth = max_depth
@@
-    async def parse_value(self, prefix: tuple[str | int, ...]) -> Any:
+    async def parse_value(self, prefix: tuple[str | int, ...], depth: int = 0) -> Any:
+        if depth > self._max_depth:
+            raise ValueError("Maximum JSON nesting depth exceeded")
         _, event, value = await self._next_event()
@@
-            node = await self.parse_object(prefix)
+            node = await self.parse_object(prefix, depth + 1)
@@
-            node = await self.parse_array(prefix)
+            node = await self.parse_array(prefix, depth + 1)
@@
-    async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]:
+    async def parse_object(self, prefix: tuple[str | int, ...], depth: int) -> dict[str | int, Any]:
@@
-                obj[value] = await self.parse_value((*prefix, value))
+                obj[value] = await self.parse_value((*prefix, value), depth + 1)
@@
-    async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]:
+    async def parse_array(self, prefix: tuple[str | int, ...], depth: int) -> list[Any]:
@@
-                arr.append(await self.parse_object((*prefix, index)))
+                arr.append(await self.parse_object((*prefix, index), depth + 1))
@@
-                arr.append(await self.parse_array((*prefix, index)))
+                arr.append(await self.parse_array((*prefix, index), depth + 1))

#!/bin/bash
# Verify recursive entry points and absence/presence of a depth guard.
rg -nP 'def parse_value|def parse_object|def parse_array|max_depth|RecursionError' src/jsontap/parser.py -C2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 36 - 83, The parser currently recurses
indefinitely in parse_value, parse_object, and parse_array and needs a max-depth
guard to prevent RecursionError; add a configurable depth limit (e.g.
self._max_depth) and a current depth counter (pass an incremented depth arg or
use an internal stack counter) and check it before any recursive descent in
parse_value, parse_object, and parse_array, raising a clear exception (or
returning an explicit error) when depth > self._max_depth; ensure the top-level
entrypoint initializes the depth (or uses 0) so all recursive calls (references:
parse_value, parse_object, parse_array) increment/check the counter
consistently.

14-21: ⚠️ Potential issue | 🟠 Major

Honor ijson read(size) semantics in AsyncIterableFileLike.read.

Line [14]-Line [21] return one upstream chunk as-is, so read(n) can exceed n and breaks file-like buffering expectations for ijson.parse_async.

💡 Proposed fix

 class AsyncIterableFileLike:
     def __init__(self, stream):
         self._iter = stream.__aiter__()
+        self._buffer = bytearray()

     async def read(self, n=-1):
         if n == 0:
             return b""
-        try:
-            chunk = await anext(self._iter)
-            return chunk.encode("utf-8") if isinstance(chunk, str) else chunk
-        except StopAsyncIteration:
-            return b""
+        if n < 0:
+            data = bytearray(self._buffer)
+            self._buffer.clear()
+            while True:
+                try:
+                    chunk = await anext(self._iter)
+                except StopAsyncIteration:
+                    return bytes(data)
+                if isinstance(chunk, str):
+                    chunk = chunk.encode("utf-8")
+                data.extend(chunk)
+
+        while len(self._buffer) < n:
+            try:
+                chunk = await anext(self._iter)
+            except StopAsyncIteration:
+                break
+            if isinstance(chunk, str):
+                chunk = chunk.encode("utf-8")
+            self._buffer.extend(chunk)
+
+        out = bytes(self._buffer[:n])
+        del self._buffer[:n]
+        return out

For ijson 3.5.0, what is the required contract for async file-like read(size) in parse_async, including behavior for read(0), size-limited reads, and EOF?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 14 - 21, The AsyncIterableFileLike.read
method currently returns a single upstream chunk which can exceed the requested
size; change read to honor ijson/typical file-like semantics: read(0) must
return b"", read(n>0) must return up to n bytes (never more) by accumulating
upstream chunks from self._iter into an internal bytes buffer until at least n
bytes are available or the iterator is exhausted, slicing off and returning
exactly n bytes (or fewer at EOF), converting str chunks via .encode("utf-8") as
you already do; preserve existing behavior for n < 0 (treat as unlimited read by
returning all remaining data) and ensure EOF still returns b"" — add an internal
buffer attribute (e.g., self._buffer) and use anext(self._iter) to refill it as
needed in AsyncIterableFileLike.read.

src/jsontap/main.py (2)

11-14: ⚠️ Potential issue | 🟡 Minor

Validate chunk_size in FakeChatCompletion.__init__.

Line [11]-Line [14] allow chunk_size <= 0, which can make iteration non-terminating or regress index movement.

💡 Proposed fix

     def __init__(self, data: str, chunk_size: int = 1):
+        if not isinstance(chunk_size, int):
+            raise TypeError("chunk_size must be an int")
+        if chunk_size <= 0:
+            raise ValueError("chunk_size must be > 0")
         self._data = data
         self._chunk_size = chunk_size
         self._i = 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 11 - 14, FakeChatCompletion.__init__
currently accepts chunk_size <= 0 which can break iteration; add validation at
the start of FakeChatCompletion.__init__ to raise a ValueError (or TypeError if
non-int) when chunk_size is not an int > 0, e.g. check isinstance(chunk_size,
int) and chunk_size > 0, and keep existing assignments to self._chunk_size,
self._data, and self._i only after validation so iteration logic using
_chunk_size cannot enter a non-terminating state.

33-34: ⚠️ Potential issue | 🟠 Major

Don’t discard the parse task; observe completion and exceptions.

Line [33]-Line [34] fire-and-forget asyncio.create_task(...). Parser failures can be lost, and path futures may never resolve.

💡 Proposed fix

 async def main():
     def jsontap(s):
         store = Store()
         parser = AsyncParser(s, store)
-        asyncio.create_task(parser.parse_value(()))
-        return parser.parse()
+        parse_task = asyncio.create_task(parser.parse_value(()))
+        return parser.parse(), parse_task
@@
-    parser = jsontap(stream)
-    print(await parser["user"]["name"])
-    print(await parser["user"]["age"])
-    print(await parser["user"]["scores"][1])
-    print(await parser["user"]["children"][0]["name"])
-    print(await parser["user"]["children"][1]["age"])
-    print(await parser["user"]["address"]["state"])
-    print(await parser["user"]["address"]["city"])
-    print(await parser["user"]["address"]["zip"])
-    print(await parser["user"]["address"])
-    print(await parser["user"])
+    parser, parse_task = jsontap(stream)
+    try:
+        print(await parser["user"]["name"])
+        print(await parser["user"]["age"])
+        print(await parser["user"]["scores"][1])
+        print(await parser["user"]["children"][0]["name"])
+        print(await parser["user"]["children"][1]["age"])
+        print(await parser["user"]["address"]["state"])
+        print(await parser["user"]["address"]["city"])
+        print(await parser["user"]["address"]["zip"])
+        print(await parser["user"]["address"])
+        print(await parser["user"])
+        await parse_task
+    finally:
+        if not parse_task.done():
+            parse_task.cancel()

#!/bin/bash
# Verify fire-and-forget task creation sites.
rg -nP 'asyncio\.create_task\(' src/jsontap/main.py -C2

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 33 - 34, The code currently
fire-and-forgets asyncio.create_task(parser.parse_value(())) so parse failures
can be lost and path futures may never resolve; change this to capture the Task
(e.g., task = asyncio.create_task(parser.parse_value(()))) and either await that
task before returning from the function that calls parser.parse(), or attach a
done callback that logs and re-raises exceptions so they propagate to
parser.parse(); update callers around parser.parse_value and parser.parse to
ensure the created task is observed and exceptions handled (refer to
parser.parse_value and parser.parse).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/jsontap/main.py`:
- Around line 11-14: FakeChatCompletion.__init__ currently accepts chunk_size <=
0 which can break iteration; add validation at the start of
FakeChatCompletion.__init__ to raise a ValueError (or TypeError if non-int) when
chunk_size is not an int > 0, e.g. check isinstance(chunk_size, int) and
chunk_size > 0, and keep existing assignments to self._chunk_size, self._data,
and self._i only after validation so iteration logic using _chunk_size cannot
enter a non-terminating state.
- Around line 33-34: The code currently fire-and-forgets
asyncio.create_task(parser.parse_value(())) so parse failures can be lost and
path futures may never resolve; change this to capture the Task (e.g., task =
asyncio.create_task(parser.parse_value(()))) and either await that task before
returning from the function that calls parser.parse(), or attach a done callback
that logs and re-raises exceptions so they propagate to parser.parse(); update
callers around parser.parse_value and parser.parse to ensure the created task is
observed and exceptions handled (refer to parser.parse_value and parser.parse).

In `@src/jsontap/parser.py`:
- Around line 36-83: The parser currently recurses indefinitely in parse_value,
parse_object, and parse_array and needs a max-depth guard to prevent
RecursionError; add a configurable depth limit (e.g. self._max_depth) and a
current depth counter (pass an incremented depth arg or use an internal stack
counter) and check it before any recursive descent in parse_value, parse_object,
and parse_array, raising a clear exception (or returning an explicit error) when
depth > self._max_depth; ensure the top-level entrypoint initializes the depth
(or uses 0) so all recursive calls (references: parse_value, parse_object,
parse_array) increment/check the counter consistently.
- Around line 14-21: The AsyncIterableFileLike.read method currently returns a
single upstream chunk which can exceed the requested size; change read to honor
ijson/typical file-like semantics: read(0) must return b"", read(n>0) must
return up to n bytes (never more) by accumulating upstream chunks from
self._iter into an internal bytes buffer until at least n bytes are available or
the iterator is exhausted, slicing off and returning exactly n bytes (or fewer
at EOF), converting str chunks via .encode("utf-8") as you already do; preserve
existing behavior for n < 0 (treat as unlimited read by returning all remaining
data) and ensure EOF still returns b"" — add an internal buffer attribute (e.g.,
self._buffer) and use anext(self._iter) to refill it as needed in
AsyncIterableFileLike.read.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef565ad and 4a83870.

📒 Files selected for processing (3)

.gitignore
src/jsontap/main.py
src/jsontap/parser.py

✅ Files skipped from review due to trivial changes (1)

.gitignore

re-impl

dc8852f

reimpl

a536473

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

fhalde added 4 commits March 3, 2026 16:13

reimpl

8ae89f4

reimpl

98ce57e

reimpl

e2089d5

reimpl

ef565ad

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

reimpl

1065888

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

reimpl

4a83870

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

fhalde added 3 commits March 3, 2026 22:58

reimpl

73b6240

reimpl

524e53e

reimpl

480d9de

fhalde force-pushed the impl branch from a40f64b to 480d9de Compare March 3, 2026 23:21

reimpl

844699e

Conversation

fhalde commented Mar 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Required contract for the async source

Where this is documented/implemented

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fhalde commented Mar 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading