Skip to content

Internal refactoring: replacing iterative logic with a recursive implementation.#7

Open
fhalde wants to merge 12 commits intomainfrom
impl
Open

Internal refactoring: replacing iterative logic with a recursive implementation.#7
fhalde wants to merge 12 commits intomainfrom
impl

Conversation

@fhalde
Copy link
Owner

@fhalde fhalde commented Mar 3, 2026

image

Summary by CodeRabbit

  • New Features

    • Asynchronous streaming JSON parsing with awaitable, path-based access to parsed values.
    • In-memory, path-keyed store to deliver parsed values to awaiters.
    • Small demo entrypoint that simulates chunked streaming for local testing.
  • Chores

    • Added attr/attrs to project dependencies.
    • Updated .gitignore to ignore editor settings.

@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds asynchronous streaming JSON parsing: an async iterator adapter and AsyncParser using ijson.parse_async, a Future-backed in-memory Store with path-keyed NodeState, an awaitable AsyncJsonNode frontend, dependency additions, and a demo runner simulating chunked streams.

Changes

Cohort / File(s) Summary
Async parser & stream adapter
src/jsontap/parser.py
Adds AsyncIterableFileLike (adapts an async iterator to a read(n) interface) and AsyncParser that drives ijson.parse_async, with async methods parse, parse_value, parse_object, and parse_array, and a private _result cache.
In-memory Future store
src/jsontap/store.py
Adds Path type alias, NodeState (attrs) containing an asyncio.Future and optional exception, and Store that lazily provides NodeState per path and resolves futures via set.
Awaitable frontend node
src/jsontap/frontend.py
Adds AsyncJsonNode, holding a path and Store; supports __getitem__ for path chaining and __await__ to await the path's future; implements __str__/__repr__.
Demo / CLI & helpers
src/jsontap/main.py
Adds FakeChatCompletion (async iterator yielding chunks), an async main() demonstrating wiring Store with AsyncParser and parsing a streamed JSON, and an if __name__ == "__main__" runner.
Project deps
pyproject.toml
Adds attr>=0.3.2 and attrs>=25.4.0 to project dependencies (retains ijson>=3.5.0).
Ignore rules
.gitignore
Adds .vscode to .gitignore.

Sequence Diagram

sequenceDiagram
    participant Stream
    participant AsyncIterableFileLike
    participant AsyncParser
    participant ijson
    participant Store
    participant AsyncJsonNode

    Stream->>AsyncIterableFileLike: yield chunk
    AsyncIterableFileLike->>AsyncParser: read(n) -> bytes
    AsyncParser->>ijson: parse_async(stream)
    ijson->>AsyncParser: emit events (start_map, map_key, value, end_map, ...)
    loop handle events
        AsyncParser->>AsyncParser: parse_value / parse_object / parse_array
        AsyncParser->>Store: set(path, value)
        Store->>Store: resolve future[path]
    end
    AsyncJsonNode->>Store: __await__ -> future for path
    Store-->>AsyncJsonNode: future resolves with parsed value
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇
Chunks of JSON tumble down the stream,
Futures wake and stitch the data seam,
Parsers hop through trees and nested lanes,
AsyncNodes wait — the values come like rains,
A rabbit applauds: small hops, bright dreams.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title claims 'replacing iterative logic with a recursive implementation' but the changeset adds new async streaming JSON parsing classes (AsyncParser, AsyncIterableFileLike, AsyncJsonNode, Store) with no evidence of iterative-to-recursive refactoring of existing code. Revise the title to accurately reflect the main change: adding asynchronous streaming JSON parsing support. For example: 'Add asynchronous streaming JSON parser with AsyncParser and related classes.'
Docstring Coverage ⚠️ Warning Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch impl

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
src/jsontap/parser.py (1)

90-111: Consider moving demo streaming code to docs/tests.

Keeping example() in the runtime module mixes library and demo concerns; moving it out keeps the module surface cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 90 - 111, Remove the demo streaming
example from the runtime module by extracting the example() function (including
nested simulate_stream and main) out of src/jsontap/parser.py; instead create a
separate docs or tests file that imports AsyncParser and demonstrates usage by
calling AsyncParser(simulate_stream(...)), await p.parse_value(()), and
inspecting p._result—leave the parser implementation (AsyncParser, parse_value)
untouched and ensure any helpers like simulate_stream are relocated to the new
demo/test file.
src/jsontap/json_node.py (1)

5-10: Prefer encapsulated fields for path/store to keep node API consistent and safer.

Using public mutable attributes here makes accidental reassignment easy and diverges from the method-style path access used in src/jsontap/core.py.

💡 Proposed refactor
 class AsyncJsonNode:
     def __init__(self, path: tuple[str | int, ...], store: Store):
-        self.path = path
-        self.store = store
+        self._path = path
+        self._store = store
+
+    `@property`
+    def path(self) -> tuple[str | int, ...]:
+        return self._path
 
     def __getitem__(self, key: str | int) -> "AsyncJsonNode":
-        return AsyncJsonNode(path=(*self.path, key), store=self.store)
+        return AsyncJsonNode(path=(*self._path, key), store=self._store)
 
     def __await__(self):
-        return self.store.getdefault(self.path).__await__()
+        return self._store.getdefault(self._path).__await__()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/json_node.py` around lines 5 - 10, Switch the public mutable
attributes to encapsulated private fields and expose read-only properties:
rename self.path and self.store to self._path and self._store in the __init__ of
the JsonNode class, add `@property` methods path(self) -> tuple[...] and
store(self) -> Store that return the internal values, and update __getitem__ to
reference self._path/self._store (or use the new properties) when constructing
the returned AsyncJsonNode (keep AsyncJsonNode(path=(*self.path, key),
store=self.store) semantics but source the values from the private
fields/properties). This preserves the external API while preventing accidental
reassignment of path/store.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/jsontap/parser.py`:
- Around line 32-76: The parser uses recursive calls in parse_value,
parse_object, and parse_array which can hit Python recursion limits for deeply
nested JSON; add an explicit depth guard by threading a depth counter (e.g., a
parameter like depth: int defaulting to 0 or an instance attribute like
self._depth) into parse_value, parse_object, and parse_array, incrementing it on
each recursive descent and checking against a configurable max (e.g.,
self._max_depth) before recursing; when the max is exceeded raise a clear
exception (or return a controlled error) instead of recursing further; update
all recursive call sites (parse_value -> parse_object/parse_array, parse_object
calling parse_value, parse_array calling parse_object/parse_array) to pass the
incremented depth and initialize/configure self._max_depth in the parser
constructor.
- Around line 11-18: The async read method currently returns whole chunks from
self._iter which can exceed the requested n; change it to maintain an internal
byte buffer (e.g., self._buffer: bytes) and implement logic in read(self, n=-1)
to: if n==0 return b""; if n<0 drain the iterator appending each chunk (encoding
str to utf-8) to the buffer and return all bytes; otherwise ensure the buffer
has at least n bytes by repeatedly awaiting anext(self._iter) and appending
encoded chunks until buffer length >= n or iterator is exhausted, then return
buffer[:n] and keep the remainder in self._buffer; on StopAsyncIteration return
whatever is left in the buffer or b""; update any initialization to create
self._buffer = b"" and use read, self._iter, and StopAsyncIteration handling as
described.

In `@src/jsontap/store.py`:
- Around line 14-15: The current Store.set calls
self.getdefault(path).set_result(value) which raises asyncio.InvalidStateError
if the future is already resolved; modify Store.set to fetch the future via f =
self.getdefault(path) and either check f.done() before calling
f.set_result(value) or wrap set_result in a try/except that catches
asyncio.InvalidStateError (or concurrent.futures.InvalidStateError) and silently
ignore or optionally log the duplicate write; this change touches Store.set and
references getdefault and set_result so you can locate and update that logic
accordingly.

---

Nitpick comments:
In `@src/jsontap/json_node.py`:
- Around line 5-10: Switch the public mutable attributes to encapsulated private
fields and expose read-only properties: rename self.path and self.store to
self._path and self._store in the __init__ of the JsonNode class, add `@property`
methods path(self) -> tuple[...] and store(self) -> Store that return the
internal values, and update __getitem__ to reference self._path/self._store (or
use the new properties) when constructing the returned AsyncJsonNode (keep
AsyncJsonNode(path=(*self.path, key), store=self.store) semantics but source the
values from the private fields/properties). This preserves the external API
while preventing accidental reassignment of path/store.

In `@src/jsontap/parser.py`:
- Around line 90-111: Remove the demo streaming example from the runtime module
by extracting the example() function (including nested simulate_stream and main)
out of src/jsontap/parser.py; instead create a separate docs or tests file that
imports AsyncParser and demonstrates usage by calling
AsyncParser(simulate_stream(...)), await p.parse_value(()), and inspecting
p._result—leave the parser implementation (AsyncParser, parse_value) untouched
and ensure any helpers like simulate_stream are relocated to the new demo/test
file.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae4b4ac and dc8852f.

📒 Files selected for processing (3)
  • src/jsontap/json_node.py
  • src/jsontap/parser.py
  • src/jsontap/store.py

Comment on lines +11 to +18
async def read(self, n=-1):
if n == 0:
return b""
try:
chunk = await anext(self._iter)
return chunk.encode("utf-8") if isinstance(chunk, str) else chunk
except StopAsyncIteration:
return b""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "*.py" | head -20

Repository: fhalde/jsontap

Length of output: 221


🏁 Script executed:

cat -n src/jsontap/parser.py | head -50

Repository: fhalde/jsontap

Length of output: 1833


🏁 Script executed:

rg "ijson" --type py

Repository: fhalde/jsontap

Length of output: 489


🌐 Web query:

ijson parse_async documentation async file-like read(size) contract

💡 Result:

ijson.parse_async() (and the other *_async entry points) expect an async file-like object whose read method is awaitable and follows the same basic semantics as a normal binary file’s read(size):

Required contract for the async source

  • Must have read(size) where read is an async function (await source.read(...)). The library detects “async file” specifically by checking that read is awaitable. [2]
  • read(size) must accept the size argument ijson passes (default buf_size=64*1024), and it will be called repeatedly as await read(buf_size). [3]
  • read(0) must be supported: ijson calls await source.read(0) once to determine whether the stream yields bytes or str. [3]
  • Return type must be either:
    • bytes (preferred), or
    • str (ijson will UTF-8 encode it internally, via an async UTF-8 wrapper, and emits a warning). [1] [3]
  • EOF must be signaled by returning an empty/falsy chunk (b'' or ''), same as normal file objects. (ijson stops when the returned chunk is falsy). [3]

Where this is documented/implemented

  • README states the (sync) “file-like object” requirement: a read(size) method returning bytes (preferably) or str, and that the same APIs work with async file-like objects. [1]
  • The async implementation in utils35.py shows the concrete behavior: probing with read(0), then looping on await read(buf_size) and feeding chunks into the parser pipeline until EOF. [3]

Sources: GitHub README and source for ijson. [1] [3] plus async-file detection helper in common.py. [2]


Honor the async read(size) contract in the ijson adapter.

read(n) currently returns a single chunk as-is, which can exceed n and violates the file-like read(size) semantics. This breaks ijson's buffering expectations with ijson.parse_async(). Implement buffering to cap reads to n bytes:

💡 Proposed fix (buffer and cap reads to n)
 class AsyncIterableFileLike:
     def __init__(self, stream):
         self._iter = stream.__aiter__()
+        self._buffer = bytearray()
 
     async def read(self, n=-1):
         if n == 0:
             return b""
-        try:
-            chunk = await anext(self._iter)
-            return chunk.encode("utf-8") if isinstance(chunk, str) else chunk
-        except StopAsyncIteration:
-            return b""
+        if n < 0:
+            # Keep streaming behavior but normalize type.
+            try:
+                chunk = await anext(self._iter)
+            except StopAsyncIteration:
+                return b""
+            return chunk.encode("utf-8") if isinstance(chunk, str) else bytes(chunk)
+
+        while len(self._buffer) < n:
+            try:
+                chunk = await anext(self._iter)
+            except StopAsyncIteration:
+                break
+            if isinstance(chunk, str):
+                chunk = chunk.encode("utf-8")
+            self._buffer.extend(chunk)
+
+        out = bytes(self._buffer[:n])
+        del self._buffer[:n]
+        return out
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 11 - 18, The async read method currently
returns whole chunks from self._iter which can exceed the requested n; change it
to maintain an internal byte buffer (e.g., self._buffer: bytes) and implement
logic in read(self, n=-1) to: if n==0 return b""; if n<0 drain the iterator
appending each chunk (encoding str to utf-8) to the buffer and return all bytes;
otherwise ensure the buffer has at least n bytes by repeatedly awaiting
anext(self._iter) and appending encoded chunks until buffer length >= n or
iterator is exhausted, then return buffer[:n] and keep the remainder in
self._buffer; on StopAsyncIteration return whatever is left in the buffer or
b""; update any initialization to create self._buffer = b"" and use read,
self._iter, and StopAsyncIteration handling as described.

Comment on lines +32 to +76
async def parse_value(self, prefix: tuple[str | int, ...]) -> Any:
_, event, value = await self._next_event()
# recurse into object handling
if event == "start_map":
node = await self.parse_object(prefix)
self._result[prefix] = node
return node
# recurse into array handling
elif event == "start_array":
node = await self.parse_array(prefix)
self._result[prefix] = node
return node
# primitives
else:
self._result[prefix] = value
return value

async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]:
obj = {}
while True:
_, event, value = await self._next_event()
# go back to parse_value for handling this key's value
if event == "map_key":
obj[value] = await self.parse_value((*prefix, value))
# done with this object, return the result
elif event == "end_map":
break

self._result[prefix] = obj
return obj

async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]:
arr = []
index = 0
while True:
_, event, value = await self._next_event()

# recurse into object handling
if event == "start_map":
arr.append(await self.parse_object((*prefix, index)))
index += 1
# recurse into array handling
elif event == "start_array":
arr.append(await self.parse_array((*prefix, index)))
index += 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard recursive descent depth to avoid RecursionError on deeply nested JSON.

This parser now relies on recursive calls across object/array/value parsing. Deeply nested payloads can crash at runtime near Python’s recursion limit.

💡 Proposed fix (explicit depth guard)
 class AsyncParser:
-    def __init__(self, stream):
+    def __init__(self, stream, max_depth: int = 900):
         self._events = ijson.parse_async(AsyncIterableFileLike(stream))
         self._result = {}
+        self._max_depth = max_depth
@@
-    async def parse(self):
-        return await self.parse_value(())
+    async def parse(self):
+        return await self.parse_value((), 0)
 
-    async def parse_value(self, prefix: tuple[str | int, ...]) -> Any:
+    async def parse_value(self, prefix: tuple[str | int, ...], depth: int) -> Any:
+        if depth > self._max_depth:
+            raise ValueError("Maximum JSON nesting depth exceeded")
         _, event, value = await self._next_event()
@@
-            node = await self.parse_object(prefix)
+            node = await self.parse_object(prefix, depth + 1)
@@
-            node = await self.parse_array(prefix)
+            node = await self.parse_array(prefix, depth + 1)
@@
-    async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]:
+    async def parse_object(self, prefix: tuple[str | int, ...], depth: int) -> dict[str | int, Any]:
@@
-                obj[value] = await self.parse_value((*prefix, value))
+                obj[value] = await self.parse_value((*prefix, value), depth + 1)
@@
-    async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]:
+    async def parse_array(self, prefix: tuple[str | int, ...], depth: int) -> list[Any]:
@@
-                arr.append(await self.parse_object((*prefix, index)))
+                arr.append(await self.parse_object((*prefix, index), depth + 1))
@@
-                arr.append(await self.parse_array((*prefix, index)))
+                arr.append(await self.parse_array((*prefix, index), depth + 1))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 32 - 76, The parser uses recursive calls
in parse_value, parse_object, and parse_array which can hit Python recursion
limits for deeply nested JSON; add an explicit depth guard by threading a depth
counter (e.g., a parameter like depth: int defaulting to 0 or an instance
attribute like self._depth) into parse_value, parse_object, and parse_array,
incrementing it on each recursive descent and checking against a configurable
max (e.g., self._max_depth) before recursing; when the max is exceeded raise a
clear exception (or return a controlled error) instead of recursing further;
update all recursive call sites (parse_value -> parse_object/parse_array,
parse_object calling parse_value, parse_array calling parse_object/parse_array)
to pass the incremented depth and initialize/configure self._max_depth in the
parser constructor.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 19-20: Remove the unused and conflicting dependency "attr>=0.3.2"
from pyproject.toml: keep only "attrs>=25.4.0", then regenerate your
lock/install (e.g., poetry lock && poetry install) and run tests; verify imports
like "from attrs import define, field" in src/jsontap/store.py still resolve to
the attrs package and there are no remaining references to the legacy "attr"
package.

In `@src/jsontap/main.py`:
- Around line 32-37: The background parse task created with
asyncio.create_task(parser.parse_value(())) is fire-and-forget and can swallow
exceptions; change it to keep a reference (e.g., parse_task =
asyncio.create_task(parser.parse_value(()))) and ensure its exceptions are
observed before awaiting parser["name"] — either await parse_task (or await
asyncio.shield/asyncio.gather with return_exceptions=False) or add a done
callback to log/raise exceptions; update the jsontap usage around
SimulatedStream and parser.parse() so the parse task is never discarded and any
errors from parser.parse_value are propagated to the caller.

In `@src/jsontap/store.py`:
- Around line 27-29: The raise uses self.nodes[path].value but NodeState has no
value attribute, causing an AttributeError; update the error message to
reference an existing representation — e.g., use getattr(self.nodes[path],
"value", repr(self.nodes[path])) or replace .value with the actual NodeState
attribute name (NodeState) used to hold the stored value (for example .state or
.data) so the ValueError shows the current node contents; adjust the raise in
the method that performs the duplicate-write check to use that safe/reference
attribute instead.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a536473 and e2089d5.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • pyproject.toml
  • src/jsontap/frontend.py
  • src/jsontap/main.py
  • src/jsontap/parser.py
  • src/jsontap/store.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/jsontap/parser.py

Comment on lines +19 to +20
"attr>=0.3.2",
"attrs>=25.4.0",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether legacy attr namespace is used anywhere in the repo
echo "=== Searching for attr imports and usage ==="
rg -n -C2 --type=py '^\s*(import\s+attr\b|from\s+attr\s+import\b|\battr\.)'

echo ""
echo "=== Checking if attr is used in any context (broader search) ==="
rg -n --type=py '\battr\b' | grep -v 'attrs' | head -30

Repository: fhalde/jsontap

Length of output: 168


🏁 Script executed:

#!/bin/bash
echo "=== Reading pyproject.toml ==="
cat -n pyproject.toml | head -50

echo ""
echo "=== Checking if attr package info in comments or requirements ==="
rg -n 'attr' pyproject.toml

Repository: fhalde/jsontap

Length of output: 1296


🏁 Script executed:

#!/bin/bash
echo "=== Searching for attrs usage in Python code ==="
rg -n --type=py 'attrs' | head -20

echo ""
echo "=== Searching for imports of attrs or attr-related classes ==="
rg -n --type=py '(import attrs|from attrs|@attrs|@attr)' | head -20

echo ""
echo "=== Looking for any Python files to understand the project ==="
find . -name '*.py' -type f | head -10

Repository: fhalde/jsontap

Length of output: 529


🏁 Script executed:

#!/bin/bash
echo "=== Checking if there are any lock files or requirements ==="
find . -name 'requirements*.txt' -o -name 'uv.lock' -o -name 'poetry.lock' -o -name 'Pipfile*' 2>/dev/null

echo ""
echo "=== Checking the actual store.py file to confirm attrs usage ==="
cat -n src/jsontap/store.py | head -20

echo ""
echo "=== Searching for attr in all files (not just Python) ==="
rg -i 'attr' --type-list py txt toml | head -30

Repository: fhalde/jsontap

Length of output: 1255


🏁 Script executed:

#!/bin/bash
echo "=== Checking uv.lock for attr and attrs packages ==="
rg 'name = "(attr|attrs)"' uv.lock -A 5 | head -40

Repository: fhalde/jsontap

Length of output: 1752


Remove attr>=0.3.2 from dependencies—it is unused and conflicts with attrs.

The codebase only imports and uses attrs (specifically from attrs import define, field in src/jsontap/store.py). The attr package is never referenced anywhere in the project. Since attr is the legacy, unmaintained package (last updated July 2022) and attrs is the modern successor, keeping both creates unnecessary namespace collision risk and increases the supply-chain surface area without providing any value.

🔧 Proposed fix
 dependencies = [
-    "attr>=0.3.2",
     "attrs>=25.4.0",
     "ijson>=3.5.0",
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"attr>=0.3.2",
"attrs>=25.4.0",
dependencies = [
"attrs>=25.4.0",
"ijson>=3.5.0",
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 19 - 20, Remove the unused and conflicting
dependency "attr>=0.3.2" from pyproject.toml: keep only "attrs>=25.4.0", then
regenerate your lock/install (e.g., poetry lock && poetry install) and run
tests; verify imports like "from attrs import define, field" in
src/jsontap/store.py still resolve to the attrs package and there are no
remaining references to the legacy "attr" package.

Comment on lines +32 to +37
asyncio.create_task(parser.parse_value(()))
return parser.parse()

stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)
parser = jsontap(stream)
print(await parser["name"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/jsontap/main.py | head -50

Repository: fhalde/jsontap

Length of output: 1387


🏁 Script executed:

cat -n src/jsontap/parser.py | head -100

Repository: fhalde/jsontap

Length of output: 3952


🏁 Script executed:

rg -n "class AsyncJsonNode" -A 20 --type=py

Repository: fhalde/jsontap

Length of output: 1916


🏁 Script executed:

rg -n "class Store" -A 30 --type=py

Repository: fhalde/jsontap

Length of output: 1039


🏁 Script executed:

rg -n 'asyncio\.create_task\(' --type=py -A 2

Repository: fhalde/jsontap

Length of output: 356


🏁 Script executed:

rg -n "_task" -B 5 -A 5 --type=py src/jsontap/core.py

Repository: fhalde/jsontap

Length of output: 389


🏁 Script executed:

cat -n src/jsontap/store.py

Repository: fhalde/jsontap

Length of output: 1171


🏁 Script executed:

cat -n src/jsontap/parser.py | tail -20

Repository: fhalde/jsontap

Length of output: 889


🏁 Script executed:

rg -n "try|except|raise" -A 2 src/jsontap/parser.py

Repository: fhalde/jsontap

Length of output: 269


🏁 Script executed:

rg -n "set_exception|exception" --type=py src/jsontap/

Repository: fhalde/jsontap

Length of output: 1464


🏁 Script executed:

find . -name "*.py" -type f -exec grep -l "jsontap\|AsyncParser" {} \; | head -10

Repository: fhalde/jsontap

Length of output: 193


🏁 Script executed:

cat -n tests/test_jsontap.py | head -100

Repository: fhalde/jsontap

Length of output: 3603


🏁 Script executed:

cat -n examples/run.py

Repository: fhalde/jsontap

Length of output: 5906


Retain the parser task and handle exceptions; fire-and-forget parsing loses errors.

Line 32 creates a background task without retaining it. If parse_value() raises an exception, it will be silently lost. If an error occurs before the expected path is resolved, line 37 will block indefinitely with no error indication.

Suggested fix
async def main():
    def jsontap(s):
        store = Store()
        parser = AsyncParser(s, store)
-       asyncio.create_task(parser.parse_value(()))
-       return parser.parse()
+       parse_task = asyncio.create_task(parser.parse_value(()))
+       return parser.parse(), parse_task

    stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)
-   parser = jsontap(stream)
-   print(await parser["name"])
+   parser, parse_task = jsontap(stream)
+   try:
+       print(await parser["name"])
+       await parse_task
+   finally:
+       if not parse_task.done():
+           parse_task.cancel()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
asyncio.create_task(parser.parse_value(()))
return parser.parse()
stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)
parser = jsontap(stream)
print(await parser["name"])
parse_task = asyncio.create_task(parser.parse_value(()))
return parser.parse(), parse_task
stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)
parser, parse_task = jsontap(stream)
try:
print(await parser["name"])
await parse_task
finally:
if not parse_task.done():
parse_task.cancel()
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 32-32: Store a reference to the return value of asyncio.create_task

(RUF006)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 32 - 37, The background parse task created
with asyncio.create_task(parser.parse_value(())) is fire-and-forget and can
swallow exceptions; change it to keep a reference (e.g., parse_task =
asyncio.create_task(parser.parse_value(()))) and ensure its exceptions are
observed before awaiting parser["name"] — either await parse_task (or await
asyncio.shield/asyncio.gather with return_exceptions=False) or add a done
callback to log/raise exceptions; update the jsontap usage around
SimulatedStream and parser.parse() so the parse task is never discarded and any
errors from parser.parse_value are propagated to the caller.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/jsontap/main.py (1)

32-37: ⚠️ Potential issue | 🟠 Major

Retain and observe the parse task; current fire-and-forget can hide failures.

Line 32 drops the create_task handle, so parse exceptions may be unobserved, and Line 37 can wait indefinitely if "name" is never produced due to upstream parse failure.

Suggested fix
-import asyncio
+import asyncio
+import contextlib
@@
     def jsontap(s):
         store = Store()
         parser = AsyncParser(s, store)
-        asyncio.create_task(parser.parse_value(()))
-        return parser.parse()
+        parse_task = asyncio.create_task(parser.parse_value(()))
+        return parser.parse(), parse_task
@@
-    parser = jsontap(stream)
-    print(await parser["name"])
+    parser, parse_task = jsontap(stream)
+    try:
+        print(await parser["name"])
+        await parse_task
+    finally:
+        if not parse_task.done():
+            parse_task.cancel()
+            with contextlib.suppress(asyncio.CancelledError):
+                await parse_task
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 32 - 37, The code currently fires
parser.parse_value via asyncio.create_task without keeping the Task, causing
exceptions to be unobserved and leaving parser["name"] awaiting forever on
upstream failures; change the fire-and-forget into an observed task by assigning
asyncio.create_task(parser.parse_value(())) to a variable (e.g., parse_task),
start the parser (ensure parser.parse() is invoked appropriately), and either
await parse_task after you consume parser["name"] or add a done-callback that
logs/propagates exceptions so parse errors are not swallowed; reference the
functions/classes parser.parse_value, parser.parse, jsontap, FakeChatCompletion
and the await of parser["name"] when updating the code.
🧹 Nitpick comments (1)
src/jsontap/main.py (1)

23-24: Avoid unconditional stdout logging in stream iteration.

Line 23 prints raw chunks every iteration. Consider using a debug logger or a verbose flag to avoid noisy output and accidental payload exposure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 23 - 24, Replace the unconditional
print(repr(chunk), flush=True) in the stream loop with a conditional or
debug-level logger; specifically, remove the direct stdout call and either (a)
wrap it behind a verbose flag (e.g., if verbose: print(...)) or (b) use a module
logger (logger = logging.getLogger(__name__) and logger.debug(repr(chunk))).
Keep the await asyncio.sleep(0.1) unchanged; ensure any logging does not
accidentally expose payloads in non-debug modes and add a CLI/config option to
enable verbose/debug output if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/jsontap/main.py`:
- Around line 10-13: The __init__ of the iterator stores chunk_size without
validation which allows chunk_size <= 0 and causes __anext__ to never advance
(or regress); update the Jsontap iterator __init__ to validate that chunk_size
is an int > 0 and raise a ValueError (or TypeError for non-int) if not, ensuring
self._chunk_size is always a positive integer so the advancement logic in
__anext__ uses a safe, terminating step size.

---

Duplicate comments:
In `@src/jsontap/main.py`:
- Around line 32-37: The code currently fires parser.parse_value via
asyncio.create_task without keeping the Task, causing exceptions to be
unobserved and leaving parser["name"] awaiting forever on upstream failures;
change the fire-and-forget into an observed task by assigning
asyncio.create_task(parser.parse_value(())) to a variable (e.g., parse_task),
start the parser (ensure parser.parse() is invoked appropriately), and either
await parse_task after you consume parser["name"] or add a done-callback that
logs/propagates exceptions so parse errors are not swallowed; reference the
functions/classes parser.parse_value, parser.parse, jsontap, FakeChatCompletion
and the await of parser["name"] when updating the code.

---

Nitpick comments:
In `@src/jsontap/main.py`:
- Around line 23-24: Replace the unconditional print(repr(chunk), flush=True) in
the stream loop with a conditional or debug-level logger; specifically, remove
the direct stdout call and either (a) wrap it behind a verbose flag (e.g., if
verbose: print(...)) or (b) use a module logger (logger =
logging.getLogger(__name__) and logger.debug(repr(chunk))). Keep the await
asyncio.sleep(0.1) unchanged; ensure any logging does not accidentally expose
payloads in non-debug modes and add a CLI/config option to enable verbose/debug
output if needed.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2089d5 and ef565ad.

📒 Files selected for processing (1)
  • src/jsontap/main.py

Comment on lines +10 to +13
def __init__(self, data: str, chunk_size: int = 1):
self._data = data
self._chunk_size = chunk_size
self._i = 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate chunk_size to prevent non-terminating iteration.

If chunk_size <= 0, Line 22 does not advance correctly, and __anext__ can loop forever (or regress for negatives).

Suggested fix
     def __init__(self, data: str, chunk_size: int = 1):
+        if chunk_size <= 0:
+            raise ValueError("chunk_size must be > 0")
         self._data = data
         self._chunk_size = chunk_size
         self._i = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 10 - 13, The __init__ of the iterator
stores chunk_size without validation which allows chunk_size <= 0 and causes
__anext__ to never advance (or regress); update the Jsontap iterator __init__ to
validate that chunk_size is an int > 0 and raise a ValueError (or TypeError for
non-int) if not, ensuring self._chunk_size is always a positive integer so the
advancement logic in __anext__ uses a safe, terminating step size.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
src/jsontap/parser.py (2)

36-83: ⚠️ Potential issue | 🟠 Major

Add an explicit recursion depth guard for nested JSON.

Line [36]-Line [83] recursively descends without a max-depth check, so deeply nested payloads can raise RecursionError and terminate parsing unexpectedly.

💡 Proposed fix
 class AsyncParser:
-    def __init__(self, stream, store: Store):
+    def __init__(self, stream, store: Store, max_depth: int = 900):
         self._events = ijson.parse_async(AsyncIterableFileLike(stream))
         self._result = {}
         self._store = store
+        self._max_depth = max_depth
@@
-    async def parse_value(self, prefix: tuple[str | int, ...]) -> Any:
+    async def parse_value(self, prefix: tuple[str | int, ...], depth: int = 0) -> Any:
+        if depth > self._max_depth:
+            raise ValueError("Maximum JSON nesting depth exceeded")
         _, event, value = await self._next_event()
@@
-            node = await self.parse_object(prefix)
+            node = await self.parse_object(prefix, depth + 1)
@@
-            node = await self.parse_array(prefix)
+            node = await self.parse_array(prefix, depth + 1)
@@
-    async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]:
+    async def parse_object(self, prefix: tuple[str | int, ...], depth: int) -> dict[str | int, Any]:
@@
-                obj[value] = await self.parse_value((*prefix, value))
+                obj[value] = await self.parse_value((*prefix, value), depth + 1)
@@
-    async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]:
+    async def parse_array(self, prefix: tuple[str | int, ...], depth: int) -> list[Any]:
@@
-                arr.append(await self.parse_object((*prefix, index)))
+                arr.append(await self.parse_object((*prefix, index), depth + 1))
@@
-                arr.append(await self.parse_array((*prefix, index)))
+                arr.append(await self.parse_array((*prefix, index), depth + 1))
#!/bin/bash
# Verify recursive entry points and absence/presence of a depth guard.
rg -nP 'def parse_value|def parse_object|def parse_array|max_depth|RecursionError' src/jsontap/parser.py -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 36 - 83, The parser currently recurses
indefinitely in parse_value, parse_object, and parse_array and needs a max-depth
guard to prevent RecursionError; add a configurable depth limit (e.g.
self._max_depth) and a current depth counter (pass an incremented depth arg or
use an internal stack counter) and check it before any recursive descent in
parse_value, parse_object, and parse_array, raising a clear exception (or
returning an explicit error) when depth > self._max_depth; ensure the top-level
entrypoint initializes the depth (or uses 0) so all recursive calls (references:
parse_value, parse_object, parse_array) increment/check the counter
consistently.

14-21: ⚠️ Potential issue | 🟠 Major

Honor ijson read(size) semantics in AsyncIterableFileLike.read.

Line [14]-Line [21] return one upstream chunk as-is, so read(n) can exceed n and breaks file-like buffering expectations for ijson.parse_async.

💡 Proposed fix
 class AsyncIterableFileLike:
     def __init__(self, stream):
         self._iter = stream.__aiter__()
+        self._buffer = bytearray()

     async def read(self, n=-1):
         if n == 0:
             return b""
-        try:
-            chunk = await anext(self._iter)
-            return chunk.encode("utf-8") if isinstance(chunk, str) else chunk
-        except StopAsyncIteration:
-            return b""
+        if n < 0:
+            data = bytearray(self._buffer)
+            self._buffer.clear()
+            while True:
+                try:
+                    chunk = await anext(self._iter)
+                except StopAsyncIteration:
+                    return bytes(data)
+                if isinstance(chunk, str):
+                    chunk = chunk.encode("utf-8")
+                data.extend(chunk)
+
+        while len(self._buffer) < n:
+            try:
+                chunk = await anext(self._iter)
+            except StopAsyncIteration:
+                break
+            if isinstance(chunk, str):
+                chunk = chunk.encode("utf-8")
+            self._buffer.extend(chunk)
+
+        out = bytes(self._buffer[:n])
+        del self._buffer[:n]
+        return out
For ijson 3.5.0, what is the required contract for async file-like read(size) in parse_async, including behavior for read(0), size-limited reads, and EOF?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/parser.py` around lines 14 - 21, The AsyncIterableFileLike.read
method currently returns a single upstream chunk which can exceed the requested
size; change read to honor ijson/typical file-like semantics: read(0) must
return b"", read(n>0) must return up to n bytes (never more) by accumulating
upstream chunks from self._iter into an internal bytes buffer until at least n
bytes are available or the iterator is exhausted, slicing off and returning
exactly n bytes (or fewer at EOF), converting str chunks via .encode("utf-8") as
you already do; preserve existing behavior for n < 0 (treat as unlimited read by
returning all remaining data) and ensure EOF still returns b"" — add an internal
buffer attribute (e.g., self._buffer) and use anext(self._iter) to refill it as
needed in AsyncIterableFileLike.read.
src/jsontap/main.py (2)

11-14: ⚠️ Potential issue | 🟡 Minor

Validate chunk_size in FakeChatCompletion.__init__.

Line [11]-Line [14] allow chunk_size <= 0, which can make iteration non-terminating or regress index movement.

💡 Proposed fix
     def __init__(self, data: str, chunk_size: int = 1):
+        if not isinstance(chunk_size, int):
+            raise TypeError("chunk_size must be an int")
+        if chunk_size <= 0:
+            raise ValueError("chunk_size must be > 0")
         self._data = data
         self._chunk_size = chunk_size
         self._i = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 11 - 14, FakeChatCompletion.__init__
currently accepts chunk_size <= 0 which can break iteration; add validation at
the start of FakeChatCompletion.__init__ to raise a ValueError (or TypeError if
non-int) when chunk_size is not an int > 0, e.g. check isinstance(chunk_size,
int) and chunk_size > 0, and keep existing assignments to self._chunk_size,
self._data, and self._i only after validation so iteration logic using
_chunk_size cannot enter a non-terminating state.

33-34: ⚠️ Potential issue | 🟠 Major

Don’t discard the parse task; observe completion and exceptions.

Line [33]-Line [34] fire-and-forget asyncio.create_task(...). Parser failures can be lost, and path futures may never resolve.

💡 Proposed fix
 async def main():
     def jsontap(s):
         store = Store()
         parser = AsyncParser(s, store)
-        asyncio.create_task(parser.parse_value(()))
-        return parser.parse()
+        parse_task = asyncio.create_task(parser.parse_value(()))
+        return parser.parse(), parse_task
@@
-    parser = jsontap(stream)
-    print(await parser["user"]["name"])
-    print(await parser["user"]["age"])
-    print(await parser["user"]["scores"][1])
-    print(await parser["user"]["children"][0]["name"])
-    print(await parser["user"]["children"][1]["age"])
-    print(await parser["user"]["address"]["state"])
-    print(await parser["user"]["address"]["city"])
-    print(await parser["user"]["address"]["zip"])
-    print(await parser["user"]["address"])
-    print(await parser["user"])
+    parser, parse_task = jsontap(stream)
+    try:
+        print(await parser["user"]["name"])
+        print(await parser["user"]["age"])
+        print(await parser["user"]["scores"][1])
+        print(await parser["user"]["children"][0]["name"])
+        print(await parser["user"]["children"][1]["age"])
+        print(await parser["user"]["address"]["state"])
+        print(await parser["user"]["address"]["city"])
+        print(await parser["user"]["address"]["zip"])
+        print(await parser["user"]["address"])
+        print(await parser["user"])
+        await parse_task
+    finally:
+        if not parse_task.done():
+            parse_task.cancel()
#!/bin/bash
# Verify fire-and-forget task creation sites.
rg -nP 'asyncio\.create_task\(' src/jsontap/main.py -C2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/jsontap/main.py` around lines 33 - 34, The code currently
fire-and-forgets asyncio.create_task(parser.parse_value(())) so parse failures
can be lost and path futures may never resolve; change this to capture the Task
(e.g., task = asyncio.create_task(parser.parse_value(()))) and either await that
task before returning from the function that calls parser.parse(), or attach a
done callback that logs and re-raises exceptions so they propagate to
parser.parse(); update callers around parser.parse_value and parser.parse to
ensure the created task is observed and exceptions handled (refer to
parser.parse_value and parser.parse).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/jsontap/main.py`:
- Around line 11-14: FakeChatCompletion.__init__ currently accepts chunk_size <=
0 which can break iteration; add validation at the start of
FakeChatCompletion.__init__ to raise a ValueError (or TypeError if non-int) when
chunk_size is not an int > 0, e.g. check isinstance(chunk_size, int) and
chunk_size > 0, and keep existing assignments to self._chunk_size, self._data,
and self._i only after validation so iteration logic using _chunk_size cannot
enter a non-terminating state.
- Around line 33-34: The code currently fire-and-forgets
asyncio.create_task(parser.parse_value(())) so parse failures can be lost and
path futures may never resolve; change this to capture the Task (e.g., task =
asyncio.create_task(parser.parse_value(()))) and either await that task before
returning from the function that calls parser.parse(), or attach a done callback
that logs and re-raises exceptions so they propagate to parser.parse(); update
callers around parser.parse_value and parser.parse to ensure the created task is
observed and exceptions handled (refer to parser.parse_value and parser.parse).

In `@src/jsontap/parser.py`:
- Around line 36-83: The parser currently recurses indefinitely in parse_value,
parse_object, and parse_array and needs a max-depth guard to prevent
RecursionError; add a configurable depth limit (e.g. self._max_depth) and a
current depth counter (pass an incremented depth arg or use an internal stack
counter) and check it before any recursive descent in parse_value, parse_object,
and parse_array, raising a clear exception (or returning an explicit error) when
depth > self._max_depth; ensure the top-level entrypoint initializes the depth
(or uses 0) so all recursive calls (references: parse_value, parse_object,
parse_array) increment/check the counter consistently.
- Around line 14-21: The AsyncIterableFileLike.read method currently returns a
single upstream chunk which can exceed the requested size; change read to honor
ijson/typical file-like semantics: read(0) must return b"", read(n>0) must
return up to n bytes (never more) by accumulating upstream chunks from
self._iter into an internal bytes buffer until at least n bytes are available or
the iterator is exhausted, slicing off and returning exactly n bytes (or fewer
at EOF), converting str chunks via .encode("utf-8") as you already do; preserve
existing behavior for n < 0 (treat as unlimited read by returning all remaining
data) and ensure EOF still returns b"" — add an internal buffer attribute (e.g.,
self._buffer) and use anext(self._iter) to refill it as needed in
AsyncIterableFileLike.read.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef565ad and 4a83870.

📒 Files selected for processing (3)
  • .gitignore
  • src/jsontap/main.py
  • src/jsontap/parser.py
✅ Files skipped from review due to trivial changes (1)
  • .gitignore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant