Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds asynchronous streaming JSON parsing: an async iterator adapter and AsyncParser using ijson.parse_async, a Future-backed in-memory Store with path-keyed NodeState, an awaitable AsyncJsonNode frontend, dependency additions, and a demo runner simulating chunked streams. Changes
Sequence DiagramsequenceDiagram
participant Stream
participant AsyncIterableFileLike
participant AsyncParser
participant ijson
participant Store
participant AsyncJsonNode
Stream->>AsyncIterableFileLike: yield chunk
AsyncIterableFileLike->>AsyncParser: read(n) -> bytes
AsyncParser->>ijson: parse_async(stream)
ijson->>AsyncParser: emit events (start_map, map_key, value, end_map, ...)
loop handle events
AsyncParser->>AsyncParser: parse_value / parse_object / parse_array
AsyncParser->>Store: set(path, value)
Store->>Store: resolve future[path]
end
AsyncJsonNode->>Store: __await__ -> future for path
Store-->>AsyncJsonNode: future resolves with parsed value
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/jsontap/parser.py (1)
90-111: Consider moving demo streaming code to docs/tests.Keeping
example()in the runtime module mixes library and demo concerns; moving it out keeps the module surface cleaner.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/parser.py` around lines 90 - 111, Remove the demo streaming example from the runtime module by extracting the example() function (including nested simulate_stream and main) out of src/jsontap/parser.py; instead create a separate docs or tests file that imports AsyncParser and demonstrates usage by calling AsyncParser(simulate_stream(...)), await p.parse_value(()), and inspecting p._result—leave the parser implementation (AsyncParser, parse_value) untouched and ensure any helpers like simulate_stream are relocated to the new demo/test file.src/jsontap/json_node.py (1)
5-10: Prefer encapsulated fields for path/store to keep node API consistent and safer.Using public mutable attributes here makes accidental reassignment easy and diverges from the method-style path access used in
src/jsontap/core.py.💡 Proposed refactor
class AsyncJsonNode: def __init__(self, path: tuple[str | int, ...], store: Store): - self.path = path - self.store = store + self._path = path + self._store = store + + `@property` + def path(self) -> tuple[str | int, ...]: + return self._path def __getitem__(self, key: str | int) -> "AsyncJsonNode": - return AsyncJsonNode(path=(*self.path, key), store=self.store) + return AsyncJsonNode(path=(*self._path, key), store=self._store) def __await__(self): - return self.store.getdefault(self.path).__await__() + return self._store.getdefault(self._path).__await__()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/json_node.py` around lines 5 - 10, Switch the public mutable attributes to encapsulated private fields and expose read-only properties: rename self.path and self.store to self._path and self._store in the __init__ of the JsonNode class, add `@property` methods path(self) -> tuple[...] and store(self) -> Store that return the internal values, and update __getitem__ to reference self._path/self._store (or use the new properties) when constructing the returned AsyncJsonNode (keep AsyncJsonNode(path=(*self.path, key), store=self.store) semantics but source the values from the private fields/properties). This preserves the external API while preventing accidental reassignment of path/store.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/jsontap/parser.py`:
- Around line 32-76: The parser uses recursive calls in parse_value,
parse_object, and parse_array which can hit Python recursion limits for deeply
nested JSON; add an explicit depth guard by threading a depth counter (e.g., a
parameter like depth: int defaulting to 0 or an instance attribute like
self._depth) into parse_value, parse_object, and parse_array, incrementing it on
each recursive descent and checking against a configurable max (e.g.,
self._max_depth) before recursing; when the max is exceeded raise a clear
exception (or return a controlled error) instead of recursing further; update
all recursive call sites (parse_value -> parse_object/parse_array, parse_object
calling parse_value, parse_array calling parse_object/parse_array) to pass the
incremented depth and initialize/configure self._max_depth in the parser
constructor.
- Around line 11-18: The async read method currently returns whole chunks from
self._iter which can exceed the requested n; change it to maintain an internal
byte buffer (e.g., self._buffer: bytes) and implement logic in read(self, n=-1)
to: if n==0 return b""; if n<0 drain the iterator appending each chunk (encoding
str to utf-8) to the buffer and return all bytes; otherwise ensure the buffer
has at least n bytes by repeatedly awaiting anext(self._iter) and appending
encoded chunks until buffer length >= n or iterator is exhausted, then return
buffer[:n] and keep the remainder in self._buffer; on StopAsyncIteration return
whatever is left in the buffer or b""; update any initialization to create
self._buffer = b"" and use read, self._iter, and StopAsyncIteration handling as
described.
In `@src/jsontap/store.py`:
- Around line 14-15: The current Store.set calls
self.getdefault(path).set_result(value) which raises asyncio.InvalidStateError
if the future is already resolved; modify Store.set to fetch the future via f =
self.getdefault(path) and either check f.done() before calling
f.set_result(value) or wrap set_result in a try/except that catches
asyncio.InvalidStateError (or concurrent.futures.InvalidStateError) and silently
ignore or optionally log the duplicate write; this change touches Store.set and
references getdefault and set_result so you can locate and update that logic
accordingly.
---
Nitpick comments:
In `@src/jsontap/json_node.py`:
- Around line 5-10: Switch the public mutable attributes to encapsulated private
fields and expose read-only properties: rename self.path and self.store to
self._path and self._store in the __init__ of the JsonNode class, add `@property`
methods path(self) -> tuple[...] and store(self) -> Store that return the
internal values, and update __getitem__ to reference self._path/self._store (or
use the new properties) when constructing the returned AsyncJsonNode (keep
AsyncJsonNode(path=(*self.path, key), store=self.store) semantics but source the
values from the private fields/properties). This preserves the external API
while preventing accidental reassignment of path/store.
In `@src/jsontap/parser.py`:
- Around line 90-111: Remove the demo streaming example from the runtime module
by extracting the example() function (including nested simulate_stream and main)
out of src/jsontap/parser.py; instead create a separate docs or tests file that
imports AsyncParser and demonstrates usage by calling
AsyncParser(simulate_stream(...)), await p.parse_value(()), and inspecting
p._result—leave the parser implementation (AsyncParser, parse_value) untouched
and ensure any helpers like simulate_stream are relocated to the new demo/test
file.
| async def read(self, n=-1): | ||
| if n == 0: | ||
| return b"" | ||
| try: | ||
| chunk = await anext(self._iter) | ||
| return chunk.encode("utf-8") if isinstance(chunk, str) else chunk | ||
| except StopAsyncIteration: | ||
| return b"" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "*.py" | head -20Repository: fhalde/jsontap
Length of output: 221
🏁 Script executed:
cat -n src/jsontap/parser.py | head -50Repository: fhalde/jsontap
Length of output: 1833
🏁 Script executed:
rg "ijson" --type pyRepository: fhalde/jsontap
Length of output: 489
🌐 Web query:
ijson parse_async documentation async file-like read(size) contract
💡 Result:
ijson.parse_async() (and the other *_async entry points) expect an async file-like object whose read method is awaitable and follows the same basic semantics as a normal binary file’s read(size):
Required contract for the async source
- Must have
read(size)wherereadis an async function (await source.read(...)). The library detects “async file” specifically by checking thatreadis awaitable. [2] read(size)must accept thesizeargument ijson passes (defaultbuf_size=64*1024), and it will be called repeatedly asawait read(buf_size). [3]read(0)must be supported: ijson callsawait source.read(0)once to determine whether the stream yieldsbytesorstr. [3]- Return type must be either:
bytes(preferred), orstr(ijson will UTF-8 encode it internally, via an async UTF-8 wrapper, and emits a warning). [1] [3]
- EOF must be signaled by returning an empty/falsy chunk (
b''or''), same as normal file objects. (ijson stops when the returned chunk is falsy). [3]
Where this is documented/implemented
- README states the (sync) “file-like object” requirement: a
read(size)method returningbytes(preferably) orstr, and that the same APIs work with async file-like objects. [1] - The async implementation in
utils35.pyshows the concrete behavior: probing withread(0), then looping onawait read(buf_size)and feeding chunks into the parser pipeline until EOF. [3]
Sources: GitHub README and source for ijson. [1] [3] plus async-file detection helper in common.py. [2]
Honor the async read(size) contract in the ijson adapter.
read(n) currently returns a single chunk as-is, which can exceed n and violates the file-like read(size) semantics. This breaks ijson's buffering expectations with ijson.parse_async(). Implement buffering to cap reads to n bytes:
💡 Proposed fix (buffer and cap reads to n)
class AsyncIterableFileLike:
def __init__(self, stream):
self._iter = stream.__aiter__()
+ self._buffer = bytearray()
async def read(self, n=-1):
if n == 0:
return b""
- try:
- chunk = await anext(self._iter)
- return chunk.encode("utf-8") if isinstance(chunk, str) else chunk
- except StopAsyncIteration:
- return b""
+ if n < 0:
+ # Keep streaming behavior but normalize type.
+ try:
+ chunk = await anext(self._iter)
+ except StopAsyncIteration:
+ return b""
+ return chunk.encode("utf-8") if isinstance(chunk, str) else bytes(chunk)
+
+ while len(self._buffer) < n:
+ try:
+ chunk = await anext(self._iter)
+ except StopAsyncIteration:
+ break
+ if isinstance(chunk, str):
+ chunk = chunk.encode("utf-8")
+ self._buffer.extend(chunk)
+
+ out = bytes(self._buffer[:n])
+ del self._buffer[:n]
+ return out🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/jsontap/parser.py` around lines 11 - 18, The async read method currently
returns whole chunks from self._iter which can exceed the requested n; change it
to maintain an internal byte buffer (e.g., self._buffer: bytes) and implement
logic in read(self, n=-1) to: if n==0 return b""; if n<0 drain the iterator
appending each chunk (encoding str to utf-8) to the buffer and return all bytes;
otherwise ensure the buffer has at least n bytes by repeatedly awaiting
anext(self._iter) and appending encoded chunks until buffer length >= n or
iterator is exhausted, then return buffer[:n] and keep the remainder in
self._buffer; on StopAsyncIteration return whatever is left in the buffer or
b""; update any initialization to create self._buffer = b"" and use read,
self._iter, and StopAsyncIteration handling as described.
| async def parse_value(self, prefix: tuple[str | int, ...]) -> Any: | ||
| _, event, value = await self._next_event() | ||
| # recurse into object handling | ||
| if event == "start_map": | ||
| node = await self.parse_object(prefix) | ||
| self._result[prefix] = node | ||
| return node | ||
| # recurse into array handling | ||
| elif event == "start_array": | ||
| node = await self.parse_array(prefix) | ||
| self._result[prefix] = node | ||
| return node | ||
| # primitives | ||
| else: | ||
| self._result[prefix] = value | ||
| return value | ||
|
|
||
| async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]: | ||
| obj = {} | ||
| while True: | ||
| _, event, value = await self._next_event() | ||
| # go back to parse_value for handling this key's value | ||
| if event == "map_key": | ||
| obj[value] = await self.parse_value((*prefix, value)) | ||
| # done with this object, return the result | ||
| elif event == "end_map": | ||
| break | ||
|
|
||
| self._result[prefix] = obj | ||
| return obj | ||
|
|
||
| async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]: | ||
| arr = [] | ||
| index = 0 | ||
| while True: | ||
| _, event, value = await self._next_event() | ||
|
|
||
| # recurse into object handling | ||
| if event == "start_map": | ||
| arr.append(await self.parse_object((*prefix, index))) | ||
| index += 1 | ||
| # recurse into array handling | ||
| elif event == "start_array": | ||
| arr.append(await self.parse_array((*prefix, index))) | ||
| index += 1 |
There was a problem hiding this comment.
Guard recursive descent depth to avoid RecursionError on deeply nested JSON.
This parser now relies on recursive calls across object/array/value parsing. Deeply nested payloads can crash at runtime near Python’s recursion limit.
💡 Proposed fix (explicit depth guard)
class AsyncParser:
- def __init__(self, stream):
+ def __init__(self, stream, max_depth: int = 900):
self._events = ijson.parse_async(AsyncIterableFileLike(stream))
self._result = {}
+ self._max_depth = max_depth
@@
- async def parse(self):
- return await self.parse_value(())
+ async def parse(self):
+ return await self.parse_value((), 0)
- async def parse_value(self, prefix: tuple[str | int, ...]) -> Any:
+ async def parse_value(self, prefix: tuple[str | int, ...], depth: int) -> Any:
+ if depth > self._max_depth:
+ raise ValueError("Maximum JSON nesting depth exceeded")
_, event, value = await self._next_event()
@@
- node = await self.parse_object(prefix)
+ node = await self.parse_object(prefix, depth + 1)
@@
- node = await self.parse_array(prefix)
+ node = await self.parse_array(prefix, depth + 1)
@@
- async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]:
+ async def parse_object(self, prefix: tuple[str | int, ...], depth: int) -> dict[str | int, Any]:
@@
- obj[value] = await self.parse_value((*prefix, value))
+ obj[value] = await self.parse_value((*prefix, value), depth + 1)
@@
- async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]:
+ async def parse_array(self, prefix: tuple[str | int, ...], depth: int) -> list[Any]:
@@
- arr.append(await self.parse_object((*prefix, index)))
+ arr.append(await self.parse_object((*prefix, index), depth + 1))
@@
- arr.append(await self.parse_array((*prefix, index)))
+ arr.append(await self.parse_array((*prefix, index), depth + 1))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/jsontap/parser.py` around lines 32 - 76, The parser uses recursive calls
in parse_value, parse_object, and parse_array which can hit Python recursion
limits for deeply nested JSON; add an explicit depth guard by threading a depth
counter (e.g., a parameter like depth: int defaulting to 0 or an instance
attribute like self._depth) into parse_value, parse_object, and parse_array,
incrementing it on each recursive descent and checking against a configurable
max (e.g., self._max_depth) before recursing; when the max is exceeded raise a
clear exception (or return a controlled error) instead of recursing further;
update all recursive call sites (parse_value -> parse_object/parse_array,
parse_object calling parse_value, parse_array calling parse_object/parse_array)
to pass the incremented depth and initialize/configure self._max_depth in the
parser constructor.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyproject.toml`:
- Around line 19-20: Remove the unused and conflicting dependency "attr>=0.3.2"
from pyproject.toml: keep only "attrs>=25.4.0", then regenerate your
lock/install (e.g., poetry lock && poetry install) and run tests; verify imports
like "from attrs import define, field" in src/jsontap/store.py still resolve to
the attrs package and there are no remaining references to the legacy "attr"
package.
In `@src/jsontap/main.py`:
- Around line 32-37: The background parse task created with
asyncio.create_task(parser.parse_value(())) is fire-and-forget and can swallow
exceptions; change it to keep a reference (e.g., parse_task =
asyncio.create_task(parser.parse_value(()))) and ensure its exceptions are
observed before awaiting parser["name"] — either await parse_task (or await
asyncio.shield/asyncio.gather with return_exceptions=False) or add a done
callback to log/raise exceptions; update the jsontap usage around
SimulatedStream and parser.parse() so the parse task is never discarded and any
errors from parser.parse_value are propagated to the caller.
In `@src/jsontap/store.py`:
- Around line 27-29: The raise uses self.nodes[path].value but NodeState has no
value attribute, causing an AttributeError; update the error message to
reference an existing representation — e.g., use getattr(self.nodes[path],
"value", repr(self.nodes[path])) or replace .value with the actual NodeState
attribute name (NodeState) used to hold the stored value (for example .state or
.data) so the ValueError shows the current node contents; adjust the raise in
the method that performs the duplicate-write check to use that safe/reference
attribute instead.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
pyproject.tomlsrc/jsontap/frontend.pysrc/jsontap/main.pysrc/jsontap/parser.pysrc/jsontap/store.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/jsontap/parser.py
| "attr>=0.3.2", | ||
| "attrs>=25.4.0", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether legacy attr namespace is used anywhere in the repo
echo "=== Searching for attr imports and usage ==="
rg -n -C2 --type=py '^\s*(import\s+attr\b|from\s+attr\s+import\b|\battr\.)'
echo ""
echo "=== Checking if attr is used in any context (broader search) ==="
rg -n --type=py '\battr\b' | grep -v 'attrs' | head -30Repository: fhalde/jsontap
Length of output: 168
🏁 Script executed:
#!/bin/bash
echo "=== Reading pyproject.toml ==="
cat -n pyproject.toml | head -50
echo ""
echo "=== Checking if attr package info in comments or requirements ==="
rg -n 'attr' pyproject.tomlRepository: fhalde/jsontap
Length of output: 1296
🏁 Script executed:
#!/bin/bash
echo "=== Searching for attrs usage in Python code ==="
rg -n --type=py 'attrs' | head -20
echo ""
echo "=== Searching for imports of attrs or attr-related classes ==="
rg -n --type=py '(import attrs|from attrs|@attrs|@attr)' | head -20
echo ""
echo "=== Looking for any Python files to understand the project ==="
find . -name '*.py' -type f | head -10Repository: fhalde/jsontap
Length of output: 529
🏁 Script executed:
#!/bin/bash
echo "=== Checking if there are any lock files or requirements ==="
find . -name 'requirements*.txt' -o -name 'uv.lock' -o -name 'poetry.lock' -o -name 'Pipfile*' 2>/dev/null
echo ""
echo "=== Checking the actual store.py file to confirm attrs usage ==="
cat -n src/jsontap/store.py | head -20
echo ""
echo "=== Searching for attr in all files (not just Python) ==="
rg -i 'attr' --type-list py txt toml | head -30Repository: fhalde/jsontap
Length of output: 1255
🏁 Script executed:
#!/bin/bash
echo "=== Checking uv.lock for attr and attrs packages ==="
rg 'name = "(attr|attrs)"' uv.lock -A 5 | head -40Repository: fhalde/jsontap
Length of output: 1752
Remove attr>=0.3.2 from dependencies—it is unused and conflicts with attrs.
The codebase only imports and uses attrs (specifically from attrs import define, field in src/jsontap/store.py). The attr package is never referenced anywhere in the project. Since attr is the legacy, unmaintained package (last updated July 2022) and attrs is the modern successor, keeping both creates unnecessary namespace collision risk and increases the supply-chain surface area without providing any value.
🔧 Proposed fix
dependencies = [
- "attr>=0.3.2",
"attrs>=25.4.0",
"ijson>=3.5.0",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "attr>=0.3.2", | |
| "attrs>=25.4.0", | |
| dependencies = [ | |
| "attrs>=25.4.0", | |
| "ijson>=3.5.0", | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pyproject.toml` around lines 19 - 20, Remove the unused and conflicting
dependency "attr>=0.3.2" from pyproject.toml: keep only "attrs>=25.4.0", then
regenerate your lock/install (e.g., poetry lock && poetry install) and run
tests; verify imports like "from attrs import define, field" in
src/jsontap/store.py still resolve to the attrs package and there are no
remaining references to the legacy "attr" package.
src/jsontap/main.py
Outdated
| asyncio.create_task(parser.parse_value(())) | ||
| return parser.parse() | ||
|
|
||
| stream = SimulatedStream('{"name": "Alice"}', chunk_size=3) | ||
| parser = jsontap(stream) | ||
| print(await parser["name"]) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/jsontap/main.py | head -50Repository: fhalde/jsontap
Length of output: 1387
🏁 Script executed:
cat -n src/jsontap/parser.py | head -100Repository: fhalde/jsontap
Length of output: 3952
🏁 Script executed:
rg -n "class AsyncJsonNode" -A 20 --type=pyRepository: fhalde/jsontap
Length of output: 1916
🏁 Script executed:
rg -n "class Store" -A 30 --type=pyRepository: fhalde/jsontap
Length of output: 1039
🏁 Script executed:
rg -n 'asyncio\.create_task\(' --type=py -A 2Repository: fhalde/jsontap
Length of output: 356
🏁 Script executed:
rg -n "_task" -B 5 -A 5 --type=py src/jsontap/core.pyRepository: fhalde/jsontap
Length of output: 389
🏁 Script executed:
cat -n src/jsontap/store.pyRepository: fhalde/jsontap
Length of output: 1171
🏁 Script executed:
cat -n src/jsontap/parser.py | tail -20Repository: fhalde/jsontap
Length of output: 889
🏁 Script executed:
rg -n "try|except|raise" -A 2 src/jsontap/parser.pyRepository: fhalde/jsontap
Length of output: 269
🏁 Script executed:
rg -n "set_exception|exception" --type=py src/jsontap/Repository: fhalde/jsontap
Length of output: 1464
🏁 Script executed:
find . -name "*.py" -type f -exec grep -l "jsontap\|AsyncParser" {} \; | head -10Repository: fhalde/jsontap
Length of output: 193
🏁 Script executed:
cat -n tests/test_jsontap.py | head -100Repository: fhalde/jsontap
Length of output: 3603
🏁 Script executed:
cat -n examples/run.pyRepository: fhalde/jsontap
Length of output: 5906
Retain the parser task and handle exceptions; fire-and-forget parsing loses errors.
Line 32 creates a background task without retaining it. If parse_value() raises an exception, it will be silently lost. If an error occurs before the expected path is resolved, line 37 will block indefinitely with no error indication.
Suggested fix
async def main():
def jsontap(s):
store = Store()
parser = AsyncParser(s, store)
- asyncio.create_task(parser.parse_value(()))
- return parser.parse()
+ parse_task = asyncio.create_task(parser.parse_value(()))
+ return parser.parse(), parse_task
stream = SimulatedStream('{"name": "Alice"}', chunk_size=3)
- parser = jsontap(stream)
- print(await parser["name"])
+ parser, parse_task = jsontap(stream)
+ try:
+ print(await parser["name"])
+ await parse_task
+ finally:
+ if not parse_task.done():
+ parse_task.cancel()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| asyncio.create_task(parser.parse_value(())) | |
| return parser.parse() | |
| stream = SimulatedStream('{"name": "Alice"}', chunk_size=3) | |
| parser = jsontap(stream) | |
| print(await parser["name"]) | |
| parse_task = asyncio.create_task(parser.parse_value(())) | |
| return parser.parse(), parse_task | |
| stream = SimulatedStream('{"name": "Alice"}', chunk_size=3) | |
| parser, parse_task = jsontap(stream) | |
| try: | |
| print(await parser["name"]) | |
| await parse_task | |
| finally: | |
| if not parse_task.done(): | |
| parse_task.cancel() |
🧰 Tools
🪛 Ruff (0.15.2)
[warning] 32-32: Store a reference to the return value of asyncio.create_task
(RUF006)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/jsontap/main.py` around lines 32 - 37, The background parse task created
with asyncio.create_task(parser.parse_value(())) is fire-and-forget and can
swallow exceptions; change it to keep a reference (e.g., parse_task =
asyncio.create_task(parser.parse_value(()))) and ensure its exceptions are
observed before awaiting parser["name"] — either await parse_task (or await
asyncio.shield/asyncio.gather with return_exceptions=False) or add a done
callback to log/raise exceptions; update the jsontap usage around
SimulatedStream and parser.parse() so the parse task is never discarded and any
errors from parser.parse_value are propagated to the caller.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/jsontap/main.py (1)
32-37:⚠️ Potential issue | 🟠 MajorRetain and observe the parse task; current fire-and-forget can hide failures.
Line 32 drops the
create_taskhandle, so parse exceptions may be unobserved, and Line 37 can wait indefinitely if"name"is never produced due to upstream parse failure.Suggested fix
-import asyncio +import asyncio +import contextlib @@ def jsontap(s): store = Store() parser = AsyncParser(s, store) - asyncio.create_task(parser.parse_value(())) - return parser.parse() + parse_task = asyncio.create_task(parser.parse_value(())) + return parser.parse(), parse_task @@ - parser = jsontap(stream) - print(await parser["name"]) + parser, parse_task = jsontap(stream) + try: + print(await parser["name"]) + await parse_task + finally: + if not parse_task.done(): + parse_task.cancel() + with contextlib.suppress(asyncio.CancelledError): + await parse_task🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/main.py` around lines 32 - 37, The code currently fires parser.parse_value via asyncio.create_task without keeping the Task, causing exceptions to be unobserved and leaving parser["name"] awaiting forever on upstream failures; change the fire-and-forget into an observed task by assigning asyncio.create_task(parser.parse_value(())) to a variable (e.g., parse_task), start the parser (ensure parser.parse() is invoked appropriately), and either await parse_task after you consume parser["name"] or add a done-callback that logs/propagates exceptions so parse errors are not swallowed; reference the functions/classes parser.parse_value, parser.parse, jsontap, FakeChatCompletion and the await of parser["name"] when updating the code.
🧹 Nitpick comments (1)
src/jsontap/main.py (1)
23-24: Avoid unconditional stdout logging in stream iteration.Line 23 prints raw chunks every iteration. Consider using a debug logger or a
verboseflag to avoid noisy output and accidental payload exposure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/main.py` around lines 23 - 24, Replace the unconditional print(repr(chunk), flush=True) in the stream loop with a conditional or debug-level logger; specifically, remove the direct stdout call and either (a) wrap it behind a verbose flag (e.g., if verbose: print(...)) or (b) use a module logger (logger = logging.getLogger(__name__) and logger.debug(repr(chunk))). Keep the await asyncio.sleep(0.1) unchanged; ensure any logging does not accidentally expose payloads in non-debug modes and add a CLI/config option to enable verbose/debug output if needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/jsontap/main.py`:
- Around line 10-13: The __init__ of the iterator stores chunk_size without
validation which allows chunk_size <= 0 and causes __anext__ to never advance
(or regress); update the Jsontap iterator __init__ to validate that chunk_size
is an int > 0 and raise a ValueError (or TypeError for non-int) if not, ensuring
self._chunk_size is always a positive integer so the advancement logic in
__anext__ uses a safe, terminating step size.
---
Duplicate comments:
In `@src/jsontap/main.py`:
- Around line 32-37: The code currently fires parser.parse_value via
asyncio.create_task without keeping the Task, causing exceptions to be
unobserved and leaving parser["name"] awaiting forever on upstream failures;
change the fire-and-forget into an observed task by assigning
asyncio.create_task(parser.parse_value(())) to a variable (e.g., parse_task),
start the parser (ensure parser.parse() is invoked appropriately), and either
await parse_task after you consume parser["name"] or add a done-callback that
logs/propagates exceptions so parse errors are not swallowed; reference the
functions/classes parser.parse_value, parser.parse, jsontap, FakeChatCompletion
and the await of parser["name"] when updating the code.
---
Nitpick comments:
In `@src/jsontap/main.py`:
- Around line 23-24: Replace the unconditional print(repr(chunk), flush=True) in
the stream loop with a conditional or debug-level logger; specifically, remove
the direct stdout call and either (a) wrap it behind a verbose flag (e.g., if
verbose: print(...)) or (b) use a module logger (logger =
logging.getLogger(__name__) and logger.debug(repr(chunk))). Keep the await
asyncio.sleep(0.1) unchanged; ensure any logging does not accidentally expose
payloads in non-debug modes and add a CLI/config option to enable verbose/debug
output if needed.
| def __init__(self, data: str, chunk_size: int = 1): | ||
| self._data = data | ||
| self._chunk_size = chunk_size | ||
| self._i = 0 |
There was a problem hiding this comment.
Validate chunk_size to prevent non-terminating iteration.
If chunk_size <= 0, Line 22 does not advance correctly, and __anext__ can loop forever (or regress for negatives).
Suggested fix
def __init__(self, data: str, chunk_size: int = 1):
+ if chunk_size <= 0:
+ raise ValueError("chunk_size must be > 0")
self._data = data
self._chunk_size = chunk_size
self._i = 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/jsontap/main.py` around lines 10 - 13, The __init__ of the iterator
stores chunk_size without validation which allows chunk_size <= 0 and causes
__anext__ to never advance (or regress); update the Jsontap iterator __init__ to
validate that chunk_size is an int > 0 and raise a ValueError (or TypeError for
non-int) if not, ensuring self._chunk_size is always a positive integer so the
advancement logic in __anext__ uses a safe, terminating step size.
There was a problem hiding this comment.
♻️ Duplicate comments (4)
src/jsontap/parser.py (2)
36-83:⚠️ Potential issue | 🟠 MajorAdd an explicit recursion depth guard for nested JSON.
Line [36]-Line [83] recursively descends without a max-depth check, so deeply nested payloads can raise
RecursionErrorand terminate parsing unexpectedly.💡 Proposed fix
class AsyncParser: - def __init__(self, stream, store: Store): + def __init__(self, stream, store: Store, max_depth: int = 900): self._events = ijson.parse_async(AsyncIterableFileLike(stream)) self._result = {} self._store = store + self._max_depth = max_depth @@ - async def parse_value(self, prefix: tuple[str | int, ...]) -> Any: + async def parse_value(self, prefix: tuple[str | int, ...], depth: int = 0) -> Any: + if depth > self._max_depth: + raise ValueError("Maximum JSON nesting depth exceeded") _, event, value = await self._next_event() @@ - node = await self.parse_object(prefix) + node = await self.parse_object(prefix, depth + 1) @@ - node = await self.parse_array(prefix) + node = await self.parse_array(prefix, depth + 1) @@ - async def parse_object(self, prefix: tuple[str | int, ...]) -> dict[str | int, Any]: + async def parse_object(self, prefix: tuple[str | int, ...], depth: int) -> dict[str | int, Any]: @@ - obj[value] = await self.parse_value((*prefix, value)) + obj[value] = await self.parse_value((*prefix, value), depth + 1) @@ - async def parse_array(self, prefix: tuple[str | int, ...]) -> list[Any]: + async def parse_array(self, prefix: tuple[str | int, ...], depth: int) -> list[Any]: @@ - arr.append(await self.parse_object((*prefix, index))) + arr.append(await self.parse_object((*prefix, index), depth + 1)) @@ - arr.append(await self.parse_array((*prefix, index))) + arr.append(await self.parse_array((*prefix, index), depth + 1))#!/bin/bash # Verify recursive entry points and absence/presence of a depth guard. rg -nP 'def parse_value|def parse_object|def parse_array|max_depth|RecursionError' src/jsontap/parser.py -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/parser.py` around lines 36 - 83, The parser currently recurses indefinitely in parse_value, parse_object, and parse_array and needs a max-depth guard to prevent RecursionError; add a configurable depth limit (e.g. self._max_depth) and a current depth counter (pass an incremented depth arg or use an internal stack counter) and check it before any recursive descent in parse_value, parse_object, and parse_array, raising a clear exception (or returning an explicit error) when depth > self._max_depth; ensure the top-level entrypoint initializes the depth (or uses 0) so all recursive calls (references: parse_value, parse_object, parse_array) increment/check the counter consistently.
14-21:⚠️ Potential issue | 🟠 MajorHonor ijson
read(size)semantics inAsyncIterableFileLike.read.Line [14]-Line [21] return one upstream chunk as-is, so
read(n)can exceednand breaks file-like buffering expectations forijson.parse_async.💡 Proposed fix
class AsyncIterableFileLike: def __init__(self, stream): self._iter = stream.__aiter__() + self._buffer = bytearray() async def read(self, n=-1): if n == 0: return b"" - try: - chunk = await anext(self._iter) - return chunk.encode("utf-8") if isinstance(chunk, str) else chunk - except StopAsyncIteration: - return b"" + if n < 0: + data = bytearray(self._buffer) + self._buffer.clear() + while True: + try: + chunk = await anext(self._iter) + except StopAsyncIteration: + return bytes(data) + if isinstance(chunk, str): + chunk = chunk.encode("utf-8") + data.extend(chunk) + + while len(self._buffer) < n: + try: + chunk = await anext(self._iter) + except StopAsyncIteration: + break + if isinstance(chunk, str): + chunk = chunk.encode("utf-8") + self._buffer.extend(chunk) + + out = bytes(self._buffer[:n]) + del self._buffer[:n] + return outFor ijson 3.5.0, what is the required contract for async file-like read(size) in parse_async, including behavior for read(0), size-limited reads, and EOF?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/parser.py` around lines 14 - 21, The AsyncIterableFileLike.read method currently returns a single upstream chunk which can exceed the requested size; change read to honor ijson/typical file-like semantics: read(0) must return b"", read(n>0) must return up to n bytes (never more) by accumulating upstream chunks from self._iter into an internal bytes buffer until at least n bytes are available or the iterator is exhausted, slicing off and returning exactly n bytes (or fewer at EOF), converting str chunks via .encode("utf-8") as you already do; preserve existing behavior for n < 0 (treat as unlimited read by returning all remaining data) and ensure EOF still returns b"" — add an internal buffer attribute (e.g., self._buffer) and use anext(self._iter) to refill it as needed in AsyncIterableFileLike.read.src/jsontap/main.py (2)
11-14:⚠️ Potential issue | 🟡 MinorValidate
chunk_sizeinFakeChatCompletion.__init__.Line [11]-Line [14] allow
chunk_size <= 0, which can make iteration non-terminating or regress index movement.💡 Proposed fix
def __init__(self, data: str, chunk_size: int = 1): + if not isinstance(chunk_size, int): + raise TypeError("chunk_size must be an int") + if chunk_size <= 0: + raise ValueError("chunk_size must be > 0") self._data = data self._chunk_size = chunk_size self._i = 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/main.py` around lines 11 - 14, FakeChatCompletion.__init__ currently accepts chunk_size <= 0 which can break iteration; add validation at the start of FakeChatCompletion.__init__ to raise a ValueError (or TypeError if non-int) when chunk_size is not an int > 0, e.g. check isinstance(chunk_size, int) and chunk_size > 0, and keep existing assignments to self._chunk_size, self._data, and self._i only after validation so iteration logic using _chunk_size cannot enter a non-terminating state.
33-34:⚠️ Potential issue | 🟠 MajorDon’t discard the parse task; observe completion and exceptions.
Line [33]-Line [34] fire-and-forget
asyncio.create_task(...). Parser failures can be lost, and path futures may never resolve.💡 Proposed fix
async def main(): def jsontap(s): store = Store() parser = AsyncParser(s, store) - asyncio.create_task(parser.parse_value(())) - return parser.parse() + parse_task = asyncio.create_task(parser.parse_value(())) + return parser.parse(), parse_task @@ - parser = jsontap(stream) - print(await parser["user"]["name"]) - print(await parser["user"]["age"]) - print(await parser["user"]["scores"][1]) - print(await parser["user"]["children"][0]["name"]) - print(await parser["user"]["children"][1]["age"]) - print(await parser["user"]["address"]["state"]) - print(await parser["user"]["address"]["city"]) - print(await parser["user"]["address"]["zip"]) - print(await parser["user"]["address"]) - print(await parser["user"]) + parser, parse_task = jsontap(stream) + try: + print(await parser["user"]["name"]) + print(await parser["user"]["age"]) + print(await parser["user"]["scores"][1]) + print(await parser["user"]["children"][0]["name"]) + print(await parser["user"]["children"][1]["age"]) + print(await parser["user"]["address"]["state"]) + print(await parser["user"]["address"]["city"]) + print(await parser["user"]["address"]["zip"]) + print(await parser["user"]["address"]) + print(await parser["user"]) + await parse_task + finally: + if not parse_task.done(): + parse_task.cancel()#!/bin/bash # Verify fire-and-forget task creation sites. rg -nP 'asyncio\.create_task\(' src/jsontap/main.py -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/jsontap/main.py` around lines 33 - 34, The code currently fire-and-forgets asyncio.create_task(parser.parse_value(())) so parse failures can be lost and path futures may never resolve; change this to capture the Task (e.g., task = asyncio.create_task(parser.parse_value(()))) and either await that task before returning from the function that calls parser.parse(), or attach a done callback that logs and re-raises exceptions so they propagate to parser.parse(); update callers around parser.parse_value and parser.parse to ensure the created task is observed and exceptions handled (refer to parser.parse_value and parser.parse).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/jsontap/main.py`:
- Around line 11-14: FakeChatCompletion.__init__ currently accepts chunk_size <=
0 which can break iteration; add validation at the start of
FakeChatCompletion.__init__ to raise a ValueError (or TypeError if non-int) when
chunk_size is not an int > 0, e.g. check isinstance(chunk_size, int) and
chunk_size > 0, and keep existing assignments to self._chunk_size, self._data,
and self._i only after validation so iteration logic using _chunk_size cannot
enter a non-terminating state.
- Around line 33-34: The code currently fire-and-forgets
asyncio.create_task(parser.parse_value(())) so parse failures can be lost and
path futures may never resolve; change this to capture the Task (e.g., task =
asyncio.create_task(parser.parse_value(()))) and either await that task before
returning from the function that calls parser.parse(), or attach a done callback
that logs and re-raises exceptions so they propagate to parser.parse(); update
callers around parser.parse_value and parser.parse to ensure the created task is
observed and exceptions handled (refer to parser.parse_value and parser.parse).
In `@src/jsontap/parser.py`:
- Around line 36-83: The parser currently recurses indefinitely in parse_value,
parse_object, and parse_array and needs a max-depth guard to prevent
RecursionError; add a configurable depth limit (e.g. self._max_depth) and a
current depth counter (pass an incremented depth arg or use an internal stack
counter) and check it before any recursive descent in parse_value, parse_object,
and parse_array, raising a clear exception (or returning an explicit error) when
depth > self._max_depth; ensure the top-level entrypoint initializes the depth
(or uses 0) so all recursive calls (references: parse_value, parse_object,
parse_array) increment/check the counter consistently.
- Around line 14-21: The AsyncIterableFileLike.read method currently returns a
single upstream chunk which can exceed the requested size; change read to honor
ijson/typical file-like semantics: read(0) must return b"", read(n>0) must
return up to n bytes (never more) by accumulating upstream chunks from
self._iter into an internal bytes buffer until at least n bytes are available or
the iterator is exhausted, slicing off and returning exactly n bytes (or fewer
at EOF), converting str chunks via .encode("utf-8") as you already do; preserve
existing behavior for n < 0 (treat as unlimited read by returning all remaining
data) and ensure EOF still returns b"" — add an internal buffer attribute (e.g.,
self._buffer) and use anext(self._iter) to refill it as needed in
AsyncIterableFileLike.read.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.gitignoresrc/jsontap/main.pysrc/jsontap/parser.py
✅ Files skipped from review due to trivial changes (1)
- .gitignore
Summary by CodeRabbit
New Features
Chores