feat: Docs, benchmarking and optimisation by daconjurer · Pull Request #6 · daconjurer/rust-json-parser

daconjurer · 2026-03-01T14:20:14Z

Changes in this MR add doc comments, a benchmark function (added to the Python CLI) and a few optimisations:

changing the type of the input member of Tokenizer here
using slices of that input instead of instances of String when consuming numbers and strings in tokenize()
move the use of the format! macro into the err_on_missing_expected_comma() function here so that the String is not eagerly allocated through Debug

The parser (considering the Python bindings) is still slower than the Python implementations, so I added the pure Rust time to the benchmark output just for fun and reference of how much overhead the Python bindings bring.

…rk flag

… instead of converting to [char]

… of a char; chore: use slices of input in string and number consuming functions

…comma() function; chore(parser): remove clone() from advance()

…r more context

daconjurer · 2026-03-03T19:24:41Z

The output of the benchmark flag looks a bit like this:

jhodapp

Looking really, really good Victor! The documentation in particular is thorough, helpful, and professional.

You can definitely be proud of this complete Rust JSON parser with a really slick Python CLI wrapper on top.

I'm going to intuit that you are feeling pretty well prepared now to mix Rust with Python for future endeavors, and possibly even spearhead some Rust-only projects. You seem ready from what I can tell, and I look forward to hearing your thoughts on this.

Look at this performance on my MacBook Air M1:

  ┌─────────────┬──────────┬─────────────┬─────────────────┬─────────────────┬────────────┐
  │    File     │   Size   │ Rust (pure) │ Rust + bindings │ Python json (C) │ simplejson │
  ├─────────────┼──────────┼─────────────┼─────────────────┼─────────────────┼────────────┤
  │ small.json  │ 85 B     │ 1.2 us      │ 1.6 us          │ 1.2 us          │ 9.2 us     │
  ├─────────────┼──────────┼─────────────┼─────────────────┼─────────────────┼────────────┤
  │ medium.json │ 16.6 KB  │ 126 us      │ 161 us          │ 90 us           │ 1.1 ms     │
  ├─────────────┼──────────┼─────────────┼─────────────────┼─────────────────┼────────────┤
  │ large.json  │ 178.4 KB │ 735 us      │ 937 us          │ 782 us          │ 7.8 ms     │
  └─────────────┴──────────┴─────────────┴─────────────────┴─────────────────┴────────────┘

What's Working Well

Extensive documentation beyond the minimum: You documented every public item, not just the required ones, with working examples and Python-style docstrings on all PyO3 functions.
Sophisticated benchmarking design: You independently added a 4th "pure Rust" benchmark column to isolate binding overhead, and built in warmup rounds, auto-scaling, and with_capacity for the timing vectors.
Thoughtful performance optimizations in the tokenizer: The refactor to &'input str with byte-level scanning, the fast-path/slow-path split in consume_string(), and slice-based consumption in consume_number()/consume_keyword() are well-executed.
Polished CLI experience: Using argparse with --rounds, --warmup, directory-based benchmarking, auto-discovery, and human-readable file sizes goes well beyond the curriculum's sys.argv suggestion.

Areas for Improvement

Clippy warning: redundant field name in Tokenizer
Use of unwrap() in the median() function

jhodapp · 2026-03-03T23:05:21Z

rust-json-parser/src/python_bindings.rs

+    let result = parse_file(path)?;
+    result.into_pyobject(py)


You could improve this even further into a one-liner that I think doesn't make it any less understandable: parse_file(path).into_pyobject(py)

jhodapp · 2026-03-03T23:07:35Z

rust-json-parser/src/python_bindings.rs

+) -> PyResult<Bound<'py, PyDict>> {
+    let n = rounds as usize;
+
+    // --- Rust (with no bindings) ---


Not major, but this method benchmark_performance() could read even better if you split the 4-5 sections of it up into separate smaller methods. You wouldn't even need the comments then if you name your methods effectively enough.

jhodapp · 2026-03-03T23:09:21Z

rust-json-parser/src/tokenizer.rs

+        Some(b)
+    }
+
+    fn _input_slice_to_string(&self, start: usize, end: usize) -> String {


You don't need to prefix this method name with a '_', that's reserved in Rust for meaning an unused method or variable symbol.

jhodapp · 2026-03-03T23:10:43Z

rust-json-parser/src/tokenizer.rs

-                '"' => {
-                    self.advance(); // consume closing quote
-                    return Ok(consumed_string);
+    fn consume_string_slow(&mut self, s: &mut String) -> JsonResult<String> {


Is this named consume_string_slow because it's a pretty unoptimized method?

jhodapp · 2026-03-03T23:15:48Z

rust-json-parser/src/python_bindings.rs

+    times.select_nth_unstable_by(mid, |a, b| a.partial_cmp(b).unwrap());
+    if times.len() % 2 == 1 {
+        times[mid]
+    } else {
+        let left = *times[..mid]
+            .iter()
+            .max_by(|a, b| a.partial_cmp(b).unwrap())
+            .unwrap();


Can you figure out a way of getting rid of these unwrap() uses? Hint for you: are there error cases to handle or surface, or would a default value suffice, or mapping to something else?

daconjurer added 3 commits February 27, 2026 23:06

chore: add doc comments

33fe33c

feat: first version of benchmark rust function and Python CLI benchma…

9a9e64f

…rk flag

chore: start optimisations (hold reference to input &str in Tokenizer…

6dd38b3

… instead of converting to [char]

daconjurer self-assigned this Mar 1, 2026

daconjurer added 4 commits March 1, 2026 14:23

chore: change peek() and advance() functions to return a byte instead…

15ef216

… of a char; chore: use slices of input in string and number consuming functions

chore(parser): move format() macro call into err_on_missing_expected_…

6de935e

…comma() function; chore(parser): remove clone() from advance()

chore: tweak results output in benchmark; add pure rust to the mix fo…

e189935

…r more context

chore: add --benchmark flag to readme

7de40f9

daconjurer requested a review from jhodapp March 2, 2026 22:35

jhodapp requested changes Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Docs, benchmarking and optimisation#6

feat: Docs, benchmarking and optimisation#6
daconjurer wants to merge 7 commits intomainfrom
chore/document-and-benchmark

daconjurer commented Mar 1, 2026 •

edited

Loading

Uh oh!

daconjurer commented Mar 3, 2026

Uh oh!

jhodapp left a comment

Uh oh!

jhodapp Mar 3, 2026

Uh oh!

jhodapp Mar 3, 2026

Uh oh!

jhodapp Mar 3, 2026

Uh oh!

jhodapp Mar 3, 2026

Uh oh!

jhodapp Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daconjurer commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daconjurer commented Mar 3, 2026

Uh oh!

jhodapp left a comment

Choose a reason for hiding this comment

What's Working Well

Areas for Improvement

Uh oh!

jhodapp Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

jhodapp Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

jhodapp Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

jhodapp Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

jhodapp Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daconjurer commented Mar 1, 2026 •

edited

Loading