Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ jobs:
- name: Test (release)
run: cargo test --release

- name: Test scalar-only (no AVX2 feature)
run: cargo test --release --no-default-features

- name: Test with test-panic feature
run: cargo test --features test-panic --release

Expand Down
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ name = "quickdecode"
crate-type = ["cdylib", "rlib"]

[features]
default = ["avx2"]
avx2 = []
test-panic = []

[dependencies]
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

Rust-implemented fast JSON decoder exposed to LuaJIT via FFI. Optimized for the common case where a large JSON is parsed once and only a small number of fields are extracted before the document is discarded.

Design document: `docs/superpowers/specs/2026-05-15-rust-quick-json-decode-design.md` (in progress).
Design document: `docs/superpowers/specs/2026-05-15-rust-quick-json-decode-design.md`.

## Status

Currently in design phase. No implementation yet.
Initial implementation complete: scalar + AVX2/PCLMUL structural scanner, root-path and cursor APIs, escape-decoded strings, integer/float/bool/typeof/len, FFI panic barrier, and a LuaJIT wrapper. Rust unit/integration tests and Lua busted tests run in CI. The benchmark harness compares against lua-cjson but tuning is pending — see `Roadmap / Deferred` below.

## Building

Expand Down Expand Up @@ -68,3 +68,4 @@ Items intentionally pushed out of the first implementation. Each will be picked
- **Skip-cache LRU eviction** — only if memory pressure on huge documents proves problematic in practice.
- **Path-position info on Phase 1 errors** — currently only an opaque `QJD_PARSE_ERROR`.
- **AVX2 tail-bypass optimization** — current implementation falls back to whole-buffer scalar when a tail exists; could be optimized by emitting tail structural offsets directly.
- **Large bench fixtures** — spec §9.3 lists `large_dump.json` (~20 MB) and `deep_nest.json` (depth stress test); not yet committed. Only `small_api.json` and `medium_resp.json` ship today.
27 changes: 15 additions & 12 deletions docs/superpowers/specs/2026-05-15-rust-quick-json-decode-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,18 @@ It does so by performing a **single fast SIMD structural scan** in Phase 1 (only
src/
├── lib.rs — crate root, re-exports
├── ffi.rs — pub extern "C" symbols (C ABI layer)
├── doc.rs — Document & Cursor (internal Rust API)
├── doc.rs — Document type (Phase 1 + container helpers)
├── cursor.rs — Cursor, path resolution, skip-cache walk
├── path.rs — path string parse (zero-alloc iterator)
├── error.rs — error / type enums
├── scan/
│ ├── mod.rs — StructScanner trait, dispatch
│ ├── mod.rs — Scanner trait + runtime dispatch (OnceCell-cached)
│ ├── scalar.rs — scalar fallback
│ ├── avx2.rs — x86_64 AVX2 + PCLMUL
│ └── runtime_dispatch.rs
│ └── avx2.rs — x86_64 AVX2 + PCLMUL (gated by `avx2` feature)
├── decode/
│ ├── mod.rs
│ ├── number.rs — lazy i64/f64 parse
│ ├── string.rs — lazy escape decode + UTF-8 check on \u
│ └── path.rs — path string parse (zero-alloc iterator)
│ └── string.rs — lazy escape decode + UTF-8 check on \u
└── skip_cache.rs — Phase 2 sibling-skip cache

lua/
Expand Down Expand Up @@ -144,8 +146,8 @@ typedef struct {
const qjd_doc* doc;
uint32_t idx_start; /* opener position in doc.indices */
uint32_t idx_end; /* one past closer */
uint32_t cache_slot; /* skip-cache slot; 0 if not populated */
uint32_t _pad;
uint32_t _reserved0; /* reserved for future fast-path */
uint32_t _reserved1; /* reserved / padding */
} qjd_cursor; /* 24 bytes, by-value, no allocation */
```

Expand Down Expand Up @@ -337,11 +339,11 @@ pub(crate) struct Cursor<'d> {
/// idx_start points at '{' or '['; idx_end points one past matching '}' / ']'.
idx_start: u32,
idx_end: u32,
/// Skip-cache slot for this range (0 = not yet built).
cache_slot: u32,
}
```

The published `qjd_cursor` carries two `_reservedN` slots beyond `idx_start`/`idx_end`; they are unused in v1 but reserved so a future per-cursor skip-cache fast-path can be added without breaking the ABI.

`Cursor` is `Copy` and never allocates. `open()`, `field()`, `index()` return new cursors by value.

### 6.2 Resolution Algorithm
Expand Down Expand Up @@ -375,8 +377,9 @@ pub(crate) struct SkipSlot {
/// (for object: pointing at the key's opening '"';
/// for array: pointing at the value's first token).
child_starts: Vec<u32>,
/// Position of the closing '}' / ']' in doc.indices.
closer_idx: u32,
/// child_ends[i] = idx_end for a Cursor pointing at the i-th child's value.
/// Storing this lets cache-hit lookups skip the brace-counting walk.
child_ends: Vec<u32>,
}
```

Expand Down
4 changes: 2 additions & 2 deletions include/lua_quick_decode.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ typedef struct {
const qjd_doc* doc;
uint32_t idx_start;
uint32_t idx_end;
uint32_t cache_slot;
uint32_t _pad;
uint32_t _reserved0;
uint32_t _reserved1;
} qjd_cursor;

const char* qjd_strerror(int code);
Expand Down
17 changes: 7 additions & 10 deletions lua/quickdecode.lua
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ ffi.cdef[[
typedef struct qjd_doc qjd_doc;
typedef struct {
const qjd_doc* doc;
uint32_t idx_start, idx_end, cache_slot, _pad;
uint32_t idx_start, idx_end, _reserved0, _reserved1;
} qjd_cursor;

const char* qjd_strerror(int code);
Expand Down Expand Up @@ -161,24 +161,21 @@ function Cursor:len(path)
end

function Cursor:open(path)
local out = ffi.new("qjd_cursor[1]")
local rc = C.qjd_cursor_open(self._cur, path, #path, out)
local rc = C.qjd_cursor_open(self._cur, path, #path, cur_box)
if not check_err(rc) then return nil end
return setmetatable({ _cur = out[0], _doc = self._doc }, Cursor)
return setmetatable({ _cur = cur_box[0], _doc = self._doc }, Cursor)
end

function Cursor:field(key)
local out = ffi.new("qjd_cursor[1]")
local rc = C.qjd_cursor_field(self._cur, key, #key, out)
local rc = C.qjd_cursor_field(self._cur, key, #key, cur_box)
if not check_err(rc) then return nil end
return setmetatable({ _cur = out[0], _doc = self._doc }, Cursor)
return setmetatable({ _cur = cur_box[0], _doc = self._doc }, Cursor)
end

function Cursor:index(i)
local out = ffi.new("qjd_cursor[1]")
local rc = C.qjd_cursor_index(self._cur, i, out)
local rc = C.qjd_cursor_index(self._cur, i, cur_box)
if not check_err(rc) then return nil end
return setmetatable({ _cur = out[0], _doc = self._doc }, Cursor)
return setmetatable({ _cur = cur_box[0], _doc = self._doc }, Cursor)
end

return _M
19 changes: 12 additions & 7 deletions src/cursor.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,14 +61,17 @@ fn walk_children(doc: &Document, cur: Cursor, seg: &PathSeg) -> Result<Cursor, q
let (slot_n, was_cached) = cache.get_or_insert(cur.idx_start);

if was_cached {
// Fast path: iterate cached child_starts.
let starts = cache.slot(slot_n).child_starts.clone();
// Fast path: iterate cached (start, end) pairs. No brace counting.
let slot = cache.slot(slot_n);
let starts = slot.child_starts.clone();
let ends = slot.child_ends.clone();
drop(cache);
return resolve_in_known_children(doc, &starts, is_obj, seg);
return resolve_in_known_children(doc, &starts, &ends, is_obj, seg);
}

// Slow path: walk all children, populate cache fully, record match if any.
let mut starts: Vec<u32> = Vec::new();
let mut ends: Vec<u32> = Vec::new();
let mut i = cur.idx_start + 1;
let end = cur.idx_end;
let mut arr_idx: u32 = 0;
Expand All @@ -79,6 +82,7 @@ fn walk_children(doc: &Document, cur: Cursor, seg: &PathSeg) -> Result<Cursor, q

let value_idx_start = if is_obj { i + 3 } else { i };
let (cursor_end, skip_end) = find_value_span(doc, value_idx_start)?;
ends.push(cursor_end);

// Match check (we keep walking after a match to populate the cache).
if result.is_none() {
Expand Down Expand Up @@ -108,7 +112,9 @@ fn walk_children(doc: &Document, cur: Cursor, seg: &PathSeg) -> Result<Cursor, q
}
}

cache.slot_mut(slot_n).child_starts = starts;
let slot = cache.slot_mut(slot_n);
slot.child_starts = starts;
slot.child_ends = ends;

match result {
Some(c) => Ok(c),
Expand All @@ -117,9 +123,9 @@ fn walk_children(doc: &Document, cur: Cursor, seg: &PathSeg) -> Result<Cursor, q
}

fn resolve_in_known_children(
doc: &Document, starts: &[u32], is_obj: bool, seg: &PathSeg,
doc: &Document, starts: &[u32], ends: &[u32], is_obj: bool, seg: &PathSeg,
) -> Result<Cursor, qjd_err> {
for (k, &i) in starts.iter().enumerate() {
for (k, (&i, &cursor_end)) in starts.iter().zip(ends.iter()).enumerate() {
let matched = if is_obj {
let key_open = doc.indices[i as usize] as usize;
let key_close = doc.indices[(i + 1) as usize] as usize;
Expand All @@ -130,7 +136,6 @@ fn resolve_in_known_children(
};
if matched {
let value_idx_start = if is_obj { i + 3 } else { i };
let (cursor_end, _) = find_value_span(doc, value_idx_start)?;
return Ok(Cursor { idx_start: value_idx_start, idx_end: cursor_end });
}
}
Expand Down
7 changes: 3 additions & 4 deletions src/doc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ use std::cell::RefCell;
use crate::error::qjd_err;
use crate::skip_cache::SkipCache;

#[allow(dead_code)]
pub struct Document<'a> {
pub(crate) buf: &'a [u8],
pub(crate) indices: Vec<u32>,
pub(crate) scratch: Vec<u8>,
pub(crate) scratch: RefCell<Vec<u8>>,
pub(crate) skip: RefCell<SkipCache>,
}

Expand All @@ -20,8 +19,8 @@ impl<'a> Document<'a> {
Ok(Self {
buf,
indices,
scratch: Vec::new(),
skip: RefCell::new(SkipCache::new()),
scratch: RefCell::new(Vec::new()),
skip: RefCell::new(SkipCache::new()),
})
}
}
Expand Down
27 changes: 10 additions & 17 deletions src/ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ macro_rules! ffi_catch {
}

/// Opaque type exported to C as `qjd_doc*`.
#[allow(dead_code)]
pub struct qjd_doc(pub(crate) Document<'static>);

#[no_mangle]
Expand Down Expand Up @@ -169,12 +168,8 @@ pub unsafe extern "C" fn qjd_get_str(
// String ends at the close quote, whose indices position is idx_start + 1.
let close = d.indices[(cur.idx_start + 1) as usize] as usize;

// SAFETY: scratch is owned by the qjd_doc; we obtain a mutable reference
// to it through the raw *mut qjd_doc pointer (not through the shared &Document
// alias `d`). Lua-side callers consume the returned ptr before any further
// FFI calls. Single-threaded use enforced by C ABI contract.
let scratch = &mut (*doc).0.scratch;
match string::decode_string(d.buf, pos + 1, close, scratch) {
let mut scratch = d.scratch.borrow_mut();
match string::decode_string(d.buf, pos + 1, close, &mut scratch) {
Ok((p, n)) => { *out_ptr = p; *out_len = n; qjd_err::QJD_OK as c_int }
Err(e) => e as c_int,
}
Expand Down Expand Up @@ -262,8 +257,8 @@ pub struct qjd_cursor {
pub doc: *const qjd_doc,
pub idx_start: u32,
pub idx_end: u32,
pub cache_slot: u32,
pub _pad: u32,
pub _reserved0: u32,
pub _reserved1: u32,
}

/// Turn a `*const qjd_cursor` into `(&'static Document<'static>, Cursor)` for Rust use.
Expand All @@ -278,10 +273,10 @@ unsafe fn cursor_to_internal(c: *const qjd_cursor) -> Result<(&'static Document<
fn internal_to_cursor(doc: *const qjd_doc, cur: Cursor) -> qjd_cursor {
qjd_cursor {
doc,
idx_start: cur.idx_start,
idx_end: cur.idx_end,
cache_slot: 0,
_pad: 0,
idx_start: cur.idx_start,
idx_end: cur.idx_end,
_reserved0: 0,
_reserved1: 0,
}
}

Expand Down Expand Up @@ -372,10 +367,8 @@ pub unsafe extern "C" fn qjd_cursor_get_str(
}
let close = d.indices[(cur.idx_start + 1) as usize] as usize;

// Access scratch via raw pointer through doc to avoid aliasing the &Document.
let doc_ptr = (*c).doc as *mut qjd_doc;
let scratch = &mut (*doc_ptr).0.scratch;
match string::decode_string(d.buf, pos + 1, close, scratch) {
let mut scratch = d.scratch.borrow_mut();
match string::decode_string(d.buf, pos + 1, close, &mut scratch) {
Ok((p, n)) => { *out_ptr = p; *out_len = n; qjd_err::QJD_OK as c_int }
Err(e) => e as c_int,
}
Expand Down
2 changes: 1 addition & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ pub mod ffi;
#[doc(hidden)]
pub mod __test_api {
pub use crate::scan::{Scanner, ScalarScanner};
#[cfg(target_arch = "x86_64")]
#[cfg(all(target_arch = "x86_64", feature = "avx2"))]
pub use crate::scan::avx2::Avx2Scanner;
}
9 changes: 9 additions & 0 deletions src/scan/avx2.rs
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,10 @@ mod tests {
use super::*;
use crate::scan::{Scanner, scalar::ScalarScanner};

fn host_supports_avx2() -> bool {
std::is_x86_feature_detected!("avx2") && std::is_x86_feature_detected!("pclmulqdq")
}

fn parity(input: &[u8]) {
let mut a = Vec::new();
let mut b = Vec::new();
Expand All @@ -180,6 +184,7 @@ mod tests {

#[test]
fn no_strings_matches_scalar() {
if !host_supports_avx2() { return; }
parity(b"{}");
parity(b"[]");
parity(b"[{}]");
Expand All @@ -190,6 +195,7 @@ mod tests {

#[test]
fn within_chunk_strings_match_scalar() {
if !host_supports_avx2() { return; }
// These are <64 bytes so they go through the scalar tail path only;
// they still verify Avx2Scanner does not corrupt the output for these
// inputs, but they do NOT exercise the AVX2 string handling.
Expand All @@ -203,6 +209,7 @@ mod tests {
/// within a single 64-byte chunk.
#[test]
fn chunked_path_with_string() {
if !host_supports_avx2() { return; }
// Build a 64-byte input where bytes 0..64 are a single AVX2 chunk
// containing a string, and there is no tail.
// Layout: `{"k":"<48 a's>"}` = 1 + 4 + 1 + 48 + 1 + 1 = 56 bytes. Need 64.
Expand All @@ -219,6 +226,7 @@ mod tests {
/// String with internal escapes inside a 64-byte chunk.
#[test]
fn chunked_path_with_escapes() {
if !host_supports_avx2() { return; }
// Bytes: {"k":"aa\"bb\\cc<padding>"}
// Need exactly 64 bytes. Build it carefully.
let mut buf = Vec::with_capacity(64);
Expand All @@ -235,6 +243,7 @@ mod tests {
/// for multiple strings in a single 64-byte chunk.
#[test]
fn pclmul_inside_string_correct() {
if !host_supports_avx2() { return; }
// {"a":"foo","b":"bar"}<padding to 64>
// Strings "foo" and "bar" both fully within the chunk.
let mut buf = Vec::with_capacity(64);
Expand Down
4 changes: 2 additions & 2 deletions src/scan/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
pub(crate) mod scalar;
#[cfg(target_arch = "x86_64")]
#[cfg(all(target_arch = "x86_64", feature = "avx2"))]
pub(crate) mod avx2;

use once_cell::sync::OnceCell;
Expand All @@ -21,7 +21,7 @@ static SCAN_FN: OnceCell<ScanFn> = OnceCell::new();

pub(crate) fn scan(buf: &[u8], out: &mut Vec<u32>) -> Result<(), usize> {
let f = *SCAN_FN.get_or_init(|| {
#[cfg(target_arch = "x86_64")]
#[cfg(all(target_arch = "x86_64", feature = "avx2"))]
{
if std::is_x86_feature_detected!("avx2")
&& std::is_x86_feature_detected!("pclmulqdq")
Expand Down
Loading
Loading