Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
1c3da23
feat(error): add 6 RFC 8259 audit error codes synced across Rust/C/Lua
membphis May 17, 2026
a872d5c
feat(options): introduce Options + Document::parse_with_options scaffold
membphis May 17, 2026
d5aaaec
feat(ffi): add qjd_parse_ex symbol with qjd_options struct
membphis May 17, 2026
80c1358
docs(ffi): clarify qjd_parse err_out contract and dead_code rationale
membphis May 17, 2026
1e8b55b
feat(lua): accept opts table in qd.parse(json, { lazy, max_depth })
membphis May 17, 2026
16a149b
fix(lua): reject fractional max_depth; add combined-opts test
membphis May 17, 2026
c8dfd84
test(rfc8259): scaffold compliance suite with cross-mode helper macros
membphis May 17, 2026
75c7244
fix(test): assert_rejects_eager macro now actually matches by variant
membphis May 17, 2026
1b86918
feat(validate): enforce max_depth in both eager and lazy modes
membphis May 17, 2026
1f93104
feat(validate): reject trailing content after root value (eager only)
membphis May 17, 2026
e9a2b57
feat(validate): strict RFC 8259 number ABNF (lazy decode + lazy entry…
membphis May 17, 2026
5e0eb26
feat(validate): reject control chars and invalid UTF-8 in string spans
membphis May 17, 2026
3eb8082
feat(validate): wire eager pass — full RFC 8259 number+string validation
membphis May 17, 2026
69e1b97
fix(validate): check_gap distinguishes wrong-case literals from numbe…
membphis May 17, 2026
33d8522
test(rfc8259): exhaustive RFC 8259 conformance corpus
membphis May 17, 2026
da99b7d
test(json_test_suite): vendor JSONTestSuite and add cross-mode walker
membphis May 17, 2026
469b3bb
fix(test): clippy doc_overindented_list_items in json_test_suite
membphis May 17, 2026
b56f93d
docs: update two-phase invariants for eager/lazy modes and RFC 8259 a…
membphis May 17, 2026
02a4fcd
Merge remote-tracking branch 'origin/main' into worktree-audit-json-v…
membphis May 17, 2026
1a3a4b7
ci: init JSONTestSuite submodule on Rust matrix checkouts
membphis May 17, 2026
4aac34e
perf(validate): single-pass string validator with SIMD ASCII fast path
membphis May 18, 2026
d0999de
feat(validate): grammar-aware eager pass closes structural gaps
membphis May 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ jobs:
os: [ubuntu-latest, macos-14]
steps:
- uses: actions/checkout@v4
with:
submodules: recursive

- name: Install Rust (stable)
run: |
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "vendor/lua-cjson"]
path = vendor/lua-cjson
url = https://github.com/openresty/lua-cjson.git
[submodule "tests/vendor/JSONTestSuite"]
path = tests/vendor/JSONTestSuite
url = https://github.com/nst/JSONTestSuite
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,12 @@ cargo test --features test-panic --release

### Two-phase parse

**Phase 1** (`src/scan/`, called from `Document::parse`): a structural scanner walks the input once and writes the byte offset of every non-string-interior `{ } [ ] : , "` into `doc.indices: Vec<u32>`. A `u32::MAX` sentinel is appended. The scanner is selected at first use via `OnceCell` in `src/scan/mod.rs`:
**Phase 1** (`src/scan/`, called from `Document::parse_with_options`): a structural scanner walks the input once and writes the byte offset of every non-string-interior `{ } [ ] : , "` into `doc.indices`. Then `validate_depth` is run unconditionally; in EAGER mode, `validate_trailing` and `validate_eager_values` (number ABNF + string content + UTF-8) follow. In LAZY mode, value-level checks are skipped and rely on the lazy decode path at field-access time. A `u32::MAX` sentinel is appended. The scanner is selected at first use via `OnceCell` in `src/scan/mod.rs`:

- `Avx2Scanner` (gated by the `avx2` cargo feature, default-on) when both `avx2` and `pclmulqdq` are detected at runtime.
- `ScalarScanner` otherwise.

Validation is shallow — bracket/quote balance only. Value-level errors (bad escapes, malformed numbers, invalid UTF-8 in `\u`) are deferred to Phase 2 and surface only if that field is accessed.
Validation level depends on `qjd_options.mode`. **EAGER** (default): a post-scan pass walks `indices` and validates RFC 8259 number ABNF, string content (no unescaped control chars), and UTF-8 — parse fails on any value-level violation. **LAZY** (opt-in): bracket/quote balance + max-depth only; value-level errors surface when the offending field is accessed (lua-cjson-equivalent behavior). Trailing-content rejection and value-level validation are eager-only; max-depth (default 1024, configurable up to 4096) is enforced in both modes.

**Phase 2** (`src/cursor.rs`, `src/path.rs`, `src/decode/`): path strings are parsed by a zero-alloc `PathIter` into `PathSeg::Key | Idx`. A `Cursor` (a `(idx_start, idx_end)` pair into `doc.indices`) is walked to the target, optionally caching sibling spans in `doc.skip` (`SkipCache`) so repeated lookups on the same container skip brace-counting. Strings are decoded into `doc.scratch` only when they contain escapes; otherwise the original buffer slice is handed back.

Expand Down
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,3 +116,37 @@ methodology + reproduction command.
```sh
make bench # quickdecode vs cjson
```

## RFC 8259 conformance

This crate implements RFC 8259 with both strict and lenient modes; the strict
(eager) mode is the default and is required by API-gateway use cases that must
reject malformed payloads before forwarding them upstream.

- Strict-mode acceptance corpus: `tests/rfc8259_compliance.rs`
- Industry corpus: `tests/json_test_suite.rs` (against the
[JSONTestSuite](https://github.com/nst/JSONTestSuite) submodule at
`tests/vendor/JSONTestSuite`)
- Behavior on implementation-defined (`i_*`) cases: `docs/rfc8259-conformance.md`

### Switching modes

From Lua:

```lua
local doc = qd.parse(json) -- eager (default)
local doc = qd.parse(json, { lazy = true }) -- lazy mode
local doc = qd.parse(json, { max_depth = 256 }) -- stricter depth limit
local doc = qd.parse(json, { lazy = true, max_depth = 256 })
```

From C:

```c
qjd_options opts = { .mode = QJD_MODE_LAZY, .max_depth = 256 };
qjd_doc* doc = qjd_parse_ex(buf, len, &opts, &err);
```

### Known gaps

Three structural-grammar checks are deferred to a follow-up — they require a grammar-aware walk beyond the current heuristic. See `tests/rfc8259_compliance.rs` for the specific `#[ignore]`d cases, and `tests/json_test_suite.rs::KNOWN_N_FAILURES` for the corresponding JSONTestSuite files.
Comment on lines +150 to +152
19 changes: 19 additions & 0 deletions docs/rfc8259-conformance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# RFC 8259 conformance: implementation-defined cases

JSONTestSuite categorizes some inputs as `i_*` — the spec allows either
acceptance or rejection. This file records `lua-quick-decode`'s behavior on
each, so changes show up in `git diff`.

Behavior is recorded for the default **EAGER** mode unless noted.

| File pattern | Our verdict | Rationale |
|---|---|---|
| `i_number_huge_exp` | REJECT (`QJD_NUMBER_OUT_OF_RANGE`) | f64 overflow surfaces at decode. |
| `i_number_very_big_negative_int` | varies — see below | ABNF-valid; representational, not structural. |
| `i_string_*` (UTF-16 surrogate halves in `\u` escapes) | REJECT (`QJD_DECODE_FAILED`) | We require well-formed surrogate pairs. |
| `i_structure_500_nested_arrays` | ACCEPT (within default 1024 max_depth) | Configurable. |

Run `cargo test --release --test json_test_suite -- --nocapture` to print the
live verdict for every `i_*` file via the `document_i_files_behavior` test.
That is the source of truth for these entries; update this table when a
verdict changes (e.g. after a validator gap is closed).
35 changes: 26 additions & 9 deletions include/lua_quick_decode.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,37 @@ extern "C" {
#endif

typedef enum {
QJD_OK = 0,
QJD_PARSE_ERROR = 1,
QJD_NOT_FOUND = 2,
QJD_TYPE_MISMATCH = 3,
QJD_OUT_OF_RANGE = 4,
QJD_DECODE_FAILED = 5,
QJD_INVALID_PATH = 6,
QJD_INVALID_ARG = 7,
QJD_OOM = 8
QJD_OK = 0,
QJD_PARSE_ERROR = 1,
QJD_NOT_FOUND = 2,
QJD_TYPE_MISMATCH = 3,
QJD_OUT_OF_RANGE = 4,
QJD_DECODE_FAILED = 5,
QJD_INVALID_PATH = 6,
QJD_INVALID_ARG = 7,
QJD_OOM = 8,
QJD_NESTING_TOO_DEEP = 9,
QJD_TRAILING_CONTENT = 10,
QJD_NUMBER_OUT_OF_RANGE = 11,
QJD_INVALID_NUMBER = 12,
QJD_INVALID_STRING = 13,
QJD_INVALID_UTF8 = 14
} qjd_err;

typedef enum {
QJD_T_NULL = 0, QJD_T_BOOL = 1, QJD_T_NUM = 2,
QJD_T_STR = 3, QJD_T_ARR = 4, QJD_T_OBJ = 5
} qjd_type;

#define QJD_MODE_EAGER 0u
#define QJD_MODE_LAZY 1u
#define QJD_DEFAULT_MAX_DEPTH 1024u

typedef struct {
uint32_t mode; /* QJD_MODE_EAGER (0) or QJD_MODE_LAZY (1) */
uint32_t max_depth; /* 0 = use QJD_DEFAULT_MAX_DEPTH */
} qjd_options;

typedef struct qjd_doc qjd_doc;

typedef struct {
Expand All @@ -38,6 +53,8 @@ typedef struct {
const char* qjd_strerror(int code);

qjd_doc* qjd_parse(const uint8_t* buf, size_t len, int* err_out);
qjd_doc* qjd_parse_ex(const uint8_t* buf, size_t len,
const qjd_options* opts, int* err_out);
void qjd_free (qjd_doc* doc);

int qjd_get_str (qjd_doc*, const char* path, size_t path_len,
Expand Down
58 changes: 54 additions & 4 deletions lua/quickdecode.lua
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,16 @@ typedef struct {
uint32_t idx_start, idx_end, _reserved0, _reserved1;
} qjd_cursor;

typedef struct {
uint32_t mode;
uint32_t max_depth;
} qjd_options;

const char* qjd_strerror(int code);
qjd_doc* qjd_parse(const uint8_t* buf, size_t len, int* err_out);
void qjd_free(qjd_doc* doc);
qjd_doc* qjd_parse (const uint8_t* buf, size_t len, int* err_out);
qjd_doc* qjd_parse_ex(const uint8_t* buf, size_t len,
const qjd_options* opts, int* err_out);
void qjd_free (qjd_doc* doc);

int qjd_get_str (qjd_doc*, const char* path, size_t path_len, const uint8_t** p, size_t* n);
int qjd_get_i64 (qjd_doc*, const char* path, size_t path_len, int64_t* out);
Expand Down Expand Up @@ -48,11 +55,31 @@ local strp_box = ffi.new("const uint8_t*[1]")
local cur_box = ffi.new("qjd_cursor[1]")

local NOT_FOUND = 2
-- Error codes mirrored from include/lua_quick_decode.h. Kept in sync manually;
-- src/error.rs has the authoritative numbering.
local ERR = {
OK = 0,
PARSE_ERROR = 1,
NOT_FOUND = 2,
TYPE_MISMATCH = 3,
OUT_OF_RANGE = 4,
DECODE_FAILED = 5,
INVALID_PATH = 6,
INVALID_ARG = 7,
OOM = 8,
NESTING_TOO_DEEP = 9,
TRAILING_CONTENT = 10,
NUMBER_OUT_OF_RANGE = 11,
INVALID_NUMBER = 12,
INVALID_STRING = 13,
INVALID_UTF8 = 14,
}

local _M = {
T_NULL = 0, T_BOOL = 1, T_NUM = 2,
T_STR = 3, T_ARR = 4, T_OBJ = 5,
}
_M.ERR = ERR

local Doc = {}; Doc.__index = Doc
local Cursor = {}; Cursor.__index = Cursor
Expand All @@ -63,8 +90,31 @@ local function check_err(rc)
error("quickdecode: " .. ffi.string(C.qjd_strerror(rc)))
end

function _M.parse(json_str)
local ptr = C.qjd_parse(json_str, #json_str, err_box)
local opts_box = ffi.new("qjd_options[1]")

local MODE_EAGER = 0
local MODE_LAZY = 1

function _M.parse(json_str, opts)
local ptr
if opts == nil then
ptr = C.qjd_parse(json_str, #json_str, err_box)
else
if type(opts) ~= "table" then
error("quickdecode.parse: opts must be a table")
end
local lazy = opts.lazy
if lazy ~= nil and type(lazy) ~= "boolean" then
error("quickdecode.parse: opts.lazy must be a boolean")
end
local max_depth = opts.max_depth or 0
if type(max_depth) ~= "number" or max_depth < 0 or max_depth ~= math.floor(max_depth) then
error("quickdecode.parse: opts.max_depth must be a non-negative integer")
end
opts_box[0].mode = lazy and MODE_LAZY or MODE_EAGER
opts_box[0].max_depth = max_depth
ptr = C.qjd_parse_ex(json_str, #json_str, opts_box, err_box)
Comment on lines +110 to +116
end
if ptr == nil then
error("quickdecode: " .. ffi.string(C.qjd_strerror(err_box[0])))
end
Expand Down
24 changes: 11 additions & 13 deletions src/decode/number.rs
Original file line number Diff line number Diff line change
@@ -1,20 +1,16 @@
use crate::error::qjd_err;

pub(crate) fn parse_i64(bytes: &[u8]) -> Result<i64, qjd_err> {
if bytes.is_empty() {
return Err(qjd_err::QJD_DECODE_FAILED);
}
// Reject non-integer JSON numbers (with decimal point or exponent).
crate::validate::validate_number(bytes)?;
// After ABNF validation, integer-only inputs have no `.`/`e`/`E`.
if bytes.iter().any(|&b| b == b'.' || b == b'e' || b == b'E') {
return Err(qjd_err::QJD_TYPE_MISMATCH);
}
let (neg, rest) = match bytes[0] {
b'-' => (true, &bytes[1..]),
_ => (false, bytes),
};
if rest.is_empty() || !rest.iter().all(|c| c.is_ascii_digit()) {
return Err(qjd_err::QJD_DECODE_FAILED);
}
// ABNF guarantees `rest` is non-empty and digit-only here.
let mut v: i64 = 0;
for &c in rest {
let d = (c - b'0') as i64;
Expand All @@ -29,11 +25,13 @@ pub(crate) fn parse_i64(bytes: &[u8]) -> Result<i64, qjd_err> {
}

pub(crate) fn parse_f64(bytes: &[u8]) -> Result<f64, qjd_err> {
if bytes.is_empty() {
return Err(qjd_err::QJD_DECODE_FAILED);
}
crate::validate::validate_number(bytes)?;
let s = std::str::from_utf8(bytes).map_err(|_| qjd_err::QJD_DECODE_FAILED)?;
s.parse::<f64>().map_err(|_| qjd_err::QJD_DECODE_FAILED)
match s.parse::<f64>() {
Ok(v) if v.is_finite() => Ok(v),
Ok(_) => Err(qjd_err::QJD_NUMBER_OUT_OF_RANGE),
Err(_) => Err(qjd_err::QJD_DECODE_FAILED),
}
}

#[cfg(test)]
Expand Down Expand Up @@ -63,7 +61,7 @@ mod tests {

#[test]
fn i64_rejects_empty() {
assert_eq!(parse_i64(b""), Err(qjd_err::QJD_DECODE_FAILED));
assert_eq!(parse_i64(b""), Err(qjd_err::QJD_INVALID_NUMBER));
}

#[test] fn f64_zero() { assert_eq!(parse_f64(b"0.0").unwrap(), 0.0); }
Expand All @@ -73,6 +71,6 @@ mod tests {

#[test]
fn f64_rejects_garbage() {
assert_eq!(parse_f64(b"hello"), Err(qjd_err::QJD_DECODE_FAILED));
assert_eq!(parse_f64(b"hello"), Err(qjd_err::QJD_INVALID_NUMBER));
}
}
12 changes: 9 additions & 3 deletions src/decode/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ pub(crate) fn decode_string(
buf: &[u8], start: usize, end: usize, scratch: &mut Vec<u8>,
) -> Result<(*const u8, usize), qjd_err> {
let slice = &buf[start..end];
crate::validate::validate_string_span(slice)?;
if memchr::memchr(b'\\', slice).is_none() {
return Ok((slice.as_ptr(), slice.len()));
}
Expand Down Expand Up @@ -163,16 +164,21 @@ mod tests {

#[test]
fn invalid_hex_in_unicode_fails() {
assert_eq!(d(b"\\uZZZZ").unwrap_err(), qjd_err::QJD_DECODE_FAILED);
// validate_string_span (called first) catches non-hex digits as
// QJD_INVALID_STRING; the decode loop would also catch it as
// QJD_DECODE_FAILED, but we never reach it.
assert_eq!(d(b"\\uZZZZ").unwrap_err(), qjd_err::QJD_INVALID_STRING);
}

#[test]
fn unknown_escape_fails() {
assert_eq!(d(b"\\q").unwrap_err(), qjd_err::QJD_DECODE_FAILED);
// validate_string_span catches unknown escape introducers first.
assert_eq!(d(b"\\q").unwrap_err(), qjd_err::QJD_INVALID_STRING);
}

#[test]
fn dangling_backslash_fails() {
assert_eq!(d(b"a\\").unwrap_err(), qjd_err::QJD_DECODE_FAILED);
// validate_string_span catches a trailing lone backslash first.
assert_eq!(d(b"a\\").unwrap_err(), qjd_err::QJD_INVALID_STRING);
}
}
38 changes: 37 additions & 1 deletion src/doc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,31 @@ pub struct Document<'a> {

impl<'a> Document<'a> {
pub fn parse(buf: &'a [u8]) -> Result<Self, qjd_err> {
Self::parse_with_options(buf, &crate::options::Options::default())
}

pub fn parse_with_options(
buf: &'a [u8],
opts: &crate::options::Options,
) -> Result<Self, qjd_err> {
// RFC 8259 §2: "A JSON text is a serialized value."
// Empty input and whitespace-only input contain no value.
if buf.iter().all(|&b| matches!(b, b' ' | b'\t' | b'\n' | b'\r')) {
return Err(qjd_err::QJD_PARSE_ERROR);
}

let max_depth = opts.effective_max_depth();
let mut indices = Vec::new();
crate::scan::scan(buf, &mut indices).map_err(|_| qjd_err::QJD_PARSE_ERROR)?;
// Sentinel simplifies boundary checks during Phase 2.
indices.push(u32::MAX);

crate::validate::validate_depth(buf, &indices, max_depth)?;

if opts.is_eager() {
crate::validate::validate_trailing(buf, &indices)?;
crate::validate::validate_eager_values(buf, &indices)?;
}
Comment on lines +18 to +38

Ok(Self {
buf,
indices,
Expand Down Expand Up @@ -169,4 +190,19 @@ mod tests {
fn parse_error_on_malformed() {
assert!(Document::parse(b"{").is_err());
}

#[test]
fn parse_with_options_defaults_match_parse() {
let opts = crate::options::Options::default();
let a = Document::parse(b"{\"a\":1}").unwrap();
let b = Document::parse_with_options(b"{\"a\":1}", &opts).unwrap();
assert_eq!(a.indices, b.indices);
}

#[test]
fn parse_with_lazy_skips_eager_validation() {
// Trailing content is an eager-only check; lazy must accept it.
let opts = crate::options::Options { mode: crate::options::QJD_MODE_LAZY, max_depth: 0 };
assert!(Document::parse_with_options(b"{}garbage", &opts).is_ok());
}
}
Loading
Loading