Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ A new module must satisfy **all three**, otherwise it goes to `starpkg/*`:
2. **Universally needed.** Broad, domain-neutral utility. Domain modules (sqlite, web, llm, mq, s3…) are starpkg's job no matter how clean they are.
3. **Zero third-party dependencies.** Stdlib-only (or an extension of an existing core module). Any `go.sum` entry is inherited by every downstream — one third-party requirement sends the module to starpkg.

**The vendoring exception** (the `lib/json/internal/jsonrepair` precedent): a frozen, permissively-licensed, **stdlib-only** third-party runtime may be vendored under `lib/<m>/internal/<pkg>/` to keep `go.sum` clean, when the capability is judged worth it. Requirements: pin to a specific upstream release (record it), copy runtime `.go` files only (no `_test.go`, no upstream test deps), keep the upstream LICENSE in the directory, add a `doc.go` stating provenance + "do not edit by hand; re-vendor to update", golden-lock the observed behavior in our tests, and exclude the path in `codecov.yml` and `.codacy.yml`. Measure the binary delta in the go1.19 container before committing to it.
**The vendoring exception** (the `lib/json/internal/jsonrepair` precedent): a frozen, **same-license (MIT)**, **stdlib-only** third-party runtime may be vendored under `lib/<m>/internal/<pkg>/` to keep `go.sum` clean, when the capability is judged worth it. Requirements: pin to a specific upstream release (record it), copy runtime `.go` files only (no `_test.go`, no upstream test deps), keep the upstream LICENSE in the directory, add a `doc.go` stating provenance + "do not edit by hand; re-vendor to update", golden-lock the observed behavior in our tests, and exclude the path in `codecov.yml` and `.codacy.yml`. Measure the binary delta in the go1.19 container before committing to it.

**License hygiene caps vendoring.** Never vendor differently-licensed source — even permissive (Apache-2.0) — into this MIT repository: the copied files keep their license and the repo becomes mixed-license. For a capability worth a differently-licensed library, use a **module dependency** instead, and only when it passes the evaluation bar: its go.mod must not exceed this repo's Go floor, it should bring zero (or near-zero) transitive dependencies into `go.sum`, the binary delta is measured in the go1.19 container, and its panic surface is audited + hostile-input tested (the `lib/json` jsonschema decision: Apache-2.0, go1.19 exactly, zero requires, +256 KiB measured → module dep, repo stays pure MIT).

**Python-parity rule.** If a module mirrors a Python stdlib API (`regex` ⇒ `re`), the shapes must match CPython exactly — signatures, return types (`findall`/`split` return **lists**, not tuples — a real bug class), group shaping, flag values. Where the Go engine genuinely can't (RE2: lookaround, backreferences), **fail to compile with a clear error**; never silently approximate. Same-name-different-shape is worse than absent.

Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ require (
github.com/google/uuid v1.6.0
github.com/h2so5/here v0.0.0-20200815043652-5e14eb691fae
github.com/montanaflynn/stats v0.7.1
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1
github.com/spyzhov/ajson v0.9.6
go.starlark.net v0.0.0-20260324133313-ffb3f39dd27a
go.uber.org/atomic v1.11.0
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ github.com/montanaflynn/stats v0.7.1 h1:etflOAAHORrCC44V+aR6Ftzort912ZU+YLiSTuV8
github.com/montanaflynn/stats v0.7.1/go.mod h1:etXPPgVO6n31NxCd9KQUMvCM+ve0ruNzt6R8Bnaayow=
github.com/pkg/errors v0.8.1 h1:iURUrRGxPUNPdy5/HRSm+Yj6okJ6UtLINN0Q9M4+h3I=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1 h1:lZUw3E0/J3roVtGQ+SCrUrg3ON6NgVqpn3+iol9aGu4=
github.com/santhosh-tekuri/jsonschema/v5 v5.3.1/go.mod h1:uToXkOrWAZ6/Oc07xWQrPOhJotwFIyu2bBVN41fcDUY=
github.com/spyzhov/ajson v0.9.6 h1:iJRDaLa+GjhCDAt1yFtU/LKMtLtsNVKkxqlpvrHHlpQ=
github.com/spyzhov/ajson v0.9.6/go.mod h1:a6oSw0MMb7Z5aD2tPoPO+jq11ETKgXUr2XktHdT8Wt8=
github.com/stretchr/testify v1.8.0 h1:pSgiaMZlXftHpm5L7V1+rVB+AZJydKsMxsQBIJw4PKk=
Expand Down
44 changes: 44 additions & 0 deletions lib/json/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -329,3 +329,47 @@ print("Error:", error)
# Result: {"a": 1}
# Error: None
```

### `validate(data, schema) None`

The validate function checks a JSON document against a [JSON Schema](https://json-schema.org) (drafts 4, 6, 7, 2019-09 and 2020-12, detected from the `$schema` keyword; 2020-12 by default). It accepts two positional arguments — both may be a JSON string, bytes, or a Starlark value (dict, list, etc.):
- data: the document to check
- schema: the JSON Schema

It returns None when the data conforms. When the data is invalid, it fails with a message listing each violation prefixed by its [JSON Pointer](https://datatracker.ietf.org/doc/html/rfc6901) location, e.g. `at /age: must be >= 0 but found -3`.

Schemas must be **self-contained**: a `$ref` to an external resource (a file or the network) is an error. Compiled schemas are cached, so repeated validation against the same schema text has no recompilation cost.

#### Examples

**Basic**

Validate a decoded value against a schema written as a Starlark dict.

```python
load('json', 'validate')
schema = {'type': 'object', 'required': ['name'], 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer', 'minimum': 0}}}
print(validate({'name': 'Ann', 'age': 3}, schema))
# Output: None
```

### `try_validate(data, schema) tuple`

The try_validate function is a variant of validate that distinguishes three outcomes instead of aborting:
- `(True, None)` — the data conforms to the schema.
- `(False, details)` — the data was checked and is invalid; details lists the violations with their JSON Pointer locations.
- `(None, error)` — validation could not run at all (invalid schema, malformed JSON text, or bad arguments).

#### Examples

**Basic**

```python
load('json', 'try_validate')
ok, err = try_validate('{"age": -3}', '{"type":"object","properties":{"age":{"type":"integer","minimum":0}}}')
print("OK:", ok)
print("Error:", err)
# Output:
# OK: False
# Error: at /age: must be >= 0 but found -3
```
24 changes: 13 additions & 11 deletions lib/json/json.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,19 @@ func LoadModule() (starlark.StringDict, error) {
mod := starlarkstruct.Module{
Name: ModuleName,
Members: starlark.StringDict{
"dumps": starlark.NewBuiltin(ModuleName+".dumps", dumps),
"try_dumps": starlark.NewBuiltin(ModuleName+".try_dumps", tryDumps),
"try_encode": starlark.NewBuiltin(ModuleName+".try_encode", tryEncode),
"try_decode": starlark.NewBuiltin(ModuleName+".try_decode", tryDecode),
"try_indent": starlark.NewBuiltin(ModuleName+".try_indent", tryIndent),
"path": starlark.NewBuiltin(ModuleName+".path", generateJsonPath(false)),
"try_path": starlark.NewBuiltin(ModuleName+".try_path", generateJsonPath(true)),
"eval": starlark.NewBuiltin(ModuleName+".eval", generateJsonEval(false)),
"try_eval": starlark.NewBuiltin(ModuleName+".try_eval", generateJsonEval(true)),
"repair": starlark.NewBuiltin(ModuleName+".repair", generateRepair(false)),
"try_repair": starlark.NewBuiltin(ModuleName+".try_repair", generateRepair(true)),
"dumps": starlark.NewBuiltin(ModuleName+".dumps", dumps),
"try_dumps": starlark.NewBuiltin(ModuleName+".try_dumps", tryDumps),
"try_encode": starlark.NewBuiltin(ModuleName+".try_encode", tryEncode),
"try_decode": starlark.NewBuiltin(ModuleName+".try_decode", tryDecode),
"try_indent": starlark.NewBuiltin(ModuleName+".try_indent", tryIndent),
"path": starlark.NewBuiltin(ModuleName+".path", generateJsonPath(false)),
"try_path": starlark.NewBuiltin(ModuleName+".try_path", generateJsonPath(true)),
"eval": starlark.NewBuiltin(ModuleName+".eval", generateJsonEval(false)),
"try_eval": starlark.NewBuiltin(ModuleName+".try_eval", generateJsonEval(true)),
"repair": starlark.NewBuiltin(ModuleName+".repair", generateRepair(false)),
"try_repair": starlark.NewBuiltin(ModuleName+".try_repair", generateRepair(true)),
"validate": starlark.NewBuiltin(ModuleName+".validate", generateValidate(false)),
"try_validate": starlark.NewBuiltin(ModuleName+".try_validate", generateValidate(true)),
},
}
for k, v := range stdjson.Module.Members {
Expand Down
188 changes: 188 additions & 0 deletions lib/json/json_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -978,3 +978,191 @@ func TestJSONRepair(t *testing.T) {
})
}
}

func TestJSONValidate(t *testing.T) {
tests := []struct {
name string
script string
wantErr string
}{
{
name: `validate: conforming data returns None`,
script: itn.HereDoc(`
load('json', 'validate')
schema = '{"type":"object","required":["name"],"properties":{"name":{"type":"string"},"age":{"type":"integer","minimum":0}}}'
assert.eq(validate('{"name":"Ann","age":3}', schema), None)
`),
},
{
name: `validate: schema and data as starlark values`,
script: itn.HereDoc(`
load('json', 'validate')
schema = {'type': 'object', 'required': ['name'], 'properties': {'name': {'type': 'string'}}}
assert.eq(validate({'name': 'Ann'}, schema), None)
assert.eq(validate([1, 2, 3], {'type': 'array', 'items': {'type': 'integer'}}), None)
`),
},
{
name: `validate: violation message carries the JSON pointer`,
script: itn.HereDoc(`
load('json', 'validate')
schema = '{"type":"object","properties":{"age":{"type":"integer","minimum":0}}}'
validate('{"age":-3}', schema)
`),
wantErr: `at /age: must be >= 0`,
},
{
name: `try_validate: the three outcomes`,
script: itn.HereDoc(`
load('json', 'try_validate')
schema = '{"type":"object","required":["name"]}'
ok, err = try_validate('{"name":"a"}', schema)
assert.eq(ok, True)
assert.eq(err, None)
bad, err2 = try_validate('{}', schema)
assert.eq(bad, False)
assert.true('missing properties' in err2)
cant, err3 = try_validate('{}', 'not a schema at all')
assert.eq(cant, None)
assert.true(err3 != None)
`),
},
{
name: `validate: draft-7 schema via $schema`,
script: itn.HereDoc(`
load('json', 'validate')
schema = '{"$schema":"http://json-schema.org/draft-07/schema#","type":"string"}'
assert.eq(validate('"hello"', schema), None)
`),
},
{
name: `validate: external file $ref is blocked`,
script: itn.HereDoc(`
load('json', 'validate')
validate('{}', '{"$ref":"file:///etc/passwd"}')
`),
wantErr: `not allowed`,
},
{
name: `validate: external http $ref is blocked`,
script: itn.HereDoc(`
load('json', 'validate')
validate('{}', '{"$ref":"http://example.com/s.json"}')
`),
wantErr: `not allowed`,
},
{
name: `validate: malformed data text cannot run`,
script: itn.HereDoc(`
load('json', 'validate')
validate('{not json', '{"type":"object"}')
`),
wantErr: `invalid data`,
},
{
name: `validate: bad schema cannot run`,
script: itn.HereDoc(`
load('json', 'validate')
validate('{}', '{"type":"nope"}')
`),
wantErr: `invalid schema`,
},
{
name: `validate: long violation list is capped`,
script: itn.HereDoc(`
load('json', 'try_validate')
schema = '{"type":"array","items":{"type":"integer"}}'
data = '["a","b","c","d","e","f","g","h","i","j","k","l"]'
ok, err = try_validate(data, schema)
assert.eq(ok, False)
assert.true('and' in err and 'more' in err)
`),
},
{
name: `validate: compiled schema cache hit`,
script: itn.HereDoc(`
load('json', 'validate')
schema = '{"type":"integer"}'
assert.eq(validate('1', schema), None)
assert.eq(validate('2', schema), None)
`),
},
{
name: `validate: missing arguments`,
script: itn.HereDoc(`
load('json', 'validate')
validate('{}')
`),
wantErr: `json.validate: missing argument for schema`,
},
{
name: `try_validate: missing arguments`,
script: itn.HereDoc(`
load('json', 'try_validate')
v, err = try_validate()
assert.eq(v, None)
assert.true('missing argument' in err)
`),
},
{
name: `validate: unserializable data value cannot run`,
script: itn.HereDoc(`
load('json', 'validate')
validate(lambda x: x, '{"type":"object"}')
`),
wantErr: `json.validate:`,
},
{
name: `validate: many distinct schemas exercise cache eviction`,
script: itn.HereDoc(`
load('json', 'validate')
def churn():
for i in range(70):
schema = '{"type":"object","maxProperties":' + str(i + 1) + '}'
assert.eq(validate('{}', schema), None)
churn()
`),
},
{
name: `robustness: self-referential $ref errors, no panic`,
script: itn.HereDoc(`
load('json', 'try_validate')
ok, err = try_validate('{"a":{"a":{"a":{}}}}', '{"$ref":"#"}')
assert.true(ok == None or ok == True or ok == False)
`),
},
{
name: `robustness: invalid pattern regex errors, no panic`,
script: itn.HereDoc(`
load('json', 'validate')
validate('"x"', '{"type":"string","pattern":"("}')
`),
wantErr: `invalid schema`,
},
{
name: `robustness: deeply nested data validates, no panic`,
script: itn.HereDoc(`
load('json', 'validate')
deep = '[' * 200 + ']' * 200
assert.eq(validate(deep, '{}'), None)
`),
},
{
name: `robustness: uniqueItems over objects, no panic`,
script: itn.HereDoc(`
load('json', 'try_validate')
ok, err = try_validate('[{"a":1},{"a":1}]', '{"type":"array","uniqueItems":true}')
assert.eq(ok, False)
assert.true(err != None)
`),
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
res, err := itn.ExecModuleWithErrorTest(t, json.ModuleName, json.LoadModule, tt.script, tt.wantErr, nil)
if (err != nil) != (tt.wantErr != "") {
t.Errorf("json(%q) expects error = '%v', actual error = '%v', result = %v", tt.name, tt.wantErr, err, res)
}
})
}
}
Loading
Loading