Background
PR #1457 landed gfql_validate() with structured GFQLValidationError / GFQLSchemaError exceptions that .to_dict() to {code, message, field, value, suggestion, operation_index}. The error structure is LLM-parseable.
But:
suggestion is often None for schema errors (e.g., unknown label, unknown relationship type, unknown property)
- Errors do not include
available_labels / available_relationship_types / available_properties[label] in the error context, despite GraphSchema.node_columns_by_label / edge_columns_by_type already exposing them
- No
closest_match / fuzzy-suggestion (Levenshtein or difflib.get_close_matches) for typos
collect_all=False is the default, forcing AI-synthesizer agents to round-trip per error
For an AI query synthesizer producing GFQL via LLM, the difference between unknown label 'Persn' (current) and unknown label 'Persn' — available: ['Person', 'Company', 'Project']; closest match: 'Person' (proposed) is the difference between 5 round-trips of guessing and 1 successful repair.
Goal
Enrich GFQLSchemaError.context (and equivalents) with the schema catalog the validator already has access to, so consumers (especially LLM agents) get actionable repair information on the first failure.
Scope
- Extend
GFQLSchemaError.context (and per-error-type context) with:
available_labels: list[str] when the error involves a label
available_relationship_types: list[str] when the error involves a relationship
available_properties: dict[str, list[str]] (per-label property map) when the error involves a property
closest_match: Optional[str] populated via stdlib difflib.get_close_matches(name, available, n=1, cutoff=0.6) when applicable
- Update
.to_dict() serialization to include the new context fields
- Anchored regression tests for each enrichment case (unknown label, unknown relationship, unknown property, typo→closest_match)
- Document the enriched error shape in docstrings + RTD
Non-scope
- No change to
collect_all default (separate ergonomics improvement, may follow as gfql_validate_all() shorthand)
- No fuzzy matching beyond stdlib
difflib (no rapidfuzz dependency)
- No changes to error codes or message strings (additive context only)
- No changes to Cypher binder error path beyond context enrichment (separate concern from message wording)
Suggested context shape
{
"code": "UNKNOWN_LABEL",
"message": "unknown label 'Persn' in MATCH clause",
"field": "label",
"value": "Persn",
"operation_index": 0,
"suggestion": None, # existing
# NEW:
"available_labels": ["Person", "Company", "Project"],
"closest_match": "Person",
}
For property errors:
{
"code": "UNKNOWN_PROPERTY",
"message": "unknown property 'naem' on label 'Person'",
"field": "property",
"value": "naem",
"operation_index": 0,
# NEW:
"available_properties": {"Person": ["id", "name", "age"]},
"closest_match": "name",
}
Acceptance
- Unknown-label error includes
available_labels + closest_match
- Unknown-relationship error includes
available_relationship_types + closest_match
- Unknown-property error includes
available_properties (per-label) + closest_match
closest_match returns None when no close-enough candidate exists (cutoff=0.6)
- Anchored regression tests cover each enrichment case
.to_dict() includes the new fields
- Docs reflect the enriched shape
- Experimental marking preserved per
#1457
- Compiler-plan surface touched: no
Cross-refs
- Landed in
#1457 / #1337
- Surfaced by AI-synthesizer user-testing 2026-05-25 (P1 finding)
- Related to existing
pygraphistry#1338 (inference) — once inference lands, "unknown label" errors may decrease because more labels are auto-discovered; this enrichment still helps for declared-strict-mode and inference-disagreement cases
- Metaissue:
#1058, #1046
- Downstream consumer: AI query synthesizers (graphistrygpt etc) need first-try-valid query generation; this enrichment is the difference between guessing and structured repair
Effort
Small-to-medium (~100 prod LOC + ~80 tests). No new dependencies (stdlib difflib). Self-contained, focused on the validator's existing error-context surface.
Background
PR
#1457landedgfql_validate()with structuredGFQLValidationError/GFQLSchemaErrorexceptions that.to_dict()to{code, message, field, value, suggestion, operation_index}. The error structure is LLM-parseable.But:
suggestionis oftenNonefor schema errors (e.g., unknown label, unknown relationship type, unknown property)available_labels/available_relationship_types/available_properties[label]in the error context, despiteGraphSchema.node_columns_by_label/edge_columns_by_typealready exposing themclosest_match/ fuzzy-suggestion (Levenshtein ordifflib.get_close_matches) for typoscollect_all=Falseis the default, forcing AI-synthesizer agents to round-trip per errorFor an AI query synthesizer producing GFQL via LLM, the difference between
unknown label 'Persn'(current) andunknown label 'Persn' — available: ['Person', 'Company', 'Project']; closest match: 'Person'(proposed) is the difference between 5 round-trips of guessing and 1 successful repair.Goal
Enrich
GFQLSchemaError.context(and equivalents) with the schema catalog the validator already has access to, so consumers (especially LLM agents) get actionable repair information on the first failure.Scope
GFQLSchemaError.context(and per-error-type context) with:available_labels: list[str]when the error involves a labelavailable_relationship_types: list[str]when the error involves a relationshipavailable_properties: dict[str, list[str]](per-label property map) when the error involves a propertyclosest_match: Optional[str]populated via stdlibdifflib.get_close_matches(name, available, n=1, cutoff=0.6)when applicable.to_dict()serialization to include the new context fieldsNon-scope
collect_alldefault (separate ergonomics improvement, may follow asgfql_validate_all()shorthand)difflib(no rapidfuzz dependency)Suggested context shape
{ "code": "UNKNOWN_LABEL", "message": "unknown label 'Persn' in MATCH clause", "field": "label", "value": "Persn", "operation_index": 0, "suggestion": None, # existing # NEW: "available_labels": ["Person", "Company", "Project"], "closest_match": "Person", }For property errors:
{ "code": "UNKNOWN_PROPERTY", "message": "unknown property 'naem' on label 'Person'", "field": "property", "value": "naem", "operation_index": 0, # NEW: "available_properties": {"Person": ["id", "name", "age"]}, "closest_match": "name", }Acceptance
available_labels+closest_matchavailable_relationship_types+closest_matchavailable_properties(per-label) +closest_matchclosest_matchreturnsNonewhen no close-enough candidate exists (cutoff=0.6).to_dict()includes the new fields#1457Cross-refs
#1457/#1337pygraphistry#1338(inference) — once inference lands, "unknown label" errors may decrease because more labels are auto-discovered; this enrichment still helps for declared-strict-mode and inference-disagreement cases#1058,#1046Effort
Small-to-medium (~100 prod LOC + ~80 tests). No new dependencies (stdlib
difflib). Self-contained, focused on the validator's existing error-context surface.