Skip to content

feat(gfql): enrich GFQLSchemaError context with available_labels/relationship_types/properties + closest_match #1634

Description

@lmeyerov

Background

PR #1457 landed gfql_validate() with structured GFQLValidationError / GFQLSchemaError exceptions that .to_dict() to {code, message, field, value, suggestion, operation_index}. The error structure is LLM-parseable.

But:

  • suggestion is often None for schema errors (e.g., unknown label, unknown relationship type, unknown property)
  • Errors do not include available_labels / available_relationship_types / available_properties[label] in the error context, despite GraphSchema.node_columns_by_label / edge_columns_by_type already exposing them
  • No closest_match / fuzzy-suggestion (Levenshtein or difflib.get_close_matches) for typos
  • collect_all=False is the default, forcing AI-synthesizer agents to round-trip per error

For an AI query synthesizer producing GFQL via LLM, the difference between unknown label 'Persn' (current) and unknown label 'Persn' — available: ['Person', 'Company', 'Project']; closest match: 'Person' (proposed) is the difference between 5 round-trips of guessing and 1 successful repair.

Goal

Enrich GFQLSchemaError.context (and equivalents) with the schema catalog the validator already has access to, so consumers (especially LLM agents) get actionable repair information on the first failure.

Scope

  • Extend GFQLSchemaError.context (and per-error-type context) with:
    • available_labels: list[str] when the error involves a label
    • available_relationship_types: list[str] when the error involves a relationship
    • available_properties: dict[str, list[str]] (per-label property map) when the error involves a property
    • closest_match: Optional[str] populated via stdlib difflib.get_close_matches(name, available, n=1, cutoff=0.6) when applicable
  • Update .to_dict() serialization to include the new context fields
  • Anchored regression tests for each enrichment case (unknown label, unknown relationship, unknown property, typo→closest_match)
  • Document the enriched error shape in docstrings + RTD

Non-scope

  • No change to collect_all default (separate ergonomics improvement, may follow as gfql_validate_all() shorthand)
  • No fuzzy matching beyond stdlib difflib (no rapidfuzz dependency)
  • No changes to error codes or message strings (additive context only)
  • No changes to Cypher binder error path beyond context enrichment (separate concern from message wording)

Suggested context shape

{
  "code": "UNKNOWN_LABEL",
  "message": "unknown label 'Persn' in MATCH clause",
  "field": "label",
  "value": "Persn",
  "operation_index": 0,
  "suggestion": None,  # existing
  # NEW:
  "available_labels": ["Person", "Company", "Project"],
  "closest_match": "Person",
}

For property errors:

{
  "code": "UNKNOWN_PROPERTY",
  "message": "unknown property 'naem' on label 'Person'",
  "field": "property",
  "value": "naem",
  "operation_index": 0,
  # NEW:
  "available_properties": {"Person": ["id", "name", "age"]},
  "closest_match": "name",
}

Acceptance

  • Unknown-label error includes available_labels + closest_match
  • Unknown-relationship error includes available_relationship_types + closest_match
  • Unknown-property error includes available_properties (per-label) + closest_match
  • closest_match returns None when no close-enough candidate exists (cutoff=0.6)
  • Anchored regression tests cover each enrichment case
  • .to_dict() includes the new fields
  • Docs reflect the enriched shape
  • Experimental marking preserved per #1457
  • Compiler-plan surface touched: no

Cross-refs

  • Landed in #1457 / #1337
  • Surfaced by AI-synthesizer user-testing 2026-05-25 (P1 finding)
  • Related to existing pygraphistry#1338 (inference) — once inference lands, "unknown label" errors may decrease because more labels are auto-discovered; this enrichment still helps for declared-strict-mode and inference-disagreement cases
  • Metaissue: #1058, #1046
  • Downstream consumer: AI query synthesizers (graphistrygpt etc) need first-try-valid query generation; this enrichment is the difference between guessing and structured repair

Effort

Small-to-medium (~100 prod LOC + ~80 tests). No new dependencies (stdlib difflib). Self-contained, focused on the validator's existing error-context surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions