Skip to content

feature - RFC 106: incan_codegraph code-index export for agentic DX #573

@dannymeijer

Description

@dannymeijer

Area

  • Tooling (CLI/formatter/test runner)
  • Editor integration (LSP/VS Code extension)
  • Documentation
  • Architecture-advice tooling
  • Agentic DX / MCP integration
  • Risk/process intelligence

Status

Target this for the v0.4 line. Do not pull this into v0.3.

Related draft RFC: RFC 106 (Compiler-backed agent context graph).

Problem statement

Agents and maintainers currently have to reconstruct Incan code relationships from source searches, compiler internals, RFCs, and docs. That makes impact analysis expensive for questions such as “what calls this?”, “what code depends on this language surface?”, “which docs/tests mention this module?”, “which architecture rule produced this finding?”, or “which area should be inspected first before a risky edit?”. External CodeGraph-style tools are promising, but Incan is not just another tree-sitter syntax: the authoritative facts live in the Incan compiler pipeline after parsing, import resolution, typechecking, checked API metadata, diagnostics, stdlib/package metadata, and future architecture-advice analysis.

This should not be confused with RFC 047 / std.graph, which is the runtime graph data-structure surface. This issue tracks code-intelligence graph facts for tooling, IDE support, architecture advice, process-risk evidence, and agentic DX.

Proposed solution

Build an Incan-owned compiler-backed agent context graph. The first implementation can keep the existing incan_codegraph name, but the product shape is broader than a raw code-index export:

  • Add/extend crates/incan_codegraph as a dependency-light schema/helper crate for nodes, edges, stable ids, spans/ranges, schema versioning, provenance, JSON/JSONL serialization, and compact context helpers.
  • Add a CLI export command, likely under incan tools codegraph export initially, with the RFC deciding whether the stable spelling becomes agent-graph, context-graph, codegraph, or similar.
  • Export facts such as files/modules, declarations, imports, containment, public API members, source spans, diagnostics, stdlib/package metadata, references/calls where available, and generated/artifact metadata where available.
  • Add Incan-specific body/advisory facts where available: call_site, reference_site, match_dispatch, dispatch/pattern metadata, declaration metadata annotations, and derived architecture findings with evidence links back to source facts.
  • Add optional process-risk facts where available: churn, ownership/congestion, co-change/hidden coupling, coverage gaps, stale decisions, trend snapshots, and other local deterministic evidence.
  • Support strict checked export and tolerant --allow-errors export. Tolerant export should preserve parseable facts and diagnostics while clearly marking unchecked or syntax-derived provenance.
  • Make LSP/editor features, CLI export, MCP context tools, and architecture-advice tooling share the same graph fact model instead of creating separate source extractors that drift.
  • Add task-ranked context packing as the real agent-facing surface: one call should return a compact, deterministic, token-budgeted context pack for a concrete task.
  • Add workspaces/codegraph/ docs/examples/config showing how external CodeGraph/Knowing/Repowise-style importers can consume or compare against the exported facts.
  • Keep external graph storage, embeddings, SurrealDB, hosted services, and direct CodeGraph/Knowing/Repowise dependencies out of Incan core/compiler crates.

The intended integration path is “Incan exports authoritative facts; external tools ingest or compare against them,” not “an external graph tool reparses Incan and becomes the semantic source of truth.”

Prior art to account for

  • CodeGraph: local indexed graph queries and MCP ergonomics.
  • Knowing: content-addressed graph snapshots, task-ranked context packing, compact formats, feedback expiry, and integrity/staleness ideas.
  • Repowise: deterministic codebase intelligence across graph, git, docs, decisions, code health, MCP, auto-sync, process metrics, and public health/defect benchmarking. Particularly relevant for Incan architect: use churn, ownership, co-change, coverage, and decision staleness as evidence, while preserving caveats such as file-size confounds.
  • Aider repo map: token-budgeted summaries and graph ranking, but too file-level/coarse for Incan’s compiler facts.
  • codebase-memory and GitNexus: persistent MCP graph workflows, impact tools, stale-index guidance, and agent instructions.
  • SCIP/LSIF/LSP: code-intelligence interchange and editor navigation lineage.
  • Repomix-style full repo packing: useful fallback baseline, but too blunt as the default agent path.

Alternatives considered

  • Put this in std.graph / stdlib: rejected because RFC 047 owns runtime graph structures for generated programs, while this feature is tooling metadata.
  • Put everything in incan_core: rejected because incan_core should remain dependency-light semantic policy, not tool export/storage plumbing.
  • Start with tree-sitter-Incan support in external tools: useful later for editor-grade parsing, but weaker as a first step because syntax alone cannot represent imports, aliases, stdlib activations, checked metadata, diagnostics, or typechecked relationships.
  • Fork CodeGraph first: possible as an integration experiment, but it risks shaping Incan around another project’s parser/storage assumptions before Incan has a stable fact export contract.
  • Adopt Repowise-style health scoring directly: useful prior art, but not a default because Incan should tie risk signals to compiler-backed declarations, dispatches, packages, and architecture findings rather than only file-level tree-sitter metrics. Any predictive claim needs an Incan-specific benchmark.
  • Use only LSP: rejected because LSP is live/editor-oriented and does not solve durable graph snapshots, task context packing, feedback expiry, or offline graph interchange.

Scope / acceptance criteria

In scope for the v0.4 design/implementation slice:

  • RFC 106 drafted and accepted before widening the prototype into public behavior.
  • incan_codegraph crate with documented schema types, provenance, source ranges, stable ids, and serialization.
  • A CLI export path that produces deterministic JSON/JSONL for at least files/modules, top-level declarations, imports, containment, diagnostics, and source spans.
  • Directory input support that recursively discovers .incn files.
  • Tolerant export mode for work-in-progress packages, with diagnostics included as graph records and unchecked facts clearly marked.
  • Focused tests for schema serialization and CLI output shape.
  • At least one multi-file/import fixture and one tolerant/broken-source fixture.
  • Docs explaining the distinction from RFC 047 std.graph, the intended external ingestion path, and initial limitations.
  • Design notes for LSP consumption and architecture-advice consumption using the same fact model.
  • Schema hooks for optional process-risk facts with explicit provenance and evidence.

Stretch scope after the baseline export is stable:

  • Reference/call/body facts such as call_site, reference_site, and match_dispatch.
  • Derived architecture findings with evidences links to graph facts.
  • Deterministic process-risk facts: churn, ownership/congestion, hidden coupling/co-change, coverage gaps, stale decisions, and trend snapshots.
  • Task-ranked compact context packing for agents.
  • MCP tools/resources for graph summary, neighbors, stale status, risk/context triage, and context_for_task.
  • External importer/comparison experiments for CodeGraph/Knowing/Repowise-style stores.
  • A small Incan-specific benchmark for context quality and risk-signal usefulness, including at least one size-control check before making predictive claims.

Out of scope for the first slice:

  • Embeddings, vector search, hosted indexing, or remote model inference.
  • Direct SurrealDB/Neo4j/KuzuDB/CodeGraph/Knowing/Repowise dependency inside Incan compiler crates.
  • Tree-sitter-Incan grammar work.
  • Full semantic call graph precision for every expression form.
  • Defect-prediction claims without an Incan-specific benchmark and visible caveats.
  • Runtime stdlib APIs.
  • Shipping this in v0.3.

Done when:

  • incan can export a stable, deterministic codegraph fact document for a small Incan project and for the stdlib directory.
  • Tests cover the exported shape, a multi-file/import scenario, and tolerant export behavior.
  • Docs and RFC text make the agent-context, LSP, architecture-advice, process-risk, and external ingestion boundaries explicit.
  • The implementation can be parked or advanced for v0.4 without misleading v0.3 release notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationeditor integrationSuggestions, features, or bugs related to the Editor integration (`vscode extension` and LSP)featureNew feature or requesttoolingSuggestions, features, or bugs related to the Tooling (CLI/formatter/test runner)
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions