Skip to content

kg-extract-agent is a single-pass extractor, not an iterative agent — rename or restructure #911

Description

@aviyashchin

Observation

The processor at trustgraph-flow/trustgraph/extract/kg/agent/extract.py is named kg-extract-agent, which suggests an iterative ReAct-style extraction agent (gather → analyze → refine → re-extract).

In practice, the implementation is a single LLM call per chunk with the same shape as kg-extract-definitions and kg-extract-relationships. There's no iteration, no refinement loop, no tool use. It's a third single-pass extractor with a more generic prompt.

Why this matters

When tuning extraction quality, operators (us) read kg-extract-agent and assume it's the heavy-hitter "smart" extractor that beats the single-pass ones. We then waste time wiring it in expecting big yield gains. It's not — it's parallel cost for marginal yield.

The actual ReAct agent code is in trustgraph-flow/trustgraph/agent/react/, which is the retrieval-side tool harness used by tg-invoke-agent. The extraction-side agent is a different thing with overlapping naming.

Proposal

One of:

  1. Rename to kg-extract-generic or kg-extract-comprehensive to remove the iteration-loop implication. Document explicitly: "single-pass extractor with a generic prompt; equivalent shape to definitions/relationships extractors."
  2. Promote to actually iterative. Implement gather-extract-refine-re-extract loop (open weakly-connected entities, fetch neighborhoods, re-extract). This is the iterative extraction pattern documented in research; would be a real differentiator over LangChain-style single-pass.
  3. Document the current shape clearly in docs.trustgraph.ai/guides/ontology-rag/ so operators don't expect what's not there.

Option 1 is cheap and unambiguous. Option 2 is large work but a real product win. Either beats the status quo.

Stack

TrustGraph 2.3.21.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions