feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback by Marc-cn · Pull Request #137 · kusari-oss/darnit

Marc-cn · 2026-03-25T18:20:55Z

Summary

Extends Darnit into a self-driving agentic orchestrator. Adds a LangGraph state machine that drives the full audit pipeline autonomously, bring-your-own LLM key support for standalone mode, a darnit run CLI command, and a pluggable human feedback mechanism.

Type of Change

Bug fix (non-breaking change fixing an issue)
New feature (non-breaking change adding functionality)
Breaking change (fix or feature causing existing functionality to change)
Documentation update
Refactoring (no functional changes)

Framework Changes Checklist

If this PR modifies the darnit framework (packages/darnit/):

Updated framework spec (openspec/specs/framework-design/spec.md) if behavior changed
Ran uv run python scripts/validate_sync.py --verbose and it passes
Ran uv run python scripts/generate_docs.py and committed any doc changes

Control/TOML Changes Checklist

Not applicable — no controls or TOML modified in this PR.

If this PR modifies controls or TOML configuration:

Control metadata defined in TOML (not Python code)
SARIF fields (description, severity, help_url) included where appropriate
Ran validation to confirm TOML schema compliance

Testing

Tests pass locally (uv run pytest tests/ -v)
Added tests for new functionality (if applicable)
Linting passes (uv run ruff check .)

What was built

LangGraph state machine — darnit/agent/graph.py drives: load context → run checks → collect context → remediate → finish
Bring-your-own LLM — Anthropic, OpenAI, Ollama backends. API key from environment variable, never hardcoded
darnit run CLI command — triggers the full pipeline from the terminal
Pluggable human feedback — interactive (CLI prompts mid-run) and noninteractive (collects questions, prints at end). Auto-detects mode based on whether running in a terminal or CI

Usage

darnit run .
darnit run . --llm-backend openai
darnit run . --feedback interactive
darnit run . --feedback noninteractive

Verified on this repo

62 controls checked: 27 passed, 13 failed (gh CLI not installed), 1 warned
4 plugins discovered: openssf-baseline, example-hygiene, gittuf, reproducibility

Known gaps

Remediate node logs fixes but does not call RemediationExecutor yet
Human feedback answers stored but do not trigger a re-audit of the control

Additional Notes

validate_sync.py and generate_docs.py fail on Windows due to a pre-existing cp1252 encoding issue unrelated to this PR

…ve modes

…imports

mlieberman85

PR Review: LangGraph Agentic Orchestrator

Tested end-to-end — darnit run . --feedback noninteractive successfully discovers 2 implementations, checks 62 controls (41 pass, 15 fail, 6 warn), queues feedback questions, and logs remediation candidates. The state machine works.

Bugs

DarnitState.check_results type mismatch (state.py:22): Annotated as list but default_factory=dict. Doesn't crash because run_checks overwrites it, but any code using check_results before that node runs will get a dict.

plugin.py indentation (lines 114, 136, 151, 165): Same issue as #136 — register_controls and the 3 new handler methods are dedented out of the ComplianceImplementation Protocol class. They become orphaned module-level functions.

Design issues

langgraph is a hard dependency for all CLI commands. cli.py:34 imports darnit_graph at module level, so darnit serve, darnit audit, darnit list all require langgraph installed. This also means uvx darnit run fails unless langgraph happens to be in the environment. Fix: lazy import inside cmd_run() + move langgraph to [project.optional-dependencies] as an agent extra.

darnit_graph compiles at import time (graph.py:226): build_graph() runs as a module-level side effect. Makes testing harder and means any LangGraph initialization failure breaks the entire module.

Scope gaps

These are acknowledged in the PR description, but they should either be fixed before merge or have tracking issues opened so they don't get lost:

remediate node is a placeholder — logs what it would fix but doesn't call RemediationExecutor
collect_context doesn't act on answers — human feedback is stored in state but doesn't trigger re-audit
Feedback answers are write-only — answers are collected but never read back by any downstream node

Without these, the darnit run pipeline discovers problems but can't close the loop on any of them. If the intent is to merge now and iterate, please open issues for each so they're tracked.

Minor

run_checks catches all exceptions broadly (except Exception) — bugs in the audit pipeline get swallowed into state.errors
plugin.py, loader.py, detectors.py changes are identical to #136 — should be a shared base PR
No tests in the diff — I wrote 61 covering state, feedback, LLM backends, graph nodes, and routing (all pass). Happy to contribute.

What's good

Clean state machine with clear node separation
LLM backend abstraction is solid (prompt building, response parsing, 3 backends + factory)
Feedback system nicely handles interactive vs CI with auto-detection
Conditional routing logic is simple and correct

…on Protocol

…g isinstance checks

…nggraph import, graph compiles lazily, plugin protocol fixes

Marc-cn · 2026-03-28T20:27:53Z

Fixes pushed:

check_results type mismatch: changed default_factory=dict to default_factory=list
plugin.py indentation: same fix as feat: Gittuf plugin — policy checks and commit signing #136, optional handlers moved out of Protocol to avoid breaking isinstance checks on existing plugins
Lazy langgraph import: moved import inside cmd_run(), moved langgraph to [project.optional-dependencies] as darnit[agent]
darnit_graph at module level: removed the singleton, build_graph() now called lazily inside cmd_run()

For the scope gaps (remediate placeholder, feedback answers not triggering re-audit, answers write-only), agreed these need tracking. Should I open issues on the main repo or would you prefer to track them differently?
Tests are now in the diff.

…otocol, loader forge/build storage, add tests

Marc-cn · 2026-04-02T15:26:46Z

Merge conflicts resolved and uv.lock regenerated. Opened tracking issues for the three scope gaps:
#144 — remediate node does not call RemediationExecutor
#145 — human feedback answers do not trigger re-audit
#146 — feedback answers are write-only

…ve modes

…imports

…nggraph import, graph compiles lazily, plugin protocol fixes

mlieberman85 · 2026-04-04T17:54:42Z

Review: Rebased & Fixed Test Failures

I've rebased this branch onto upstream/main (resolved 5 merge conflicts) and fixed a test collection failure caused by a top-level langgraph import in graph.py. All 25 PR tests now pass, and the full suite is green (1225/1226 — the 1 failure is a pre-existing upstream spec hash drift).

Fixes applied

Lazy langgraph import — moved from langgraph.graph import END, StateGraph from module-level into build_graph(). Without this, any test importing routing functions from graph.py crashes with ModuleNotFoundError since langgraph is optional.
langgraph dependency group — moved from [attestation] extras to a new [agent] extras group so pip install darnit[agent] works correctly.
Protocol conflict — kept optional handlers as comments (per commit 4358400 "move optional handlers out of Protocol to avoid breaking isinstance checks") rather than adding them as concrete Protocol methods.

Two bugs still present in the code

Bug 1: cmd_run() crashes on error (cli.py:541–548)

If graph.invoke() raises an exception, the except block logs the error but falls through to line 548 which accesses final_state — a variable that was never assigned. This will crash with UnboundLocalError.

try:
    graph = build_graph()
    final_state = graph.invoke(state)
except Exception as e:
    logger.error(f"Agent run failed: {e}")
    # BUG: falls through, final_state is unbound → UnboundLocalError

# line 548 — uses final_state unconditionally
check_results = final_state.get("check_results") or []

Fix: add return 1 in the except block, or initialize final_state = {} before the try.

Bug 2: collect_context() uses wrong key name (graph.py:122)

run_checks() stores results with key "control_id" (line 94), but collect_context() reads result.get("id") (line 122). This means control_id will always be "unknown" in feedback messages.

# graph.py:122 — should be "control_id", not "id"
control_id = result.get("id", "unknown")

…darnit into feature/langgraph-agent

…ntext

…, key name, conflict resolution)

…ed scope, add TODO for hardcoded GitHub URL

…, fix lint

mlieberman85

Re-review: rebase regressions block `darnit run`

Thanks for the fixes since the last round — the prior bugs are addressed cleanly. New problem: the rebase didn't reconcile cli.py with the post-refactor agent/ module shape, so cmd_run won't run.

Fixed since last review ✅

cmd_run UnboundLocalError — return 1 added in except.
collect_context "id" vs "control_id" — moot, function refactored to take an answers dict.
DarnitState.check_results type mismatch — moot, class restructured into AuditState.
plugin.py optional handlers — moved to comments; no longer break Protocol isinstance checks.
langgraph hard dependency — [agent] extra + lazy import.
Tracking issues opened for scope gaps (#144, #145, #146).
Tests added under tests/darnit/llm/.

Blockers ❌

See inline comments. The shortest path summary:

cli.py imports build_graph and DarnitState from darnit.agent.* — neither exists in the rebased agent module (graph.py is now plain functions, state.py exports AuditState). darnit run will hit ImportError, which the except ImportError block misdiagnoses as missing optional deps.
cmd_run docstring contains orphaned cmd_profiles code from the merge conflict.
final_state.get("check_results") reads a field name that doesn't exist on AuditState (audit_results).
Stray attestation file with placeholder data was committed.
Duplicate DarnitState import inside cmd_run.

Architectural question worth surfacing 🧭

llm/backends.py introduces direct Anthropic / OpenAI / Ollama API calls inside darnit, with the docstring honestly noting "there is no Claude Code sitting there ... this module lets Darnit call an LLM directly." That crosses the "darnit is an MCP server / skill provider, not an LLM client" boundary we've held to date. I'm not opposed — standalone-agent mode is a legitimate third deployment shape — but please:

Get explicit architectural sign-off before merge.
Update CLAUDE.md and the README so the boundary (when this code path is allowed to run) is documented.
Consider gating the import behind the [agent] extra so plain MCP-server users never load LLM SDKs.

Lower-severity

Vestigial langgraph dep: [agent] extra still pulls langgraph>=0.2.0, but the refactor removed every langgraph import from the codebase. Either restore the graph builder or drop the dep.
pyproject.toml lost its trailing newline (\ No newline at end of file in diff).
Indentation typo cli.py:632 (3-space comment) — ruff will flag.
cmd_profiles rewrite: switched from impl.get_audit_profiles() (method) + core.discovery to impl.audit_profiles (attribute) + core.plugin. Worth verifying every implementation in the wild exposes the attribute form, or add a getattr(impl, "audit_profiles", None) or (hasattr(impl, "get_audit_profiles") and impl.get_audit_profiles()) shim.

Verdict: Blockers 1–5 must be fixed; darnit run needs to be exercised end-to-end before merge (a CI smoke test that runs darnit run --feedback noninteractive against a fixture repo would catch all of these). The architectural question on standalone LLM calling deserves an explicit decision, not a quiet merge.

mlieberman85 · 2026-05-05T16:30:06Z

+def cmd_run(args: argparse.Namespace) -> int:
+    """Run the full agentic workflow autonomously.

    impls = discover_implementations()


Merge artifact: this looks like the body of the old cmd_profiles got tangled into cmd_run's docstring during conflict resolution. The impls = discover_implementations() block here is dead text inside the triple-quoted string. Strip lines 571–574 so the docstring reads cleanly:

def cmd_run(args: argparse.Namespace) -> int: """Run the full agentic workflow autonomously. Requires a configured LLM backend and API key. Install agent dependencies with: pip install darnit[agent] """

mlieberman85 · 2026-05-05T16:30:06Z

+    """
+    try:
+        from darnit.agent.graph import build_graph
+        from darnit.agent.state import DarnitState


Blocker — neither symbol exists in the rebased agent module.

agent/graph.py was refactored to plain functions (audit, collect_context, remediate, route) with no StateGraph and no build_graph().

agent/state.py exports AuditState, not DarnitState.

This import will raise ImportError, which the except ImportError clause then misdiagnoses as "agent dependencies not installed" — a confusing UX even if the imports were valid.

Fix: either (a) restore build_graph() in agent/graph.py (returning a real StateGraph if you're keeping LangGraph, or a lightweight orchestrator otherwise) and rename AuditState → DarnitState (or update this import), or (b) inline the orchestration in cmd_run against the existing audit / collect_context / remediate / route functions and drop build_graph entirely. Option (b) also lets you drop langgraph from the [agent] extra, since nothing in the codebase uses it after the refactor.

mlieberman85 · 2026-05-05T16:30:06Z

+
+    # Lazy imports — langgraph is optional (darnit[agent])
+    from darnit.agent.state import DarnitState



Duplicate import — DarnitState is already imported on line 581. (Also still broken; see comment above.) Remove this line.

mlieberman85 · 2026-05-05T16:30:06Z

+        return 1
+
+   # LangGraph returns a dict, not a DarnitState object
+    check_results = final_state.get("check_results") or []


Two issues on this block:

Wrong field name. AuditState has audit_results (and error singular), not check_results / errors. Even with the imports fixed, the summary printed below would always show zeros.

Indentation typo. Three leading spaces on the # LangGraph returns a dict comment — ruff will catch it, and the comment itself is now stale (no LangGraph involved post-refactor).

Replace with:

# AuditState fields check_results = final_state.get("audit_results") or [] human_messages = final_state.get("human_messages") or [] error = final_state.get("error")

and thread error (single string, not list) through the rest of the summary.

mlieberman85 · 2026-05-05T16:30:06Z

@@ -0,0 +1,54 @@
+{


Stray test artifact — org/repo and abc123def456 are placeholder values. This was almost certainly committed by accident from a local darnit run. Delete the file and add .darnit/ (or at minimum .darnit/attestations/) to .gitignore so it can't sneak in again.

mlieberman85 · 2026-05-05T16:30:07Z

 ]
+agent = [
+    "langgraph>=0.2.0",
+]


Two nits on this hunk:

langgraph is currently vestigial. The refactor removed every langgraph import from the codebase, so this extra installs a heavy dep nothing uses. Either restore the LangGraph orchestrator or drop the dep. (If you're keeping standalone-agent mode but using a hand-rolled state machine, list only the LLM SDK deps that the [agent] mode actually needs — anthropic, openai, httpx for ollama, etc.)

Trailing newline got stripped from this file (see \ No newline at end of file in the diff). Add it back.

Marc-cn added 7 commits March 25, 2026 14:18

Add forge, CI, and build system detectors

f86858a

Wire detectors into init_project_config

7bd4cf0

Extend plugin protocol with check, context and remediation handlers

5ec25ae

Add LangGraph state machine for agentic workflow

7dffff6

Add bring-your-own LLM backends and wire into agent graph

33bc4df

Fix graph.py key names and add darnit run CLI command

a58cdd7

Add pluggable human feedback mechanism — interactive and noninteracti…

7770a90

…ve modes

Marc-cn requested a review from mlieberman85 as a code owner March 25, 2026 18:43

Marc-cn added 2 commits March 25, 2026 14:44

Fix lint: remove unused variable, fix exception chaining

d295c40

Fix remaining lint issues: trailing newlines, whitespace, f-strings, …

8fc39e6

…imports

mlieberman85 reviewed Mar 28, 2026

View reviewed changes

mlieberman85 mentioned this pull request Mar 28, 2026

feat: scientific reproducibility plugin #138

Open

10 tasks

Marc-cn added 3 commits March 28, 2026 16:15

Fix plugin.py indentation: move methods inside ComplianceImplementati…

d3d5502

…on Protocol

Fix Protocol: move optional handlers out of Protocol to avoid breakin…

cf306ef

…g isinstance checks

Fix bugs and design issues from Mike's review: type mismatch, lazy la…

953477a

…nggraph import, graph compiles lazily, plugin protocol fixes

Marc-cn added a commit to Marc-cn/darnit that referenced this pull request Mar 28, 2026

Proactive fixes from kusari-oss#136/kusari-oss#137 reviews: plugin pr…

2d7d334

…otocol, loader forge/build storage, add tests

Marc-cn mentioned this pull request Mar 28, 2026

feat: pluggable storage backends — file, Archivista, memory #139

Merged

7 tasks

Marc-cn and others added 3 commits April 2, 2026 11:17

Merge branch 'main' into feature/langgraph-agent

82d9095

Resolve merge conflicts with upstream/main, regenerate uv.lock

56ddab7

Resolve pyproject.toml conflict — keep agent extra for langgraph

e377933

Marc-cn added 8 commits April 4, 2026 13:48

Extend plugin protocol with check, context and remediation handlers

49ad3cf

Add LangGraph state machine for agentic workflow

df81d97

Add bring-your-own LLM backends and wire into agent graph

30610ee

Fix graph.py key names and add darnit run CLI command

bbf3537

Add pluggable human feedback mechanism — interactive and noninteracti…

0760fda

…ve modes

Fix lint: remove unused variable, fix exception chaining

a41352b

Fix remaining lint issues: trailing newlines, whitespace, f-strings, …

561d5f6

…imports

Fix bugs and design issues from Mike's review: type mismatch, lazy la…

9b44608

…nggraph import, graph compiles lazily, plugin protocol fixes

mlieberman85 force-pushed the feature/langgraph-agent branch from e377933 to 9b44608 Compare April 4, 2026 17:54

mlieberman85 mentioned this pull request Apr 4, 2026

feat: agentic orchestrator — forge detector, plugin protocol, LangGraph agent, Gittuf (example) and reproducibility plugins #130

Closed

11 tasks

Marc-cn added 3 commits April 9, 2026 13:41

Merge branch 'feature/langgraph-agent' of https://github.com/Marc-cn/…

3ec28c7

…darnit into feature/langgraph-agent

Fix lazy langgraph import in cmd_run and wrong key name in collect_co…

c4f1d65

…ntext

feat: LangGraph agentic orchestrator — fixes from review (lazy import…

f609473

…, key name, conflict resolution)

Marc-cn force-pushed the feature/langgraph-agent branch from c4f1d65 to f609473 Compare April 9, 2026 18:05

Fix storage wiring: guard None config, pass dict directly, fix unsign…

62b9953

…ed scope, add TODO for hardcoded GitHub URL

Jaydeep869 mentioned this pull request Apr 15, 2026

fix(agent): resolve graph compilation and cli bugs from review Marc-cn/darnit#2

Open

14 tasks

Marc-cn and others added 5 commits May 1, 2026 09:02

Merge branch 'main' into feature/langgraph-agent

f6ff464

Fix merge conflicts, restore cmd_profiles, fix cmd_run undefined vars…

4e31b19

…, fix lint

Skip threat_model tests when tree-sitter-language-pack not installed

246b9f2

Fix conftest.py: guard imports behind tree-sitter availability check

656945b

Skip tree-sitter tests when package not installed, fix conftest newline

54bb4f5

mlieberman85 reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137

feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137
Marc-cn wants to merge 32 commits into
kusari-oss:mainfrom
Marc-cn:feature/langgraph-agent

Marc-cn commented Mar 25, 2026

Uh oh!

mlieberman85 left a comment

Uh oh!

Marc-cn commented Mar 28, 2026

Uh oh!

Marc-cn commented Apr 2, 2026

Uh oh!

mlieberman85 commented Apr 4, 2026

Uh oh!

mlieberman85 left a comment

Uh oh!

mlieberman85 May 5, 2026

Uh oh!

mlieberman85 May 5, 2026

Uh oh!

mlieberman85 May 5, 2026

Uh oh!

mlieberman85 May 5, 2026

Uh oh!

mlieberman85 May 5, 2026

Uh oh!

mlieberman85 May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		# Lazy imports — langgraph is optional (darnit[agent])
		from darnit.agent.state import DarnitState

Conversation

Marc-cn commented Mar 25, 2026

Summary

Type of Change

Framework Changes Checklist

Control/TOML Changes Checklist

Testing

What was built

Usage

Verified on this repo

Known gaps

Additional Notes

Uh oh!

mlieberman85 left a comment

Choose a reason for hiding this comment

PR Review: LangGraph Agentic Orchestrator

Bugs

Design issues

Scope gaps

Minor

What's good

Uh oh!

Marc-cn commented Mar 28, 2026

Uh oh!

Marc-cn commented Apr 2, 2026

Uh oh!

mlieberman85 commented Apr 4, 2026

Review: Rebased & Fixed Test Failures

Fixes applied

Two bugs still present in the code

Uh oh!

mlieberman85 left a comment

Choose a reason for hiding this comment

Re-review: rebase regressions block darnit run

Fixed since last review ✅

Blockers ❌

Architectural question worth surfacing 🧭

Lower-severity

Uh oh!

mlieberman85 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mlieberman85 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mlieberman85 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mlieberman85 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mlieberman85 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

mlieberman85 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Re-review: rebase regressions block `darnit run`