feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137
feat: LangGraph agentic orchestrator: state machine, LLM backends, CLI run, human feedback#137Marc-cn wants to merge 32 commits into
Conversation
mlieberman85
left a comment
There was a problem hiding this comment.
PR Review: LangGraph Agentic Orchestrator
Tested end-to-end — darnit run . --feedback noninteractive successfully discovers 2 implementations, checks 62 controls (41 pass, 15 fail, 6 warn), queues feedback questions, and logs remediation candidates. The state machine works.
Bugs
DarnitState.check_results type mismatch (state.py:22): Annotated as list but default_factory=dict. Doesn't crash because run_checks overwrites it, but any code using check_results before that node runs will get a dict.
plugin.py indentation (lines 114, 136, 151, 165): Same issue as #136 — register_controls and the 3 new handler methods are dedented out of the ComplianceImplementation Protocol class. They become orphaned module-level functions.
Design issues
langgraph is a hard dependency for all CLI commands. cli.py:34 imports darnit_graph at module level, so darnit serve, darnit audit, darnit list all require langgraph installed. This also means uvx darnit run fails unless langgraph happens to be in the environment. Fix: lazy import inside cmd_run() + move langgraph to [project.optional-dependencies] as an agent extra.
darnit_graph compiles at import time (graph.py:226): build_graph() runs as a module-level side effect. Makes testing harder and means any LangGraph initialization failure breaks the entire module.
Scope gaps
These are acknowledged in the PR description, but they should either be fixed before merge or have tracking issues opened so they don't get lost:
remediatenode is a placeholder — logs what it would fix but doesn't callRemediationExecutorcollect_contextdoesn't act on answers — human feedback is stored in state but doesn't trigger re-audit- Feedback answers are write-only — answers are collected but never read back by any downstream node
Without these, the darnit run pipeline discovers problems but can't close the loop on any of them. If the intent is to merge now and iterate, please open issues for each so they're tracked.
Minor
run_checkscatches all exceptions broadly (except Exception) — bugs in the audit pipeline get swallowed intostate.errorsplugin.py,loader.py,detectors.pychanges are identical to #136 — should be a shared base PR- No tests in the diff — I wrote 61 covering state, feedback, LLM backends, graph nodes, and routing (all pass). Happy to contribute.
What's good
- Clean state machine with clear node separation
- LLM backend abstraction is solid (prompt building, response parsing, 3 backends + factory)
- Feedback system nicely handles interactive vs CI with auto-detection
- Conditional routing logic is simple and correct
…g isinstance checks
…nggraph import, graph compiles lazily, plugin protocol fixes
|
Fixes pushed:
For the scope gaps (remediate placeholder, feedback answers not triggering re-audit, answers write-only), agreed these need tracking. Should I open issues on the main repo or would you prefer to track them differently? |
…otocol, loader forge/build storage, add tests
…nggraph import, graph compiles lazily, plugin protocol fixes
e377933 to
9b44608
Compare
Review: Rebased & Fixed Test FailuresI've rebased this branch onto Fixes applied
Two bugs still present in the codeBug 1: If try:
graph = build_graph()
final_state = graph.invoke(state)
except Exception as e:
logger.error(f"Agent run failed: {e}")
# BUG: falls through, final_state is unbound → UnboundLocalError
# line 548 — uses final_state unconditionally
check_results = final_state.get("check_results") or []Fix: add Bug 2:
# graph.py:122 — should be "control_id", not "id"
control_id = result.get("id", "unknown") |
…darnit into feature/langgraph-agent
…, key name, conflict resolution)
c4f1d65 to
f609473
Compare
…ed scope, add TODO for hardcoded GitHub URL
mlieberman85
left a comment
There was a problem hiding this comment.
Re-review: rebase regressions block darnit run
Thanks for the fixes since the last round — the prior bugs are addressed cleanly. New problem: the rebase didn't reconcile cli.py with the post-refactor agent/ module shape, so cmd_run won't run.
Fixed since last review ✅
cmd_runUnboundLocalError —return 1added in except.collect_context"id" vs "control_id" — moot, function refactored to take ananswersdict.DarnitState.check_resultstype mismatch — moot, class restructured intoAuditState.plugin.pyoptional handlers — moved to comments; no longer break Protocolisinstancechecks.langgraphhard dependency —[agent]extra + lazy import.- Tracking issues opened for scope gaps (#144, #145, #146).
- Tests added under
tests/darnit/llm/.
Blockers ❌
See inline comments. The shortest path summary:
cli.pyimportsbuild_graphandDarnitStatefromdarnit.agent.*— neither exists in the rebased agent module (graph.pyis now plain functions,state.pyexportsAuditState).darnit runwill hitImportError, which theexcept ImportErrorblock misdiagnoses as missing optional deps.cmd_rundocstring contains orphanedcmd_profilescode from the merge conflict.final_state.get("check_results")reads a field name that doesn't exist onAuditState(audit_results).- Stray attestation file with placeholder data was committed.
- Duplicate
DarnitStateimport insidecmd_run.
Architectural question worth surfacing 🧭
llm/backends.py introduces direct Anthropic / OpenAI / Ollama API calls inside darnit, with the docstring honestly noting "there is no Claude Code sitting there ... this module lets Darnit call an LLM directly." That crosses the "darnit is an MCP server / skill provider, not an LLM client" boundary we've held to date. I'm not opposed — standalone-agent mode is a legitimate third deployment shape — but please:
- Get explicit architectural sign-off before merge.
- Update
CLAUDE.mdand the README so the boundary (when this code path is allowed to run) is documented. - Consider gating the import behind the
[agent]extra so plain MCP-server users never load LLM SDKs.
Lower-severity
- Vestigial
langgraphdep:[agent]extra still pullslanggraph>=0.2.0, but the refactor removed everylanggraphimport from the codebase. Either restore the graph builder or drop the dep. pyproject.tomllost its trailing newline (\ No newline at end of filein diff).- Indentation typo cli.py:632 (3-space comment) — ruff will flag.
cmd_profilesrewrite: switched fromimpl.get_audit_profiles()(method) +core.discoverytoimpl.audit_profiles(attribute) +core.plugin. Worth verifying every implementation in the wild exposes the attribute form, or add agetattr(impl, "audit_profiles", None) or (hasattr(impl, "get_audit_profiles") and impl.get_audit_profiles())shim.
Verdict: Blockers 1–5 must be fixed; darnit run needs to be exercised end-to-end before merge (a CI smoke test that runs darnit run --feedback noninteractive against a fixture repo would catch all of these). The architectural question on standalone LLM calling deserves an explicit decision, not a quiet merge.
| def cmd_run(args: argparse.Namespace) -> int: | ||
| """Run the full agentic workflow autonomously. | ||
|
|
||
| impls = discover_implementations() |
There was a problem hiding this comment.
Merge artifact: this looks like the body of the old cmd_profiles got tangled into cmd_run's docstring during conflict resolution. The impls = discover_implementations() block here is dead text inside the triple-quoted string. Strip lines 571–574 so the docstring reads cleanly:
def cmd_run(args: argparse.Namespace) -> int:
"""Run the full agentic workflow autonomously.
Requires a configured LLM backend and API key.
Install agent dependencies with: pip install darnit[agent]
"""| """ | ||
| try: | ||
| from darnit.agent.graph import build_graph | ||
| from darnit.agent.state import DarnitState |
There was a problem hiding this comment.
Blocker — neither symbol exists in the rebased agent module.
agent/graph.pywas refactored to plain functions (audit,collect_context,remediate,route) with noStateGraphand nobuild_graph().agent/state.pyexportsAuditState, notDarnitState.
This import will raise ImportError, which the except ImportError clause then misdiagnoses as "agent dependencies not installed" — a confusing UX even if the imports were valid.
Fix: either (a) restore build_graph() in agent/graph.py (returning a real StateGraph if you're keeping LangGraph, or a lightweight orchestrator otherwise) and rename AuditState → DarnitState (or update this import), or (b) inline the orchestration in cmd_run against the existing audit / collect_context / remediate / route functions and drop build_graph entirely. Option (b) also lets you drop langgraph from the [agent] extra, since nothing in the codebase uses it after the refactor.
|
|
||
| # Lazy imports — langgraph is optional (darnit[agent]) | ||
| from darnit.agent.state import DarnitState | ||
|
|
There was a problem hiding this comment.
Duplicate import — DarnitState is already imported on line 581. (Also still broken; see comment above.) Remove this line.
| return 1 | ||
|
|
||
| # LangGraph returns a dict, not a DarnitState object | ||
| check_results = final_state.get("check_results") or [] |
There was a problem hiding this comment.
Two issues on this block:
- Wrong field name.
AuditStatehasaudit_results(anderrorsingular), notcheck_results/errors. Even with the imports fixed, the summary printed below would always show zeros. - Indentation typo. Three leading spaces on the
# LangGraph returns a dictcomment — ruff will catch it, and the comment itself is now stale (no LangGraph involved post-refactor).
Replace with:
# AuditState fields
check_results = final_state.get("audit_results") or []
human_messages = final_state.get("human_messages") or []
error = final_state.get("error")and thread error (single string, not list) through the rest of the summary.
| @@ -0,0 +1,54 @@ | |||
| { | |||
There was a problem hiding this comment.
Stray test artifact — org/repo and abc123def456 are placeholder values. This was almost certainly committed by accident from a local darnit run. Delete the file and add .darnit/ (or at minimum .darnit/attestations/) to .gitignore so it can't sneak in again.
| ] | ||
| agent = [ | ||
| "langgraph>=0.2.0", | ||
| ] |
There was a problem hiding this comment.
Two nits on this hunk:
langgraphis currently vestigial. The refactor removed everylanggraphimport from the codebase, so this extra installs a heavy dep nothing uses. Either restore the LangGraph orchestrator or drop the dep. (If you're keeping standalone-agent mode but using a hand-rolled state machine, list only the LLM SDK deps that the[agent]mode actually needs —anthropic,openai,httpxfor ollama, etc.)- Trailing newline got stripped from this file (see
\ No newline at end of filein the diff). Add it back.
Summary
Extends Darnit into a self-driving agentic orchestrator. Adds a LangGraph state machine that drives the full audit pipeline autonomously, bring-your-own LLM key support for standalone mode, a
darnit runCLI command, and a pluggable human feedback mechanism.Type of Change
Framework Changes Checklist
If this PR modifies the darnit framework (
packages/darnit/):openspec/specs/framework-design/spec.md) if behavior changeduv run python scripts/validate_sync.py --verboseand it passesuv run python scripts/generate_docs.pyand committed any doc changesControl/TOML Changes Checklist
Not applicable — no controls or TOML modified in this PR.
If this PR modifies controls or TOML configuration:
Testing
uv run pytest tests/ -v)uv run ruff check .)What was built
darnit/agent/graph.pydrives: load context → run checks → collect context → remediate → finishdarnit runCLI command — triggers the full pipeline from the terminalUsage
Verified on this repo
Known gaps
Additional Notes