Skip to content

refactor: improve domain/credential logic and result processing for multi-domain ops#322

Merged
l50 merged 3 commits into
mainfrom
fix/llm-state-taint-hardening
May 15, 2026
Merged

refactor: improve domain/credential logic and result processing for multi-domain ops#322
l50 merged 3 commits into
mainfrom
fix/llm-state-taint-hardening

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented May 15, 2026

Key Changes:

  • Unified parent-child domain handling and credential selection logic across orchestrator modules
  • Hardened result processing to avoid trusting legacy scalar outputs from LLM workers
  • Improved detection and handling of tool outputs for signals like lockout, seimpersonate, NTLMv1, and ccache
  • Extended metadata for operation completion to record red team/blue team boundaries

Added:

  • red_completed_at, red_completion_reason, and red_blocked_on_blue fields to operation metadata (in both state and reports)
  • Direct ACL enumeration step after inter-realm ticket forging to accelerate SID-filtered trust analysis
  • Dedicated functions to build and dispatch direct krbtgt extraction with confirmation parsing
  • Test coverage for new domain/credential selection and evidence parsing policies

Changed:

  • Parent-child domain logic: replaced references to north.contoso.local with child.contoso.local for consistency in tests, documentation, and code comments
  • Credential selection: consistently allow child-domain creds for parent operations and vice versa, including proper fallback and forest-level matching
  • Result processing:
    • Only trust tool output arrays for critical signals (e.g. golden ticket, ccache, seimpersonate, lockout)
    • Ignore agent-generated summary/output fields from LLM workers (rust-llm-runner) for all evidence detection
    • New include_legacy_scalar_outputs policy flag to gate trust of legacy output fields
    • Hardened extract_locked_usernames_from_result, result_has_seimpersonate_signal, result_has_ccache_evidence, and related functions to avoid LLM hallucination
  • Task result shape: added optional worker_pod provenance field for downstream policy decisions
  • Hash publishing: accept both NT and LM:NT pairs for NTLM hash values, rejecting malformed entries
  • Secretsdump krbtgt extraction: dispatch direct tool call and only mark dedup if krbtgt hash is confirmed in parser output
  • Orchestrator completion: persist red/blue completion markers separately and expose to reporting and display
  • Trust automation: direct ACL enumeration with correct Kerberos context after ticket forging

Removed:

  • report_cracked_credential callback and agent tool definition; all cracked credentials must now be extracted from structured tool output, not LLM summaries
  • Legacy fallback logic that trusted LLM-provided scalar fields for evidence detection or credential publishing

l50 added 2 commits May 15, 2026 14:56
… domain logic

**Added:**

- New test cases for child domain preservation when parent is valid and for
  domain/hostname overlap scenarios in `normalize_state_domains` logic
- Tests for correct lockout detection filtering based on worker type and output
  field in result processing
- Tests for privilege and NTLMv1/lockout signals extracted only from trusted
  tool outputs

**Changed:**

- `normalize_state_domains` now retains child domains if their parent is valid,
  and confirms real domains by checking FQDN suffixes from hosts
- Only removes trusted evidence keys in `merge_result_extras`, expanding the
  list of dropped keys to avoid agent-supplied shadowing
- Evidence detection (e.g., SeImpersonate, NTLMv1, ccache, lockouts, locked
  usernames) now only considers trusted tool outputs and ignores summary fields
- `payload_contains_golden_ticket_marker` ignores summary and explicit flag
  fields, using only trusted tool outputs
- `check_golden_ticket_completion` now prioritizes provided task domain over
  payload domain field
- Test suite and payload construction updated throughout to use
  `tool_outputs`/trusted fields instead of summary or legacy output fields

**Removed:**

- Detection logic that previously relied on summary or explicit agent fields for
  evidence or privilege signals
- Legacy test payloads and field usage in favor of trusted output arrays
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 76.88022% with 249 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.81%. Comparing base (3b4d36e) to head (a161dae).

Files with missing lines Patch % Lines
ares-cli/src/orchestrator/result_processing/mod.rs 38.09% 65 Missing ⚠️
ares-cli/src/orchestrator/automation/trust.rs 0.00% 58 Missing ⚠️
...res-cli/src/orchestrator/automation/secretsdump.rs 67.22% 39 Missing ⚠️
ares-cli/src/orchestrator/completion.rs 16.66% 35 Missing ⚠️
...src/orchestrator/result_processing/admin_checks.rs 63.23% 25 Missing ⚠️
ares-cli/src/ops/loot/format/display.rs 76.59% 11 Missing ⚠️
...li/src/orchestrator/automation/mssql_link_pivot.rs 68.42% 6 Missing ⚠️
ares-cli/src/ops/loot/format/json.rs 0.00% 3 Missing ⚠️
...rchestrator/result_processing/impacket_recovery.rs 92.10% 3 Missing ⚠️
...rchestrator/result_processing/discovery_polling.rs 97.43% 2 Missing ⚠️
... and 2 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #322      +/-   ##
==========================================
+ Coverage   78.78%   78.81%   +0.02%     
==========================================
  Files         439      439              
  Lines      124726   125356     +630     
==========================================
+ Hits        98271    98799     +528     
- Misses      26455    26557     +102     
Files with missing lines Coverage Δ
ares-cli/src/dedup/tests.rs 100.00% <100.00%> (ø)
ares-cli/src/orchestrator/automation/crack.rs 71.07% <100.00%> (ø)
ares-cli/src/orchestrator/automation/gpp_sysvol.rs 84.13% <100.00%> (+2.77%) ⬆️
...i/src/orchestrator/automation/group_enumeration.rs 78.85% <100.00%> (ø)
...li/src/orchestrator/automation/ntlmv1_downgrade.rs 74.83% <100.00%> (+3.67%) ⬆️
...cli/src/orchestrator/automation/password_policy.rs 85.35% <100.00%> (+2.25%) ⬆️
ares-cli/src/orchestrator/automation/s4u.rs 89.56% <100.00%> (+0.90%) ⬆️
...-cli/src/orchestrator/callback_handler/dispatch.rs 27.80% <ø> (-6.25%) ⬇️
ares-cli/src/orchestrator/callback_handler/mod.rs 40.90% <ø> (ø)
...res-cli/src/orchestrator/callback_handler/tests.rs 100.00% <100.00%> (ø)
... and 34 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…controls

**Added:**

- New test cases for policy-based exclusion of scalar output fields in result
  parsing, ensuring that only tool-emitted data is consumed for automations
- Tests verifying LM:NT pair acceptance in hash publishing, and that krbtgt
  LM:NT hashes grant domain admin status
- Tests for red completion metadata population and operation meta parsing
- Documentation and comments clarifying the rationale for parsing policies and
  the removal of the cracked credential callback

**Changed:**

- Replaced test and documentation references from `north.contoso.local` to
  `child.contoso.local` for consistency in domain hierarchy examples across
  all modules and test fixtures
- Updated all domain trust, host, and parsing logic to use `child.contoso.local`
  as the canonical child domain example, including in test data, assertions,
  and string construction
- Introduced new policies to control parsing of legacy scalar output fields in
  orchestrator result processing, with explicit toggles based on worker
  provenance (e.g., excluding LLM-runner model-authored narrative from tool
  output parsing)
- Refactored result parsing helpers (e.g., NTLMv1 detection, impersonation,
  ccache evidence, lockout extraction) to centralize text part collection and
  support the new policy controls for legacy output
- Updated domain credential selection in orchestrator automations to allow
  matching on parent/child domain relationships, skipping cross-forest creds
  when unrelated
- Adjusted secretsdump krbtgt extraction automation to dispatch the tool
  directly, only marking dedup after successful krbtgt hash parsing
- Modified MSSQL link pivot automation to always set `windows_auth` true when a
  credential domain is present, and to support impersonation hints in tool args
- Improved operation completion metadata: now records `red_completed_at`,
  `red_completion_reason`, and `red_blocked_on_blue` to Redis and operation
  state, with new display and JSON output in loot reporting
- Enhanced domain SID and golden ticket marker extraction to prefer trusted
  task context over payload fields, and made result processing robust against
  LLM-generated summaries and legacy payload shapes
- Hardened NTLM hash value validation to accept both standard 32-hex and
  LM:NT pairs, rejecting malformed relay artifacts
- Removed the `report_cracked_credential` callback and tool definition,
  centralizing cracked credential extraction to automated stdout parsing only;
  hallucinated calls are now deterministically ignored
- Updated tool registry and callback handler logic to trap removed callback
  names, returning a deterministic "tool removed" response for hallucinations

**Removed:**

- Legacy `report_cracked_credential` callback handler, tool definition, and
  associated tests, as cracked credentials are now reliably extracted from
  stdout without LLM summarization fallback
- Callback tool references for removed/unsupported reporting callbacks from
  agent tool registry, with trap logic for deterministic handling
@l50 l50 changed the title refactor: restrict evidence parsing to trusted tool outputs and standardize result payloads refactor: improve domain/credential logic and result processing for multi-domain ops May 15, 2026
@l50 l50 merged commit 716599a into main May 15, 2026
14 checks passed
@l50 l50 deleted the fix/llm-state-taint-hardening branch May 15, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant