What went wrong
The backend classifier maps classifier failures to a synthetic M0 / benign verdict. This includes transport errors, non-2xx responses, malformed JSON, empty choices, and responses with no parseable MAD code.
That may be acceptable for availability-first deployments, but Adrian does not currently expose a strict or fail-closed option for users who want BLOCK/HITL mode to stop execution when classification cannot be completed safely.
There also appears to be a mismatch between the Classifier interface comment and the current implementation.
Relevant code:
backend/internal/engine/client.go: failOpen
backend/internal/engine/engine.go: Classifier interface comment
backend/internal/ws/handler.go: persistAndClassify
Reproduction steps
- Configure Adrian policy mode as
block, with M3/M4 in scope.
- Make the classifier return malformed JSON, an empty choices array, or a response with no MAD code.
- Send an SDK event that requires classification before tool execution.
- Observe the backend records a synthetic
M0 / benign verdict.
- Observe the SDK allows execution because the verdict is not in scope for blocking.
Expected behaviour
Adrian should support an explicit strict or fail-closed policy for high-assurance deployments.
When strict mode is enabled:
- Classifier failure in
block mode should halt execution.
- Classifier failure in
hitl mode should queue or hold for review.
- The dashboard should still record enough reasoning to show that classification failed.
The default can remain fail-open if that is the intended availability posture.
Actual behaviour
Classifier failures currently become M0 / benign.
In BLOCK mode, that verdict is treated as allow unless M0 is explicitly in scope.
Environment
- Adrian version / commit: current
main
- OS: not expected to be OS-specific
- Docker version: not required to reproduce
- GPU model: not required to reproduce
Logs
Relevant code path
Classifier failures are converted to M0 / benign:
func (c *HTTPClient) failOpen(ctx context.Context, cause error, start time.Time) *Verdict {
slog.WarnContext(ctx, "engine.classifier_failure_fail_open", "error", cause)
return &Verdict{
MADCode: "M0",
Classification: "benign",
Reasoning: "classifier failure (fail-open): " + cause.Error(),
LatencyMS: time.Since(start).Milliseconds(),
}
}
Suggested fix
Add an explicit strict or fail-closed policy option.
Possible shape:
Add a server-side policy field such as fail_closed_on_classifier_error.
Include it in the policy snapshot sent to the SDK.
On classifier failure:
In alert mode, record the failure as today.
In hitl mode, hold or queue the action for review.
In block mode, send a blocking verdict or equivalent policy result.
Add tests for classifier failure under alert, hitl, and block modes.
What went wrong
The backend classifier maps classifier failures to a synthetic
M0/benignverdict. This includes transport errors, non-2xx responses, malformed JSON, empty choices, and responses with no parseable MAD code.That may be acceptable for availability-first deployments, but Adrian does not currently expose a strict or fail-closed option for users who want BLOCK/HITL mode to stop execution when classification cannot be completed safely.
There also appears to be a mismatch between the
Classifierinterface comment and the current implementation.Relevant code:
backend/internal/engine/client.go:failOpenbackend/internal/engine/engine.go:Classifierinterface commentbackend/internal/ws/handler.go:persistAndClassifyReproduction steps
block, with M3/M4 in scope.M0/benignverdict.Expected behaviour
Adrian should support an explicit strict or fail-closed policy for high-assurance deployments.
When strict mode is enabled:
blockmode should halt execution.hitlmode should queue or hold for review.The default can remain fail-open if that is the intended availability posture.
Actual behaviour
Classifier failures currently become
M0/benign.In BLOCK mode, that verdict is treated as allow unless M0 is explicitly in scope.
Environment
mainLogs
Relevant code path
Classifier failures are converted to
M0/benign:Suggested fix
Add an explicit strict or fail-closed policy option.
Possible shape:
Add a server-side policy field such as fail_closed_on_classifier_error.
Include it in the policy snapshot sent to the SDK.
On classifier failure:
In alert mode, record the failure as today.
In hitl mode, hold or queue the action for review.
In block mode, send a blocking verdict or equivalent policy result.
Add tests for classifier failure under alert, hitl, and block modes.