Skip to content

[bug] Classifier failures always fail open with no strict policy option #46

@Muhammad-usman92

Description

@Muhammad-usman92

What went wrong

The backend classifier maps classifier failures to a synthetic M0 / benign verdict. This includes transport errors, non-2xx responses, malformed JSON, empty choices, and responses with no parseable MAD code.

That may be acceptable for availability-first deployments, but Adrian does not currently expose a strict or fail-closed option for users who want BLOCK/HITL mode to stop execution when classification cannot be completed safely.

There also appears to be a mismatch between the Classifier interface comment and the current implementation.

Relevant code:

  • backend/internal/engine/client.go: failOpen
  • backend/internal/engine/engine.go: Classifier interface comment
  • backend/internal/ws/handler.go: persistAndClassify

Reproduction steps

  1. Configure Adrian policy mode as block, with M3/M4 in scope.
  2. Make the classifier return malformed JSON, an empty choices array, or a response with no MAD code.
  3. Send an SDK event that requires classification before tool execution.
  4. Observe the backend records a synthetic M0 / benign verdict.
  5. Observe the SDK allows execution because the verdict is not in scope for blocking.

Expected behaviour

Adrian should support an explicit strict or fail-closed policy for high-assurance deployments.

When strict mode is enabled:

  • Classifier failure in block mode should halt execution.
  • Classifier failure in hitl mode should queue or hold for review.
  • The dashboard should still record enough reasoning to show that classification failed.

The default can remain fail-open if that is the intended availability posture.

Actual behaviour

Classifier failures currently become M0 / benign.

In BLOCK mode, that verdict is treated as allow unless M0 is explicitly in scope.

Environment

  • Adrian version / commit: current main
  • OS: not expected to be OS-specific
  • Docker version: not required to reproduce
  • GPU model: not required to reproduce

Logs

Relevant code path

Classifier failures are converted to M0 / benign:

func (c *HTTPClient) failOpen(ctx context.Context, cause error, start time.Time) *Verdict {
    slog.WarnContext(ctx, "engine.classifier_failure_fail_open", "error", cause)
    return &Verdict{
        MADCode:        "M0",
        Classification: "benign",
        Reasoning:      "classifier failure (fail-open): " + cause.Error(),
        LatencyMS:      time.Since(start).Milliseconds(),
    }
}

Suggested fix

Add an explicit strict or fail-closed policy option.

Possible shape:

Add a server-side policy field such as fail_closed_on_classifier_error.
Include it in the policy snapshot sent to the SDK.
On classifier failure:
In alert mode, record the failure as today.
In hitl mode, hold or queue the action for review.
In block mode, send a blocking verdict or equivalent policy result.
Add tests for classifier failure under alert, hitl, and block modes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions