Confirm before destructive Galaxy history delete/purge (#338)#344
Merged
dannon merged 2 commits intoJun 20, 2026
Merged
Conversation
…#338) A weak model asked to delete two datasets reached for the history-level delete tool and a raw curl instead, and nothing gated it: the exec-guard's catch-all allowed every Galaxy MCP call and the web-mode gate allowed every galaxy_ tool, so a whole ~10GB history got marked deleted with no are-you-sure. This adds a shared classifier (shared/galaxy-destructive.js) that both tool_call gates consume, so "what counts as destructive" lives in one place. A whole-history delete/purge now always prompts regardless of model tier -- it deliberately skips the usual weak->deny downgrade, since the danger was silent execution, not the model proposing it -- and a non-interactive session denies because there's no one to approve. Orbit/CLI get a yes/no confirm that's never cached and never offered "allow for this session", with honest wording that distinguishes a recoverable soft-delete from an irreversible purge. Remote/web mode hard-blocks these for now; its gate has no confirmation UI, so a remote-confirm flow is left as a follow-up. The classifier covers the direct tool call, the adapter's generic mcp({tool,args}) proxy, and code mode's run_galaxy_tool(code=...) script, plus a best-effort guardrail for a raw curl/wget DELETE against /api/histories/. The curl and code matchers are guardrails, not boundaries -- they catch the obvious reach-for-the-nearest-tool case, not deliberate obfuscation. This doesn't add a per-dataset delete tool (galaxyproject#228). That missing tool is what drove the dangerous workaround in the first place, so a proper delete_dataset op upstream, plus populating galaxy-ops destructive metadata the classifier can later defer to, are the complementary root-cause fixes.
A review pass over the gate turned up a few cheap gaps in the best-effort
matchers worth closing: string-quoted booleans (deleted='True',
{"purge":"true"}) slipped past the bare true checks; the code-mode matcher
only caught a positional call_tool('update_history', ...), so the kwargs form
and the galaxy_-prefixed name got through; and normalize() didn't trim, so a
whitespace-padded tool name defeated the mcp-proxy unwrap. Also lists the new
galaxy:destructive category in the PolicyResult comment.
The reliable structured paths were already solid -- this just tightens the
free-form curl/code guardrails against ordinary (non-adversarial) shapes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #338.
The problem
A weak model, asked to delete two datasets, reached for the history-level delete tool (
galaxy_update_history(deleted=true)) and then rawcurl-- and nothing gated either. The brain exec-guard's policy ends in a catch-all that allows every non-local tool (Galaxy MCP included), and the web-mode gate allows everygalaxy_*tool by prefix. So a whole ~10GB history was marked deleted with no are-you-sure. There's also no per-dataset delete tool (#228), which is what pushed the model toward the history-level tool and curl in the first place.The fix
A shared classifier,
shared/galaxy-destructive.js, that bothtool_callgates consume so "what counts as destructive" has one home:exec-guard/policy.ts+gate.ts): a whole-history delete/purge routes to a newgalaxy:destructivedecision that always confirms, regardless of model tier -- it deliberately bypasses the usual weak->deny downgrade, because the danger in Destructive Galaxy history delete/purge runs with no confirmation -- agent purged a whole history when asked to delete two datasets #338 was silent execution, not the model proposing the action. Non-interactive sessions deny (no one to approve). The confirm is a yes/no that is never cached and never offered "allow for this session", with honest wording that distinguishes a recoverable soft-delete from an irreversible purge.web/extensions/web-mode-gate.ts): destructive ops are hard-blocked -- that gate has no confirmation UI. A remote-confirm flow is a follow-up.The classifier reads two reliable, structured shapes exactly -- a direct tool call and the adapter's generic
mcp({tool, args})proxy -- and adds best-effort guardrails for the free-form paths: a rawcurl/wgetDELETE against/api/histories/, and code mode'srun_galaxy_tool(code=...)Python script. (Code mode isn't wired into this stack yet -- galaxy-mcp ships it behind--discovery-mode codeand loom's adoption is separate -- so that branch is anticipatory.) The curl and code matchers are guardrails, not security boundaries: they catch the obvious reach-for-the-nearest-tool case, not deliberate obfuscation or a different HTTP client.Review
The diff went through an adversarial review (a cross-family pass plus several focused lenses). It confirmed end-to-end that the gate fires against the real emitted tool names and that the always-ask-even-weak / non-interactive-deny / uncacheable / honest-wording properties hold on the reliable paths. It also drove a round of hardening to the best-effort matchers, now in this PR: string-quoted booleans (
deleted='True',{"purge":"true"}), the code-mode kwargs /galaxy_-prefixedcall_toolforms, purge carried in the DELETE body (not just the query), shell-variable history IDs, a whitespace-padded proxied tool name, and curl-regex over/under-matching (requires a curl/wget verb, excludes dataset-level/contents/, handleswget --method=DELETE).Scope / follow-ups
delete_user_tool) are not yet classified -- a small follow-up to extend the catalog.delete_datasetop (Orbit/Galaxy MCP: no way to delete/purge datasets from a history #228) plus populating galaxy-opsdestructiveHintmetadata are the root-cause fixes the classifier can later defer to instead of a hand-maintained catalog.Tests
TDD throughout.
shared/galaxy-destructiveunit tests (direct / mcp-proxy / code-mode / curl vectors, including the review findings);exec-guard-policy(asks even weak, denies non-interactive, proxy + code-mode, curl);exec-guard-gate(uncacheable confirm, purge vs delete wording);web-mode-gate(hard-block incl. code-mode). Full suite green; root + app typecheck clean.