Skip to content

test(integration): purge orphan datasets in annotation cleanup fixtures#165

Draft
henrycgbaker wants to merge 2 commits intomainfrom
fix/annotation-teardown-clears-all-datasets
Draft

test(integration): purge orphan datasets in annotation cleanup fixtures#165
henrycgbaker wants to merge 2 commits intomainfrom
fix/annotation-teardown-clears-all-datasets

Conversation

@henrycgbaker
Copy link
Copy Markdown
Collaborator

@henrycgbaker henrycgbaker commented Apr 25, 2026

Goal

Tests that exercise full teardown (empty dataset_id) can accumulate orphan datasets across runs (if dataset uses non-canonical name and teardown does not specify. it, or if exits midway). Argilla refuses to delete a workspace while any dataset is linked, so the next teardown_resources call hits 409 ConflictError and the suite fails.

This PR targets orphaned datasets in test fixtures, but not production.

Scope

Test fixtures only - no production code changes. Production callers own their dataset_id and don't accumulate orphans, so teardown_resources keeps its current targeted behaviour.

Implementation

New helper tests/integration/_argilla_cleanup.purge_workspace_datasets(client, ws_base) deletes every dataset in a workspace.

clean_slate (test_annotation_setup.py) and clean_environment (test_annotation_import.py) call the purge for each configured workspace before teardown_resources. This keeps the fix at the right layer - test-induced state pollution is handled by test fixtures, not by widening the production teardown contract.

Earlier draft of this PR pushed the orphan-purge into core/annotation/setup.py so a full teardown would delete every dataset in each workspace before deleting the workspace. Rejected because:

  • Multi-tenant blast radius: in shared Argilla deployments, that wipes datasets owned by other runs/users
  • Silent contract change: same call signature, different (more destructive) behaviour
  • Wrong layer: the symptom is test-fixture state, not a production-API gap

If a production-side purge is wanted later, the right shape is an explicit purge_workspace(ws_base, *, confirm=True) function, not an implicit branch in teardown_resources.

Testing

References

@henrycgbaker henrycgbaker force-pushed the fix/annotation-teardown-clears-all-datasets branch from bc12ab9 to 681db9c Compare April 25, 2026 12:04
@henrycgbaker henrycgbaker added bug Something isn't working annotation labels Apr 25, 2026
@henrycgbaker henrycgbaker force-pushed the fix/annotation-teardown-clears-all-datasets branch from 681db9c to cb9af28 Compare April 26, 2026 11:46
@henrycgbaker henrycgbaker changed the title fix(annotation): teardown deletes all linked datasets before workspace test(integration): purge orphan datasets in annotation cleanup fixtures Apr 26, 2026
@henrycgbaker henrycgbaker marked this pull request as ready for review April 26, 2026 12:05
@henrycgbaker henrycgbaker marked this pull request as draft April 26, 2026 12:21
@henrycgbaker
Copy link
Copy Markdown
Collaborator Author

henrycgbaker commented Apr 26, 2026

open question (@saschagobel let me know if you have thoughts otherwise I'll return to this this evening):

better in tests just to remove ALL datasets cleanly (global, non-scoped)

for ds in client.datasets:
    ds.delete()

rather than scope to AnnotationSettings.workspace_dataset_map ?

NB pending this decision -> update #166 workflow accordingly

Base automatically changed from test/dev-stack-integration-slim to test/annotation-stack-preflight April 30, 2026 09:48
@henrycgbaker henrycgbaker changed the base branch from test/annotation-stack-preflight to main April 30, 2026 10:13
@henrycgbaker henrycgbaker force-pushed the fix/annotation-teardown-clears-all-datasets branch from 493f0ab to 8e4e535 Compare April 30, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

annotation bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant