Skip to content

feat: defer item tagging to an external agent (phase 2)#120

Open
jansitarski wants to merge 4 commits into
Anyesh:mainfrom
jansitarski:jansitarski/external-item-tagging
Open

feat: defer item tagging to an external agent (phase 2)#120
jansitarski wants to merge 4 commits into
Anyesh:mainfrom
jansitarski:jansitarski/external-item-tagging

Conversation

@jansitarski

@jansitarski jansitarski commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Description

This is phase 2 of making the internal AI optional so the backend can run with internal generation disabled and defer work to an external agent (e.g. Claude via an MCP server). Phase 1 (#113, merged in v1.4.0) added the capability switches and /capabilities; this phase makes item tagging a first-class, externally-ownable surface.

Today tagging happens implicitly via the internal vision model, and there is no way to (a) leave an item untagged for something else to tag, or (b) record whether tags came from the machine or a person. This adds an explicit tagging lifecycle and a server-derived write origin:

  • New fields on clothing_items: tagging_status (pending|tagged), tagged_by (auto|manual), tagged_at, with native PG enum types. Existing rows are backfilled to tagged/auto so nothing changes for current data.
  • Worker auto-tag stamps tagged/auto on success.
  • POST /items gains an auto_tag flag. Every enqueue site (single create, bulk create, re-analyze) is vision-guarded: when internal vision is off (or auto_tag=false), the item is left ready + pending for an external tagger instead of queuing a no-op job.
  • GET /items?tagging_status=pending exposes the external tagger's work queue.
  • PATCH /items/{id} that fills in a still-pending item's tags marks it tagged with a server-derived origin (manual). This is gated on pending, so it is a one-way transition and never re-stamps an already-tagged item, and it requires actual content — a PATCH carrying only empty/null tag values leaves the item pending. A tags write-back also projects its attributes onto their first-class columns (pattern, material, style, season, formality, colors, primary_color), keeping the column representation in sync with the tags JSONB — parity with the internal worker, so externally-tagged items remain visible to column-based filters/scoring.
  • POST /items/{id}/retag resets an item to the pending queue (clears origin, keeps tag content).
  • GET /capabilities now advertises features.external_tagging: true, per the contract established in phase 1 (flags flip in the PR that ships the write-back surface). external_suggestions/external_pairings stay false until phase 3.

Everything is additive and defaults to current behavior (internal vision on → items auto-tag exactly as before). The motivating consumers are external MCP servers that front this backend for an LLM; the design is provider-agnostic.

Related Issue

Related to #99; builds on #113 (phase 1)

Type of Change

  • New feature (non-breaking change that adds functionality)

Checklist

  • I have read the CONTRIBUTING guide
  • My code follows the project's coding style
  • I have added tests that prove my fix/feature works
  • New and existing tests pass locally
  • I have updated documentation as needed
  • My changes don't introduce new warnings or errors

Testing

Test Environment

  • Docker Compose
  • Kubernetes
  • Local development

Tests Performed

  • New backend/tests/test_item_tagging.py (14 tests): pending default + auto-tag worker origin; auto_tag/vision enqueue guards; pending work-queue filter; PATCH write-back origin; empty write-backs stay pending; no body forgery of origin; no re-stamp of an already-tagged item; tags→column projection; retag reset.
  • Full backend suite green (349 passed).
  • Migration upgrade and downgrade verified; ruff check and ruff format clean.
  • End-to-end validated against a live Kubernetes deployment (internal vision off) through an MCP client: create→pending, work-queue listing, tag write-back with origin stamping and column population, one-way origin gate under re-edits, retag re-queue, and image retrieval via the signed URLs.

Additional Notes

  • Write origin is server-derived, never trusted from the body. tagged_by is derived from the write path: the internal worker stamps auto; API write-backs stamp manual. tagging_status/tagged_by/tagged_at are intentionally absent from the writable schemas.
  • Scope note: tagged_by records auto vs manual only. An earlier iteration added a third agent origin (signed JWT actor claim + a shared-secret mint path); it was dropped because no feature consumes write provenance and tagged_by grants no authority, so the unforgeability machinery wasn't worth the surface — and the backend cannot verify whether an API client is a human app or an agent anyway. The enum can gain a value later via an additive migration if a feature ever needs to trust provenance.
  • Additive response change: ItemResponse now includes tagging_status, tagged_by, and tagged_at. No existing field changes shape.
  • The branch is organized as four independently reviewable commits: schema + migration → deferral and write surface → capabilities flip → tests.
  • The consuming MCP-side changes ship in feat(items): expose Phase 2 external-tagging surface jansitarski/wardrowbe-mcp#20: the pending work-queue tools (list_untagged_items, tagging_status filter), auto_tag on the create tools, retag_item, and the tag write-back were validated end-to-end against this branch.

Add TaggingStatus (pending|tagged) and TaggedBy (auto|manual) enums with
tagging_status/tagged_by/tagged_at columns on clothing_items, exposed on
ItemResponse and filterable via ItemFilter. tagged_by records how the current
tags were produced and is only ever set server-side: auto (internal AI worker)
or manual (supplied through the API). Existing rows are backfilled as
tagged/auto so they do not appear as pending external work.
…rface

With internal vision disabled (or auto_tag=false on upload), items skip the
tagging queue and are left ready + tagging_status=pending, immediately usable
while untagged. GET /items?tagging_status=pending is the external tagger's
work queue. A content-bearing PATCH on a pending item marks it tagged with a
server-derived manual origin — a one-way transition that never rewrites an
existing origin — and projects tag attributes onto their first-class columns
(parity with the worker's dual-write). POST /items/{id}/retag resets an item
to the queue. The internal worker stamps tagged/auto on successful auto-tag
and leaves skipped items pending.
The features block shipped false pending the write-back surface; the
tagging surface now exists, so flip external_tagging on. Suggestions and
pairings stay false until their authoring endpoints land.
AI-on and AI-off paths: the auto_tag / vision enqueue guards, the pending
work-queue filter, write-back origin stamping and its one-way transition,
origin forgery rejection, empty write-backs staying pending, tag-to-column
projection, and the retag reset. Default behavior (vision on) is asserted
unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant