Skip to content

cerefox_ingest (local MCP) destroys multi-project membership on update #38

@fstamatelopoulos

Description

@fstamatelopoulos

Summary

When an agent updates a document's content through the local Python MCP (cerefox_ingest with update_if_exists: true and a project_name), the document's project memberships are silently reset to the single project passed in. Project links added via the web UI or REST API are wiped on the next agent content update.

The remote MCP and cerefox-ingest Edge Function have a related but different problem: they silently ignore project_name on the update branches entirely — non-destructive, but you cannot change project assignment via the tool either.

The underlying data model is already many-to-many (cerefox_document_projects), and the web UI / REST /documents/{id}/edit already supports a project_ids[] set. This is purely an MCP write-surface gap.

Originally reported by an agent working with the local MCP path.

Reproduction (local Python MCP path)

  1. Create a document via cerefox_ingest with project_name: "Cerefox". The response contains a Project IDs: <uuid> line (array-shaped output from src/cerefox/mcp_server.py).
  2. In the web app, add a second project (cfcf) to the same document. Confirm with a project-filtered search on cfcf — the doc is returned.
  3. Update the document's content via cerefox_ingest with update_if_exists: true and project_name: "Cerefox".
  4. Result: the doc is associated with Cerefox only. A project-filtered search on cfcf returns no results. The cfcf membership was silently dropped.

Root cause

The destructive behavior is in the Python ingestion pipeline, not the RPC.

  • mcp_server.py:537 (_handle_ingest) calls pipeline.ingest_text(project_name=...).
  • pipeline.ingest_text resolves project_name=\"Cerefox\" to [<cerefox-uuid>] and, on the update_existing match branch (src/cerefox/ingestion/pipeline.py:170-180), forwards it to update_document(project_ids=[<cerefox-uuid>]).
  • update_document (pipeline.py:340-345, 473-478) treats a non-None project_ids as the explicit full set and calls assign_document_projects(...).
  • assign_document_projects (src/cerefox/db/client.py:474-489) is a DELETE-then-INSERT replace on cerefox_document_projects — all existing rows for that document are removed before the new set is inserted.

The update_document docstring states the intended contract ("None = unchanged, [] = clear"), and the web UI's POST /documents/{id}/edit path respects it correctly. The problem is that pipeline.ingest_text's update branch eagerly resolves a single project_name into a one-element list and then hands it to update_document as if it were an explicit full set.

For the record, the database RPCs are innocent: cerefox_ingest_document (src/cerefox/db/rpcs.sql:1045) and cerefox_snapshot_version (rpcs.sql:744) never touch cerefox_document_projects.

Cross-path behavior matrix

Path Behavior on update with project_name
Local Python MCP (cerefox_ingest) Destructive — replaces all memberships with [project_name]
Remote MCP (supabase/functions/cerefox-mcp/tools/ingest.ts) Silent — project_name ignored on both update branches (lines 223-288 and 291-352)
Edge Function cerefox-ingest (supabase/functions/cerefox-ingest/index.ts) Silent — project_name ignored on both update branches (lines 372-468 and 471-568)
Web UI → POST /documents/{id}/edit Correct — explicit project_ids[] set, respects None=unchanged

This inconsistency between local and remote MCP for the same tool name is itself part of the problem.

Impact

  • Routine agent content updates strip human-curated project links without warning.
  • No MCP tool exposes add / remove / set-multiple-projects on an existing document; cerefox_ingest (single project_name) is the only project write path agents have.
  • Undercuts the agent-first, human-on-the-loop governance model: the human curates project membership, the agent quietly erases it.

The audit log already enumerates update-content and update-metadata as distinct operation types, so the backend has a notion of attribute edits independent of content. The MCP layer just doesn't expose it.

Proposed fix (coordinated set)

Fixing only the local-MCP destruction leaves remote MCP still unable to set any project on an existing document. The four changes below should ship together:

  1. Make the update path non-destructive in pipeline.ingest_text. On the update_existing match branch (pipeline.py:157-180), do not push a single project_name into assign_document_projects. Either ignore it on update or treat it as "ensure this membership exists, leave others alone."
  2. Bring remote MCP and Edge Function to the same contract. Once (1) lands, the local and remote paths should behave identically — "content update via cerefox_ingest never touches memberships unless the caller asks."
  3. Accept project_names: string[] on cerefox_ingest with explicit "full set" semantics only when supplied; absent = unchanged. Keep project_name as a non-destructive single add for backward compatibility.
  4. Add a dedicated metadata-only tool (e.g. cerefox_set_document_projects(document_id, project_names[]), optionally cerefox_add_project / cerefox_remove_project), mapped to the existing update-metadata audit operation so agents get the same multi-project capability the web app already has.

Acceptance check

After a content-only update via cerefox_ingest (any MCP path), a document previously in two projects still returns under both project filters.

Pointers

  • Local MCP: src/cerefox/mcp_server.py:537 (_handle_ingestpipeline.ingest_text)
  • Pipeline: src/cerefox/ingestion/pipeline.py:157-180 (update_existing branch), pipeline.py:276-488 (update_document)
  • Destructive replace: src/cerefox/db/client.py:474-489 (assign_document_projects)
  • Remote MCP: supabase/functions/cerefox-mcp/tools/ingest.ts:223-288, 291-352, 407-431
  • Edge Function: supabase/functions/cerefox-ingest/index.ts:372-468, 471-568, 645-669
  • RPC (no change needed): src/cerefox/db/rpcs.sql:1045 (cerefox_ingest_document), rpcs.sql:744 (cerefox_snapshot_version)
  • Junction table: src/cerefox/db/schema.sql:120 (cerefox_document_projects)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions