Summary
When an agent updates a document's content through the local Python MCP (cerefox_ingest with update_if_exists: true and a project_name), the document's project memberships are silently reset to the single project passed in. Project links added via the web UI or REST API are wiped on the next agent content update.
The remote MCP and cerefox-ingest Edge Function have a related but different problem: they silently ignore project_name on the update branches entirely — non-destructive, but you cannot change project assignment via the tool either.
The underlying data model is already many-to-many (cerefox_document_projects), and the web UI / REST /documents/{id}/edit already supports a project_ids[] set. This is purely an MCP write-surface gap.
Originally reported by an agent working with the local MCP path.
Reproduction (local Python MCP path)
- Create a document via
cerefox_ingest with project_name: "Cerefox". The response contains a Project IDs: <uuid> line (array-shaped output from src/cerefox/mcp_server.py).
- In the web app, add a second project (
cfcf) to the same document. Confirm with a project-filtered search on cfcf — the doc is returned.
- Update the document's content via
cerefox_ingest with update_if_exists: true and project_name: "Cerefox".
- Result: the doc is associated with
Cerefox only. A project-filtered search on cfcf returns no results. The cfcf membership was silently dropped.
Root cause
The destructive behavior is in the Python ingestion pipeline, not the RPC.
mcp_server.py:537 (_handle_ingest) calls pipeline.ingest_text(project_name=...).
pipeline.ingest_text resolves project_name=\"Cerefox\" to [<cerefox-uuid>] and, on the update_existing match branch (src/cerefox/ingestion/pipeline.py:170-180), forwards it to update_document(project_ids=[<cerefox-uuid>]).
update_document (pipeline.py:340-345, 473-478) treats a non-None project_ids as the explicit full set and calls assign_document_projects(...).
assign_document_projects (src/cerefox/db/client.py:474-489) is a DELETE-then-INSERT replace on cerefox_document_projects — all existing rows for that document are removed before the new set is inserted.
The update_document docstring states the intended contract ("None = unchanged, [] = clear"), and the web UI's POST /documents/{id}/edit path respects it correctly. The problem is that pipeline.ingest_text's update branch eagerly resolves a single project_name into a one-element list and then hands it to update_document as if it were an explicit full set.
For the record, the database RPCs are innocent: cerefox_ingest_document (src/cerefox/db/rpcs.sql:1045) and cerefox_snapshot_version (rpcs.sql:744) never touch cerefox_document_projects.
Cross-path behavior matrix
| Path |
Behavior on update with project_name |
Local Python MCP (cerefox_ingest) |
Destructive — replaces all memberships with [project_name] |
Remote MCP (supabase/functions/cerefox-mcp/tools/ingest.ts) |
Silent — project_name ignored on both update branches (lines 223-288 and 291-352) |
Edge Function cerefox-ingest (supabase/functions/cerefox-ingest/index.ts) |
Silent — project_name ignored on both update branches (lines 372-468 and 471-568) |
Web UI → POST /documents/{id}/edit |
Correct — explicit project_ids[] set, respects None=unchanged |
This inconsistency between local and remote MCP for the same tool name is itself part of the problem.
Impact
- Routine agent content updates strip human-curated project links without warning.
- No MCP tool exposes add / remove / set-multiple-projects on an existing document;
cerefox_ingest (single project_name) is the only project write path agents have.
- Undercuts the agent-first, human-on-the-loop governance model: the human curates project membership, the agent quietly erases it.
The audit log already enumerates update-content and update-metadata as distinct operation types, so the backend has a notion of attribute edits independent of content. The MCP layer just doesn't expose it.
Proposed fix (coordinated set)
Fixing only the local-MCP destruction leaves remote MCP still unable to set any project on an existing document. The four changes below should ship together:
- Make the update path non-destructive in
pipeline.ingest_text. On the update_existing match branch (pipeline.py:157-180), do not push a single project_name into assign_document_projects. Either ignore it on update or treat it as "ensure this membership exists, leave others alone."
- Bring remote MCP and Edge Function to the same contract. Once (1) lands, the local and remote paths should behave identically — "content update via
cerefox_ingest never touches memberships unless the caller asks."
- Accept
project_names: string[] on cerefox_ingest with explicit "full set" semantics only when supplied; absent = unchanged. Keep project_name as a non-destructive single add for backward compatibility.
- Add a dedicated metadata-only tool (e.g.
cerefox_set_document_projects(document_id, project_names[]), optionally cerefox_add_project / cerefox_remove_project), mapped to the existing update-metadata audit operation so agents get the same multi-project capability the web app already has.
Acceptance check
After a content-only update via cerefox_ingest (any MCP path), a document previously in two projects still returns under both project filters.
Pointers
- Local MCP:
src/cerefox/mcp_server.py:537 (_handle_ingest → pipeline.ingest_text)
- Pipeline:
src/cerefox/ingestion/pipeline.py:157-180 (update_existing branch), pipeline.py:276-488 (update_document)
- Destructive replace:
src/cerefox/db/client.py:474-489 (assign_document_projects)
- Remote MCP:
supabase/functions/cerefox-mcp/tools/ingest.ts:223-288, 291-352, 407-431
- Edge Function:
supabase/functions/cerefox-ingest/index.ts:372-468, 471-568, 645-669
- RPC (no change needed):
src/cerefox/db/rpcs.sql:1045 (cerefox_ingest_document), rpcs.sql:744 (cerefox_snapshot_version)
- Junction table:
src/cerefox/db/schema.sql:120 (cerefox_document_projects)
Summary
When an agent updates a document's content through the local Python MCP (
cerefox_ingestwithupdate_if_exists: trueand aproject_name), the document's project memberships are silently reset to the single project passed in. Project links added via the web UI or REST API are wiped on the next agent content update.The remote MCP and
cerefox-ingestEdge Function have a related but different problem: they silently ignoreproject_nameon the update branches entirely — non-destructive, but you cannot change project assignment via the tool either.The underlying data model is already many-to-many (
cerefox_document_projects), and the web UI / REST/documents/{id}/editalready supports aproject_ids[]set. This is purely an MCP write-surface gap.Originally reported by an agent working with the local MCP path.
Reproduction (local Python MCP path)
cerefox_ingestwithproject_name: "Cerefox". The response contains aProject IDs: <uuid>line (array-shaped output fromsrc/cerefox/mcp_server.py).cfcf) to the same document. Confirm with a project-filtered search oncfcf— the doc is returned.cerefox_ingestwithupdate_if_exists: trueandproject_name: "Cerefox".Cerefoxonly. A project-filtered search oncfcfreturns no results. Thecfcfmembership was silently dropped.Root cause
The destructive behavior is in the Python ingestion pipeline, not the RPC.
mcp_server.py:537(_handle_ingest) callspipeline.ingest_text(project_name=...).pipeline.ingest_textresolvesproject_name=\"Cerefox\"to[<cerefox-uuid>]and, on theupdate_existingmatch branch (src/cerefox/ingestion/pipeline.py:170-180), forwards it toupdate_document(project_ids=[<cerefox-uuid>]).update_document(pipeline.py:340-345, 473-478) treats a non-Noneproject_idsas the explicit full set and callsassign_document_projects(...).assign_document_projects(src/cerefox/db/client.py:474-489) is a DELETE-then-INSERT replace oncerefox_document_projects— all existing rows for that document are removed before the new set is inserted.The
update_documentdocstring states the intended contract ("None= unchanged,[]= clear"), and the web UI'sPOST /documents/{id}/editpath respects it correctly. The problem is thatpipeline.ingest_text's update branch eagerly resolves a singleproject_nameinto a one-element list and then hands it toupdate_documentas if it were an explicit full set.For the record, the database RPCs are innocent:
cerefox_ingest_document(src/cerefox/db/rpcs.sql:1045) andcerefox_snapshot_version(rpcs.sql:744) never touchcerefox_document_projects.Cross-path behavior matrix
project_namecerefox_ingest)[project_name]supabase/functions/cerefox-mcp/tools/ingest.ts)project_nameignored on both update branches (lines 223-288 and 291-352)cerefox-ingest(supabase/functions/cerefox-ingest/index.ts)project_nameignored on both update branches (lines 372-468 and 471-568)POST /documents/{id}/editproject_ids[]set, respectsNone=unchangedThis inconsistency between local and remote MCP for the same tool name is itself part of the problem.
Impact
cerefox_ingest(singleproject_name) is the only project write path agents have.The audit log already enumerates
update-contentandupdate-metadataas distinct operation types, so the backend has a notion of attribute edits independent of content. The MCP layer just doesn't expose it.Proposed fix (coordinated set)
Fixing only the local-MCP destruction leaves remote MCP still unable to set any project on an existing document. The four changes below should ship together:
pipeline.ingest_text. On theupdate_existingmatch branch (pipeline.py:157-180), do not push a singleproject_nameintoassign_document_projects. Either ignore it on update or treat it as "ensure this membership exists, leave others alone."cerefox_ingestnever touches memberships unless the caller asks."project_names: string[]oncerefox_ingestwith explicit "full set" semantics only when supplied; absent = unchanged. Keepproject_nameas a non-destructive single add for backward compatibility.cerefox_set_document_projects(document_id, project_names[]), optionallycerefox_add_project/cerefox_remove_project), mapped to the existingupdate-metadataaudit operation so agents get the same multi-project capability the web app already has.Acceptance check
After a content-only update via
cerefox_ingest(any MCP path), a document previously in two projects still returns under both project filters.Pointers
src/cerefox/mcp_server.py:537(_handle_ingest→pipeline.ingest_text)src/cerefox/ingestion/pipeline.py:157-180(update_existing branch),pipeline.py:276-488(update_document)src/cerefox/db/client.py:474-489(assign_document_projects)supabase/functions/cerefox-mcp/tools/ingest.ts:223-288, 291-352, 407-431supabase/functions/cerefox-ingest/index.ts:372-468, 471-568, 645-669src/cerefox/db/rpcs.sql:1045(cerefox_ingest_document),rpcs.sql:744(cerefox_snapshot_version)src/cerefox/db/schema.sql:120(cerefox_document_projects)