Feature request: native path-injection convention for non-multimodal attachments in the web composer
Summary
The web composer (apps/app) currently has a single attachment-to-prompt shape: convert the uploaded file to a base64 data: URL and emit a type: "file" content part. This works cleanly for media types that downstream providers accept as multimodal inputs — images, PDFs. It does not provide a path for the increasingly common case where the agent should reason about the file via tools (Read, MCP tool calls, Bash) rather than ingest its bytes as multimodal content.
The result, today, is that workspaces wiring up MCP servers for spreadsheet / CSV / document analysis (a use case OpenWork's docs front-and-center via the MCP server feature) have no upstream-supported way to surface "the user attached file X; it's at path Y; decide what to do with it" to the agent.
This issue proposes opening a design conversation about a native path-injection convention in the web composer.
Why the current shape doesn't fit this case
The attachmentToFilePart path in apps/app/src/react-app/domains/session/sync/actions-store.ts emits:
{
type: "file",
url: await fileToDataUrl(attachment.file), // data:<mime>;base64,...
filename: attachment.name,
mime: attachment.mimeType,
}
For attachments outside the multimodal-supported set:
- Anthropic's Messages API rejects non-PDF MIME types in
document.source.base64 (only application/pdf accepted; non-PDF files are explicitly redirected to the Files API, a separate beta endpoint).
- OpenRouter passes the same constraint through unchanged (its
file content type is documented exclusively for PDFs in its multimodal/pdfs guide).
- Other providers vary, but the multimodal-bytes-in-payload shape is fundamentally about model-ingestible content, not file-as-path.
So in the web composer, attachments outside the model-ingestible set effectively can't reach the agent at all today — they're encoded into a payload the downstream provider drops.
The deeper design gap
Even if every provider accepted every MIME, the multimodal-bytes-in-payload shape is the wrong primitive for many real workflows. Consider:
- User attaches a 200-row CAFM asset spreadsheet. The agent's job is to call an MCP tool (
excel_analysis(file_path=...)) — not to read 5000 cells of Excel as model context.
- User attaches a 50MB log file. The agent should grep / tail with a tool — not blow the context window.
- User attaches a CSV the agent will iterate over via a structured tool.
In each case, the agent needs the path as substrate input, then uses its own tool palette to reason about what to do with it. This is exactly the natural shape in claude-code (filesystem-as-substrate, paths in prompts, Read tool fires on Bash). OpenWork-desktop has a structurally similar primitive via file:// references for client-local files. OpenWork-web has no equivalent — there's no path on the server unless something uploads the file first, and no documented convention for the SPA to do that upload + emit the path as substrate context.
Current workaround in production usage
A production deployment of OpenWork on dev SHA 674a0373 ran into this gap and shipped a workaround (relevant artifact: spa-attach-as-path.patch). The shape:
-
SPA composer detects a non-image attachment, POSTs to a project-owned /upload endpoint (a FastMCP server with a Docker-volume-shared writable path).
-
Endpoint writes the file to a workspace-scoped path and returns the path.
-
SPA emits a type: "text" content part with body:
[Attached file: <name>] Available at orchestrator-readable path `<path>`
(MIME: <mime>). Pass this path to tools that accept absolute file paths
(e.g., excel_analysis(file_path="<path>")).
-
Agent reads the text part, calls its MCP tool with the path.
This works — the agent receives the path and acts on it. But the side effect is that the substrate metadata renders visibly in the user message bubble alongside the user's typed text, because TEXT parts are the user-message rail. For a "Susan-grade" workspace where the implementer is a non-developer end user, that substrate text leaking into the user bubble breaks the chat metaphor. The workaround works mechanically but trades UX integrity for it.
The point isn't that this specific workaround is right or wrong — it's that the absence of an upstream-blessed mechanism forces every team hitting this case to invent their own and pay the same UX tax.
Proposed approaches (sketches, not prescriptions)
Three rough shapes a native solution could take, in increasing order of upstream surface area:
1. SPA-renderer-hides-tagged-text-parts (smallest)
Add a documented metadata.kind: "substrate-injection" (or similar marker) on text content parts. The SPA renderer recognizes the marker and either:
- omits the part from the user-message bubble render entirely, OR
- collapses it into a small chip (
📎 <filename>) that expands on click for inspection.
The model still receives the full text part. Agents continue using it as-is. No new API endpoints, no new content shapes — just a render-side convention.
This would let workarounds like the one above keep working while removing the UX leak.
2. Native upload-and-inject (medium)
A documented OpenWork-web convention: a configurable upload target (workspaces.uploads.endpoint in workspace config, or similar) that the SPA composer POSTs non-image attachments to. The response carries a path; the SPA emits a chip in the bubble and injects the path as system / substrate context (out-of-band from the user message, e.g., via a system: "[Attached files: ...]" clause prepended to the next request).
This makes path-injection a first-class web-mode primitive.
3. New content-part shape (largest)
Introduce a type: "attachment-path" or type: "file-reference" part that carries a path (not bytes), is forwarded to the model as substrate, and is rendered SPA-side as a chip in the bubble.
Likely too invasive without a wider design pass; sketched for completeness.
Concrete asks
- Confirm the gap exists as described (or correct the framing if a native mechanism does exist and is just undocumented).
- Open a design conversation on which of the above shapes (or a fourth) fits OpenWork's direction.
- If approach 1 or 2 fits — share a marker / config convention so downstream teams can align workarounds against the eventual upstream shape.
Documentation observation
The current docs site covers LLM providers, MCP servers, browser control, and skill sharing — but does not document file-attachment behavior, what file types are supported, or whether the design intent is "bytes-only via multimodal" or "future path-injection mode is in scope." Whatever the answer ends up being, a docs entry on attachment handling would prevent each new team from rediscovering the constraint.
Feature request: native path-injection convention for non-multimodal attachments in the web composer
Summary
The web composer (
apps/app) currently has a single attachment-to-prompt shape: convert the uploaded file to a base64data:URL and emit atype: "file"content part. This works cleanly for media types that downstream providers accept as multimodal inputs — images, PDFs. It does not provide a path for the increasingly common case where the agent should reason about the file via tools (Read, MCP tool calls, Bash) rather than ingest its bytes as multimodal content.The result, today, is that workspaces wiring up MCP servers for spreadsheet / CSV / document analysis (a use case OpenWork's docs front-and-center via the MCP server feature) have no upstream-supported way to surface "the user attached file X; it's at path Y; decide what to do with it" to the agent.
This issue proposes opening a design conversation about a native path-injection convention in the web composer.
Why the current shape doesn't fit this case
The
attachmentToFilePartpath inapps/app/src/react-app/domains/session/sync/actions-store.tsemits:For attachments outside the multimodal-supported set:
document.source.base64(onlyapplication/pdfaccepted; non-PDF files are explicitly redirected to the Files API, a separate beta endpoint).filecontent type is documented exclusively for PDFs in its multimodal/pdfs guide).So in the web composer, attachments outside the model-ingestible set effectively can't reach the agent at all today — they're encoded into a payload the downstream provider drops.
The deeper design gap
Even if every provider accepted every MIME, the multimodal-bytes-in-payload shape is the wrong primitive for many real workflows. Consider:
excel_analysis(file_path=...)) — not to read 5000 cells of Excel as model context.In each case, the agent needs the path as substrate input, then uses its own tool palette to reason about what to do with it. This is exactly the natural shape in
claude-code(filesystem-as-substrate, paths in prompts, Read tool fires on Bash). OpenWork-desktop has a structurally similar primitive viafile://references for client-local files. OpenWork-web has no equivalent — there's no path on the server unless something uploads the file first, and no documented convention for the SPA to do that upload + emit the path as substrate context.Current workaround in production usage
A production deployment of OpenWork on
devSHA674a0373ran into this gap and shipped a workaround (relevant artifact:spa-attach-as-path.patch). The shape:SPA composer detects a non-image attachment, POSTs to a project-owned
/uploadendpoint (a FastMCP server with a Docker-volume-shared writable path).Endpoint writes the file to a workspace-scoped path and returns the path.
SPA emits a
type: "text"content part with body:Agent reads the text part, calls its MCP tool with the path.
This works — the agent receives the path and acts on it. But the side effect is that the substrate metadata renders visibly in the user message bubble alongside the user's typed text, because TEXT parts are the user-message rail. For a "Susan-grade" workspace where the implementer is a non-developer end user, that substrate text leaking into the user bubble breaks the chat metaphor. The workaround works mechanically but trades UX integrity for it.
The point isn't that this specific workaround is right or wrong — it's that the absence of an upstream-blessed mechanism forces every team hitting this case to invent their own and pay the same UX tax.
Proposed approaches (sketches, not prescriptions)
Three rough shapes a native solution could take, in increasing order of upstream surface area:
1. SPA-renderer-hides-tagged-text-parts (smallest)
Add a documented
metadata.kind: "substrate-injection"(or similar marker) on text content parts. The SPA renderer recognizes the marker and either:📎 <filename>) that expands on click for inspection.The model still receives the full text part. Agents continue using it as-is. No new API endpoints, no new content shapes — just a render-side convention.
This would let workarounds like the one above keep working while removing the UX leak.
2. Native upload-and-inject (medium)
A documented OpenWork-web convention: a configurable upload target (
workspaces.uploads.endpointin workspace config, or similar) that the SPA composer POSTs non-image attachments to. The response carries a path; the SPA emits a chip in the bubble and injects the path as system / substrate context (out-of-band from the user message, e.g., via asystem: "[Attached files: ...]"clause prepended to the next request).This makes path-injection a first-class web-mode primitive.
3. New content-part shape (largest)
Introduce a
type: "attachment-path"ortype: "file-reference"part that carries a path (not bytes), is forwarded to the model as substrate, and is rendered SPA-side as a chip in the bubble.Likely too invasive without a wider design pass; sketched for completeness.
Concrete asks
Documentation observation
The current docs site covers LLM providers, MCP servers, browser control, and skill sharing — but does not document file-attachment behavior, what file types are supported, or whether the design intent is "bytes-only via multimodal" or "future path-injection mode is in scope." Whatever the answer ends up being, a docs entry on attachment handling would prevent each new team from rediscovering the constraint.