feat: add embedded ms3t S3 listener backed by Forge#34
Conversation
Adds an S3-compatible HTTP listener that runs inside sprue, gated by config.MS3T.Enabled. When enabled, sprue exposes a path-style S3 API on a separate port; PUT/GET/HEAD/DELETE/LIST translate into mutations on a Merkle Search Tree whose blocks ship to piri via sprue's existing piriclient/routing/indexerclient (no UCAN-over-HTTP loopback). In ms3t.forge.no_cache mode (the smelt-deployed shape): - All block reads go through indexer queries + UCAN-authorized ranged retrieves on piri - Writes are synchronous to Forge — three round trips per S3 PUT - Local state is the registry SQLite (bucket → root CID) and a generated space keypair; ms3t is its own UCAN root authority When ms3t.forge.enabled is false, falls back to a local-disk uploader for development without Forge connectivity. See pkg/ms3t/architectural.md for prototype-level design notes, the choice points, and open questions for the team. Wired into the fx graph via internal/fx/ms3t.go; configuration lives under the new ms3t: block in config.example.yaml.
| - **Why it's awkward**: `aws s3 sync` of many small files is slow. | ||
| An MST traversal during a PUT pays N network round trips for N | ||
| existing nodes on the path, even though those nodes are | ||
| deterministic. |
There was a problem hiding this comment.
question: Can we pipeline these requests? Or rather, a) can we support pipelining, and b) do S3 clients typically support it? It wouldn't be slow to hold the PUT open until completion if the next PUT could start before the previous one closed.
| - **Why we picked this**: zero out-of-band provisioning. The first | ||
| time sprue starts with `forge.enabled`, ms3t writes a key and | ||
| uses it. No "go ask the delegator for a delegation, paste it | ||
| here." |
There was a problem hiding this comment.
thought: I think the whole provisioning story is pretty undefined right now. Personally, I'd be comfortable with (and recommend) leaving that to a separate decision process (which I think is more or less the idea as written). I think we have room in the options discussed here for whatever that outcome is. But that investigation is going to bring up all sorts of questions of identity and authorization. If we bridge S3 auth to UCAN, what S3 auth are we even bridging? We have product questions here to resolve as much as technical ones.
I'm going to flag that we need that conversation as well, just to make sure that happens.
There was a problem hiding this comment.
Full agreeement here, this was another quick and dirty decision targeting an MVP. I know @alanshaw has some ideas on placeing bucket metadata (the MST) in its own space and such.
| Body chunks ride in the same CAR as the structural blocks. The | ||
| indexer maps inner CIDs to byte ranges within the outer CAR. One | ||
| data-CAR upload + one index-blob upload per PUT. |
There was a problem hiding this comment.
question/thought: This is addressed by fil-one/RFC#2, correct? Specifically, under that proposal, this would still be two uploads per PUT, but the data upload would be a raw chunk of data, and the index CAR would contain the UnixFS metadata nodes over it. That's nearly the best of both worlds, although we still can't quite do direct passthrough. But we can do relaying by chunks with a fixed buffer size, which is nearly as good.
| - **Why it's awkward**: the operator running sprue + piri pays | ||
| bandwidth twice (client→sprue, sprue→piri) when conceptually | ||
| the bytes only need to move once. In a federated model where | ||
| piri storage is run by different operators, this becomes | ||
| structurally wrong (sprue's operator pays to deposit bytes onto | ||
| someone else's hardware). |
There was a problem hiding this comment.
thought: There's another reason: trust. Under direct passthrough, we trust the Piri operator to hash and store the data correctly. Under the system here, that trust is places in the facade only, and the use of Piri remains trustless. I think that's the correct alignment. The S3 facade, like the HTTP gateway, requires trust as it bridges from IPLD and UCAN to the outside world. So that layer should ideally be what the customer has to trust, and ideally everything behind it remains just as trustless as before.
| - **Why we picked this**: zero auth coordination — ms3t is sprue, | ||
| it has all sprue's identities and clients in-process. One binary | ||
| to ship, one config file. |
There was a problem hiding this comment.
issue: I'm not a fan of this identity conflation. I think it makes sense for the moment to put them in the same process/deployment for convenience, but using the same identity smells wrong to me. But this will likely/hopefully be driven out by the full auth story.
There was a problem hiding this comment.
Although, reading the architecture closer, it looks like they're not conflated after all, so maybe I misunderstood what this choice was about?
| The current code assumes a single ms3t instance per bucket, via the | ||
| in-process `sync.Mutex` per-bucket lock. There is no cross-instance | ||
| coordination. |
There was a problem hiding this comment.
question: What are multiple "instances"? Multiple processes? Does that mean multiple instances of Sprue as well?
| type Body struct { | ||
| Size int64 | ||
| ChunkSize int64 | ||
| Chunks []cid.Cid |
There was a problem hiding this comment.
thought: Under fil-one/RFC#2, I think we'll have a single root we can store here instead of individual chunks.
There was a problem hiding this comment.
Yup! Next steps here would be adopting fil-one/RFC#2 once its settled on. This was just my quick and dirty: "make this work for an MVP"
| - **Why we picked this**: zero auth coordination — ms3t is sprue, | ||
| it has all sprue's identities and clients in-process. One binary | ||
| to ship, one config file. |
There was a problem hiding this comment.
Although, reading the architecture closer, it looks like they're not conflated after all, so maybe I misunderstood what this choice was about?
- s3 versitygw wired in with pass/fail sets
To review, start here: https://github.com/storacha/sprue/pull/34/changes#diff-c1e3f8006e0cc6f137969167b1d125433db5f4941d0975f8a7b53abdef81f954
Adds an S3-compatible HTTP listener that runs inside sprue, gated by config.MS3T.Enabled. When enabled, sprue exposes a path-style S3 API on a separate port; PUT/GET/HEAD/DELETE/LIST translate into mutations on a Merkle Search Tree whose blocks ship to piri via sprue's existing piriclient/routing/indexerclient (no UCAN-over-HTTP loopback).
In ms3t.forge.no_cache mode (the smelt-deployed shape):
When ms3t.forge.enabled is false, falls back to a local-disk uploader for development without Forge connectivity.
See pkg/ms3t/architectural.md for prototype-level design notes, the choice points, and open questions for the team.
Wired into the fx graph via internal/fx/ms3t.go; configuration lives under the new ms3t: block in config.example.yaml.