Spun out from #10 (point 11, raised by @mx5kevin).
Problem
Today every site stores its own copy of a file, keyed by (site_address, sha512). If the same large file (video, dataset, PDF) is referenced from multiple sites, each peer ends up storing N copies on disk and uploading it N times across the network.
This is especially wasteful for:
- Video sharing sites re-uploading the same clips
- Mirror sites and content aggregators
- Popular optional-file downloads
Proposal
Introduce a global content-addressed store where files are stored once per local peer, keyed by SHA512, and multiple sites reference them by hash.
Approach sketch
Design questions
- Backwards compatibility: old sites keep working, new sites opt into global hashes
- Security: a malicious site can't "poison" the global store because hashes are verified
- Storage accounting: does a 1GB file count once or N times against the user's storage budget?
- Discovery: how do peers announce they have a hash that isn't tied to a specific site they've joined?
Non-goals
- Not a full CAS redesign of EpixNet storage, this is additive and opt-in
- Not required for small files where the overhead wouldn't pay off
Related
Credit: @mx5kevin in #10.
Spun out from #10 (point 11, raised by @mx5kevin).
Problem
Today every site stores its own copy of a file, keyed by
(site_address, sha512). If the same large file (video, dataset, PDF) is referenced from multiple sites, each peer ends up storing N copies on disk and uploading it N times across the network.This is especially wasteful for:
Proposal
Introduce a global content-addressed store where files are stored once per local peer, keyed by SHA512, and multiple sites reference them by hash.
Approach sketch
data/_content/<sha512_prefix>/<sha512>with a refcount per sitecontent.jsonfile entries can reference a global hash instead of (or in addition to) a site-relative pathDesign questions
Non-goals
Related
Credit: @mx5kevin in #10.