Skip to content

Global content-addressed file dedup across sites #16

@MudDev

Description

@MudDev

Spun out from #10 (point 11, raised by @mx5kevin).

Problem

Today every site stores its own copy of a file, keyed by (site_address, sha512). If the same large file (video, dataset, PDF) is referenced from multiple sites, each peer ends up storing N copies on disk and uploading it N times across the network.

This is especially wasteful for:

  • Video sharing sites re-uploading the same clips
  • Mirror sites and content aggregators
  • Popular optional-file downloads

Proposal

Introduce a global content-addressed store where files are stored once per local peer, keyed by SHA512, and multiple sites reference them by hash.

Approach sketch

Design questions

  • Backwards compatibility: old sites keep working, new sites opt into global hashes
  • Security: a malicious site can't "poison" the global store because hashes are verified
  • Storage accounting: does a 1GB file count once or N times against the user's storage budget?
  • Discovery: how do peers announce they have a hash that isn't tied to a specific site they've joined?

Non-goals

  • Not a full CAS redesign of EpixNet storage, this is additive and opt-in
  • Not required for small files where the overhead wouldn't pay off

Related

Credit: @mx5kevin in #10.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions