There is a coupling between erigon and erigon-snapshot which has historical reasons, but IMHO nowadays it is:
- unnecessary
- creates confusion, it is very coupled to erigon (the company) infrastructure layout
- requires changing multiple points every time we need to change something
Recent example (this is the 2nd time that happens to me, and I always need to spend 1 afternoon remembering/AI assisting now how the code works and which code change I forgot):
performance branch was created in erigon-snapshots
- sync stops working
- it is serving torrents correctly
- manifest looks fine in erigon-snapshot
- actually need to update constant inside erigon code
That is only one variation/possibility, but in general I think we should (in no particular order, only things I remember right now):
- remove the dependency with erigon-snapshot (as go mod).
- the historical reason was fallback (as I remember): if getting remote hashes fail, then use embedded hashes.
- but in the past we only had github (now we have R2, and github as a fallback)
- even if R2/github fails, due to merging, etc., (1) there is a high probability that some files in the embedded hashes don't exist anymore in the torrent network (2) if the user manage to get such torrent, it'll be far from the tip. in both cases UX will be better if the user just wait for connectivity to R2/github in order to get up-to-date-to-tip chain.toml
- last time I tested the fallback was not working, i.e., R2/github down => erigon halts
- there is some code inside erigon-snapshot, but AFAIK they are all to interpret the embedded hashes or temporary hacks to connect to R2 (which we did ASAP when github started to block us but we didn't cleanup), hence they can be removed or internalized into erigon.
- internalize the webseeds URLs from erigon-snapshot into erigon (they are constants and there are no fallbacks, AFAIK they don't need to be inside erigon-snapshot).
- turn erigon-snapshot into a release-target-only .toml repo for our automated process
- remove the hardcoded constant with the erigon-snapshots branch name
- this is a source of confusion, it does a lookup to webseed URL (from embedded erigon-snapshot go mod) + hardcoded branch name. if you create a branch in erigon-snapshot, you need to remember to point erigon binary to the new branch name. in practice we never did publish multiple branches for the same chain and if we forget to do that, then we publish to the correct place, but the erigon "feels" like it is working, but it doesn't, bc it downloads torrent from the wrong branch. not fail fast.
- we should publish to/read from only 1 place: the webseed determined to the domain hardcoded inside erigon binary and the chain name which is dynamic.
- if we need some variation, then we can provide some env var manifest override to provide a full URL.
- remove non-valid chain.toml inside erigon-snapshot repo: e.g., the performance branch should contain only bloatnet.toml, everything else can be removed; we should do similar policy to all active branches in order to failfast if any process tries to publish/read by mistake a wrong chain in a wrong branch.
There is a coupling between erigon and erigon-snapshot which has historical reasons, but IMHO nowadays it is:
Recent example (this is the 2nd time that happens to me, and I always need to spend 1 afternoon remembering/AI assisting now how the code works and which code change I forgot):
performancebranch was created in erigon-snapshotsThat is only one variation/possibility, but in general I think we should (in no particular order, only things I remember right now):