Skip to content

elixir-vibe/hex-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hex Playground

Corpus playground for running local tools against large sets of Hex.pm packages.

Setup

cd ~/Development/hex-playground
mix deps.get

You can run it as a Mix task:

mix hex_playground.fetch --mode latest --limit 300 --concurrency 8

Or build a standalone escript:

mix escript.build
./hex_playground fetch --mode latest --limit 300

Fetch a corpus

Fetch and extract the latest release of packages from the signed Hex repository registry:

mix hex_playground.fetch --mode latest --limit 300 --concurrency 8

This creates:

  • manifest.json — package metadata, paths, mirror used, and file-extension counts
  • sources/<package>-<version>/ — extracted package sources
  • tarballs/<package>-<version>.tar — cached Hex tarballs

Useful modes:

# Latest release of every public Hex package
mix hex_playground.fetch --mode latest --concurrency 16 --prune-non-elixir

# Every public package version. Large: currently ~150k releases.
mix hex_playground.fetch --mode all --concurrency 16

# Top packages by downloads, using the Hex HTTP API for ranking
mix hex_playground.fetch --mode top --limit 1000 --concurrency 16

latest and all use the Hex repository endpoint:

https://repo.hex.pm/versions

Tarballs are downloaded from:

https://repo.hex.pm/tarballs/<name>-<version>.tar

and unpacked with hex_core.

Mirror balancing

Tarball downloads can be balanced across multiple repository mirrors. Registry discovery still uses --registry-url so the signed Hex.pm registry remains the source of truth.

mix hex_playground.fetch \
  --mode latest \
  --limit 1000 \
  --concurrency 16 \
  --mirror https://repo.hex.pm \
  --mirror https://cdn.jsdelivr.net/hex \
  --mirror-strategy round_robin

You can also pass mirrors comma-separated:

mix hex_playground.fetch \
  --mirror https://repo.hex.pm,https://cdn.jsdelivr.net/hex \
  --mirror-strategy random

Available strategies:

  • round_robin — distribute package tarball attempts across mirrors
  • random — pick a random starting mirror per package

If a mirror fails for a tarball, the downloader falls back to the remaining mirrors. Only https://repo.hex.pm is the official Hex.pm mirror; other mirrors are useful for public tarballs but should be treated as untrusted.

Build a serveable Hex.pm-compatible mirror

Mirror the signed Hex registry files and package tarballs into a static-file layout compatible with Hex clients:

mix hex_playground.mirror \
  --out mirror \
  --concurrency 32 \
  --package-concurrency 16 \
  --mirror https://repo.hex.pm \
  --mirror https://cdn.jsdelivr.net/hex

This creates:

  • mirror/names
  • mirror/versions
  • mirror/public_key
  • mirror/packages/<name>
  • mirror/tarballs/<name>-<version>.tar
  • mirror/.hex_playground/manifest.ndjson
  • mirror/.hex_playground/failures.ndjson when downloads fail
  • mirror/.hex_playground/summary.json

Registry metadata is always fetched from --registry-url, defaulting to the official https://repo.hex.pm. Tarball downloads are balanced across --mirror URLs with fallback when a mirror fails. Existing valid tarballs are reused unless --force is passed.

For a small test run:

mix hex_playground.mirror --out mirror-test --limit 20 --concurrency 4

Serve the mirror with any static HTTP server rooted at mirror/:

cd mirror
python3 -m http.server 8080

The served paths must match Hex's repository paths exactly:

/names
/versions
/public_key
/packages/<name>
/tarballs/<name>-<version>.tar

Verify a completed or partial mirror:

mix hex_playground.mirror.verify --out mirror

The verifier checks required registry files, package metadata files referenced by the manifest, tarball presence, and tarball unpacking. Hex tarballs with metadata files too large for hex_core's in-memory unpack safety limit are treated as valid, because they are still serveable by a mirror and fetchable by Hex clients. It writes mirror/.hex_playground/verify-summary.json.

To use the mirror as a drop-in replacement for the default Hex repo in an isolated Mix home:

MIX_HOME=/tmp/hex-mirror-mix \
  mix hex.repo set hexpm \
  --url http://localhost:8080 \
  --public-key mirror/public_key

Then ordinary Hex commands use the mirror:

MIX_HOME=/tmp/hex-mirror-mix mix hex.package fetch a1 0.25.0

If you add the mirror under a new repo name instead of overriding hexpm, Hex will reject upstream registry metadata unless you set HEX_NO_VERIFY_REPO_ORIGIN=1, because the signed package records still declare their origin as hexpm.

Run tools against every package

Use scripts/run_tool.exs with a command after --. Placeholders:

  • {name} — Hex package name
  • {version} — package version
  • {path} — relative source path
  • {abs_path} — absolute source path

Examples:

./scripts/run_tool.exs --limit 20 -- elixir -e 'IO.puts(System.get_env("HEX_PLAYGROUND_PACKAGE"))'

./scripts/run_tool.exs --limit 300 -- bash -lc 'find lib src -type f 2>/dev/null | wc -l'

./scripts/run_tool.exs --limit 300 -- bash -lc 'mix ex_dna --format json 2>/dev/null || true'

Each run writes:

  • runs/<timestamp>/results.ndjson
  • runs/<timestamp>/summary.json
  • one log file per package

Notes

This directory is intentionally data-heavy. Keep generated corpus data out of git unless explicitly needed.

About

Corpus playground for running local tools against popular Hex.pm packages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages