From e0ada6477722f631be86c7e61492757162efef26 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 13:58:35 -0700 Subject: [PATCH 01/10] docs(user): add NERSC (Perlmutter) guide under user/ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reorganized away from the now-removed docs/hpc/ tree and into the new docs/user/ user-guide structure. The page is a focused, site- specific walkthrough: install Claude Code → pick a Python env (with the 40 GB home-quota → \$SCRATCH symlink note) → install lightcone-cli from PyPI or source → init a project → run on compute nodes via salloc + claude. Wired into the User Guide nav as "NERSC (Perlmutter)" right after "Running on a Cluster" (which it complements with site-specific overlays), and cross-linked from cluster.md so NERSC users land here naturally. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/cluster.md | 5 + docs/user/nersc.md | 229 +++++++++++++++++++++++++++++++++++++++++++ zensical.toml | 1 + 3 files changed, 235 insertions(+) create mode 100644 docs/user/nersc.md diff --git a/docs/user/cluster.md b/docs/user/cluster.md index 5da22935..2b7e25fa 100644 --- a/docs/user/cluster.md +++ b/docs/user/cluster.md @@ -5,6 +5,11 @@ a SLURM HPC system. There's no separate configuration to learn — the same `lc run` command works inside an allocation, just with more hardware to spread across. +> On NERSC Perlmutter, the filesystem layout (DVS-mounted home, Lustre +> scratch) and the `module load conda` workflow add a few site-specific +> considerations. See [NERSC (Perlmutter)](nersc.md) for a focused +> walkthrough. + ## The big picture `lc run` always dispatches through a Dask cluster. Three branches: diff --git a/docs/user/nersc.md b/docs/user/nersc.md new file mode 100644 index 00000000..a88279cd --- /dev/null +++ b/docs/user/nersc.md @@ -0,0 +1,229 @@ +# Installing and Using lightcone-cli on NERSC + +A practical guide for running [`lightcone-cli`](https://github.com/LightconeResearch/lightcone-cli) on Perlmutter. The CLI works the same as anywhere else, but the filesystem layout, container runtime, and SLURM submission have NERSC-specific quirks that are worth knowing about up front. + +--- + +## 0. Install Claude Code + +`lightcone-cli` is the execution layer of the `lightcone` project — it harnesses a coding agent (e.g. Claude Code) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent. For now the project is built around **Claude Code**, which can be installed via: + +```bash +curl -fsSL claude.ai/install.sh | bash # installs to ~/.local/bin/claude +``` + +Add `~/.local/bin` to your `PATH` if it isn't already, then verify and authenticate: + +```bash +claude --version +claude # first run prompts for login (claude.ai or API key) +``` + +Other install routes (npm, native package managers) are documented in the [Claude Code installation docs](https://docs.claude.com/en/docs/claude-code/setup). + +--- + +## 1. Pick a Python environment + +Next, set up a Python environment for `lightcone-cli` (Python 3.11+ required). There are two practical options on Perlmutter: + +### Option A — conda env (recommended) + +```bash +module load conda # NERSC's miniconda +conda create -n your-env-name python=3.11 -y +conda activate your-env-name +``` + +Conda envs land under `~/.conda/envs/` (your home, not CFS). They're persistent across sessions; just `conda activate your-env-name` next time. + +> The home disk quota on NERSC is capped at 40 GB, so for larger envs it's worth moving the env to `$SCRATCH` and pointing the original location at it via a symlink: +> +> ```bash +> # Move the env once it's created, then symlink the original location +> conda deactivate +> mv ~/.conda/envs/your-env-name $SCRATCH/conda-envs/ +> ln -s $SCRATCH/conda-envs/your-env-name ~/.conda/envs/your-env-name +> ``` +> +> Caveats: `$SCRATCH` is purged on a 12-week rolling window — the env will silently disappear. If you go this route, set up a periodic `touch` job or use `/global/cfs/cdirs//conda-envs/` instead. +> +> See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy and [the `ln(1)` man page](https://man7.org/linux/man-pages/man1/ln.1.html) for the symlink syntax. + +### Option B — venv inside an existing conda env + +If you already have a project conda env (e.g. `lightcone`) and just want `lc` available alongside it without polluting the conda env: + +```bash +module load conda +conda activate lightcone +python -m venv ~/.lightcone/.venv # or wherever you prefer +source ~/.lightcone/.venv/bin/activate +``` + +**Pitfall:** if `lc` ends up installed in more than one env (e.g. both the conda env and a venv), the wrong one can shadow the other on `PATH`. After install, always run `which lc` to confirm you're getting the binary you expect. + +--- + +## 2. Install lightcone-cli + +With the environment ready, install the package itself. + +### From PyPI (recommended) + +`lightcone-cli` and its companion package `astra-tools` are both published to PyPI, so a single command does it: + +```bash +pip install lightcone-cli astra-tools +``` + +### From source + +You're also welcome to install from source — useful if you want to follow the latest commits or contribute back to the repo. Note the GitHub repo for `astra-tools` is named `ASTRA`: + +```bash +cd ~/.lightcone # or wherever you keep clones + +git clone https://github.com/LightconeResearch/lightcone-cli.git +pip install -e ./lightcone-cli # editable install, follows local edits + +git clone https://github.com/LightconeResearch/ASTRA.git +pip install -e ./ASTRA # same for astra-tools +``` + +For development work, add the dev extras: + +```bash +pip install -e "./lightcone-cli[dev]" # adds pytest, ruff, mypy +``` + +### Verify + +```bash +which lc # should be inside your active env's bin/ +lc --version +lc --help +``` + +--- + +## 3. Initialize a new project + +Now you're ready to start working with it: + +```bash +lc init your-analysis # scaffolds a new folder with everything lightcone needs +cd your-analysis +claude # launch Claude Code inside the project +``` + +--- + +## 4. Start your research with lightcone! + +Once Claude Code is open, you can use the lightcone skillset to start a fresh analysis or migrate one from existing code — all driven by natural-language prompts to the agent. + +For example, to start from scratch: + +```text +/lc-new Please sample a standard Gaussian distribution using numpy. +``` + +Or to migrate from existing code in another directory: + +```text +/lc-migrate I have code that samples a standard Gaussian distribution using numpy at @../gaussian_sampling. Please create an analysis based on it. +``` + +After initialization, just keep talking to the agent in plain English about what you want to build next. Note that your job will all run on **login node**, see the next section on how to run jobs on computing node. + +--- + +## 5. Running on compute nodes + +Everything up to this point ran on a Perlmutter **login node** — fine for installation, scaffolding, and `lc status`, but anything heavy belongs on a compute node. Login nodes are shared and should not be abused. + +The agent (Claude Code) will invoke `lc run` for you when it decides recipes need to materialize — you don't call it directly. What you control is *where Claude Code is running*: it inherits whatever shell environment you started it from. To get the agent's `lc run` calls onto a compute node, start `claude` from inside a Slurm allocation: + +```bash +salloc -A -q interactive -C gpu --nodes=1 -t 00:30:00 +# allocation drops you onto a compute node; from there: +cd /path/to/your-analysis +claude +``` + +Now anything the agent decides to run (`lc run`, scripts, etc.) executes on the allocated node, not the login node. + +The `interactive` QoS on the GPU partition is appropriate for development. For longer or larger sessions, other QoS queues will be supported in the future. + +> Unattended batch submission (`sbatch`-style runs of `lc`) is not yet supported — for now, every analysis runs interactively under an allocation that's open while you work. + + + + +### Storage gotcha: Snakemake state must live on `$SCRATCH` + +`$HOME` and `/global/cfs/` are mounted on compute nodes via DVS, which silently ignores `flock()`. Snakemake (and any sane locking system) uses `flock`, so its `.snakemake/` directory and Dask spill files must go on Lustre (`$SCRATCH`), which honors `flock`. Otherwise you get intermittent silent rule-rerun loops or hangs. + +`lc` redirects state automatically when it detects Perlmutter, so this usually just works. To pin explicitly per project: + +```bash +lc init your-analysis --scratch '$SCRATCH' # expands at run time, kept verbatim in config +``` + +Or after the fact, add to `/.lightcone/lightcone.yaml`: + +```yaml +scratch_root: $SCRATCH +``` + +`$SCRATCH` is purged on a 12-week rolling window, so for outputs you want to keep, copy or symlink to `/global/cfs/cdirs//`. + +### Further reading + +- [NERSC interactive jobs](https://docs.nersc.gov/jobs/interactive/) — `salloc` patterns and reservation queues +- [Perlmutter system overview](https://docs.nersc.gov/systems/perlmutter/) — node types and partitions + +--- + +## 6. Common troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| `lc: command not found` | Wrong env active | `which lc`; reinstall in the active env | +| `lc` runs but uses unexpected code | Two installs across two envs shadowing each other on `PATH` | `which lc` and uninstall the stale one | +| `ModuleNotFoundError: lightcone.cli.__main__` | Tried `python -m lightcone.cli` (the package isn't directly executable) | Use the `lc` console script | +| Snakemake locking errors / silent rule rerun loops | `.snakemake/` ended up on DVS-mounted storage | Set `scratch_root: $SCRATCH` in the project's `.lightcone/lightcone.yaml` | +| `ImportError: cannot import name 'resolve_analysis_tree' from 'astra.helpers'` | Stale `astra-tools` (pre-0.2.5) | `pip install -U astra-tools` | +| `PermissionError` reading another user's symlinked `results/` | Cross-user scratch path without group ACLs | Request access from the data owner, or copy the manifests you need into your own scratch | +| `pip install` hangs or times out on a compute node | Compute nodes have no public internet | Always install from a login node | + +--- + +## 7. Updating + +For source installs: + +```bash +cd ~/.lightcone/lightcone-cli +git pull +pip install -e . # only needed if pyproject.toml changed +``` + +Editable installs auto-follow source edits — switching branches or pulling new commits is reflected immediately in `lc`. Re-run `pip install -e .` only when `pyproject.toml` adds a new dependency or changes the `[project.scripts]` table. + +For PyPI installs: + +```bash +pip install -U lightcone-cli astra-tools +``` + +--- + +## 8. Uninstalling + +```bash +pip uninstall lightcone-cli # remove from the active env +rm -rf ~/.lightcone/lightcone-cli # remove source clone (only for source installs) +# Keep ~/.lightcone/config.yaml and ~/.lightcone/targets/ unless you want to start fresh. +``` diff --git a/zensical.toml b/zensical.toml index f38bc093..04164e13 100644 --- a/zensical.toml +++ b/zensical.toml @@ -16,6 +16,7 @@ nav = [ {"Tutorial: Your First Analysis" = "user/tutorial.md"}, {"Multiverse Analyses" = "user/multiverse.md"}, {"Running on a Cluster" = "user/cluster.md"}, + {"NERSC (Perlmutter)" = "user/nersc.md"}, {"Troubleshooting" = "user/troubleshooting.md"}, {"Glossary" = "user/glossary.md"}, ]}, From b66bb9042655a903044448a3e763a96164930018 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 14:09:31 -0700 Subject: [PATCH 02/10] docs(nersc): align with install.md and cluster.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The NERSC page was duplicating content already covered by the generic install.md (Python env, lc install, Claude Code, lc setup) and cluster.md (lc build, podman-hpc, salloc + lc run pattern). Refactor into a tight site-specific overlay that links out to those pages and documents only what's actually different on Perlmutter: 1. module load conda (NERSC distributes Python via modules) 2. 40 GB home-quota workaround: env on \$SCRATCH + symlink 3. DVS doesn't honor flock() — Snakemake state must live on \$SCRATCH 4. salloc invocation needs -A -q 5. podman-hpc just works (no NERSC-specific config beyond cluster.md) Result: 184 lines deleted, 28 added. The page now reads as a companion to install.md and cluster.md rather than a parallel universe of them. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 212 ++++++--------------------------------------- 1 file changed, 28 insertions(+), 184 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index a88279cd..91377f91 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -1,33 +1,17 @@ -# Installing and Using lightcone-cli on NERSC +# NERSC (Perlmutter) -A practical guide for running [`lightcone-cli`](https://github.com/LightconeResearch/lightcone-cli) on Perlmutter. The CLI works the same as anywhere else, but the filesystem layout, container runtime, and SLURM submission have NERSC-specific quirks that are worth knowing about up front. +Site-specific overlays for running lightcone-cli on NERSC Perlmutter. The generic [Install](install.md), [Getting Started](getting-started.md), and [Running on a Cluster](cluster.md) pages cover the main flow — this page documents only what's different on NERSC. ---- +If you're new, the recommended order is: -## 0. Install Claude Code +1. [Install](install.md) — `lc` and Claude Code (skip the Python step; see below) +2. [Getting Started](getting-started.md) — `lc init my-analysis` and the agent workflow +3. This page — Perlmutter-specific overlays +4. [Running on a Cluster](cluster.md) — `lc build`, allocations, `lc run` inside SLURM -`lightcone-cli` is the execution layer of the `lightcone` project — it harnesses a coding agent (e.g. Claude Code) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent. For now the project is built around **Claude Code**, which can be installed via: +## 1. Python environment -```bash -curl -fsSL claude.ai/install.sh | bash # installs to ~/.local/bin/claude -``` - -Add `~/.local/bin` to your `PATH` if it isn't already, then verify and authenticate: - -```bash -claude --version -claude # first run prompts for login (claude.ai or API key) -``` - -Other install routes (npm, native package managers) are documented in the [Claude Code installation docs](https://docs.claude.com/en/docs/claude-code/setup). - ---- - -## 1. Pick a Python environment - -Next, set up a Python environment for `lightcone-cli` (Python 3.11+ required). There are two practical options on Perlmutter: - -### Option A — conda env (recommended) +[Install](install.md#1-python-311) assumes Python is already on your `PATH`. On Perlmutter, Python comes via modules: ```bash module load conda # NERSC's miniconda @@ -35,195 +19,55 @@ conda create -n your-env-name python=3.11 -y conda activate your-env-name ``` -Conda envs land under `~/.conda/envs/` (your home, not CFS). They're persistent across sessions; just `conda activate your-env-name` next time. +Then continue with [Install §2](install.md#2-lightcone-cli) (`pip install lightcone-cli`). -> The home disk quota on NERSC is capped at 40 GB, so for larger envs it's worth moving the env to `$SCRATCH` and pointing the original location at it via a symlink: +> **Home quota is 40 GB on Perlmutter.** For larger envs, move the env to `$SCRATCH` and symlink the original location: > > ```bash -> # Move the env once it's created, then symlink the original location > conda deactivate > mv ~/.conda/envs/your-env-name $SCRATCH/conda-envs/ > ln -s $SCRATCH/conda-envs/your-env-name ~/.conda/envs/your-env-name > ``` > -> Caveats: `$SCRATCH` is purged on a 12-week rolling window — the env will silently disappear. If you go this route, set up a periodic `touch` job or use `/global/cfs/cdirs//conda-envs/` instead. -> -> See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy and [the `ln(1)` man page](https://man7.org/linux/man-pages/man1/ln.1.html) for the symlink syntax. - -### Option B — venv inside an existing conda env - -If you already have a project conda env (e.g. `lightcone`) and just want `lc` available alongside it without polluting the conda env: - -```bash -module load conda -conda activate lightcone -python -m venv ~/.lightcone/.venv # or wherever you prefer -source ~/.lightcone/.venv/bin/activate -``` +> `$SCRATCH` is purged on a 12-week rolling window — for a more permanent location, use `/global/cfs/cdirs//`. See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy. -**Pitfall:** if `lc` ends up installed in more than one env (e.g. both the conda env and a venv), the wrong one can shadow the other on `PATH`. After install, always run `which lc` to confirm you're getting the binary you expect. +## 2. Snakemake state must live on `$SCRATCH` ---- +This is the one Perlmutter gotcha that breaks lightcone-cli silently: -## 2. Install lightcone-cli +`$HOME` and `/global/cfs/` are mounted on compute nodes via DVS, which ignores `flock()`. Snakemake (and any sane locking system) uses `flock`, so its `.snakemake/` directory and Dask spill files must go on Lustre (`$SCRATCH`), which honors `flock`. Otherwise you get intermittent silent rule-rerun loops or hangs. -With the environment ready, install the package itself. - -### From PyPI (recommended) - -`lightcone-cli` and its companion package `astra-tools` are both published to PyPI, so a single command does it: +`lc` redirects state automatically when it detects Perlmutter, so this usually just works. To pin it explicitly per project, either pass `--scratch` at init time: ```bash -pip install lightcone-cli astra-tools +lc init your-analysis --scratch '$SCRATCH' # expanded at run time, kept verbatim in config ``` -### From source - -You're also welcome to install from source — useful if you want to follow the latest commits or contribute back to the repo. Note the GitHub repo for `astra-tools` is named `ASTRA`: - -```bash -cd ~/.lightcone # or wherever you keep clones - -git clone https://github.com/LightconeResearch/lightcone-cli.git -pip install -e ./lightcone-cli # editable install, follows local edits - -git clone https://github.com/LightconeResearch/ASTRA.git -pip install -e ./ASTRA # same for astra-tools -``` - -For development work, add the dev extras: - -```bash -pip install -e "./lightcone-cli[dev]" # adds pytest, ruff, mypy -``` - -### Verify - -```bash -which lc # should be inside your active env's bin/ -lc --version -lc --help -``` - ---- - -## 3. Initialize a new project - -Now you're ready to start working with it: - -```bash -lc init your-analysis # scaffolds a new folder with everything lightcone needs -cd your-analysis -claude # launch Claude Code inside the project -``` - ---- - -## 4. Start your research with lightcone! - -Once Claude Code is open, you can use the lightcone skillset to start a fresh analysis or migrate one from existing code — all driven by natural-language prompts to the agent. - -For example, to start from scratch: +…or after the fact, edit `/.lightcone/lightcone.yaml`: -```text -/lc-new Please sample a standard Gaussian distribution using numpy. -``` - -Or to migrate from existing code in another directory: - -```text -/lc-migrate I have code that samples a standard Gaussian distribution using numpy at @../gaussian_sampling. Please create an analysis based on it. +```yaml +scratch_root: $SCRATCH ``` -After initialization, just keep talking to the agent in plain English about what you want to build next. Note that your job will all run on **login node**, see the next section on how to run jobs on computing node. - ---- +## 3. Allocations -## 5. Running on compute nodes - -Everything up to this point ran on a Perlmutter **login node** — fine for installation, scaffolding, and `lc status`, but anything heavy belongs on a compute node. Login nodes are shared and should not be abused. - -The agent (Claude Code) will invoke `lc run` for you when it decides recipes need to materialize — you don't call it directly. What you control is *where Claude Code is running*: it inherits whatever shell environment you started it from. To get the agent's `lc run` calls onto a compute node, start `claude` from inside a Slurm allocation: +Follow [Running on a Cluster](cluster.md) for the general pattern. The Perlmutter-specific bit is the allocation invocation — Perlmutter requires `-A ` and a QoS: ```bash salloc -A -q interactive -C gpu --nodes=1 -t 00:30:00 # allocation drops you onto a compute node; from there: cd /path/to/your-analysis -claude +claude # or: lc run, if running directly without the agent ``` -Now anything the agent decides to run (`lc run`, scripts, etc.) executes on the allocated node, not the login node. - -The `interactive` QoS on the GPU partition is appropriate for development. For longer or larger sessions, other QoS queues will be supported in the future. - -> Unattended batch submission (`sbatch`-style runs of `lc`) is not yet supported — for now, every analysis runs interactively under an allocation that's open while you work. - +The `interactive` QoS is appropriate for development. For longer or larger sessions, see [NERSC's queue policy](https://docs.nersc.gov/jobs/policy/) for the full table. +## 4. Container runtime +Compute nodes ship `podman-hpc`. The `lc build` step from [cluster.md → Pre-flight](cluster.md#pre-flight-pick-the-right-container-runtime) just works — no NERSC-specific config needed beyond what that page describes. -### Storage gotcha: Snakemake state must live on `$SCRATCH` - -`$HOME` and `/global/cfs/` are mounted on compute nodes via DVS, which silently ignores `flock()`. Snakemake (and any sane locking system) uses `flock`, so its `.snakemake/` directory and Dask spill files must go on Lustre (`$SCRATCH`), which honors `flock`. Otherwise you get intermittent silent rule-rerun loops or hangs. - -`lc` redirects state automatically when it detects Perlmutter, so this usually just works. To pin explicitly per project: - -```bash -lc init your-analysis --scratch '$SCRATCH' # expands at run time, kept verbatim in config -``` - -Or after the fact, add to `/.lightcone/lightcone.yaml`: - -```yaml -scratch_root: $SCRATCH -``` - -`$SCRATCH` is purged on a 12-week rolling window, so for outputs you want to keep, copy or symlink to `/global/cfs/cdirs//`. - -### Further reading +## Further reading - [NERSC interactive jobs](https://docs.nersc.gov/jobs/interactive/) — `salloc` patterns and reservation queues - [Perlmutter system overview](https://docs.nersc.gov/systems/perlmutter/) — node types and partitions - ---- - -## 6. Common troubleshooting - -| Symptom | Cause | Fix | -|---|---|---| -| `lc: command not found` | Wrong env active | `which lc`; reinstall in the active env | -| `lc` runs but uses unexpected code | Two installs across two envs shadowing each other on `PATH` | `which lc` and uninstall the stale one | -| `ModuleNotFoundError: lightcone.cli.__main__` | Tried `python -m lightcone.cli` (the package isn't directly executable) | Use the `lc` console script | -| Snakemake locking errors / silent rule rerun loops | `.snakemake/` ended up on DVS-mounted storage | Set `scratch_root: $SCRATCH` in the project's `.lightcone/lightcone.yaml` | -| `ImportError: cannot import name 'resolve_analysis_tree' from 'astra.helpers'` | Stale `astra-tools` (pre-0.2.5) | `pip install -U astra-tools` | -| `PermissionError` reading another user's symlinked `results/` | Cross-user scratch path without group ACLs | Request access from the data owner, or copy the manifests you need into your own scratch | -| `pip install` hangs or times out on a compute node | Compute nodes have no public internet | Always install from a login node | - ---- - -## 7. Updating - -For source installs: - -```bash -cd ~/.lightcone/lightcone-cli -git pull -pip install -e . # only needed if pyproject.toml changed -``` - -Editable installs auto-follow source edits — switching branches or pulling new commits is reflected immediately in `lc`. Re-run `pip install -e .` only when `pyproject.toml` adds a new dependency or changes the `[project.scripts]` table. - -For PyPI installs: - -```bash -pip install -U lightcone-cli astra-tools -``` - ---- - -## 8. Uninstalling - -```bash -pip uninstall lightcone-cli # remove from the active env -rm -rf ~/.lightcone/lightcone-cli # remove source clone (only for source installs) -# Keep ~/.lightcone/config.yaml and ~/.lightcone/targets/ unless you want to start fresh. -``` +- [Best practices for running jobs](https://docs.nersc.gov/jobs/best-practices/) — when to pick which QoS, GPU vs CPU sizing From 3b237ef105aa076f919f7ec113b153ec40985477 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 14:12:51 -0700 Subject: [PATCH 03/10] Revert "docs(nersc): align with install.md and cluster.md" This reverts commit b66bb9042655a903044448a3e763a96164930018. --- docs/user/nersc.md | 212 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 184 insertions(+), 28 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index 91377f91..a88279cd 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -1,17 +1,33 @@ -# NERSC (Perlmutter) +# Installing and Using lightcone-cli on NERSC -Site-specific overlays for running lightcone-cli on NERSC Perlmutter. The generic [Install](install.md), [Getting Started](getting-started.md), and [Running on a Cluster](cluster.md) pages cover the main flow — this page documents only what's different on NERSC. +A practical guide for running [`lightcone-cli`](https://github.com/LightconeResearch/lightcone-cli) on Perlmutter. The CLI works the same as anywhere else, but the filesystem layout, container runtime, and SLURM submission have NERSC-specific quirks that are worth knowing about up front. -If you're new, the recommended order is: +--- -1. [Install](install.md) — `lc` and Claude Code (skip the Python step; see below) -2. [Getting Started](getting-started.md) — `lc init my-analysis` and the agent workflow -3. This page — Perlmutter-specific overlays -4. [Running on a Cluster](cluster.md) — `lc build`, allocations, `lc run` inside SLURM +## 0. Install Claude Code -## 1. Python environment +`lightcone-cli` is the execution layer of the `lightcone` project — it harnesses a coding agent (e.g. Claude Code) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent. For now the project is built around **Claude Code**, which can be installed via: -[Install](install.md#1-python-311) assumes Python is already on your `PATH`. On Perlmutter, Python comes via modules: +```bash +curl -fsSL claude.ai/install.sh | bash # installs to ~/.local/bin/claude +``` + +Add `~/.local/bin` to your `PATH` if it isn't already, then verify and authenticate: + +```bash +claude --version +claude # first run prompts for login (claude.ai or API key) +``` + +Other install routes (npm, native package managers) are documented in the [Claude Code installation docs](https://docs.claude.com/en/docs/claude-code/setup). + +--- + +## 1. Pick a Python environment + +Next, set up a Python environment for `lightcone-cli` (Python 3.11+ required). There are two practical options on Perlmutter: + +### Option A — conda env (recommended) ```bash module load conda # NERSC's miniconda @@ -19,55 +35,195 @@ conda create -n your-env-name python=3.11 -y conda activate your-env-name ``` -Then continue with [Install §2](install.md#2-lightcone-cli) (`pip install lightcone-cli`). +Conda envs land under `~/.conda/envs/` (your home, not CFS). They're persistent across sessions; just `conda activate your-env-name` next time. -> **Home quota is 40 GB on Perlmutter.** For larger envs, move the env to `$SCRATCH` and symlink the original location: +> The home disk quota on NERSC is capped at 40 GB, so for larger envs it's worth moving the env to `$SCRATCH` and pointing the original location at it via a symlink: > > ```bash +> # Move the env once it's created, then symlink the original location > conda deactivate > mv ~/.conda/envs/your-env-name $SCRATCH/conda-envs/ > ln -s $SCRATCH/conda-envs/your-env-name ~/.conda/envs/your-env-name > ``` > -> `$SCRATCH` is purged on a 12-week rolling window — for a more permanent location, use `/global/cfs/cdirs//`. See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy. +> Caveats: `$SCRATCH` is purged on a 12-week rolling window — the env will silently disappear. If you go this route, set up a periodic `touch` job or use `/global/cfs/cdirs//conda-envs/` instead. +> +> See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy and [the `ln(1)` man page](https://man7.org/linux/man-pages/man1/ln.1.html) for the symlink syntax. + +### Option B — venv inside an existing conda env + +If you already have a project conda env (e.g. `lightcone`) and just want `lc` available alongside it without polluting the conda env: + +```bash +module load conda +conda activate lightcone +python -m venv ~/.lightcone/.venv # or wherever you prefer +source ~/.lightcone/.venv/bin/activate +``` -## 2. Snakemake state must live on `$SCRATCH` +**Pitfall:** if `lc` ends up installed in more than one env (e.g. both the conda env and a venv), the wrong one can shadow the other on `PATH`. After install, always run `which lc` to confirm you're getting the binary you expect. -This is the one Perlmutter gotcha that breaks lightcone-cli silently: +--- -`$HOME` and `/global/cfs/` are mounted on compute nodes via DVS, which ignores `flock()`. Snakemake (and any sane locking system) uses `flock`, so its `.snakemake/` directory and Dask spill files must go on Lustre (`$SCRATCH`), which honors `flock`. Otherwise you get intermittent silent rule-rerun loops or hangs. +## 2. Install lightcone-cli -`lc` redirects state automatically when it detects Perlmutter, so this usually just works. To pin it explicitly per project, either pass `--scratch` at init time: +With the environment ready, install the package itself. + +### From PyPI (recommended) + +`lightcone-cli` and its companion package `astra-tools` are both published to PyPI, so a single command does it: ```bash -lc init your-analysis --scratch '$SCRATCH' # expanded at run time, kept verbatim in config +pip install lightcone-cli astra-tools ``` -…or after the fact, edit `/.lightcone/lightcone.yaml`: +### From source -```yaml -scratch_root: $SCRATCH +You're also welcome to install from source — useful if you want to follow the latest commits or contribute back to the repo. Note the GitHub repo for `astra-tools` is named `ASTRA`: + +```bash +cd ~/.lightcone # or wherever you keep clones + +git clone https://github.com/LightconeResearch/lightcone-cli.git +pip install -e ./lightcone-cli # editable install, follows local edits + +git clone https://github.com/LightconeResearch/ASTRA.git +pip install -e ./ASTRA # same for astra-tools ``` -## 3. Allocations +For development work, add the dev extras: + +```bash +pip install -e "./lightcone-cli[dev]" # adds pytest, ruff, mypy +``` -Follow [Running on a Cluster](cluster.md) for the general pattern. The Perlmutter-specific bit is the allocation invocation — Perlmutter requires `-A ` and a QoS: +### Verify + +```bash +which lc # should be inside your active env's bin/ +lc --version +lc --help +``` + +--- + +## 3. Initialize a new project + +Now you're ready to start working with it: + +```bash +lc init your-analysis # scaffolds a new folder with everything lightcone needs +cd your-analysis +claude # launch Claude Code inside the project +``` + +--- + +## 4. Start your research with lightcone! + +Once Claude Code is open, you can use the lightcone skillset to start a fresh analysis or migrate one from existing code — all driven by natural-language prompts to the agent. + +For example, to start from scratch: + +```text +/lc-new Please sample a standard Gaussian distribution using numpy. +``` + +Or to migrate from existing code in another directory: + +```text +/lc-migrate I have code that samples a standard Gaussian distribution using numpy at @../gaussian_sampling. Please create an analysis based on it. +``` + +After initialization, just keep talking to the agent in plain English about what you want to build next. Note that your job will all run on **login node**, see the next section on how to run jobs on computing node. + +--- + +## 5. Running on compute nodes + +Everything up to this point ran on a Perlmutter **login node** — fine for installation, scaffolding, and `lc status`, but anything heavy belongs on a compute node. Login nodes are shared and should not be abused. + +The agent (Claude Code) will invoke `lc run` for you when it decides recipes need to materialize — you don't call it directly. What you control is *where Claude Code is running*: it inherits whatever shell environment you started it from. To get the agent's `lc run` calls onto a compute node, start `claude` from inside a Slurm allocation: ```bash salloc -A -q interactive -C gpu --nodes=1 -t 00:30:00 # allocation drops you onto a compute node; from there: cd /path/to/your-analysis -claude # or: lc run, if running directly without the agent +claude ``` -The `interactive` QoS is appropriate for development. For longer or larger sessions, see [NERSC's queue policy](https://docs.nersc.gov/jobs/policy/) for the full table. +Now anything the agent decides to run (`lc run`, scripts, etc.) executes on the allocated node, not the login node. + +The `interactive` QoS on the GPU partition is appropriate for development. For longer or larger sessions, other QoS queues will be supported in the future. + +> Unattended batch submission (`sbatch`-style runs of `lc`) is not yet supported — for now, every analysis runs interactively under an allocation that's open while you work. + -## 4. Container runtime -Compute nodes ship `podman-hpc`. The `lc build` step from [cluster.md → Pre-flight](cluster.md#pre-flight-pick-the-right-container-runtime) just works — no NERSC-specific config needed beyond what that page describes. -## Further reading +### Storage gotcha: Snakemake state must live on `$SCRATCH` + +`$HOME` and `/global/cfs/` are mounted on compute nodes via DVS, which silently ignores `flock()`. Snakemake (and any sane locking system) uses `flock`, so its `.snakemake/` directory and Dask spill files must go on Lustre (`$SCRATCH`), which honors `flock`. Otherwise you get intermittent silent rule-rerun loops or hangs. + +`lc` redirects state automatically when it detects Perlmutter, so this usually just works. To pin explicitly per project: + +```bash +lc init your-analysis --scratch '$SCRATCH' # expands at run time, kept verbatim in config +``` + +Or after the fact, add to `/.lightcone/lightcone.yaml`: + +```yaml +scratch_root: $SCRATCH +``` + +`$SCRATCH` is purged on a 12-week rolling window, so for outputs you want to keep, copy or symlink to `/global/cfs/cdirs//`. + +### Further reading - [NERSC interactive jobs](https://docs.nersc.gov/jobs/interactive/) — `salloc` patterns and reservation queues - [Perlmutter system overview](https://docs.nersc.gov/systems/perlmutter/) — node types and partitions -- [Best practices for running jobs](https://docs.nersc.gov/jobs/best-practices/) — when to pick which QoS, GPU vs CPU sizing + +--- + +## 6. Common troubleshooting + +| Symptom | Cause | Fix | +|---|---|---| +| `lc: command not found` | Wrong env active | `which lc`; reinstall in the active env | +| `lc` runs but uses unexpected code | Two installs across two envs shadowing each other on `PATH` | `which lc` and uninstall the stale one | +| `ModuleNotFoundError: lightcone.cli.__main__` | Tried `python -m lightcone.cli` (the package isn't directly executable) | Use the `lc` console script | +| Snakemake locking errors / silent rule rerun loops | `.snakemake/` ended up on DVS-mounted storage | Set `scratch_root: $SCRATCH` in the project's `.lightcone/lightcone.yaml` | +| `ImportError: cannot import name 'resolve_analysis_tree' from 'astra.helpers'` | Stale `astra-tools` (pre-0.2.5) | `pip install -U astra-tools` | +| `PermissionError` reading another user's symlinked `results/` | Cross-user scratch path without group ACLs | Request access from the data owner, or copy the manifests you need into your own scratch | +| `pip install` hangs or times out on a compute node | Compute nodes have no public internet | Always install from a login node | + +--- + +## 7. Updating + +For source installs: + +```bash +cd ~/.lightcone/lightcone-cli +git pull +pip install -e . # only needed if pyproject.toml changed +``` + +Editable installs auto-follow source edits — switching branches or pulling new commits is reflected immediately in `lc`. Re-run `pip install -e .` only when `pyproject.toml` adds a new dependency or changes the `[project.scripts]` table. + +For PyPI installs: + +```bash +pip install -U lightcone-cli astra-tools +``` + +--- + +## 8. Uninstalling + +```bash +pip uninstall lightcone-cli # remove from the active env +rm -rf ~/.lightcone/lightcone-cli # remove source clone (only for source installs) +# Keep ~/.lightcone/config.yaml and ~/.lightcone/targets/ unless you want to start fresh. +``` From b14b7b44cc3741ec5abe6656efe344873dfae8e1 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 14:14:51 -0700 Subject: [PATCH 04/10] docs(nersc): align command details with install.md and cluster.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Keep the full step-by-step recipe but bring its commands into line with how the generic pages describe them: - §0: add the missing https:// prefix on the Claude Code curl one-liner (matches install.md §4) - §2: drop the redundant astra-tools from the pip install command — it's a transitive dep, install.md §2 just says "pip install lightcone-cli". Mention uv as the modern alternative. - §2: add the lc setup step (matches install.md §3) so the global config is created. - §2: clarify that source install is a contributor route (most users should stick with PyPI); the ASTRA clone is only needed if you also want to hack on astra-tools itself. - §5: add the missing pre-flight (lc build) step before running on compute nodes, with a link to cluster.md's mechanics. - §5: reconcile the batch-not-supported caveat with cluster.md (which has a working sbatch template). The truth is: agent-driven runs are interactive-only (Claude Code can't run from a non- interactive sbatch), but raw lc run from a sbatch script works fine — that's what cluster.md describes. Document both paths. Net: 62 insertions, 9 deletions. Complete recipe preserved; only divergent details corrected. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 71 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 62 insertions(+), 9 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index a88279cd..7ebe8b05 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -9,7 +9,7 @@ A practical guide for running [`lightcone-cli`](https://github.com/LightconeRese `lightcone-cli` is the execution layer of the `lightcone` project — it harnesses a coding agent (e.g. Claude Code) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent. For now the project is built around **Claude Code**, which can be installed via: ```bash -curl -fsSL claude.ai/install.sh | bash # installs to ~/.local/bin/claude +curl -fsSL https://claude.ai/install.sh | bash # installs to ~/.local/bin/claude ``` Add `~/.local/bin` to your `PATH` if it isn't already, then verify and authenticate: @@ -71,24 +71,34 @@ With the environment ready, install the package itself. ### From PyPI (recommended) -`lightcone-cli` and its companion package `astra-tools` are both published to PyPI, so a single command does it: +```bash +pip install lightcone-cli +``` + +If you use [`uv`](https://docs.astral.sh/uv/) (faster, no daemon): ```bash -pip install lightcone-cli astra-tools +uv pip install lightcone-cli ``` -### From source +`astra-tools` is a transitive dependency, so a single `pip install lightcone-cli` pulls it in automatically. + +### From source (contributor route) -You're also welcome to install from source — useful if you want to follow the latest commits or contribute back to the repo. Note the GitHub repo for `astra-tools` is named `ASTRA`: +If you want to track the latest commits or contribute back, clone the repo and install editably. This is **optional** — most users should stick with PyPI. ```bash cd ~/.lightcone # or wherever you keep clones git clone https://github.com/LightconeResearch/lightcone-cli.git pip install -e ./lightcone-cli # editable install, follows local edits +``` + +If you also want to hack on `astra-tools` itself, clone the `ASTRA` repo (the package is published to PyPI as `astra-tools` but the GitHub repo is named `ASTRA`): +```bash git clone https://github.com/LightconeResearch/ASTRA.git -pip install -e ./ASTRA # same for astra-tools +pip install -e ./ASTRA ``` For development work, add the dev extras: @@ -97,6 +107,14 @@ For development work, add the dev extras: pip install -e "./lightcone-cli[dev]" # adds pytest, ruff, mypy ``` +### One-time setup + +```bash +lc setup +``` + +This creates `~/.lightcone/config.yaml` with a default container runtime of `auto`. You can pin the runtime later (see [§5](#5-running-on-compute-nodes) — Perlmutter compute nodes need `podman-hpc`). + ### Verify ```bash @@ -143,7 +161,28 @@ After initialization, just keep talking to the agent in plain English about what Everything up to this point ran on a Perlmutter **login node** — fine for installation, scaffolding, and `lc status`, but anything heavy belongs on a compute node. Login nodes are shared and should not be abused. -The agent (Claude Code) will invoke `lc run` for you when it decides recipes need to materialize — you don't call it directly. What you control is *where Claude Code is running*: it inherits whatever shell environment you started it from. To get the agent's `lc run` calls onto a compute node, start `claude` from inside a Slurm allocation: +### Pre-flight: pin the container runtime and build images + +On Perlmutter, compute nodes ship `podman-hpc`. Pin it once in your global config: + +```yaml +# ~/.lightcone/config.yaml +container: + runtime: podman-hpc +``` + +Then build and migrate the images for your project on a login node (`lc build` runs `podman-hpc build` then `podman-hpc migrate`, which copies the image into the per-node container cache): + +```bash +cd /path/to/your-analysis +lc build +``` + +See [Running on a Cluster → Pre-flight](cluster.md#pre-flight-pick-the-right-container-runtime) for the underlying mechanics. + +### Interactive runs (agent-driven) + +The agent (Claude Code) will invoke `lc run` for you when it decides recipes need to materialize — you don't call it directly. What you control is *where Claude Code is running*: it inherits whatever shell environment you started it from. To get the agent's `lc run` calls onto a compute node, start `claude` from inside a SLURM allocation: ```bash salloc -A -q interactive -C gpu --nodes=1 -t 00:30:00 @@ -154,12 +193,26 @@ claude Now anything the agent decides to run (`lc run`, scripts, etc.) executes on the allocated node, not the login node. -The `interactive` QoS on the GPU partition is appropriate for development. For longer or larger sessions, other QoS queues will be supported in the future. +The `interactive` QoS on the GPU partition is appropriate for development. For longer or larger sessions, see [NERSC's queue policy reference](https://docs.nersc.gov/jobs/policy/). -> Unattended batch submission (`sbatch`-style runs of `lc`) is not yet supported — for now, every analysis runs interactively under an allocation that's open while you work. +### Unattended batch runs (no agent in the loop) +If you want to submit `lc run` as an unattended batch job — i.e., without Claude Code in the loop — that path also works. See [Running on a Cluster → A typical SLURM workflow](cluster.md#a-typical-slurm-workflow) for the generic `sbatch` template; on Perlmutter, the only addition is the `-A`/`-q` directives: +```bash +#!/bin/bash +#SBATCH -A +#SBATCH -q regular +#SBATCH -C gpu +#SBATCH -N 4 +#SBATCH -t 04:00:00 + +cd $SCRATCH/your-analysis +source ~/.conda/envs/your-env-name/bin/activate # or your venv +lc run -j 16 +``` +> Note: this path runs `lc run` directly, not through the agent — useful for production sweeps where you've already nailed down the recipes interactively. The agent-driven flow above is the right tool for development. ### Storage gotcha: Snakemake state must live on `$SCRATCH` From 82bc2a6c17f3a108b733e2542227ea1ed956784f Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 14:55:21 -0700 Subject: [PATCH 05/10] docs(nersc): use \`module load python\` as the default MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NERSC's canonical entry point is `module load python` (the Python distribution that ships with conda); `module load conda` loads Miniconda directly. Both work — switch the recommended command to `module load python` to match the rest of NERSC's user-facing docs, and keep a one-line note that `module load conda` is equivalent. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index 7ebe8b05..c659883e 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -30,11 +30,13 @@ Next, set up a Python environment for `lightcone-cli` (Python 3.11+ required). T ### Option A — conda env (recommended) ```bash -module load conda # NERSC's miniconda +module load python # NERSC's Python distribution; ships conda conda create -n your-env-name python=3.11 -y conda activate your-env-name ``` +(`module load conda` works too — it loads Miniconda directly. Either gives you a working `conda` on `PATH`; `module load python` is the canonical NERSC default.) + Conda envs land under `~/.conda/envs/` (your home, not CFS). They're persistent across sessions; just `conda activate your-env-name` next time. > The home disk quota on NERSC is capped at 40 GB, so for larger envs it's worth moving the env to `$SCRATCH` and pointing the original location at it via a symlink: @@ -55,7 +57,7 @@ Conda envs land under `~/.conda/envs/` (your home, not CFS). They're persistent If you already have a project conda env (e.g. `lightcone`) and just want `lc` available alongside it without polluting the conda env: ```bash -module load conda +module load python conda activate lightcone python -m venv ~/.lightcone/.venv # or wherever you prefer source ~/.lightcone/.venv/bin/activate From 9df8917f5519a9f70c72ad752a21779c0d916bb5 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 15:00:41 -0700 Subject: [PATCH 06/10] docs(nersc): \`module load python\` already gives a usable env MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per docs.nersc.gov/development/languages/python/nersc-python/: \`module load python\` provides a ready-to-use Python distribution with conda, pip, and many scientific packages pre-installed. Users who just want to install lightcone-cli on top don't need to \`conda create\` first — \`pip install --user lightcone-cli\` works straight against the module. Restructure §1 to lead with the simpler path: - Default: \`module load python\` + \`pip install --user lightcone-cli\` - Fallback (when isolation or a different Python version is needed): conda env on top, which is also NERSC's recommended path when pip-installing custom packages Update §2 to match — two install paths corresponding to the two §1 choices. The dropped "Option B venv inside conda" was redundant once the default no longer requires a conda env. The 40 GB home-quota note remains, plus a pointer to NERSC's recommended \`/global/common/software//\` for larger envs. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 53 ++++++++++++++++++++-------------------------- 1 file changed, 23 insertions(+), 30 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index c659883e..98764f8e 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -25,64 +25,57 @@ Other install routes (npm, native package managers) are documented in the [Claud ## 1. Pick a Python environment -Next, set up a Python environment for `lightcone-cli` (Python 3.11+ required). There are two practical options on Perlmutter: - -### Option A — conda env (recommended) +NERSC's `python` module gives you a ready-to-use Python distribution with `conda`, `pip`, and many common scientific packages already installed — no env creation needed for the basics: ```bash -module load python # NERSC's Python distribution; ships conda -conda create -n your-env-name python=3.11 -y -conda activate your-env-name +module load python # NERSC Python (3.11+); brings conda and pip onto PATH ``` -(`module load conda` works too — it loads Miniconda directly. Either gives you a working `conda` on `PATH`; `module load python` is the canonical NERSC default.) +That's enough for installing `lightcone-cli` straight into your user site-packages with `pip install --user`, which is the simplest path. See [§2](#2-install-lightcone-cli). -Conda envs land under `~/.conda/envs/` (your home, not CFS). They're persistent across sessions; just `conda activate your-env-name` next time. +> **When to create your own conda env.** The NERSC python module is shared and read-only — you can install user-level packages on top of it (`pip install --user`), but you can't pin a different Python version or guarantee dependency isolation. If you want either, create a conda env on top: +> +> ```bash +> module load python +> conda create -n your-env-name python=3.11 -y +> conda activate your-env-name +> ``` +> +> This is also NERSC's [recommended path for `pip install`](https://docs.nersc.gov/development/languages/python/nersc-python/) when you need custom packages: pip-into-conda-env rather than pip-into-base. -> The home disk quota on NERSC is capped at 40 GB, so for larger envs it's worth moving the env to `$SCRATCH` and pointing the original location at it via a symlink: +> **Storage note.** Conda envs land under `~/.conda/envs/`. The Perlmutter home quota is 40 GB; for larger envs NERSC recommends installing to `/global/common/software//` instead. If you really want them on `$SCRATCH` (12-week purge!), move and symlink: > > ```bash -> # Move the env once it's created, then symlink the original location > conda deactivate > mv ~/.conda/envs/your-env-name $SCRATCH/conda-envs/ > ln -s $SCRATCH/conda-envs/your-env-name ~/.conda/envs/your-env-name > ``` > -> Caveats: `$SCRATCH` is purged on a 12-week rolling window — the env will silently disappear. If you go this route, set up a periodic `touch` job or use `/global/cfs/cdirs//conda-envs/` instead. -> > See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy and [the `ln(1)` man page](https://man7.org/linux/man-pages/man1/ln.1.html) for the symlink syntax. -### Option B — venv inside an existing conda env - -If you already have a project conda env (e.g. `lightcone`) and just want `lc` available alongside it without polluting the conda env: - -```bash -module load python -conda activate lightcone -python -m venv ~/.lightcone/.venv # or wherever you prefer -source ~/.lightcone/.venv/bin/activate -``` - -**Pitfall:** if `lc` ends up installed in more than one env (e.g. both the conda env and a venv), the wrong one can shadow the other on `PATH`. After install, always run `which lc` to confirm you're getting the binary you expect. - --- ## 2. Install lightcone-cli -With the environment ready, install the package itself. +With the environment ready, install the package itself. Pick the path that matches your §1 setup: -### From PyPI (recommended) +### Into NERSC's python module (no conda env) ```bash -pip install lightcone-cli +pip install --user lightcone-cli ``` -If you use [`uv`](https://docs.astral.sh/uv/) (faster, no daemon): +`--user` puts it under `~/.local/`. Make sure `~/.local/bin` is on your `PATH` (Perlmutter usually has this by default — check with `echo $PATH | tr : '\n' | grep .local/bin`). + +### Into a conda env ```bash -uv pip install lightcone-cli +conda activate your-env-name +pip install lightcone-cli ``` +If you use [`uv`](https://docs.astral.sh/uv/) (faster, no daemon), `uv pip install lightcone-cli` works in either flow. + `astra-tools` is a transitive dependency, so a single `pip install lightcone-cli` pulls it in automatically. ### From source (contributor route) From 1dbd6f00a90541f94291ba62091c54bd48ba431c Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 15:21:34 -0700 Subject: [PATCH 07/10] docs(nersc): align with the improved install.md from alex/improve-install MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mirror the conventions from the new install.md: - Rename §0 "Install Claude Code" → "Agentic CLI" (matches install.md §4's broader framing — Claude Code is the current agent, not the only one we'll ever support). - Rename §1 "Pick a Python environment" → just "Python", matching install.md §1. - Add a uv "Recommendation" tip admonition in §1, mirroring the install.md style. uv install one-liner included. - §2 install commands switched to tabbed \`=== "uv"\` / \`=== "pip"\` syntax matching install.md, and use the modern command spellings: - \`uv tool install lightcone-cli\` (not \`uv pip install\`) for the no-env path — uv tool creates an isolated venv with a ~/.local/bin wrapper, which is exactly right for a shared NERSC python module - \`python -m pip install --user lightcone-cli\` (not bare \`pip\`) for the pip path, matching the new install.md - \`uv pip install\` / \`python -m pip install\` inside a conda env The site-specific structure (NERSC module + conda overlay + DVS/scratch gotcha + interactive/batch SLURM walkthrough) is unchanged — only the command details and section names migrated to match install.md's conventions. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 49 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 13 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index 98764f8e..8c6f9588 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -4,9 +4,9 @@ A practical guide for running [`lightcone-cli`](https://github.com/LightconeRese --- -## 0. Install Claude Code +## 0. Agentic CLI -`lightcone-cli` is the execution layer of the `lightcone` project — it harnesses a coding agent (e.g. Claude Code) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent. For now the project is built around **Claude Code**, which can be installed via: +`lightcone-cli` is the execution layer of the `lightcone` project — it harnesses an agent-based CLI (currently Claude Code) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent. ```bash curl -fsSL https://claude.ai/install.sh | bash # installs to ~/.local/bin/claude @@ -23,7 +23,7 @@ Other install routes (npm, native package managers) are documented in the [Claud --- -## 1. Pick a Python environment +## 1. Python NERSC's `python` module gives you a ready-to-use Python distribution with `conda`, `pip`, and many common scientific packages already installed — no env creation needed for the basics: @@ -31,9 +31,17 @@ NERSC's `python` module gives you a ready-to-use Python distribution with `conda module load python # NERSC Python (3.11+); brings conda and pip onto PATH ``` -That's enough for installing `lightcone-cli` straight into your user site-packages with `pip install --user`, which is the simplest path. See [§2](#2-install-lightcone-cli). +That's enough for installing `lightcone-cli` on top. See [§2](#2-install-lightcone-cli). -> **When to create your own conda env.** The NERSC python module is shared and read-only — you can install user-level packages on top of it (`pip install --user`), but you can't pin a different Python version or guarantee dependency isolation. If you want either, create a conda env on top: +!!! tip "Recommendation" + Like the generic [Install](install.md#1-python) page, we recommend [`uv`](https://docs.astral.sh/uv/) for managing Python installations and virtual environments — it's faster than pip and gives you a Python independent of the loaded module. + + ```bash + curl -LsSf https://astral.sh/uv/install.sh | sh + uv python install 3.12 + ``` + +> **When to create your own conda env.** The NERSC python module is shared and read-only — you can install user-level packages on top of it, but you can't pin a different Python version or guarantee dependency isolation. If you want either, create a conda env on top: > > ```bash > module load python @@ -57,26 +65,41 @@ That's enough for installing `lightcone-cli` straight into your user site-packag ## 2. Install lightcone-cli -With the environment ready, install the package itself. Pick the path that matches your §1 setup: +With the environment ready, install the package itself. ### Into NERSC's python module (no conda env) -```bash -pip install --user lightcone-cli -``` +`uv tool install` is the recommended path — it isolates `lc` in its own venv under `~/.local/share/uv/tools/` with a wrapper at `~/.local/bin/lc`, so the shared NERSC python module stays untouched. -`--user` puts it under `~/.local/`. Make sure `~/.local/bin` is on your `PATH` (Perlmutter usually has this by default — check with `echo $PATH | tr : '\n' | grep .local/bin`). +=== "uv" + ```bash + uv tool install lightcone-cli + ``` + +=== "pip" + ```bash + python -m pip install --user lightcone-cli + ``` + +Make sure `~/.local/bin` is on your `PATH` (Perlmutter usually has this by default — check with `echo $PATH | tr : '\n' | grep .local/bin`). ### Into a conda env ```bash conda activate your-env-name -pip install lightcone-cli ``` -If you use [`uv`](https://docs.astral.sh/uv/) (faster, no daemon), `uv pip install lightcone-cli` works in either flow. +=== "uv" + ```bash + uv pip install lightcone-cli + ``` + +=== "pip" + ```bash + python -m pip install lightcone-cli + ``` -`astra-tools` is a transitive dependency, so a single `pip install lightcone-cli` pulls it in automatically. +`astra-tools` is a transitive dependency, so a single `lightcone-cli` install pulls it in automatically. ### From source (contributor route) From f2837f95f85758b1afe3ecd7604b96b37da888e5 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 15:26:33 -0700 Subject: [PATCH 08/10] docs(nersc): pip-first install, uv as optional alternative MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NERSC's official documentation (docs.nersc.gov) doesn't mention `uv` at all — their Python guidance covers conda, mamba, pip, and Spack. Recommending `uv` as the default path for NERSC users implies it's part of the supported stack, which it isn't. - Drop the §1 "Recommendation" tip pushing `uv` for NERSC. - §2 leads with `python -m pip install --user lightcone-cli` against the stock `module load python`, which is what NERSC actually documents and supports. - `uv tool install` is mentioned afterward as a cleaner alternative for users who already have uv installed (with the install one-liner inline so it's not a dead reference), but it's not framed as "recommended" for NERSC. - Drop the `=== "uv"` / `=== "pip"` tabbed syntax — install.md doesn't actually use that style; it just runs `pip install` and then offers uv as a follow-on. Match that simpler shape. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 41 +++++++++++++++-------------------------- 1 file changed, 15 insertions(+), 26 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index 8c6f9588..560e3d9f 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -33,14 +33,6 @@ module load python # NERSC Python (3.11+); brings conda and pip onto PATH That's enough for installing `lightcone-cli` on top. See [§2](#2-install-lightcone-cli). -!!! tip "Recommendation" - Like the generic [Install](install.md#1-python) page, we recommend [`uv`](https://docs.astral.sh/uv/) for managing Python installations and virtual environments — it's faster than pip and gives you a Python independent of the loaded module. - - ```bash - curl -LsSf https://astral.sh/uv/install.sh | sh - uv python install 3.12 - ``` - > **When to create your own conda env.** The NERSC python module is shared and read-only — you can install user-level packages on top of it, but you can't pin a different Python version or guarantee dependency isolation. If you want either, create a conda env on top: > > ```bash @@ -69,35 +61,32 @@ With the environment ready, install the package itself. ### Into NERSC's python module (no conda env) -`uv tool install` is the recommended path — it isolates `lc` in its own venv under `~/.local/share/uv/tools/` with a wrapper at `~/.local/bin/lc`, so the shared NERSC python module stays untouched. +The shared NERSC `python` module is read-only, so install with `--user` to land into your home dir's site-packages: -=== "uv" - ```bash - uv tool install lightcone-cli - ``` +```bash +python -m pip install --user lightcone-cli +``` -=== "pip" - ```bash - python -m pip install --user lightcone-cli - ``` +This drops the `lc` console script into `~/.local/bin/`. Make sure that's on your `PATH` (Perlmutter usually has this by default — check with `echo $PATH | tr : '\n' | grep .local/bin`). -Make sure `~/.local/bin` is on your `PATH` (Perlmutter usually has this by default — check with `echo $PATH | tr : '\n' | grep .local/bin`). +If you already use [`uv`](https://docs.astral.sh/uv/) (NERSC doesn't ship it, but you can install it yourself with `curl -LsSf https://astral.sh/uv/install.sh | sh`), `uv tool install` is a cleaner alternative — it isolates `lc` in its own venv and drops the same `~/.local/bin/lc` wrapper: + +```bash +uv tool install lightcone-cli +``` ### Into a conda env ```bash conda activate your-env-name +python -m pip install lightcone-cli ``` -=== "uv" - ```bash - uv pip install lightcone-cli - ``` +If you use `uv`: -=== "pip" - ```bash - python -m pip install lightcone-cli - ``` +```bash +uv pip install lightcone-cli +``` `astra-tools` is a transitive dependency, so a single `lightcone-cli` install pulls it in automatically. From 627982acfad29c8ff4af4ac5c4af5d560d2c0bf0 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 15:37:35 -0700 Subject: [PATCH 09/10] =?UTF-8?q?docs(nersc):=20beautify=20=E2=80=94=20adm?= =?UTF-8?q?onitions,=20tabs,=20tighter=20section=20flow?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Visual + structural polish on the NERSC guide. No content was added or removed; existing prose was rephrased for cadence and the formatting was migrated to the same admonition / tab vocabulary that index.md and install.md already use. - Title shortened to "lightcone-cli on NERSC (Perlmutter)" (matches the nav entry shape). - Intro: added a quick-orientation `!!! tip` pointing at install.md / cluster.md so readers know this page is the site-specific overlay. - §1: converted the two `>` blockquote callouts ("When to create your own conda env" / "Storage note") into proper `!!! note` and `!!! warning` admonitions. They were already callouts in spirit. - §2: renamed subsections to "Path A / B / C" — same ordering, but the labeling makes the branching obvious. Inlined the conda-env uv alternative as a comment so it doesn't need its own subsection. - §4: switched the two example-prompt code blocks to `=== "Start fresh"` / `=== "Migrate existing code"` tabs (matches the "Quick start" tabs in user/index.md), and converted the trailing "you're on a login node" reminder into a `!!! warning` — it's the most common foot-gun in this section, easy to miss in plain prose. - §5: same treatment — inline reminders ("Picking a QoS", "When to use this path") promoted to `!!! note`; the DVS-flock gotcha is now a `!!! danger`, since silent corruption is the right severity level; the 12-week purge is a `!!! warning`. - §7: switched updating section to `=== "PyPI install"` / `=== "Source install"` tabs. Most users only care about one. - §8: kept config-survives-uninstall as a `!!! note` instead of a trailing comment line. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 211 ++++++++++++++++++++++++--------------------- 1 file changed, 111 insertions(+), 100 deletions(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index 560e3d9f..3f8a9955 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -1,22 +1,25 @@ -# Installing and Using lightcone-cli on NERSC +# lightcone-cli on NERSC (Perlmutter) -A practical guide for running [`lightcone-cli`](https://github.com/LightconeResearch/lightcone-cli) on Perlmutter. The CLI works the same as anywhere else, but the filesystem layout, container runtime, and SLURM submission have NERSC-specific quirks that are worth knowing about up front. +A practical guide for running [`lightcone-cli`](https://github.com/LightconeResearch/lightcone-cli) on **Perlmutter**. The CLI itself behaves the same as on a laptop — the wrinkles are in the filesystem layout (DVS-mounted home, Lustre scratch), the container runtime (`podman-hpc`), and SLURM submission. This page covers all three. + +!!! tip "Already familiar with the basics?" + The generic [Install](install.md) and [Running on a Cluster](cluster.md) pages cover the cross-platform story. This page is the NERSC-specific overlay — read it first if Perlmutter is your home base. --- ## 0. Agentic CLI -`lightcone-cli` is the execution layer of the `lightcone` project — it harnesses an agent-based CLI (currently Claude Code) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent. +`lightcone-cli` is the execution layer of the `lightcone` project — it harnesses an agent-based CLI (currently [Claude Code](https://docs.claude.com/en/docs/claude-code/setup)) to follow the `astra` standard while building and running an analysis. So the very first step, even before touching `lightcone-cli` itself, is to install the agent: ```bash curl -fsSL https://claude.ai/install.sh | bash # installs to ~/.local/bin/claude ``` -Add `~/.local/bin` to your `PATH` if it isn't already, then verify and authenticate: +Make sure `~/.local/bin` is on your `PATH`, then verify and authenticate: ```bash claude --version -claude # first run prompts for login (claude.ai or API key) +claude # first run prompts for login (claude.ai or API key) ``` Other install routes (npm, native package managers) are documented in the [Claude Code installation docs](https://docs.claude.com/en/docs/claude-code/setup). @@ -31,101 +34,103 @@ NERSC's `python` module gives you a ready-to-use Python distribution with `conda module load python # NERSC Python (3.11+); brings conda and pip onto PATH ``` -That's enough for installing `lightcone-cli` on top. See [§2](#2-install-lightcone-cli). - -> **When to create your own conda env.** The NERSC python module is shared and read-only — you can install user-level packages on top of it, but you can't pin a different Python version or guarantee dependency isolation. If you want either, create a conda env on top: -> -> ```bash -> module load python -> conda create -n your-env-name python=3.11 -y -> conda activate your-env-name -> ``` -> -> This is also NERSC's [recommended path for `pip install`](https://docs.nersc.gov/development/languages/python/nersc-python/) when you need custom packages: pip-into-conda-env rather than pip-into-base. - -> **Storage note.** Conda envs land under `~/.conda/envs/`. The Perlmutter home quota is 40 GB; for larger envs NERSC recommends installing to `/global/common/software//` instead. If you really want them on `$SCRATCH` (12-week purge!), move and symlink: -> -> ```bash -> conda deactivate -> mv ~/.conda/envs/your-env-name $SCRATCH/conda-envs/ -> ln -s $SCRATCH/conda-envs/your-env-name ~/.conda/envs/your-env-name -> ``` -> -> See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy and [the `ln(1)` man page](https://man7.org/linux/man-pages/man1/ln.1.html) for the symlink syntax. +That's enough for installing `lightcone-cli` on top. Skip ahead to [§2](#2-install-lightcone-cli). + +!!! note "When you'd want your own conda env" + The NERSC python module is shared and read-only. You *can* layer user-level packages on top, but you can't pin a different Python version or guarantee dependency isolation. If you need either, build a conda env on top of the module: + + ```bash + module load python + conda create -n your-env-name python=3.11 -y + conda activate your-env-name + ``` + + This is also NERSC's [recommended path for `pip install`](https://docs.nersc.gov/development/languages/python/nersc-python/) when you need custom packages: pip-into-conda-env rather than pip-into-base. + +!!! warning "Storage note: 40 GB home quota" + Conda envs land under `~/.conda/envs/` by default. The Perlmutter home quota is **40 GB**, which gets eaten quickly. NERSC recommends `/global/common/software//` for larger envs. If you really want them on `$SCRATCH` (note: 12-week purge!), move and symlink: + + ```bash + conda deactivate + mv ~/.conda/envs/your-env-name $SCRATCH/conda-envs/ + ln -s $SCRATCH/conda-envs/your-env-name ~/.conda/envs/your-env-name + ``` + + See [NERSC's Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) for the full storage strategy. --- ## 2. Install lightcone-cli -With the environment ready, install the package itself. +With Python in place, install the package itself. Pick the path that matches your environment: -### Into NERSC's python module (no conda env) +### Path A — On top of NERSC's `python` module (no conda env) -The shared NERSC `python` module is read-only, so install with `--user` to land into your home dir's site-packages: +The module is read-only, so install with `--user` to land into your home directory's site-packages: ```bash python -m pip install --user lightcone-cli ``` -This drops the `lc` console script into `~/.local/bin/`. Make sure that's on your `PATH` (Perlmutter usually has this by default — check with `echo $PATH | tr : '\n' | grep .local/bin`). - -If you already use [`uv`](https://docs.astral.sh/uv/) (NERSC doesn't ship it, but you can install it yourself with `curl -LsSf https://astral.sh/uv/install.sh | sh`), `uv tool install` is a cleaner alternative — it isolates `lc` in its own venv and drops the same `~/.local/bin/lc` wrapper: +This drops the `lc` console script into `~/.local/bin/`. Make sure that's on your `PATH` — Perlmutter usually has it by default; check with: ```bash -uv tool install lightcone-cli +echo $PATH | tr : '\n' | grep .local/bin ``` -### Into a conda env +!!! tip "Already use `uv`?" + [`uv`](https://docs.astral.sh/uv/) isn't shipped by NERSC, but if you've installed it yourself (`curl -LsSf https://astral.sh/uv/install.sh | sh`), `uv tool install` is a cleaner alternative — it isolates `lc` in its own venv and exposes the same `~/.local/bin/lc` wrapper: -```bash -conda activate your-env-name -python -m pip install lightcone-cli -``` + ```bash + uv tool install lightcone-cli + ``` -If you use `uv`: +### Path B — Inside a conda env ```bash -uv pip install lightcone-cli +conda activate your-env-name +python -m pip install lightcone-cli # or: uv pip install lightcone-cli ``` `astra-tools` is a transitive dependency, so a single `lightcone-cli` install pulls it in automatically. -### From source (contributor route) +### Path C — From source (contributors only) -If you want to track the latest commits or contribute back, clone the repo and install editably. This is **optional** — most users should stick with PyPI. +If you want to track the latest commits or contribute back, clone the repo and install editably. **Most users should stick with PyPI** and skip this section. ```bash -cd ~/.lightcone # or wherever you keep clones - +cd ~/.lightcone # or wherever you keep clones git clone https://github.com/LightconeResearch/lightcone-cli.git -pip install -e ./lightcone-cli # editable install, follows local edits +pip install -e ./lightcone-cli # editable: tracks local edits ``` -If you also want to hack on `astra-tools` itself, clone the `ASTRA` repo (the package is published to PyPI as `astra-tools` but the GitHub repo is named `ASTRA`): +If you also want to hack on `astra-tools` (note: PyPI name `astra-tools`, GitHub repo name `ASTRA`): ```bash git clone https://github.com/LightconeResearch/ASTRA.git pip install -e ./ASTRA ``` -For development work, add the dev extras: +For development tooling (pytest, ruff, mypy), add the `dev` extras: ```bash -pip install -e "./lightcone-cli[dev]" # adds pytest, ruff, mypy +pip install -e "./lightcone-cli[dev]" ``` ### One-time setup +After install, run setup once: + ```bash lc setup ``` -This creates `~/.lightcone/config.yaml` with a default container runtime of `auto`. You can pin the runtime later (see [§5](#5-running-on-compute-nodes) — Perlmutter compute nodes need `podman-hpc`). +This creates `~/.lightcone/config.yaml` with `runtime: auto`. You'll pin it to `podman-hpc` for compute nodes in [§5](#5-running-on-compute-nodes). ### Verify ```bash -which lc # should be inside your active env's bin/ +which lc # should resolve inside your active env's bin/ lc --version lc --help ``` @@ -134,43 +139,44 @@ lc --help ## 3. Initialize a new project -Now you're ready to start working with it: +Scaffold a project directory and drop into it with the agent: ```bash -lc init your-analysis # scaffolds a new folder with everything lightcone needs +lc init your-analysis # scaffolds a fresh project tree cd your-analysis -claude # launch Claude Code inside the project +claude # launch Claude Code inside the project ``` --- -## 4. Start your research with lightcone! - -Once Claude Code is open, you can use the lightcone skillset to start a fresh analysis or migrate one from existing code — all driven by natural-language prompts to the agent. +## 4. Start your research -For example, to start from scratch: +Once Claude Code is open, drive everything from there. The `lc-*` skills are how you tell the agent what to build: -```text -/lc-new Please sample a standard Gaussian distribution using numpy. -``` +=== "Start fresh" + ```text + /lc-new Please sample a standard Gaussian distribution using numpy. + ``` -Or to migrate from existing code in another directory: +=== "Migrate existing code" + ```text + /lc-migrate I have code that samples a standard Gaussian distribution using numpy at @../gaussian_sampling. Please create an analysis based on it. + ``` -```text -/lc-migrate I have code that samples a standard Gaussian distribution using numpy at @../gaussian_sampling. Please create an analysis based on it. -``` +After that, just keep talking to the agent in plain English about what you want to build next. -After initialization, just keep talking to the agent in plain English about what you want to build next. Note that your job will all run on **login node**, see the next section on how to run jobs on computing node. +!!! warning "You're still on a login node" + Everything from `lc init` through your first `/lc-new` runs on a Perlmutter **login node**. That's fine for scaffolding and small recipes, but anything heavyweight needs a compute node — see [§5](#5-running-on-compute-nodes). --- ## 5. Running on compute nodes -Everything up to this point ran on a Perlmutter **login node** — fine for installation, scaffolding, and `lc status`, but anything heavy belongs on a compute node. Login nodes are shared and should not be abused. +Login nodes are shared and rate-limited — fine for `lc init`, `lc status`, and small `lc build` calls, but anything heavyweight belongs on a compute node. ### Pre-flight: pin the container runtime and build images -On Perlmutter, compute nodes ship `podman-hpc`. Pin it once in your global config: +Perlmutter compute nodes ship `podman-hpc`. Pin it once globally: ```yaml # ~/.lightcone/config.yaml @@ -178,33 +184,34 @@ container: runtime: podman-hpc ``` -Then build and migrate the images for your project on a login node (`lc build` runs `podman-hpc build` then `podman-hpc migrate`, which copies the image into the per-node container cache): +Then, on a login node, build and migrate your project's images: ```bash cd /path/to/your-analysis lc build ``` -See [Running on a Cluster → Pre-flight](cluster.md#pre-flight-pick-the-right-container-runtime) for the underlying mechanics. +`lc build` runs `podman-hpc build` followed by `podman-hpc migrate`, which copies the image into each compute node's local container cache. See [Running on a Cluster → Pre-flight](cluster.md#pre-flight-pick-the-right-container-runtime) for the underlying mechanics. ### Interactive runs (agent-driven) -The agent (Claude Code) will invoke `lc run` for you when it decides recipes need to materialize — you don't call it directly. What you control is *where Claude Code is running*: it inherits whatever shell environment you started it from. To get the agent's `lc run` calls onto a compute node, start `claude` from inside a SLURM allocation: +The agent (Claude Code) calls `lc run` for you whenever a recipe needs to materialize — you never call it directly. What you *do* control is **where Claude Code is running**: it inherits the shell environment you launched it from. To put the agent's recipes onto a compute node, simply launch `claude` from inside a SLURM allocation: ```bash salloc -A -q interactive -C gpu --nodes=1 -t 00:30:00 -# allocation drops you onto a compute node; from there: +# salloc drops you onto a compute node; from there: cd /path/to/your-analysis claude ``` -Now anything the agent decides to run (`lc run`, scripts, etc.) executes on the allocated node, not the login node. +Now everything the agent triggers (`lc run`, scripts, etc.) executes on the allocated node. -The `interactive` QoS on the GPU partition is appropriate for development. For longer or larger sessions, see [NERSC's queue policy reference](https://docs.nersc.gov/jobs/policy/). +!!! note "Picking a QoS" + The `interactive` QoS on the GPU partition is right for development. For longer or larger sessions, see [NERSC's queue policy reference](https://docs.nersc.gov/jobs/policy/). ### Unattended batch runs (no agent in the loop) -If you want to submit `lc run` as an unattended batch job — i.e., without Claude Code in the loop — that path also works. See [Running on a Cluster → A typical SLURM workflow](cluster.md#a-typical-slurm-workflow) for the generic `sbatch` template; on Perlmutter, the only addition is the `-A`/`-q` directives: +For production sweeps where the recipes are already nailed down, you can submit `lc run` directly as a batch job. See [Running on a Cluster → A typical SLURM workflow](cluster.md#a-typical-slurm-workflow) for the generic template; on Perlmutter, the only addition is the `-A` / `-q` directives: ```bash #!/bin/bash @@ -219,71 +226,75 @@ source ~/.conda/envs/your-env-name/bin/activate # or your venv lc run -j 16 ``` -> Note: this path runs `lc run` directly, not through the agent — useful for production sweeps where you've already nailed down the recipes interactively. The agent-driven flow above is the right tool for development. +!!! note "When to use this path" + The agent-driven flow above is the right tool during development. Reach for batch submission when you've finished iterating and want a hands-off sweep. ### Storage gotcha: Snakemake state must live on `$SCRATCH` -`$HOME` and `/global/cfs/` are mounted on compute nodes via DVS, which silently ignores `flock()`. Snakemake (and any sane locking system) uses `flock`, so its `.snakemake/` directory and Dask spill files must go on Lustre (`$SCRATCH`), which honors `flock`. Otherwise you get intermittent silent rule-rerun loops or hangs. +!!! danger "DVS silently ignores `flock()`" + `$HOME` and `/global/cfs/` are mounted on compute nodes via DVS, which silently ignores `flock()`. Snakemake (and any sane locking system) relies on `flock`, so its `.snakemake/` directory and Dask spill files **must** live on Lustre (`$SCRATCH`), which honors `flock`. Otherwise you get intermittent silent rule-rerun loops or hangs. -`lc` redirects state automatically when it detects Perlmutter, so this usually just works. To pin explicitly per project: +`lc` redirects state automatically when it detects Perlmutter, so this usually just works. To pin explicitly at project creation: ```bash -lc init your-analysis --scratch '$SCRATCH' # expands at run time, kept verbatim in config +lc init your-analysis --scratch '$SCRATCH' # kept verbatim, expanded at run time ``` -Or after the fact, add to `/.lightcone/lightcone.yaml`: +Or, after the fact, edit `/.lightcone/lightcone.yaml`: ```yaml scratch_root: $SCRATCH ``` -`$SCRATCH` is purged on a 12-week rolling window, so for outputs you want to keep, copy or symlink to `/global/cfs/cdirs//`. +!!! warning "12-week purge on `$SCRATCH`" + Perlmutter purges `$SCRATCH` on a rolling 12-week window. For outputs you need to keep, copy or symlink to `/global/cfs/cdirs//`. ### Further reading - [NERSC interactive jobs](https://docs.nersc.gov/jobs/interactive/) — `salloc` patterns and reservation queues - [Perlmutter system overview](https://docs.nersc.gov/systems/perlmutter/) — node types and partitions +- [NERSC Python guide](https://docs.nersc.gov/development/languages/python/nersc-python/) — module, conda, and pip layering --- ## 6. Common troubleshooting -| Symptom | Cause | Fix | +| Symptom | Likely cause | Fix | |---|---|---| -| `lc: command not found` | Wrong env active | `which lc`; reinstall in the active env | +| `lc: command not found` | Wrong env active, or `~/.local/bin` not on `PATH` | `which lc`; reinstall in the active env, or fix `PATH` | | `lc` runs but uses unexpected code | Two installs across two envs shadowing each other on `PATH` | `which lc` and uninstall the stale one | -| `ModuleNotFoundError: lightcone.cli.__main__` | Tried `python -m lightcone.cli` (the package isn't directly executable) | Use the `lc` console script | +| `ModuleNotFoundError: lightcone.cli.__main__` | Tried `python -m lightcone.cli` (the package isn't directly executable) | Use the `lc` console script instead | | Snakemake locking errors / silent rule rerun loops | `.snakemake/` ended up on DVS-mounted storage | Set `scratch_root: $SCRATCH` in the project's `.lightcone/lightcone.yaml` | | `ImportError: cannot import name 'resolve_analysis_tree' from 'astra.helpers'` | Stale `astra-tools` (pre-0.2.5) | `pip install -U astra-tools` | -| `PermissionError` reading another user's symlinked `results/` | Cross-user scratch path without group ACLs | Request access from the data owner, or copy the manifests you need into your own scratch | +| `PermissionError` reading another user's symlinked `results/` | Cross-user scratch path without group ACLs | Request access from the data owner, or copy the manifests into your own scratch | | `pip install` hangs or times out on a compute node | Compute nodes have no public internet | Always install from a login node | --- ## 7. Updating -For source installs: - -```bash -cd ~/.lightcone/lightcone-cli -git pull -pip install -e . # only needed if pyproject.toml changed -``` - -Editable installs auto-follow source edits — switching branches or pulling new commits is reflected immediately in `lc`. Re-run `pip install -e .` only when `pyproject.toml` adds a new dependency or changes the `[project.scripts]` table. +=== "PyPI install" + ```bash + pip install -U lightcone-cli astra-tools + ``` -For PyPI installs: +=== "Source install" + ```bash + cd ~/.lightcone/lightcone-cli + git pull + pip install -e . # only needed if pyproject.toml changed + ``` -```bash -pip install -U lightcone-cli astra-tools -``` + Editable installs auto-follow source edits — switching branches or pulling new commits is reflected immediately in `lc`. Re-run `pip install -e .` only when `pyproject.toml` adds a new dependency or changes the `[project.scripts]` table. --- ## 8. Uninstalling ```bash -pip uninstall lightcone-cli # remove from the active env -rm -rf ~/.lightcone/lightcone-cli # remove source clone (only for source installs) -# Keep ~/.lightcone/config.yaml and ~/.lightcone/targets/ unless you want to start fresh. +pip uninstall lightcone-cli # remove from the active env +rm -rf ~/.lightcone/lightcone-cli # only for source installs ``` + +!!! note "Keep your config?" + `~/.lightcone/config.yaml` and `~/.lightcone/targets/` survive the uninstall. Delete them too if you want to start fresh. From 08d33b267b351b15d6bdaa97929f3427a0f9b309 Mon Sep 17 00:00:00 2001 From: dkn16 Date: Thu, 7 May 2026 15:42:08 -0700 Subject: [PATCH 10/10] =?UTF-8?q?docs(nersc):=20drop=20reference=20to=20~/?= =?UTF-8?q?.lightcone/targets/=20=E2=80=94=20folder=20removed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The targets/ directory is no longer part of ~/.lightcone — only config.yaml lives there now. The §8 uninstall admonition was suggesting users would want to preserve a path that doesn't exist. (Separately, docs/architecture.md:278 still mentions targets/ alongside the old dagster.yaml — that's a stale Prism-era paragraph and a separate cleanup.) Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user/nersc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user/nersc.md b/docs/user/nersc.md index 3f8a9955..eeb41505 100644 --- a/docs/user/nersc.md +++ b/docs/user/nersc.md @@ -297,4 +297,4 @@ rm -rf ~/.lightcone/lightcone-cli # only for source installs ``` !!! note "Keep your config?" - `~/.lightcone/config.yaml` and `~/.lightcone/targets/` survive the uninstall. Delete them too if you want to start fresh. + `~/.lightcone/config.yaml` survives the uninstall. Delete it too if you want to start fresh.