Skip to content

ramses-organisation/ramses-job-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ramses-job-scripts

RAMSES SLURM DATABASE

  • Draft document by Pierre OCVIRK, 19/09/2024
  • Updated 04/12/2024

QUICK SUMMARY: HOW TO SUBMIT YOUR SCRIPTS TO THE DATABASE

If you are in a hurry, read only this!
The scripts are to be deposited here:
https://seafile.unistra.fr/d/7cf0c55d4a0447a8aba5/

With the minimal mandatory header (only 11 easy header fields), the script should look like this:

########
#HEADER
########

ori_file_name=run_ramsesaton_summit_nobb_4096.lsf
cc_main_name=OLCF
sc_main_name=Summit
exec_year=2021
pi_name=Pierre OCVIRK
sim_queue=slurm
sim_nnodes=4096
sim_nmpi=131072
sim_ncpu=131072
sim_nthreads_total=131072
sim_cpu_compiler=pgi\

#######
#SCRIPT
#######
#Your script goes here
#! /bin/bash
#begin LSF directives

#BSUB -P AST031
#BSUB -W 24:00
#BSUB -nnodes 4096
#BSUB -alloc_flags gpumps
...

A longer form, more detailed of the document and possible metadata follows below for the more patient reader.

Context

The purpose of this document is to help organize the RAMSES Slurm DB (RSDB) working group.

The RSDB is an attempt at improving the user experience when deploying RAMSES on computing centers. Depending on the queue system, the heterogeneity of the compute resources, the quality of documentation, and the complexity of the simulation at hand and its management of data, generating/adapting a generic SLURM script as provided by the HPC center documentation can be time-consuming and frustrating, particularly for new users. The RSDB WG aims at creating a database of such scripts for users to support deployment of RAMSES and speed up progress towards production.

Structure / Content / Metadata

The main structure of the DB is given further below, and a number of repo have been setup for this purpose. A number of principles and requirements have also been formulated:

The DB is curated by RSDB steward(s). The role of these stewards is several fold:

  • Retrieve the user-uploaded scripts from a TBD repo, on a regular basis (once per week / every other week / once a month)
    • => AT RUM2025
  • Review the scripts to verify/assess provenance and conformity of metadata, if necessary interact with the author to clarify.
  • Ingest into the main DB.

There is a Seafile repo for upload from users, and a GitHub repo for download to users. The steward makes the link between the two.

Data Model

Several metadata items have been identified and are presented in the table below:

Item Keyword Type, Format Mandatory/Desirable/Optional
Computing center name main acronym cc_main_name String, "" Mandatory
Computing center developed acronym cc_main_full String, "" Desirable
Computing center name alt acronym cc_alt_name String, "" Optional
Computing center developed alt acronym cc_alt_full String, "" Optional
Supercomputer name main sc_main_name String, "" Mandatory
Supercomputer name alt sc_alt_name String, "" Optional
Execution date exec_date String, "dd-mm-yyyy" Optional
Execution year exec_year String, "yyyy" Mandatory
Simulation project name main (e.g. short) proj_main_name String, "" Desirable
Simulation project name alt (e.g. long) proj_alt_name String, "" Desirable
Principal Investigator / author / uploader pi_name String, "" Mandatory
Principal Investigator email pi_email String, "" Optional
Associated publication bibcode sim_bibcode String, "" Desirable
Queue management system (slurm / sbatch / bsub / other) sim_queue String, "" Mandatory
NNODES sim_nnodes int Mandatory
NMPI domains sim_nmpi int Mandatory
NCPU sim_ncpu int Mandatory
NGPU sim_ngpu int Optional (0) if no GPU
NTHREADS_TOTAL sim_nthreads_total int Mandatory
NVECTOR sim_nvector int Optional
Cpu Compiler: pgi/intel/gnu/cray sim_cpu_compiler String, "" Mandatory
Accelerator compiler: nvidia/amd/intel sim_accel_compiler String, "" Optional
Modules sim_modules Comma-separated list,"," Desirable
Original Filename ori_file_name String, "" Mandatory
Alt filename alt_file_name String, "" Leave blank / Workflow-assigned
Database filename db_file_name String, "" Leave blank / Workflow-assigned
Static DB path db_path String, "" Leave blank / Workflow-assigned

Example SLURM Script with Metadata

### RAMSES SLURM DB

########
#HEADER
########
ori_file_name=run_ramsesaton_summit_nobb_4096.lsf
alt_file_name=
db_file_name=
db_path=
cc_main_name=OLCF
cc_main_full=Oak Ridge Leadership Computing Facility
cc_alt_name=
cc_alt_full=
sc_main_name=Summit
sc_alt_name=
exec_date=21-01-2021
proj_main_name=Cosmic Dawn III
proj_alt_name=CoDaIII
pi_name=Pierre OCVIRK
pi_email=
sim_bibcode=2022MNRAS.516.3389L
sim_queue=slurm
sim_nnodes=4096
sim_nmpi=131072
sim_ncpu=131072
sim_ngpu=24576
sim_nthreads_total=131072
sim_nvector=32
sim_cpu_compiler=pgi
sim_accel_compiler=nvidia
modules=pgi/20.1,cuda/9.2.148

#######
#SCRIPT
#######
#! /bin/bash
# begin LSF directives

#BSUB -P AST031
#BSUB -W 24:00
#BSUB -nnodes 4096
#BSUB -alloc_flags gpumps
#BSUB -J 8192_131k
#BSUB -o 8192_131ko.%J
#BSUB -e 8192_131ke.%J
#BSUB -q batch
#BSUB -B
#BSUB -N

module load pgi/20.1
module load cuda/9.2.148
cd /gpfs/alpine/ast031/proj-shared/pocvirk/CoDaIII/prod_sr/
module list

# set last snap as restart
./set_last_snap_as_restart_sr

# set ramses log file name using date
now=$(date +"%Y-%m-%d-%Hh%Mm%Ss")
printf "%s\n" $now
rlogname="ramseslog-${now}"
echo "ramseslogname = $rlogname"

# Cosmic Dawn III 4096 nodes run
jsrun --nrs 8192 --tasks_per_rs 16 --cpu_per_rs 16 --gpu_per_rs 3 --rs_per_host 2 --latency_priority GPU-CPU stdbuf -o0 
./ramses_aton_128x128x256_noHe_noZ_DS_BPASS_6DIGITS_LONGINT_IOGS-R=32-8192_ULM=4_NOMKDIR_PARTSP2 ramses.nml > $rlogname

echo "finished"

Indexation and Repository Structure

1 - STATIC_DB

The static DB will be a repository with the following structure:

  • Computing Center
    • Computer
      • Execution year
        • Compiler

This structure is useful for a web-based exploration of the DB in a browser.

2 - Dynamic, Python-queryable DB

The full DB is anticipated to be small enough to be wholly downloadable by users in one tarball. A registry file will contain the metadata of all the SLURM files, in a Python dictionary structure. Using this file it should be easy to make queries such as:

  • What is the most recent SLURM executed on Jean Zay?
  • Show me the 3 largest runs in terms of MPI parallelism
  • Show me the SPHINX production run SLURM script

The structure for this will be a GitHub repo containing the DB data and a few Python scripts to read the metadata registry.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors