-
Notifications
You must be signed in to change notification settings - Fork 0
1_getting_started_hpc
This is a short guide designed to get you acquainted with working on Saga, part of the Sigma2 high performance computer cluster infrastructure for Norway. It is worth noting that Sigma2 has a lot of high level, clear documentation but the tutorial here is a guide that is designed to help you get started.
Saga is where you will do the majority of your research computing while in the group. It is where all the genomic data we have produced is stored. It is also accessible for projects using large datasets that need either high capacity storage or high compute power to process. If you’ve not used a computing cluster before, the best (and perhaps simplest) way to think of it is as an external resource that you are able to log in to and access in addition to your own computer. This means any highly demanding analyses you need to do can be run on Saga without using your own machine directly.
There are two ways to work with Saga. The first is interactively via the Unix command line - i.e. by typing commands into the prompt and seeing how it responds. The second way is as a job scheduler which means you are able to submit jobs to the cluster specifying certain levels of resources and then it will work in the background while you wait for it to finish. We will return to these two ways of working later. First, you need to know how to get access and log on!
If you have not already done so, you need to go to the Metacenter page and apply for a user account. You should apply for access to both storage and HPC access (i.e. two separate applications).
When you apply, you will need to ensure you do the following: - Choose
NIRD under resources; set project to NS10082K (storage) - Choose Saga;
set project to NN10082K
Notify me when you have done this and I will grant you access when your application comes to me for approval.
If you are using a Mac or Linux machine, then all you need to do now is use the built in terminal (or you can use a modified terminal app like iterm2, my personal preference).
There are even more options with a Windows machine. You can use PuTTY, mobaXterm for example. These are harder to configure so you will need to come and see me if you want to use one of these. However, assuming you have the latest version of Windows, you should have the Windows Terminal app. This means you can follow the generic terminal instructions below.
NB: Everything that follows here will require you to have basic working knowledge of bash and Unix - see here and here for tutorials on that.
To log on, we will use the secure shell protocol, or ssh.
Remember that for all the examples below, you need to replace username
with your assigned username from Sigma2.
ssh username@login.saga.sigma2.no
When you run this command, you will be asked for your password and then you might get a prompt about remembering the host. Just say yes and you should be logged on to Saga! To tell for certain, try the following:
pwd
If you were successful, you will see something like the following (it will vary based on your username):
/cluster/home/username
And that’s it - you’re in and ready to start working. But next there is a very important step you must complete!
For most of your time on the cluster, you will be working in $USERWORK - i.e. the working directory. The path is typically:
/cluster/work/users/username
Where username is your own username. To make it accessible to others (i.e. me, other group members who want to help you), do the following:
chmod -R 775 /cluster/work/users/username
This has the slight disadvantage of giving access to everyone on the cluster but no one can write to or delete files in your directory but you (and the system admins). However if one of us needs to help you, this will at least let us see what is going on.
When you log onto unix and Saga, your shell checks several files for custom features or aliases you have set up. In your $HOME directory, you should use nano to include the following line to your .bash_profile file:
export PATH=$PATH:/cluster/projects/nn10082k/bin
You can also do it like so - this will add the line to your bash_profile without you needing to open an editor.
echo 'PATH=$PATH:/cluster/projects/nn10082k/bin' >> ~/.bash_profile
You can add other things to this file too if you wish. Aliases, a custom path colour and more. However, you should also add these lines:
echo 'CPU_AVAIL=$(cost -p nn10082k --parsable | grep "Available" | cut -f 3 -d "|")' >> ~/.bash_profile
echo 'echo "There are ${CPU_AVAIL} remaining CPU hours for our group - remember to check your job memory requests carefully!"' >> ~/.bash_profile
This will ensure you get a little reminder about memory use on login.
If you are like me, you will either forget or get very bored of typing
these commands to login and typing your password everytime you use ssh
or scp (more on this later). Luckily there are shortcuts.
First, we need to set up a login profile for ssh. This is very
straight forward to do from our home directory on our local machine
(i.e. your own laptop or computer)
cd ~
mkdir .ssh
Here we have made a directory for ssh - note that the . makes it a
hidden directory. Inside this, we need to make a config file that sets
out our ssh configuration. We will use nano, the command line text
editor for this:
nano ~/.ssh/config
Next, we need to add the details below. Remember to change username to
your own.
Host saga
Hostname saga.sigma2.no
User username
The Host is the shortcut name we will use everytime we log in. The
Hostname is just the address of the cluster - i.e. saga.sigma2.no
here.
Now, you should be able to log in just by typing the following:
ssh saga
You can use this for any cluster you might have access to, you just need to change the details!
The host shortcut makes life a lot easier for logging in but we can also
create a key that means we can log in to the cluster without having to
type our password each time. This also works for scp and rsync,
making it much easier to migrate data on and off the cluster.
To do this, we need to create what is known as an ssh key. The key
basically has two forms, a public one and a private one. You put the
public one on the server or cluster you are logging on to and then it
always checks against your private key to ensure it is you logging on.
This is very easy to do from your home directory on your local machine
cd ~
ssh-keygen
You will prompted to enter a password - you can add one to be secure but
you can also leave this blank. When complete, you will see message
telling you that you have created a key. If you type ls .ssh you
should see it stored as id_rsa or something similar.
Next you need to copy the public key to Saga. You do this like so:
ssh-copy-id -i ~/.ssh/id_rsa.pub saga
Note that this assumes you have already set up the saga hostname shortcut described above.
You will be prompted for your password this one last time but then you
can try logging in again (ssh saga) and you should enter the cluster
without having to type a password in again.