forked from kew24/Intro_to_Linux_for_HPC
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path01_linux_basics.Rmd
More file actions
executable file
·374 lines (250 loc) · 11.2 KB
/
01_linux_basics.Rmd
File metadata and controls
executable file
·374 lines (250 loc) · 11.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
---
title: "Basic Command Line Utilities on the HPC"
site: bookdown::bookdown_site
output: bookdown::gitbook
documentclass: book
bibliography: [packages.bib]
biblio-style: apalike
link-citations: yes
github-repo: vari-bbc/Intro_to_Linux_for_HPC
theme: "yeti"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# **Basic Command Line Utilities on the HPC**
## **Access HPC**
Ensure that your laptop is connected to wifi "vai", not "vai-guest".
(properly formatted "aside" element below. title is "Note:" -- is this true?)
::: aside
**Note:**
The platform-specific instructions below are hidden by default because of the
newer HPC OnDemand platform, which we can all access from our laptops using a
web browser.
However, for archival purposes, we will still provide the platform-specific
instructions to access the HPC on your laptop.
### **Mac users**
1. Open your terminal (search for the app)
2. Type the following command to connect to the HPC
```bash
ssh firstname.lastname@access.hpc.vai.org
```
Replace `firstname.lastname` with your actual VAI username.
3. Enter your VAI password. (Note: you will not see the password as you type it,
but it *is* working! Press your enter/return key when you're done.)
### **Windows user**
If MobaXterm is not installed, please download the [free MobaXterm version](https://mobaxterm.mobatek.net/download.html), then set up, and agree with license agreements to proceed installation.
1. Open MobaXterm
2. Click on `Session` in the top left corner
3. Click on `SSH` (Secure Shell)
4. Enter `access.hpc.vai.org` in the Remote host field
5. Click the box on the left of Specify username, then enter your VAI `username` and `password`
6. Click `OK`
:::
(same thing as above, a toggle dropdown, but with the title as "hello here")
<details>
<summary>hello here</summary>
**Note:**
The platform-specific instructions below are hidden by default because of the
newer HPC OnDemand platform, which we can all access from our laptops using a
web browser.
However, for archival purposes, we will still provide the platform-specific
instructions to access the HPC on your laptop.
### **Mac users**
1. Open your terminal (search for the app)
2. Type the following command to connect to the HPC
```bash
ssh firstname.lastname@access.hpc.vai.org
```
Replace `firstname.lastname` with your actual VAI username.
3. Enter your VAI password. (Note: you will not see the password as you type it,
but it *is* working! Press your enter/return key when you're done.)
### **Windows user**
If MobaXterm is not installed, please download the [free MobaXterm version](https://mobaxterm.mobatek.net/download.html), then set up, and agree with license agreements to proceed installation.
1. Open MobaXterm
2. Click on `Session` in the top left corner
3. Click on `SSH`(Secure Shell)
4. Enter `access.hpc.vai.org` in the Remote host field
5. Click the box on the left of Specify username, then enter your VAI `username` and `password`
6. Click `OK`
</details>
### **OnDemand Instructions**
HPC OnDemand is a web-based interface that allows you to access the HPC without needing to conn
You can read more about this on the [HPC OnDemand sharepoint page](https://vanandelinstitute.sharepoint.com/sites/SC/SitePages/HPC-OnDemand.aspx).
Below are the steps to access the HPC command line using OnDemand:
1. Open a web browser (Chrome, Firefox, Safari, etc.)
2. Go to the following URL: [https://ondemand3.vai.zone/](https://ondemand3.vai.zone/)
- if you see an error, try this instead: [https://ondemandlocal.hpc.vai.org](https://ondemandlocal.hpc.vai.org/)
- if neither or those links work, please let us know!
3. Select "hpc Shell Access" from the list of several "pinned apps". ("hpc Shell Access" is similar to the terminal on your laptop)
## **File navigation**
Practice directory: `/varidata/researchtemp/hpctmp/BBC_workshop_June2023_I`.
### **Navigate to a directory (folder)**
`cd` - Change Directory. Allows you to navigate to a different directory (folder).
Note the following special symbols:
- `.` means current working directory.
- `..` means the parent directory.
- `/` is the root directory.
- `~` is home directory.
- `pwd` - Print Working Directory. Displays the current directory (folder) you are in.
```{bash, eval=TRUE, engine="sh"}
# Display the current directory
pwd
# Change to the practice directory
cd /varidata/researchtemp/hpctmp/BBC_workshop_June2023_I
# Display the current directory again to confirm the change
pwd
```
### **List content in a directory**
`ls` - list contents (files and folders). Without anything specified after ls, it will list what's in the current directory. It can also list content of another directory.
`ls -lht` – list contents, with added options (more details, human-readable sizes, and sorted by modification time).
Shows more details about the files and folders,
```{bash, eval=FALSE, engine="sh"}
ls
ls -lht
ls -lht /varidata/researchtemp/hpctmp/BBC_workshop_June2023_I
```
### **Navigate to home directory**
Without anything after `cd`, you will change to your home directory, which is `/home/username` (same as `~`).
```{bash, eval=FALSE, engine="sh"}
cd
# check: where are you now?
pwd
```
### **Create a directory**
`mkdir dir_name` - will create a directory "dir_name" in your current directory.
You will be creating a directory in your home drectory (because we just navigated to it).
Make sure you are in your home directory first (use pwd), then create a directory called "hpc_workshop_2024"
```{bash, eval=FALSE, engine="sh"}
pwd
mkdir hpc_workshop_2024
```
### **Copy a file**
`cp` - copy file/files
Copy a file from `/varidata/researchtemp/hpctmp/BBC_workshop_Oct2024_II` to the directory you just created.
```{bash, eval=FALSE, engine="sh"}
pwd
cp /varidata/researchtemp/hpctmp/BBC_workshop_June2023_I/metadata.txt hpc_workshop_2024
ls hpc_workshop_2024
```
## **View and manipulate files**
### **Display content**
`cat` - display the entire contents of a file.
`head` - display the *first* 10 lines/rows of the a file.
`tail` - display the *last* 10 lines/rows of the a file. You can use "-n" to control how many lines you want to see, default is 10.
```{bash, eval=TRUE, engine="sh"}
cd ~/hpc_workshop_2024
cat metadata.txt
head metadata.txt
tail metadata.txt
head -n 2 metadata.txt
```
The `cat` command can also be used to combine files. (This is where the command's name comes from: con*cat*enate)
Note that we won't run the command below, but keep this in mind for future reference.
```{bash, eval=FALSE, engine="sh"}
cat file1.txt file2.txt > combined.txt
```
### **Pattern search**
`grep` - search a pattern by line
```{bash, eval=TRUE, engine="sh"}
grep "13" metadata.tsv
```
### **Display the number of words, lines, and characters**
`wc` - word count. It counts the number of words, lines, and characters in a file.
```{bash, eval=TRUE, engine="sh"}
wc metadata.tsv
wc -l metadata.tsv
```
### **Pipe - redirection**
A pipe is a form of redirection (instead of printing output to the screen,
it sends it to other destinations). You can send output from one command/program
to another for further processing, such as `command 1 | command 2 | command 3`.
The output from command 1 is used as input for command 2, and the output from command 2 is used as input for command 3.
In the example below, we will use 3 commands subsequently to count the number of lines that contain the number "13" in the file.
```{bash, eval= TRUE, engine= "sh"}
cat metadata.tsv | grep "13" | wc -l
```
### **Output redirection**
Instead of printing output to your screen (typical command output) or
another command (pipe), you can redirect the output to a file.
`>` - redirect output to a file. Note that it will **overwrite** the file if it already exists -- be careful!
`>>` - append output to a file. It will add the output to the end of the file.
```{bash, eval=TRUE, engine="sh"}
# note that no output will be displayed on the screen -- it's saved in the file instead
grep "13" metadata.tsv > lines_with_13.tsv
# check: do we see the file we just created?
ls
# display the content of the file
cat lines_with_13.tsv
```
## **Exercise**
Below, we'll be doing something similar to the exercises above, but with real genomic data!
We'll be using a fastq file (format containing raw sequencing & quality information).
We will:
1. copy fastq files from the practice directory to the folder you created
2. combine the two fastq files into one
3. count the number of reads in the combined file
See if you can do each step on your own -- if you get stuck, don't worry!
Try to remember the commands we've learned so far, and you can always refer back to the examples above.
To check your work, you can view the commands below.
### Step 1: Copy the files
Once you're in your `hpc_workshop_2024` directory, copy the files `test_01_R1.fq` and `test_54_R1.fq`
from the practice directory (/varidata/researchtemp/hpctmp/BBC_workshop_Oct2024_II) into your folder.
<details>
<summary>Click here to see a solution</summary>
```{bash, eval=FALSE, engine="sh"}
cd ~/hpc_workshop_2024
cp /varidata/researchtemp/hpctmp/BBC_workshop_Oct2024_II/test_54_R1.fq .
```
</details>
## Step 2: Combine the files
Combine the two fastq files into one file called `combined.fq`.
<details>
<summary>Click here to see a solution</summary>
```{bash, eval=TRUE, engine="sh"}
cat test_01_R1.fq test_54_R1.fq > combined.fq
ls
```
</details>
## Step 3: Count the number of reads
Count the number of reads in the combined file.
<details>
<summary>Hint</summary>
In a fastq file, reads start with the "@" symbol.
</details>
<details>
<summary>Hint 2</summary>
See if you can combine a few commands together!
</details>
<details>
<summary>Click here to see a solution</summary>
```{bash, eval=TRUE, engine="sh"}
# use that combined file
grep "@" combined.fq | wc -l
# a different way to do the same thing
cat combined.fq | grep "@" | wc -l
# or, combine the commands
cat test_01_R1.fq test_54_R1.fq | grep "@" | wc -l
```
</details>
Can you explain why your solution works?
Congratulations! You've successfully completed the exercise.
**BONUS:** If you have extra time, use the skills you've learned to explore the fastq file format further:
1. Count the total number of lines in the combined file.
2. Using your answer from the exercise above (count the number of reads) -- how are the number of reads related to the total number of lines in the file?
3. What do you notice about the structure of the file? Can you find any patterns? What do you think each line represents?
## **Summary**
In this section, we've covered the basic command lines for navigating directories, viewing and manipulating files, and using pipes and output redirection.
You've learned how to:
- Navigate to a directory
- List content in a directory
- Create a directory
- Copy a file
- Display content
- Search for patterns
- Display the number of words, lines, and characters
- Use pipes for redirection
- Use output redirection
These are the fundamental commands you'll need to work with files and directories on the HPC.
In the next section, we'll work through a real bioinformatics "mini-project" that builds on these commands, so you can see how they can be used together to solve a problem.
[Next: Bioinformatics Mini Project](02_bioinfx_example.Rmd)