Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# 🚀 Pull Request

## Description
Short summary of the changes and related issue(s).

## Type of change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update

## How Has This Been Tested?
Explain how you tested your changes. Screenshots if needed.

## Checklist
- [ ] Code follows project style
- [ ] Self-reviewed my code
- [ ] Documentation updated if needed
- [ ] No secrets or temp files committed
44 changes: 44 additions & 0 deletions .github/workflows/github-actions-demo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
name: GitHub Actions Demo
run-name: ${{ github.actor }} is testing out GitHub Actions 🚀

on:
push:
workflow_dispatch: {}

jobs:
Explore-GitHub-Actions:
runs-on: ubuntu-latest
steps:
- run: echo "🎉 The job was automatically triggered by a ${{ github.event_name }} event."
- run: echo "🐧 This job is now running on a ${{ runner.os }} server hosted by GitHub!"
- run: echo "🔎 The name of your branch is ${{ github.ref }} and your repository is ${{ github.repository }}."
- name: Check out repository code
uses: actions/checkout@v5
- run: echo "💡 The ${{ github.repository }} repository has been cloned to the runner."
- run: echo "🖥️ The workflow is now ready to test your code on the runner."
- name: List files in the repository
run: |
ls ${{ github.workspace }}
- run: echo "🍏 This job's status is ${{ job.status }}."

Gather-System-Info:
runs-on: ubuntu-latest
needs: Explore-GitHub-Actions
steps:
- name: Gather Runner System Info
run: |
echo "🖥️ OS and kernel info:"
uname -a
echo "💻 CPU cores:"
nproc
echo "🧠 Memory info:"
free -h
echo "💽 Disk space:"
df -h
echo "🔧 Detailed CPU info:"
cat /proc/cpuinfo | head -n 20
echo "🏷️ GitHub Runner Info:"
echo "Runner OS: ${{ runner.os }}"
echo "Runner Temp Dir: ${{ runner.temp }}"
echo "Workflow Run ID: ${{ github.run_id }}"
echo "Job ID: ${{ github.job }}"
Binary file added labs/imgs/alerts_settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labs/imgs/api_checks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labs/imgs/assertion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labs/imgs/dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labs/imgs/freq.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added labs/imgs/successful_run.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
166 changes: 166 additions & 0 deletions labs/submission8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Task 1 — System metrics

Commands were run on `big-data-vm` (8 vCPUs, ~31 GB RAM). Same shell session as in the log below.

```
chrnegor@big-data-vm:~$ htop

1 [ 0.0%] 5 [ 0.0%]
2 [ 0.0%] 6 [ 0.0%]
3 [ 0.0%] 7 [||||||||||||||||||||||||100.0%]
4 [ 0.0%] 8 [ 0.0%]
Mem[|||||||||||||||||||18.3G/31.3G] Tasks: 45, 374 thr; 1 running
Swp[ 0K/0K] Load average: 0.07 0.04 0.01
Uptime: 1 day, 01:26:15

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
1 root 20 0 165M 12916 8520 S 0.0 0.0 0:04.44 /sbin/init
406 root 19 -1 60336 24728 23592 S 0.0 0.1 0:02.10 /lib/systemd/sy
439 root 20 0 20316 6020 4020 S 0.0 0.0 0:00.90 /lib/systemd/sy
531 root RT 0 209M 17948 8208 S 0.0 0.1 0:00.53 /sbin/multipath
532 root RT 0 209M 17948 8208 S 0.0 0.0 0:00.00 /sbin/multipath
533 root RT 0 209M 17948 8208 S 0.0 0.1 0:00.08 /sbin/multipath
534 root RT 0 209M 17948 8208 S 0.0 0.1 0:03.84 /sbin/multipath
535 root RT 0 209M 17948 8208 S 0.0 0.1 0:00.00 /sbin/multipath
536 root RT 0 209M 17948 8208 S 0.0 0.1 0:00.00 /sbin/multipath
530 root RT 0 209M 17948 8208 S 0.0 0.1 0:07.91 /sbin/multipath
565 systemd-t 20 0 90888 6036 5260 S 0.0 0.0 0:00.00 /lib/systemd/sy
563 systemd-t 20 0 90888 6036 5260 S 0.0 0.0 0:00.16 /lib/systemd/sy
618 systemd-r 20 0 25616 13988 8968 S 0.0 0.0 0:08.34 /lib/systemd/sy
F1Help F2Setup F3SearchF4FilterF5Tree F6SortByF7Nice -F8Nice +F9Kill F10Quit


chrnegor@big-data-vm:~$ iostat -x 1 5
Linux 5.4.0-216-generic (big-data-vm) 03/24/26 _x86_64_ (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
0.29 0.00 0.39 0.05 0.01 99.27

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util
vda 0.17 8.93 0.04 21.01 12.55 53.30 0.74 15.12 0.49 39.95 10.75 20.39 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.19
vdb 0.54 24.29 0.08 13.35 1.59 44.59 1.24 50.05 1.05 45.87 3.37 40.46 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.33

chrnegor@big-data-vm:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 16G 0 16G 0% /dev
tmpfs 3.2G 1.3M 3.2G 1% /run
/dev/vda1 19G 7.4G 12G 40% /
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda15 599M 6.1M 593M 2% /boot/efi
/dev/vdb1 194G 14G 181G 7% /mnt/data
tmpfs 3.2G 0 3.2G 0% /run/user/1000

chrnegor@big-data-vm:~$ sudo du -h /var | sort -rh | head -n 10

12G /var
11G /var/lib/docker/overlay2
11G /var/lib/docker
11G /var/lib
5.1G /var/lib/docker/overlay2/6495a703f913dbec5887ded2fdcbe8c679c6b5cd71e4ea472c916ee9d64a5f85
3.9G /var/lib/docker/overlay2/6495a703f913dbec5887ded2fdcbe8c679c6b5cd71e4ea472c916ee9d64a5f85/merged/usr
3.9G /var/lib/docker/overlay2/6495a703f913dbec5887ded2fdcbe8c679c6b5cd71e4ea472c916ee9d64a5f85/merged
2.9G /var/lib/docker/overlay2/6495a703f913dbec5887ded2fdcbe8c679c6b5cd71e4ea472c916ee9d64a5f85/merged/usr/local
2.5G /var/lib/docker/overlay2/6495a703f913dbec5887ded2fdcbe8c679c6b5cd71e4ea472c916ee9d64a5f85/merged/usr/local/hadoop
1.3G /var/lib/docker/overlay2/8cfbf33d8510e9065e24ff6bc42c0379460798bb34e66eeef81af425e0e5e1c1/diff/usr/local/hadoop
chrnegor@big-data-vm:~$ sudo find /var -type f -exec du -h {} + | sort -rh | head -n 3
383M /var/lib/docker/overlay2/c1ee043002554aa24a18ded845596c1b859f49677ae44df395bdabc99e776ac7/diff/spark.tar.gz
184M /var/lib/docker/overlay2/8cfbf33d8510e9065e24ff6bc42c0379460798bb34e66eeef81af425e0e5e1c1/diff/usr/local/hadoop/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.901.jar
184M /var/lib/docker/overlay2/6495a703f913dbec5887ded2fdcbe8c679c6b5cd71e4ea472c916ee9d64a5f85/merged/usr/local/hadoop/share/hadoop/tools/lib/aws-java-sdk-bundle-1.11.901.jar
chrnegor@big-data-vm:~$
```

What stood out from the tools:

- CPU: load average was low and almost every line in the htop screenshot is at 0% CPU. One core briefly showed full utilisation on the bar graph but the process list didn’t scroll to whatever caused it — typical for an idle VM with a short spike.

- Memory: ~18 GB in use out of 31 GB. Among the processes that were visible, the largest RSS values were systemd (pid 406), several multipathd threads (~17 MB each), and systemd-resolved (618).

- I/O: `iostat` is per disk, not per process. `vdb` had slightly higher read/write throughput than `vda` in the first sample; after that both disks were basically quiet with near-zero `%util`.

Largest files under `/var` from `find`: `spark.tar.gz` (~383 MB) and two ~184 MB `aws-java-sdk-bundle` jars under Docker overlay paths. Most of `/var` is Docker image layers (Hadoop/Spark related).

## Analysis

The machine wasn’t under stress: high `%idle`, low load, no swap. Disk activity was minimal in the window I captured. `/var` growth is almost entirely `overlay2`, which matches a host that runs big-data containers.

## Reflection

If `/var` keeps growing I’d prune unused Docker data (`docker system prune` and similar) and delete old images I don’t need. For a deeper CPU investigation next time I’d sort htop by CPU % or run `pidstat` while something is actually busy. For I/O troubleshooting I’d watch `%iowait` and map `vda` / `vdb` to mounts (root vs `/mnt/data` here).

# Task 2

I used Checkly with two targets: an API check on `https://huggingface.co` (simple availability) and a separate browser check on Aviasales for content on the page.

## API check (Hugging Face)

For the API check I pointed the request at the Hugging Face homepage and added an assertion that the response status code equals 200.

![Assertion: status code 200](imgs/assertion.png)

I set the run interval to every 10 seconds.

![Check frequency 10s](imgs/freq.png)

On the checks list / request view everything looked healthy after the runs.

![API checks dashboard](imgs/api_checks.png)

## Browser check (Aviasales)

For the browser check I used a Playwright script against `https://www.aviasales.ru/?params=MOW1`. Besides checking that the HTTP response is OK, it waits for the main hero heading text “Тут покупают дешёвые авиабилеты” so we know the real page content rendered, not only a blank or error page with a 200.

```javascript
/**
* Checkly browser check — Aviasales
* URL: https://www.aviasales.ru/?params=MOW1
* Assert: visible h1 "Тут покупают дешёвые авиабилеты"
*/
const { expect, test } = require('@playwright/test')

test.setTimeout(210000)
test.use({ actionTimeout: 10000 })

test('Aviasales: hero heading visible', async ({ page }) => {
const url =
process.env.ENVIRONMENT_URL ||
'https://www.aviasales.ru/?params=MOW1'

const response = await page.goto(url, { waitUntil: 'domcontentloaded' })
expect(response.status(), 'HTTP status should be success').toBeLessThan(400)

const heading = page.getByRole('heading', {
name: 'Тут покупают дешёвые авиабилеты',
level: 1,
})
await expect(heading, 'Hero heading visible').toBeVisible()
await page.screenshot({ path: 'screenshot.jpg' })
})
```

The run completed successfully (screenshot below).

![Successful browser check run](imgs/successful_run.png)

## Alerts

I configured email alerts so I get email notification when something breaks instead of having to open Checkly every time.

![Alert settings (email)](imgs/alerts_settings.png)

## Dashboard

Overall dashboard view with the checks and recent status:

![Checkly dashboard](imgs/dashboard.png)

## Analysis

Hugging Face was a good fit for the API check: the main URL is stable, returns 200 when the service is up, and 10s frequency is a reasonable balance between fast feedback and load.

Aviasales is a different kind of check: the same status code could still hide a broken UI, so asserting on the visible heading adds a layer that is closer to what a user sees.

## Reflection

Having both an API check and a browser check plus email alerts means I get early warning on outages and on “looks broken but HTTP is fine” cases. That is basically what SRE-style monitoring is about: know when users are impacted and react before they all complain.