Data Engineering for Beginners

The code for SQL, Python, and data model sections are written using Spark SQL. To run the code, you will need the prerequisites listed below.

Setup

Prerequisites

Windows users: please setup WSL and a local Ubuntu Virtual machine following the instructions here.

Install the above prerequisites on your ubuntu terminal; if you have trouble installing docker, follow the steps here (only Step 1 is necessary).

Fork this repository data_engineering_for_beginners_code.
After forking, clone the repo to your local machine and start the containers as shown below:

# Replace your-user-name with your github username
git clone https://github.com/your-user-name/data_engineering_for_beginners_code.git 
cd data_engineering_for_beginners_code
docker compose up -d --build 
sleep 30

Open Jupyter Lab at http://localhost:8888 and run the code at ./notebooks/starter-notebook.ipynb to create the data and check that your setup worked.

After the data is created open the Airflow UI with http://localhost:8080/ and trigger the DAG and ensure that it runs successfully.

Shut down

After you are done, shut down the containers with

docker compose down -v

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
airflow		airflow
images		images
ipython_scripts/startup		ipython_scripts/startup
notebooks		notebooks
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
spark_defaults.conf		spark_defaults.conf
startup.sh		startup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering for Beginners

Setup

Shut down

About

Uh oh!

Releases

Packages

Languages

josephmachado/data_engineering_for_beginners_code

Folders and files

Latest commit

History

Repository files navigation

Data Engineering for Beginners

Setup

Shut down

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages