I have recently started learning Data Engineering and I’ve finished Week 1 of my course. These are the topics I’ve covered:
- Overview of Distributed Systems
- Hadoop Ecosystem Core Components (HDFS, MapReduce, YARN)
- Hadoop Architecture
- Hadoop Daemons
- Hadoop Cluster Architecture
- O/S vs U/S topic
- Data Engineering Flow
- HDFS Architecture: Data Node, Name Node, Replication Factor, Heartbeat, Secondary Name Node
- ETL and ELT
- Importance of Data Engineers – Why are they needed?