Skip to content

errajibadr/DataEngineeringUnboxed

Repository files navigation

DataEngineeringUnboxed

Welcome to DataEngineeringUnboxed! This repository is a comprehensive resource for data engineers, covering a wide range of topics, tools, and best practices in the field of data engineering. Our goal is to provide practical, hands-on guidance for both beginners and experienced professionals. For more info, you can visit my blog articles here

Bonus tip : You can boost your productivity today, visit DataUnboxed

🚀 What's Inside

This repository covers various aspects of data engineering, including but not limited to:

  1. AWS Services: Glue, ECS, CloudFormation
  2. Development Environments: Jupyter, VSCode
  3. Data Formats: Parquet
  4. Machine Learning: Project setup and best practices
  5. Infrastructure as Code: CloudFormation templates
  6. Local Development: Running cloud services locally

📚 Repository Structure

Our repository is organized into the following main directories:

  • /aws: AWS-specific guides and resources
  • /MLKickstart_repo: Machine Learning project template and best practices
  • /parquet: Tutorials on Parquet file format
  • /llm_structured_outputs: Examples of structured outputs from language models

🛠 Getting Started

  1. Clone this repository
  2. Explore the directories that interest you most
  3. Follow the README files in each subdirectory for specific instructions

📖 Key Tutorials

AWS Glue Local Development

AWS ECS and Infrastructure

  • Check out the CloudFormation sample templates in the aws/ecs_global infrastructure/ directory for ECS cluster and service setups.

Parquet Optimization

  • Explore the parquet/parquet_encoding_secrets.ipynb notebook for insights on Parquet file optimization. for more info, you can visit my blog article here

Machine Learning Project Setup

  • The MLKickstart_repo/ directory contains a template for setting up machine learning projects with best practices.

LLM and Gen AI

  • The llm_structured_outputs/ directory contains a template for making best use of LLMs. and For more details, you can visit my blog article here

🤝 Contributing

We welcome contributions from the community! If you have knowledge to share:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/new-content)
  3. Add your content or make your changes
  4. Commit your changes (git commit -am 'Add some new content')
  5. Push to the branch (git push origin feature/new-content)
  6. Open a Pull Request

Please ensure your contributions align with our Contribution Guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

  • All the amazing contributors to this project
  • The broader data engineering community for continuous inspiration

Happy Data Engineering! DataEngineeringUnboxed Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors