DataEngineeringUnboxed

Welcome to DataEngineeringUnboxed! This repository is a comprehensive resource for data engineers, covering a wide range of topics, tools, and best practices in the field of data engineering. Our goal is to provide practical, hands-on guidance for both beginners and experienced professionals. For more info, you can visit my blog articles here

Bonus tip : You can boost your productivity today, visit DataUnboxed

🚀 What's Inside

This repository covers various aspects of data engineering, including but not limited to:

AWS Services: Glue, ECS, CloudFormation
Development Environments: Jupyter, VSCode
Data Formats: Parquet
Machine Learning: Project setup and best practices
Infrastructure as Code: CloudFormation templates
Local Development: Running cloud services locally

📚 Repository Structure

Our repository is organized into the following main directories:

/aws: AWS-specific guides and resources
/MLKickstart_repo: Machine Learning project template and best practices
/parquet: Tutorials on Parquet file format
/llm_structured_outputs: Examples of structured outputs from language models

🛠 Getting Started

Clone this repository
Explore the directories that interest you most
Follow the README files in each subdirectory for specific instructions

📖 Key Tutorials

AWS Glue Local Development

AWS ECS and Infrastructure

Check out the CloudFormation sample templates in the aws/ecs_global infrastructure/ directory for ECS cluster and service setups.

Parquet Optimization

Explore the parquet/parquet_encoding_secrets.ipynb notebook for insights on Parquet file optimization. for more info, you can visit my blog article here

Machine Learning Project Setup

The MLKickstart_repo/ directory contains a template for setting up machine learning projects with best practices.

LLM and Gen AI

The llm_structured_outputs/ directory contains a template for making best use of LLMs. and For more details, you can visit my blog article here

🤝 Contributing

We welcome contributions from the community! If you have knowledge to share:

Fork the repository
Create a new branch (git checkout -b feature/new-content)
Add your content or make your changes
Commit your changes (git commit -am 'Add some new content')
Push to the branch (git push origin feature/new-content)
Open a Pull Request

Please ensure your contributions align with our Contribution Guidelines.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

All the amazing contributors to this project
The broader data engineering community for continuous inspiration

Happy Data Engineering! DataEngineeringUnboxed Team

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
MLKickstart_repo		MLKickstart_repo
aws		aws
docling_test		docling_test
llm_structured_outputs		llm_structured_outputs
parquet		parquet
qr_code_generator		qr_code_generator
.DS_Store		.DS_Store
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataEngineeringUnboxed

🚀 What's Inside

📚 Repository Structure

🛠 Getting Started

📖 Key Tutorials

AWS Glue Local Development

AWS ECS and Infrastructure

Parquet Optimization

Machine Learning Project Setup

LLM and Gen AI

🤝 Contributing

📄 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataEngineeringUnboxed

🚀 What's Inside

📚 Repository Structure

🛠 Getting Started

📖 Key Tutorials

AWS Glue Local Development

AWS ECS and Infrastructure

Parquet Optimization

Machine Learning Project Setup

LLM and Gen AI

🤝 Contributing

📄 License

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages