Releases: seriallink/datamaster
Releases · seriallink/datamaster
Initial Release
Initial public release of the Data Master project, delivered as part of the Santander Data Master certification program.
This version includes a fully functional, end-to-end, event-driven and serverless Data Lake implementation following the Medallion Architecture (raw, bronze, silver, gold).
Key features:
- Infrastructure as Code: Complete AWS stack provisioning via CloudFormation.
- Event-driven ingestion: Aurora PostgreSQL (CDC) → Kinesis → Firehose → S3 (raw layer).
- Data processing: Serverless pipelines to transform data across bronze, silver, and gold layers using Go, PySpark, and ECS.
- Governance & Security: IAM-based fine-grained access, Lake Formation integration, encryption at rest and in transit.
- Observability: Grafana dashboards with operational and analytical views.
- Benchmark module: Performance comparison between Go, Glue Jobs, and EMR Serverless.
- Fully reproducible: All code, templates, and artifacts included to replicate the environment.