Skip to content

Releases: seriallink/datamaster

Initial Release

11 Aug 23:13

Choose a tag to compare

Initial public release of the Data Master project, delivered as part of the Santander Data Master certification program.

This version includes a fully functional, end-to-end, event-driven and serverless Data Lake implementation following the Medallion Architecture (raw, bronze, silver, gold).

Key features:

  • Infrastructure as Code: Complete AWS stack provisioning via CloudFormation.
  • Event-driven ingestion: Aurora PostgreSQL (CDC) → Kinesis → Firehose → S3 (raw layer).
  • Data processing: Serverless pipelines to transform data across bronze, silver, and gold layers using Go, PySpark, and ECS.
  • Governance & Security: IAM-based fine-grained access, Lake Formation integration, encryption at rest and in transit.
  • Observability: Grafana dashboards with operational and analytical views.
  • Benchmark module: Performance comparison between Go, Glue Jobs, and EMR Serverless.
  • Fully reproducible: All code, templates, and artifacts included to replicate the environment.