Skip to content

McKeev/fin_db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fin_db: My Personal Financial Database

Technology used: Python, PostgreSQL, Docker, Bash Scripts, Linux

This project is a financial database (750+ instruments) hosted on my Raspberry Pi. I developed this to ensure data quality and availability for future projects (faster, more predictable queries). The system features:

  • Daily market data ingestions (self refreshing) for equities, ETFs and currencies.
  • A containerised environment, allowing for easy portability and reproducibility.
  • Daily backups using pgBackRest.
  • Enhanced API pulling: batching, retries, and validation.
  • A complete and scalable SQL schema (indices, views, internal IDs).
  • Support for multiple sources through source prioritisation.
  • A modular design allowing for future expansions.

Skills developped during this project

Data Engineering 🔗

This was the main skill developped during this project. From what to store, to how to store it; there were a lot of questions that needed answering. This was my first time creating a relational database from scratch and I had a lot to learn. I opted to keep the following as my guiding principle:

Always store data in its rawest form possible.

For example, I do not store adjusted close as it is a derived field. I instead retrieve raw closing prices (in original currency) and total returns, which allow me to construct adjusted closes if needed. I also did my best to keep this project simple and efficient which is the reason I limited the fields I ingest in favour of more instruments. I used LSEG (refinitiv) to bootstrap my data and I use Yahoo Finance (their recent data is okay but older datapoints are not great) for daily ingest jobs.

For more information (including my schema), I invite you to look through /db (excluding /db/ops) where you'll find my migrations and others.

DevOps 🔗

I wanted the database to run in a Docker container so it could be easily moved and deployed to other machines. Docker allows me to avoid environment issues and ensures I can run this project on a wider variety of systems. For example, the main database runs on a Raspberry Pi (Linux-Debian) but I also run a copy-test environment locally on my mac. It would simply be a matter of pulling the git repo, downloading my google drive backup and starting a new container. This portability also helps with reproducibility and facilitates testing!

The biggest challenged I faced in this respect was integrating pgBackRest (backup system) into the setup. Running this with a dockerised postgres is a notorious bane and my solution was to implement my own image. I also faced a lot of permission issues and to ensure I wouldn't have to fix them repetitively, I set up easy bash scripts.

In the end, it was worth it! My database backs up daily and syncs with Google Drive, allowing for easy restores if (when 😅) I make mistakes.

Operational Automation 🔗

To ensure I don't have to manually ingest data, I had to develop robust automations. These are run frequently using cron jobs (a linux feature that allows for scheduling tasks). Accordingly, I do not check the outputs of these jobs regularly and instead rely on my system. I guarantee that my scripts don't crash unexpectedly by having catches for potential fails. To protect against potential silent issues, I developed the system to communicate issues and feedback to me via Telegram, which I can view on my phone.

API Integration 🔗

To facilitate work with outside sources, I developed wrappers around existing APIs and libraries. These serve to ensure I don't hit usage/rate limits and create easier functions to obtain data. Furthemore, they also help provide basic verififications which ensure consistent outputs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors