Here I provide my own implementation of the encoder-decoder transformer architecture based on the orginal "Attention is all you need" paper. I train this model based on some random data and also test its inference process. This was done with some help from these two great tutorials 1 and 2. You can also find some of my personal notes about Transformers here.
Simply run:
python main.py