An implementation of Soft Actor Critic using PyTorch. This implementation doesn't include automatic temperature optimization (which could be a future work). It was trained for a 1000 games (or 1 million steps) on OpenAI Gymnasium Mujoco environment Half-Cheetah-v5.
It can be observed that the average reward return curve had not yet saturated and the agent can be improved further upon training for a few thousand more games.

