When running for a set amount of epochs, the benchmark will still print something like Trained to dice score 0.95 after X seconds even if the training was stopped early (because the set amount of epochs). We need to adjust the message when not training to convergence to be more clear.