Currently evals require a VS Code workspace setup already. We should setup a temporary workspace on demand. We can cache the workspace instead of clone the repo for each eval. So, after each eval, we have a teardown method than cleans up files added during the eval.