diff --git a/datasets/dea-tools/README.md b/datasets/dea-tools/README.md index f7def9a0..e784697d 100644 --- a/datasets/dea-tools/README.md +++ b/datasets/dea-tools/README.md @@ -32,6 +32,13 @@ EVAL_DEA_WORKSPACE_ID= \ * `EVAL_DEA_WORKSPACE_ID`: The target Dataform workspace ID (short name). ## 4. Inspect Results -Upon completion, results will be generated under the `results/` folder: +Upon completion, results will be generated in two locations: + +### Local Files (under the `results/` folder): * `evals.csv`: Contains the full conversation history. * `scores.csv`: Contains LLM-Judge scores and detailed reasoning for the rubric checks. + +### Google BigQuery (Cloud Database): +If enabled in `example_run_config.yaml`, results are automatically uploaded to your GCP project under the table `.evalbench.results`. + +A clickable **Looker Studio Dashboard** link will be printed in the terminal console upon completion to visually inspect the conversation flows and scores. diff --git a/datasets/dea-tools/example_run_config.yaml b/datasets/dea-tools/example_run_config.yaml index f99f24ae..b2fd6ad9 100644 --- a/datasets/dea-tools/example_run_config.yaml +++ b/datasets/dea-tools/example_run_config.yaml @@ -21,5 +21,7 @@ scorers: reporting: csv: output_directory: results + bigquery: + gcp_project_id: !ENV ${EVAL_GCP_PROJECT_ID} runners: agent_runners: 1