diff --git a/.gitignore b/.gitignore
index 86f36e6..5c2169d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,7 @@
 .venv/
 __pycache__/
 pipeline_config.yml
+reports/
 config/pipeline_config.yml
 config/local_pipeline_config.yml
 config/docker_pipeline_config.yml
diff --git a/README.md b/README.md
index 4244ca3..6fda121 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,10 @@
-## Eval Coordinator
+## Autoeval Coordinator
 ### Description  
-The coordinator is the entry point to the evaluation pipeline. It takes a gpkg containing either a polygon or multipolygon geometry and then uses that to run and monitor batch jobs for each step along the evaluation pipeline for all the polygons submitted by the user. 
+This repository contains an evaluation pipeline that works with HashiCorp Nomad to evaluate HAND generated extent flood inundation maps (FIMs) against benchmark FIMs. It takes a gpkg containing either a polygon or multipolygon geometry and then uses that to run and monitor batch jobs for each step along a FIM evaluation pipeline that evaluates flood scenarios for benchmark sources that intersect the AOI submitted by the user. 
 
-The current evaluation pipeline is primarily designed to generate HAND FIM extents or depths and then evaluate these against relevant benchmark sources.
+The repository also contains a directory `tools/`, that assists the user in running batches of evaluation pipelines, evaluating the results of a batch, and working with the Nomad API.
+
+While the current evaluation pipeline is primarily designed to generate HAND FIM extents or depths and then evaluate these against relevant benchmark sources it is possible that more pipelines will be added in the future to allow for evaluations of more types of FIMs or for different types of FIM evaluations.
 
 ### Getting Started Locally
 1. Create `.env` file
@@ -50,9 +52,36 @@ This version can also be adapted to dispatch jobs to non-local Nomad servers.
   - The HAND version argument allows the user to specify a specific version of HAND to generate extents for. This argument is required.
 - **Benchmark Source** 
   - This is a string that select which source will be used to evaluate HAND against. For example 'ripple-mip' will be used to select FEMA MIP data produced by ripple. This argument is required.
+- **hand_index_path**
+  - This argument provides the location the HAND index used to spatially query a given set of HAND outputs. The NGWPC hand-index repo contains more information about generating a HAND index for use in an evaluation.
 - **Date Range** 
   - Certain Benchmark sources contain flood scenarios that have a time component to them. For example high water mark data is associated with the flood  event associated with a given survey. This argument allows for filtering a Benchmark source to only return benchmark data within a certain date range.
  
 ### Inputs
 - **AOI**
   - This input is a geopackage that must contain either a polygon or multipolygon geometry. For every polygon the coordinator will generate a HAND extent and find benchmark data that lies within the polygon for the source selected by the user. The coordinator will then run all the rest of the jobs described in this repository to generate an evaluation for that polygon. 
+
+### Outputs
+- **output_path**
+  This is the directory where the outputs of a pipeline will be written. The outputs written to this directory follow this format (here <test-case-id> is synonymous with <aoi-id>:
+
+  - <test-case-id>/: the unique identifier, or test case, for the category of benchmark data used to generate metrics. For the PI7 ripple eval data this corresponds to a STAC item id in a given benchmark STAC collection. This could also be an ID for an AOI that returns multiple STAC items from the Benchmark STAC when used as a query AOI. In the example output the <test-case-id> is the STAC item id “11090202-ble” from the “ble-collection" benchmark stac collection.  
+  - <test-case-id>__agg_metrics.csv: aggregated metrics for the test case. 
+  - <test-case-id>__logs.txt: test case logs generated by the Pipeline 
+  - <test-case-id>__results.json: A file containing metadata and references to written output file locations 
+  - catchment_data_indices/: This directory contains files that point to catchment HAND data for each HAND catchment that will be inundated to compare to the benchmark scenarios being evaluated against 
+    - catchment-<unique-catchment-id>.parquet: Files in this directory will be parquet files that contain the UUID assigned to that catchment in the HAND index 
+  - <benchmark_type>/: This directory name is a shortened reference to the Benchmark STAC collection that the benchmark data for this evaluation was queried from. It is possible for a single AOI or test case to be evaluated against multiple benchmark collections so in some cases there could be multiple directories of this type with each directory containing evaluation results for that benchmark collections 
+  - <scenario>/: Test case scenario, e.g., “ble-100yr”  
+    - <test_case_id>-<scenario>__agreement.tif: The agreement raster for this scenario 
+    - <test_case_id>-<scenario>__benchmark_mosiac.tif: The mosaiced benchmark raster used as the benchmark raster 
+    - <test_case_id>-<scenario>__flowfile.csv: The merged flowfile used for this scenario 
+    - <test_case_id>-<scenario>__inundate_mosiac.tif: The mosaiced HAND extent used as the candidate raster for this scenario 
+    - <test_case_id>-<scenario>__metrics.csv: A single row CSV containing the metrics for this scenario. These CSV’s are aggregated together along with additional metadata to create the test cases agg_metrics.csv file 
+    - catchment_extents/ 
+      - <test_case_id>__<unique-catchment-id>.tif: The HAND extents for a single HAND catchment. These are merged together to form the inundate_mosaic.tif for the scenario 
+
+
+### Running a batch of pipelines
+
+The above instructions are for running a single test evaluation pipeline using a local nomad cluster. If you know which HAND outputs you want to evaluate and where its HAND index is located and you have access to the FIM Benchmark STAC this should be sufficient to run single pipelines. This repository also contains functionality for running batches of dozens to thousands of pipelines using either a local Nomad cluster running within the Parallel Works environment or a Nomad cluster deployed to the NGWPC AWS Test account. For more information on running batches please refer to `docs/batch-run-guide-ParallelWorks.md` and `docs/batch-run-guide-AWS-Test.md`.
diff --git a/docs/batch-run-guide-AWS-Test.md b/docs/batch-run-guide-AWS-Test.md
new file mode 100644
index 0000000..3137db1
--- /dev/null
+++ b/docs/batch-run-guide-AWS-Test.md
@@ -0,0 +1,101 @@
+This document contains instructions for running a batch of autoeval pipelines in the AWS test account.
+
+The test account instructions assume that you are interacting with a Nomad API that is set up in a way similar to the [nomad-runner](https://github.com/NGWPC/nomad-runner) deployment. This deployment uses a single Nomad server that sends jobs to a group of EC2 instances in an ASG group.
+
+# Running a pipeline batch in the AWS test account
+
+## Size the cluster
+
+The ripple batch runs were executed using a c5.9xlarge server and between 10-40 r5a.xlarge clients. The more clients you are communicating with and the larger the instance size of the clients the larger your server instance needs to be. At these instance sizes for the pipeline batch jobs being run the maximum number of instances that could effectively be communicated with by the server was ~40. Currently the pipeline code is not designed to work with a Nomad API that autoscales clients because autoscaling causes jobs to be cancelled or lost and then rescheduled with a new dispatched job id. A recovery mechanism has not been implemented to deal with this event though it is planned. Because of this the number of clients needs to be set in the AWS autoscaling group at the start of a batch run by setting the "desired capacity" of the autoscaling group at the beginning of a run. For this approach to work the autoscaling job also needs to be turned off before the autoscaler capacity is set. A good rule of thumb for choosing the number of clients/desired capacity is to set the number of clients to half the number of pipelines that you want to run. If that number is higher than the max number of clients supported by the Nomad API then use the max number of clients.
+
+Eval pipeline batches are not designed to be run on a Nomad cluster being used by other workloads. In the future, once the pipeline has implemented more robust job tracking and with a more robust Nomad API it could be possible for a batch to be run alongside other workloads. 
+
+## Export your NOMAD_TOKEN and NOMAD_ADDR 
+
+These environment variables are used by the batch submission code to determine which Nomad API to send requests to and authenticate to that API. They can be set with:
+
+```
+export NOMAD_ADDR="http://localhost:4646"
+export NOMAD_TOKEN="token"
+```
+
+These variables will be read from your environment when you start the autoeval container.
+
+## Refresh your S3 credentials
+
+The batch run script needs access to the S3 bucket the pipelines will output data to. This is because it uploads AOI that are used by the pipeline to S3. You should refresh your credentials either in the environment that you will start your autoeval container from.
+
+## Start the autoeval container
+
+Start the autoeval container by running the following from the repo root:
+
+```
+docker compose -f docker-compose-dev.yml up -d
+docker compose -f docker-compose-dev.yml exec autoeval-dev bash
+```
+
+You should execute the batch code from this container's shell.
+
+## Configure Nomad Job definitions
+
+Most of the environment variables in ./job_defs/test/ should already be configured but if you are using a different NOMAD_ADDRESS from the one used by NGWPC then you should set that as well. Depending on the data being evaluated you might also want to adjust the job memory requirements in the "resources" block of the job definition. Please refer to the document docs/job_sizing_guide.md in this repo for guidance on how much memory to allocate to each autoeval-job based on the resolution of the data being evaluated.
+
+## Start the nomad memory monitor script
+
+Open another autoeval-dev container shell that is different from the one you will use to run a batch of pipelines using another `docker compose exec autoeval-dev bash` command and then from the local repo root start the script that monitors the Nomad servers memory usage with `tools/nomad_memory_monitor.sh`. This should be started in a separate terminal from the one you will use to submit the batch. This terminal also needs to have valid NOMAD_ADDR and NOMAD_TOKEN environment variables. The memory monitor script will create a log file at `nomad_memory_usage.log` and will run the command `nomad system gc` after the nomad servers active memory allocation exceeds the value for `MEMORY_THRESHOLD_GIB` hardcoded at the top of the script. `MEMORY_THRESHOLD_GIB` value should be set to about 25-30% of your Nomad server's max memory.
+
+Memory monitoring is necessary because after running jobs Nomad keeps old allocations and evaluations in memory for a configurable amount of time. If memory use gets too high then the server slows down and becomes unresponsive. The memory can be cleared on a set schedule by configuring the server (for example: every 15 minutes) but it was observed that the API could lose jobs during garbage collection events. So to minimize the number of garbage collection events while also ensuring that the server stayed responsive a dynamic approach was taken that monitors the memory usage of the server from the client side and clears the 
+
+To run this script you need the Nomad cli installed. Instructions for installing the CLI [can be found here](https://developer.hashicorp.com/nomad/tutorials/get-started/gs-install).
+
+## Submit a batch of pipelines
+
+A batch of pipelines can be submitted using this command after it has been modified for the specifics of your batch:
+
+```
+ python tools/submit_stac_batch.py --batch_name fim100_huc12_3m_2025-08-21-15 --output_root s3://fimc-data/autoeval/batches/fim100_huc12_3m_non_calibrated/ --hand_index_path s3://fimc-data/autoeval/hand_output_indices/fim100_huc12_3m_index/ --benchmark_sources "ripple-fim-collection" --item_list /home/dylan.lee/autoeval-coordinator/inputs/ripple-fim-collection-3m-run4.txt --wait_seconds 10 --stop_threshold 30 --resume_threshold 15
+```
+
+The arguments are:
+
+* --batch_name: This is the name of the batch that will be included in the nomad job definitions. It is usually timestamped to the hour to make it possible to query different batches in CloudWatch.
+* --output_root: This is the directory that the batch outputs will be written to.
+* --hand_index_path: This is the directory that contains the HAND index that will be used to assemble the necessary HAND outputs to run each pipeline.
+* --benchmark_sources: This is the list of benchmark STAC collections that you want to be evaluated
+* --item_list: This is a list of the specific STAC item id's that will be evaluated. There will be a pipeline job submitted for each item on this list.
+* --wait_seconds: This is the number of seconds to wait before submitting each pipeline job to the Nomad API. This should be 10 or more seconds to avoid hammering the Nomad API.
+* --stop_threshold: This is the maximum number of running pipelines that should be running on Nomad at once. Tests revealed that this shouldn't be more than 30 pipelines in parallel or else the server will be overwhelmed. Once this threshold is reached `submit_stac_batch.py` will pause pipeline job submission and wait until the resume_threshold is reached.
+* --resume_threshold: This is the threshold at which pipelines will start being submitted again by `submit_stac_batch.py`. The resume threshold should be 10-15 pipeline jobs below the stop_threshold. Having two thresholds introduces a pause that ensures that each pipeline job is able to submit jobs at the inundate, mosaic, and agreement stages without getting crowded out by other newer jobs. This pause was necessary because of how Nomad's job scheduling works. If it didn't exist then the Nomad scheduler would tend to preferentially place jobs with the lower resource requirements. This results in pipelines being hung up for unreasonable lengths of time and increases the risk of pipeline failure.
+
+## Evaluate the batch outcome
+
+After a batch has run then the script tools/cloudwatch_reports.py should be run from the autoeval-dev shell that you ran tools/submit_stac_batch.py from.
+
+The Test account that stores the cloudwatch logs currently needs different credentials from that used by the S3 bucket. You can update the credentials by exporting new 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', and 'AWS_SESSION_TOKEN' variables that work for the Test account cloudwatch service and then run the script using a modified form of:
+
+```
+./tools/cloudwatch_reports.py inputs/ripple-fim-collection-item-list.txt fim100_huc12_10m_2025-08-20-09  reports/ripple-10m-run3
+```
+
+The first argument is the list of STAC item id's that were submitted by submit_stac_batch.py. The second argument is the batch name used by submit_stac_batch.py. The 3rd argument is where the report files will be written to.
+
+The file unique_fail_aoi_names.txt in the results directory has the list of failed pipeline aoi's. Usually these pipelines failed because of Nomad api, S3, or credential errors and will succeed if the failed aoi's are resubmitted. To resubmit all you have to do is copy the contents of unique_fail_aoi_names.txt to a batch item list file and then start another batch using that item list as an input to submit_stac_batch.py
+
+Refer to docs/intepreting-reports.md for more information on using a batch's reports to inspect a batch's outcome 
+
+## Shutdown the memory monitor and purge jobs
+
+Kill the nomad_memory_monitor.sh script and then from that container run:
+
+```
+nomad system gc
+python tools/purge_dispatch_jobs.py
+```
+
+This will clear all the jobs associated with the batch from the Nomad servers memory. This step ensures that the Nomad server stays responsive and makes it easier to use the Nomad UI to monitor the progress of the next batch that will be run.
+
+If this is the last batch you will run you can now kill the instance of the autoeval-dev shell that was being used to monitor memory.
+
+## Set the ASG to 1 client and turn autoscaler job back on
+
+After you have successfully run your batches you should then set the desired capacity of the ASG back to 1 client to save on costs. The autoscaler job should also be turned back on in case the next user has a workload for which it is useful.
diff --git a/docs/batch-run-guide-ParallelWorks.md b/docs/batch-run-guide-ParallelWorks.md
new file mode 100644
index 0000000..59e3bbf
--- /dev/null
+++ b/docs/batch-run-guide-ParallelWorks.md
@@ -0,0 +1,157 @@
+This document contains instructions for running a batch of autoeval pipelines in Parallel Works on a single instance using a local Nomad cluster.
+
+# Running a pipeline batch in Parallel Works
+
+## Start the `fimsinglenode` cluster and attach a desktop
+
+Eval pipeline batches are not designed to be run on a Parallel Works cluster being used by other workloads. If another user is running a large workload on `fimsinglenode` it is recommended to create a clone of the cluster on which to execute the batch.
+
+## Start a terminal and navigate to the repo root
+
+Unless otherwise noted all the commands below should be run from the root of the Parallel Works clone of the `autoeval-coordinator` repostitory. Currently this clone is located at: `/efs/demonstrations/pi7/autoeval-coordinator` when using the `fimsinglenode` cluster.
+
+## Export your NOMAD_ADDR and AWS credentials
+
+These environment variables are used by the batch submission code to determine which Nomad API to send requests to and authenticate to the S3 bucket that outputs will be written to.
+
+The NOMAD_ADDR can be set with:
+
+```
+export NOMAD_ADDR="http://localhost:4646"
+```
+
+The domain is localhost since we are using a local Nomad cluster.
+
+The AWS creds for the NGWPC Data account also need to be exported. They can be obtained from the NGWPC AWS Access Portal. When you sign into that portal then if you have a role with access to the Data account then the appropriate credentials can be copied from the popup that shows up when you click "Access Keys". The credentials should look something like:
+
+```
+export AWS_ACCESS_KEY_ID="<access_key_id_string>"
+export AWS_SECRET_ACCESS_KEY="<secret_access_key_string"
+export AWS_SESSION_TOKEN="<token_string>"
+```
+
+These variables will be read from your environment when you start the docker container that will execute and inspect the batch runs. The AWS credentials expire every 8 hrs so should be refreshed if you resubmit a batch after that time period has elapsed.
+
+## Configure Nomad Job definitions
+
+Most of the environment variables in ./job_defs/test/ should already be configured but if you are using a different NOMAD_ADDRESS from the one used by NGWPC then you should set that as well. Depending on the data being evaluated you might also want to adjust the job memory requirements in the "resources" block of the job definition. Please refer to the document docs/job_sizing_guide.md in this repo for guidance on how much memory to allocate to each autoeval-job based on the resolution of the data being evaluated.
+
+## Submit a batch of pipelines
+
+A batch of pipelines can be submitted using this command after it has been modified for the specifics of your batch:
+
+```
+docker run --rm \
+  --entrypoint="" \
+  -v $(pwd)/src:/app/src \
+  -v $(pwd)/inputs:/app/inputs \
+  -v $(pwd)/tools:/app/tools \
+  -v $(pwd)/data:/data \
+  -e AWS_ACCESS_KEY_ID \
+  -e AWS_SECRET_ACCESS_KEY \
+  -e AWS_SESSION_TOKEN \
+  --network host \
+  autoeval-coordinator:local \
+  python tools/submit_stac_batch.py \
+   --batch_name pw_test_run \
+   --output_root s3://fimc-data/autoeval/batches/trials/pw_test_run \
+   --hand_index_path s3://fimc-data/autoeval/hand_output_indices/fim100_huc12_10m_index/ \
+   --benchmark_sources "nws-fim-collection" \
+   --item_list ./inputs/test-run.txt \
+   --wait_seconds 10 \
+   --stop_threshold 2 \
+   --resume_threshold 0 \
+   --use-local-creds \
+   --stac_api_url http://localhost:8082/
+```
+
+In this command the script submit_stac_batch.py is submitted to the autoeval-coordinator:local instance through a docker run command. The docker run command contains all the arguments needed to configure the container invocation so that the container can successfully run a batch of pipelines. The arguments for submit_stac_batch.py are:
+
+* --batch_name: This is the name of the batch that will be included in the nomad job definitions. It is usually timestamped to the hour to make it possible to query different batches in CloudWatch.
+* --output_root: This is the directory that the batch outputs will be written to.
+* --hand_index_path: This is the directory that contains the HAND index that will be used to assemble the necessary HAND outputs to run each pipeline.
+* --benchmark_sources: This is the list of benchmark STAC collections that you want to be evaluated
+* --item_list: This is a list of the specific STAC item id's that will be evaluated. There will be a pipeline job submitted for each item on this list.
+* --wait_seconds: This is the number of seconds to wait before submitting each pipeline job to the Nomad API. This should be 10 or more seconds to avoid hammering the Nomad API.
+* --stop_threshold: This is the maximum number of running pipelines that should be running on Nomad at once. Tests revealed that this shouldn't be more than 30 pipelines in parallel or else the server will be overwhelmed. Once this threshold is reached `submit_stac_batch.py` will pause pipeline job submission and wait until the resume_threshold is reached.
+* --resume_threshold: This is the threshold at which pipelines will start being submitted again by `submit_stac_batch.py`. The resume threshold should be 10-15 pipeline jobs below the stop_threshold. Having two thresholds introduces a pause that ensures that each pipeline job is able to submit jobs at the inundate, mosaic, and agreement stages without getting crowded out by other newer jobs. This pause was necessary because of how Nomad's job scheduling works. If it didn't exist then the Nomad scheduler would tend to preferentially place jobs with the lower resource requirements. This results in pipelines being hung up for unreasonable lengths of time and increases the risk of pipeline failure.
+
+## Evaluate the batch outcome
+
+After a batch has run then the script tools/local_reports.py should be run from the same shell that you ran tools/submit_stac_batch.py from in the preceding step.
+
+```
+docker run --rm \
+  --entrypoint="" \
+  -v $(pwd)/tools:/app/tools \
+  -v $(pwd)/inputs:/app/inputs \
+  -v $(pwd)/local-logs:/app/local-logs \
+  -v $(pwd)/local-reports:/app/local-reports \
+  autoeval-coordinator:local \
+  python tools/local_reports.py \
+   --run_list inputs/test-run.txt \
+   --log_dir local-logs/pw_test_run \
+   --report_dir local-reports/pw_test_run
+```
+
+The first argument is the list of STAC item id's that were submitted by submit_stac_batch.py. The second argument is the batch name used by submit_stac_batch.py. The 3rd argument is where the report files will be written to.
+
+The file unique_fail_aoi_names.txt in the results directory has the list of failed pipeline aoi's. Usually these pipelines failed because of Nomad api, S3, or credential errors and will succeed if the failed aoi's are resubmitted. To resubmit all you have to do is copy the contents of unique_fail_aoi_names.txt to a batch item list file and then start another batch using that item list as an input to submit_stac_batch.py
+
+Refer to docs/intepreting-reports.md for more information on using a batch's reports to inspect a batch's outcome 
+
+## Purge jobs
+
+After you have assessed the outcome of a batch then it can be helpful to remove the jobs from the current batch in the Nomad API so that another batch can be run without the output from the previous batches jobs confusing the status of the next batch. Old jobs also increase the amount of memory being used by Nomad. Old Jobs should automatically be removed by the API after approx. 1 hr. Manual removal of the previously run jobs from the API can also be done by purging the jobs with the following command:
+
+```
+docker run --rm \
+  --network host \
+  --entrypoint="" \
+  -v $(pwd)/tools:/app/tools \
+  -e NOMAD_ADDR \
+  autoeval-coordinator:local \
+  python /app/tools/purge_dispatch_jobs.py
+```
+
+## Troubleshooting
+
+If a docker run command fails then it can be helpful to substitute a bash command for the command that is failing in an interactive terminal Docker session and then run the command that is failing from the terminal. For example, to troubleshoot the reporting command you can change:
+
+```
+docker run --rm \
+  --entrypoint="" \
+  -v $(pwd)/tools:/app/tools \
+  -v $(pwd)/inputs:/app/inputs \
+  -v $(pwd)/local-logs:/app/local-logs \
+  -v $(pwd)/local-reports:/app/local-reports \
+  autoeval-coordinator:local \
+  python tools/local_reports.py \
+   --run_list inputs/test-run.txt \
+   --log_dir local-logs/pw_test_run \
+   --report_dir local-reports/pw_test_run
+```
+
+to:
+
+```
+docker run --rm -it \
+  --entrypoint="" \
+  -v $(pwd)/tools:/app/tools \
+  -v $(pwd)/inputs:/app/inputs \
+  -v $(pwd)/local-logs:/app/local-logs \
+  -v $(pwd)/local-reports:/app/local-reports \
+  autoeval-coordinator:local \
+  bash
+```
+
+And then run:
+
+```
+python tools/local_reports.py \
+ --run_list inputs/test-run.txt \
+ --log_dir local-logs/pw_test_run \
+ --report_dir local-reports/pw_test_run
+```
+
+from within the interactive terminal. Besides being able to run the command multiple times after editing the script you can also use ipdb and any other debugging tools.
diff --git a/docs/batch-run-guide.md b/docs/batch-run-guide.md
new file mode 100644
index 0000000..84bc44f
--- /dev/null
+++ b/docs/batch-run-guide.md
@@ -0,0 +1,102 @@
+This document contains instructions for running a batch of autoeval pipelines in the AWS test account. Instructions for running a batch on a local Nomad cluster will be added before the end of NGWPC PI-7.
+
+The test instructions assume that you are interacting with a Nomad API that is set up in a way similar to the [nomad-runner](https://github.com/NGWPC/nomad-runner) deployment. This deployment uses a single Nomad server that sends jobs to a group of EC2 instances in an ASG group.
+
+# Running a pipeline batch in test
+
+## Size the cluster
+
+The ripple batch runs were executed using a c5.9xlarge server and between 10-40 r5a.xlarge clients. The more clients you are communicating with and the larger the instance size of the clients the larger your server instance needs to be. At these instance sizes for the pipeline batch jobs being run the maximum number of instances that could effectively be communicated with by the server was ~40. Currently the pipeline code is not designed to work with a Nomad API that autoscales clients because autoscaling causes jobs to be cancelled or lost and then rescheduled with a new dispatched job id. A recovery mechanism has not been implemented to deal with this event though it is planned. Because of this the number of clients needs to be set in the AWS autoscaling group at the start of a batch run by setting the "desired capacity" of the autoscaling group at the beginning of a run. For this approach to work the autoscaling job also needs to be turned off before the autoscaler capacity is set. A good rule of thumb for choosing the number of clients/desired capacity is to set the number of clients to half the number of pipelines that you want to run. If that number is higher than the max number of clients supported by the Nomad API then use the max number of clients.
+
+Currently batches are not designed to be run on a Nomad cluster being used by other workloads. In the future, once the pipeline has implemented more robust job tracking and with a more robust Nomad API it could be possible for a batch to be run alongside other workloads. 
+
+## Export your NOMAD_TOKEN and NOMAD_ADDR 
+
+These environment variables are used by the batch submission code to determine which Nomad API to send requests to and authenticate to that API. They can be set with:
+
+```
+export NOMAD_ADDR="http://localhost:4646"
+export NOMAD_TOKEN="token"
+```
+
+These variables will be read from your environment when you start the autoeval container.
+
+## Refresh your S3 credentials
+
+The batch run script needs access to the S3 bucket the pipelines will output data to. This is because it uploads AOI that are used by the pipeline to S3. You should refresh your credentials either in the environment that you will start your autoeval container from.
+
+## Start the autoeval container
+
+Start the autoeval container by running the following from the repo root:
+
+```
+docker compose -f docker-compose-dev.yml up -d
+docker compose -f docker-compose-dev.yml exec autoeval-dev bash
+```
+
+You should execute the batch code from this container's shell.
+
+## Configure Nomad Job definitions
+
+Most of the environment variables in ./job_defs/test/ should already be configured but if you are using a different NOMAD_ADDRESS from the one used by NGWPC then you should set that as well. Depending on the data being evaluated you might also want to adjust the job memory requirements in the "resources" block of the job definition.
+
+## Start the nomad memory monitor script
+
+Open another autoeval-dev container shell that is different from the one you will use to run a batch of pipelines using another `docker compose exec autoeval-dev bash` command and then from the local repo root start the script that monitors the Nomad servers memory usage with `tools/nomad_memory_monitor.sh`. This should be started in a separate terminal from the one you will use to submit the batch. This terminal also needs to have valid NOMAD_ADDR and NOMAD_TOKEN environment variables. The memory monitor script will create a log file at `nomad_memory_usage.log` and will run the command `nomad system gc` after the nomad servers active memory allocation exceeds the value for `MEMORY_THRESHOLD_GIB` hardcoded at the top of the script. `MEMORY_THRESHOLD_GIB` value should be set to about 25-30% of your Nomad server's max memory.
+
+Memory monitoring is necessary because after running jobs Nomad keeps old allocations and evaluations in memory for a configurable amount of time. If memory use gets too high then the server slows down and becomes unresponsive. The memory can be cleared on a set schedule by configuring the server (for example: every 15 minutes) but it was observed that the API could lose jobs during garbage collection events. So to minimize the number of garbage collection events while also ensuring that the server stayed responsive a dynamic approach was taken that monitors the memory usage of the server from the client side and clears the 
+
+To run this script you need the Nomad cli installed. Instructions for installing the CLI [can be found here](https://developer.hashicorp.com/nomad/tutorials/get-started/gs-install).
+
+## Submit a batch of pipelines
+
+A batch of pipelines can be submitted using this command after it has been modified for the specifics of your batch:
+
+```
+ python tools/submit_stac_batch.py --batch_name fim100_huc12_3m_2025-08-21-15 --output_root s3://fimc-data/autoeval/batches/fim100_huc12_3m_non_calibrated/ --hand_index_path s3://fimc-data/autoeval/hand_output_indices/fim100_huc12_3m_index/ --benchmark_sources "ripple-fim-collection" --item_list /home/dylan.lee/autoeval-coordinator/inputs/ripple-fim-collection-3m-run4.txt --wait_seconds 10 --stop_threshold 30 --resume_threshold 15
+```
+
+The arguments are:
+
+* --batch_name: This is the name of the batch that will be included in the nomad job definitions. It is usually timestamped to the hour to make it possible to query different batches in CloudWatch.
+* --output_root: This is the directory that the batch outputs will be written to.
+* --hand_index_path: This is the directory that contains the HAND index that will be used to assemble the necessary HAND outputs to run each pipeline.
+* --benchmark_sources: This is the list of benchmark STAC collections that you want to be evaluated
+* --item_list: This is a list of the specific STAC item id's that will be evaluated. There will be a pipeline job submitted for each item on this list.
+* --wait_seconds: This is the number of seconds to wait before submitting each pipeline job to the Nomad API. This should be 10 or more seconds to avoid hammering the Nomad API.
+* --stop_threshold: This is the maximum number of running pipelines that should be running on Nomad at once. Tests revealed that this shouldn't be more than 30 pipelines in parallel or else the server will be overwhelmed. Once this threshold is reached `submit_stac_batch.py` will pause pipeline job submission and wait until the resume_threshold is reached.
+* --resume_threshold: This is the threshold at which pipelines will start being submitted again by `submit_stac_batch.py`. The resume threshold should be 10-15 pipeline jobs below the stop_threshold. Having two thresholds introduces a pause that ensures that each pipeline job is able to submit jobs at the inundate, mosaic, and agreement stages without getting crowded out by other newer jobs. This pause was necessary because of how Nomad's job scheduling works. If it didn't exist then the Nomad scheduler would tend to preferentially place jobs with the lower resource requirements. This results in pipelines being hung up for unreasonable lengths of time and increases the risk of pipeline failure.
+
+## Evaluate the batch outcome
+
+After a batch has run then the script tools/cloudwatch_reports.py should be run from the autoeval-dev shell that you ran tools/submit_stac_batch.py from.
+
+The Test account that stores the cloudwatch logs currently needs different credentials from that used by the S3 bucket. You can update the credentials by exporting new 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', and 'AWS_SESSION_TOKEN' variables that work for the Test account cloudwatch service and then run the script using a modified form of:
+
+```
+./tools/cloudwatch_reports.py inputs/ripple-fim-collection-item-list.txt fim100_huc12_10m_2025-08-20-09  reports/ripple-10m-run3
+```
+
+The first argument is the list of STAC item id's that were submitted by submit_stac_batch.py. The second argument is the batch name used by submit_stac_batch.py. The 3rd argument is where the report files will be written to.
+
+The file unique_fail_aoi_names.txt in the results directory has the list of failed pipeline aoi's. Usually these pipelines failed because of Nomad api, S3, or credential errors and will succeed if the failed aoi's are resubmitted. To resubmit all you have to do is copy the contents of unique_fail_aoi_names.txt to a batch item list file and then start another batch using that item list as an input to submit_stac_batch.py
+
+TODO: create and then link to a document describing how to interpret pipeline error messages
+
+
+## Shutdown the memory monitor and purge jobs
+
+Kill the nomad_memory_monitor.sh script and then from that container run:
+
+```
+nomad system gc
+python tools/purge_dispatch_jobs.py
+```
+
+This will clear all the jobs associated with the batch from the Nomad servers memory. This step ensures that the Nomad server stays responsive and makes it easier to use the Nomad UI to monitor the progress of the next batch that will be run.
+
+If this is the last batch you will run you can now kill the instance of the autoeval-dev shell that was being used to monitor memory.
+
+## Set the ASG to 1 client and turn autoscaler job back on
+
+After you have successfully run your batches you should then set the desired capacity of the ASG back to 1 client to save on costs. The autoscaler job should also be turned back on in case the next user has a workload for which it is useful.
diff --git a/docs/interpreting-reports.md b/docs/interpreting-reports.md
new file mode 100644
index 0000000..e7a06d9
--- /dev/null
+++ b/docs/interpreting-reports.md
@@ -0,0 +1,77 @@
+The pipeline in autoeval-coordinator produces detailed logs both for the status of the pipeline itself and for the jobs dispatched to Nomad by the pipeline. For a single pipeline it is enough to inspect these logs to assess pipeline success or diagnose why a pipeline is failing. However, things become more challenging when many pipelines are run as a batch. Sometimes individual pipelines can fail for non-deterministic reasons (such as a Nomad node failure) while most other pipelines succeed. To make it easier to assess the outcome of running a batch of pipelines this repository contains code to scrape the logs from a batches' pipelines to produce summary reports of successful and failed pipelines.  
+
+# Interpreting CloudWatch reports
+
+The outcome of a batch run with `submit_stac_batch.py` on the NGWPC Test account can be assessed by running the `cloudwatch_reports.py` script in the `tools` directory. This script is a series of CloudWatch queries that scrapes the logs produced by all the pipelines in a batch. Instructions for generating the reports can be found in `docs/batch-run-guide-AWS-Test.md`.
+
+Once the reports have been generated there will be a report directory containing a number of .txt and .json files. The text files contain new-line delimited lists of AOIs and the json files contain messages associated with different errors observed within AOIs in the batch. _Each AOI corresponds to a pipeline within the batch_. The most important top line files are:
+
+* `unique_success_aoi_names.txt`: This contains the list of successful AOIs/successful pipelines within this batch run
+* `unique_fail_aoi_names.txt`: This contains the list of failed AOIs/failed pipelines within this batch run 
+
+The other files generated by `cloudwatch_reports.py` are:
+
+* `agr_mos_errors_aois.txt`: This is the list of AOIs that failed due to errors logged by the agreement or mosaic jobs in a pipeline
+* `agr_mos_erros_results.json`: This is a json file that contains the specific error messages associated with an agreement mosaic job failure for the AOIs that failed at these stages
+* `early_exit_aois.txt`: This is a list of AOIs that exited early without producing data without a failure of the pipeline code itself. This can be caused for a number of reasons. For example, there could be no HAND data in the AOI being queried or a container could have crashed.
+* `early_exit_aois.json`: This is a json file that contains the full CloudWatch logstream names for the AOIs that exited early. These are provided to allow the user to quickly look up the full pipeline logs for the AOI that exited early.
+* `failed_results.json`: This is a json file that records messages from the pipeline job itself that are associated with pipeline failure. AOIs obtained from this list are combined with AOIs that failed at the inundate, mosaic, or agreement stages to get a list of all failed AOIs
+* `ignorable_errors_aois.txt`: This list contains AOIs that give errors that don't necessarily indicate pipeline failure. For example, an agreement_maker job could log an error that no metrics could be obtained from an agreement map that contains no masked data. These are considered successful AOIs.
+* `ignorable_errors_results.json`: This file records the specific error messages that are considered ignorable. This allows the user to evaluate if errors are being considered ignorable that are currently actually of concern.
+* `inundate_errors_aois.txt`: This list contains AOIs that failed because one of the inundate jobs failed in way that indicates a non-successful pipeline run.
+* `inundate_errors_results.json`: This file records the specific error messages that caused the failure of the inundate job/s.
+* `job_errors_aois.txt`: This list records jobs that are recorded as failed by Nomad itself during a pipeline run. Nomad can error out a job because of an error caused by the Nomad scheduler or by the job itself.
+* `job_errors_results.json`: This file contains the full pipeline job CloudWatch logstream names for the AOIs for which Nomad reported job errors. This makes it possible to fully inspect the full logs to further assess why a job is failing
+* `silent_failures.txt`: This is a list of AOIs for which there is no explicit failure or success message in the pipeline job. This is considered a failure. The most common reason for this type of failure is stopping the batch early and purging the batches jobs so a new batch can be run.
+`sorted-run-list.txt`: This is a sorted list of the AOI names in the batch of pipelines being analyzed. This is the list of AOI names that the code uses to perform set operations to arrive at the lists of success, failures, etc.
+`success_results.json`: This list contains the full CloudWatch logstream names for the successful AOIs for easy lookup of the pipeline logs in CloudWatch.
+
+By inspecting these files it is possible to get an overview of why pipelines are failing. If the failures are for reasons having to do with the Nomad deployment or external APIs then resubmitting the AOIs listed in `unique_fail_aoi_names.txt` is often enough to reprocess the pipelines. If the warning or error messages pointed to by the various report json files and CloudWatch logs indicate a code failure then it is necessary to modify one or more of: the job definitions in `job_defs/test/`, the code in the `src` directory, or in one of `autoeval-jobs` files.
+
+# Interpreting local reports
+
+When pipelines are run inside the Parallel Works environment they persist there logs to the local drive. This means that a different script (`local_reports.py`) is necessary to process these logs since the CloudWatch query language can't be used to analyze them. The format for the local reports is similar to that of the CloudWatch reports with a few differences. The top line files are still:
+
+* `unique_success_aoi_names.txt`
+* `unique_fail_aoi_names.txt`
+
+Most of the other aoi list file names are the same as well:
+* `agr_mos_errors_aois.txt`
+* `early_exit_aois.txt`
+* `ignorable_errors_aois.txt`
+* `inundate_errors_aois.txt`
+* `job_errors_aois.txt`
+* `silent_failures.txt`
+* `sorted-run-list.txt`
+
+All the above file names signify the same thing they do for the Cloudwatch logs.
+
+There are two additional filenames that are used to create the other final lists. These can be disregarded:
+* `failed_aois.txt`
+* `success_aois.txt`
+
+The local reports don't contain as many json files since the json files that only point to cloudwatch logs aren't created. There are four possible json  files that be included in a local report directory:
+
+* job_errors.json: Errors within the pipeline job itself that cause a pipeline to fail
+* agr_mos_errors.json: Errors that cause AOIs to fail that are associated with the agreement or mosaic jobs
+* inundate_errors.json: Errors that cause AOIs to fail that are associated with the inundate jobs
+* ignorable_errors.json: Job errors that are considered ignorable
+
+These files are only created if there are error or warning messages to write to them.
+
+The format for the local report json files is a list of json objects where each object contains the following fields:
+
+```
+[
+  {
+    "aoi": "aoi_name_here",
+    "job": "job_type_here",
+    "task": "task_name_here", 
+    "file": "relative/path/to/stderr/file",
+    "message": "The actual error message text"
+  },
+  ...
+]
+```
+
+
diff --git a/docs/job-sizing-guide.md b/docs/job-sizing-guide.md
new file mode 100644
index 0000000..dd3d85b
--- /dev/null
+++ b/docs/job-sizing-guide.md
@@ -0,0 +1,11 @@
+The resource requirements of each Nomad job in the autoeval pipeline can vary according to the size and resolution of the AOIs being evaluated. The main resource that should be adjusted to efficiently use resources in the `memory` field in the `resources` block of the Nomad job definitions in `job_defs/` since this controls how many jobs at a given stage can be run at once.
+
+Below are guidelines for each job for running FIM100 domain evaluations at different resolutions. These guidelines reflect doing evaluations over the FIM100 domain using AOIs associated with STAC items in the FIM benchmark STAC. If larger AOIs or a different domain is used then it will likely be necessary to play with the resources requested for each job to find a good fit for the evaluations being performed.
+
+| Dataset | Inundate Job | Mosaic Job | Agreement Job |
+|---------|--------------|------------|---------------|
+| 10 meter FIM100 | 2 GB | 10 GB | 12 GB |
+| 5 meter FIM100 | 2 GB | 12 GB | 20 GB |
+| 3 meter FIM100 | 2 GB | 16 GB | 30 GB |
+
+*Note: The pipeline job memory requirement is constant across all the resolutions. Experience has shown that 3 GB is enough memory for the pipeline job to avoid OOM errors.*
diff --git a/docs/suggested-improvements.md b/docs/suggested-improvements.md
new file mode 100644
index 0000000..ac68619
--- /dev/null
+++ b/docs/suggested-improvements.md
@@ -0,0 +1,19 @@
+While running the ripple collection evaluations I noticed a few things changes that could be made to the repository to improve the way the code works with Nomad. This should lead to more robust batch runs and reduce the need to rerun failed pipelines. I am grouping suggested changes that came out of performing the ripple collection batch runs here so that they can be prioritized when we have the time to improve the autoeval code.
+
+## Modify nomad_job_manager to be able to search for re-dispatched lost jobs 
+
+Right now if a job is lost at any stage then the pipeline fails and needs to be rerun. Lost jobs happen non-deterministically and do not indicate a problem with a jobs data or code. Nomad has the ability to reschedule lost jobs but then the dispatched job id changes. This shouldn't be an issue for us since our tags can be made so that each pipeline has a unique tag set. Right now however the nomad_job_manager doesn't fall back to trying to query all jobs that might match a tag set and once a dispatched job id is lost it can't find a job that has been rescheduled.
+
+Adding this ability to nomad_job_manager is also crucial for the ability for the pipeline to work with a Nomad API that is autoscaling clients. This is because downscaling actions trigger client drainage and this forces batch jobs running on a client given a drain command to be cancelled/lost. The batch jobs can be configured so that the jobs are allocated to a new client but this reintroduces the problem of the current nomad_job_manager implementation not being able to find the new dispatched job id.
+
+## Introduce a tunable aggregation argument for the inundate stage
+
+The inundate job is short-lived relative to the other jobs in the pipeline. There are also many more inundate jobs with a single fim100 eval run generating more than one million inundate jobs. Numerous, short-lived batch jobs are challenging for Nomad to handle and too many of them queued can result in "evaluation storms" where the server is constantly sending evaluation requests to clients that are already full. A simple solution to this is to modify the inundate job so that multiple invocations of inundate.py can be performed during a single dispatched inundate job. Ideally the number of invocations could be controlled by the pipeline so that the number of branches being inundated during an inundate job could be tuned for the compute and memory characteristics of the available Nomad clients to optimize performance.
+
+This change would dramatically reduce the load that batch pipeline runs place on the Nomad server and would likely allow us to run more pipelines in parallel.
+
+## Modify inundate job to return 0 for expected null results
+
+Currently the inundate job returns FAILED statuses and non-zero exit codes for certain job outcomes that are common and to be expected in the event that there is no flow information for a job or a given HAND output directory doesn't contain the necessary data to inundate a branch. These errors are not true failures and so should be handled differently so that Nomad knows not to reattempt them. Currently we don't reattempt inundate job runs at all because rerunning the large number of inundate jobs that get incorrectly marked as failed would waste too much time. Making it so that we could rerun inundate jobs in the event of true failures that might be fixed on a rerun would save us the effort of having to rerun entire pipelines. An example of this is that sometimes inundate jobs aren't able to find the AWS instances IAM credentials. This is a transient error that usually goes away if you rerun the job again. Another example is that occasionally S3 gives transient errors that usually don't show up again on a rerun.
+
+The same thing could be done for the mosaic and agreement maker jobs as well though it is less of an issue for those jobs.