Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,4 +179,4 @@ Annotated version history:
- `0.0.2`: License under BSD.
- `0.0.1`: Initial release.

The community files were last updated on Jan 7, 2025.
The community files were last updated on Oct 31, 2025.
12 changes: 11 additions & 1 deletion snapshot/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ Note that, while provided as a service to the community, these Avro files and di
### Manual execution
In order to build the Avro files yourself by requesting, joining, and indexing original upstream API data, you can simply execute `bash execute_all.sh` after local setup. These will build these files on S3 but they may be deployed to an SFTP server trivially.

For updating just a specific year, use `get_year.sh`:
```bash
bash get_year.sh 2025
```

This will fetch species, catch, and haul data for the specified year and upload to S3. After running this, you'll need to run the remaining pipeline steps (render_flat.py, indexing, etc.) to complete the update.

## Local setup
Local environment setup varies depending on how these files are used.

Expand All @@ -37,8 +44,11 @@ To perform manual execution, these scripts expect to use [AWS S3](https://aws.am
- `AWS_ACCESS_KEY`: This is the access key used to upload completed payloads to AWS S3 or to request those data as part of distributed indexing and processing.
- `AWS_ACCESS_SECRET`: This is the secret associated with the access key used to upload completed payloads to AWS S3 or to request those data as part of distributed indexing and processing.
- `BUCKET_NAME`: This is the name of the bucket where completed uploads should be uploaded or requested within S3.
- `SFTP_HOST`: The SFTP server hostname for deploying files to data.pyafscgap.org.
- `SFTP_USER`: The SFTP username for authentication.
- `SFTP_PASS`: The SFTP password for authentication.

These may be set within `.bashrc` files or similar through `EXPORT` commands. Finally, these scripts expect [Coiled](https://www.coiled.io/) to perform distributed tasks.
These may be set within `.bashrc` files or similar through `EXPORT` commands. A `setup_env.sh` file in the parent directory can also be used (should not be committed to version control). Finally, these scripts expect [Coiled](https://www.coiled.io/) to perform distributed tasks.

## Testing
Unit tests can be executed by running `nose2` within the `snapshot` directory.
Expand Down
32 changes: 32 additions & 0 deletions snapshot/get_year.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash

# Check if BUCKET_NAME is set
if [ -z "$BUCKET_NAME" ]; then
echo "Error: BUCKET_NAME environment variable is not set"
echo "Please run: source ../../setup_env.sh"
exit 1
fi

# Check if year argument is provided
if [ -z "$1" ]; then
echo "Error: Year argument is required"
echo "Usage: bash get_year.sh <year>"
echo "Example: bash get_year.sh 2025"
exit 1
fi

YEAR=$1

echo "-- Getting species --"
python request_source.py species $BUCKET_NAME species
[ $? -ne 0 ] && exit $?

echo "-- Getting catch --"
python request_source.py catch $BUCKET_NAME catch
[ $? -ne 0 ] && exit $?

echo "-- Getting $YEAR --"
python request_source.py haul $BUCKET_NAME haul $YEAR
[ $? -ne 0 ] && exit $?

echo "Done with getting $YEAR."
7 changes: 6 additions & 1 deletion snapshot/render_flat.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,12 @@ def get_haul_record(year: int, survey: str, haul: int) -> typing.Optional[dict]:
return None

haul_records = get_avro(haul_loc)
assert len(haul_records) == 1
if len(haul_records) != 1:
raise ValueError(
f"Expected exactly 1 haul record but found "
f"{len(haul_records)} records for year={year}, "
f"survey={survey}, haul={haul}, file={haul_loc}"
)
haul_record = haul_records[0]
return haul_record

Expand Down