workflows

scripts to explore and load data into the CalCOFI database

notebooks

See rendered notebooks at https://calcofi.github.io/workflows/.

data pipeline

The CalCOFI data workflow uses the targets R package for dependency management and reproducibility.

quick start

# from the workflows/ directory
setwd("workflows")

# run the full pipeline
targets::tar_make()

# visualize the dependency graph
targets::tar_visnetwork()

# check which targets are outdated
targets::tar_outdated()

pipeline architecture

Google Drive → rclone → GCS (calcofi-files) → targets → Parquet → DuckDB
                              ↓
                         archive/ (versioned)

Key files:

_targets.R - pipeline definition
scripts/sync_gdrive_to_gcs.sh - rclone sync with versioning
scripts/run_pipeline.R - pipeline runner script

GCS buckets:

gs://calcofi-files/ - versioned source CSV files
gs://calcofi-db/ - Parquet files and DuckDB database

rclone setup

install rclone: brew install rclone
configure remotes: rclone config
- gdrive: Google Drive (readonly scope)
- gcs: Google Cloud Storage (project: ucsd-sio-calcofi)
run sync: ./scripts/sync_gdrive_to_gcs.sh

R package functions

The calcofi4db package provides helper functions:

# cloud operations
get_gcs_file("gs://calcofi-files/current/file.csv")
put_gcs_file("local.csv", "gs://calcofi-files/current/file.csv")
list_gcs_versions("path/to/file.csv")

# parquet operations
csv_to_parquet("data.csv", "output.parquet")
read_parquet_table("data.parquet")

# duckdb operations
con <- get_duckdb_con("calcofi.duckdb")
create_duckdb_from_parquet(parquet_files, "calcofi.duckdb")

See the full implementation plan at README_PLAN.md.

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github/workflows		.github/workflows
.vscode		.vscode
_output		_output
data		data
datasets		datasets
diagrams		diagrams
figures		figures
ingest		ingest
libs		libs
metadata		metadata
old		old
scripts		scripts
test_shinyui		test_shinyui
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
README_PLAN.qmd		README_PLAN.qmd
_config.yml		_config.yml
_quarto.yml		_quarto.yml
_targets.R		_targets.R
air.toml		air.toml
check_swfsc.noaa.gov_calcofi-db_eggs-larvae.R		check_swfsc.noaa.gov_calcofi-db_eggs-larvae.R
clean_db.qmd		clean_db.qmd
explore_bird-mammal.Rmd		explore_bird-mammal.Rmd
explore_bottle.Rmd		explore_bottle.Rmd
explore_db.qmd		explore_db.qmd
explore_depth.Rmd		explore_depth.Rmd
explore_duckdb.qmd		explore_duckdb.qmd
explore_dumping.Rmd		explore_dumping.Rmd
explore_gdatasets.qmd		explore_gdatasets.qmd
explore_grid-res.Rmd		explore_grid-res.Rmd
explore_interpolate-larvae.qmd		explore_interpolate-larvae.qmd
explore_join-larvae-bottle.qmd		explore_join-larvae-bottle.qmd
explore_larvae.Rmd		explore_larvae.Rmd
explore_pg_contour.qmd		explore_pg_contour.qmd
explore_pg_contour2.qmd		explore_pg_contour2.qmd
explore_scripps.ucsd.edu_pelagic-invertebrate-collection.qmd		explore_scripps.ucsd.edu_pelagic-invertebrate-collection.qmd
explore_scripps.ucsd.edu_pelagic-invertebrate-collection_2026-02-09.qmd		explore_scripps.ucsd.edu_pelagic-invertebrate-collection_2026-02-09.qmd
explore_station-grid.Rmd		explore_station-grid.Rmd
explore_station-grid_2.Rmd		explore_station-grid_2.Rmd
explore_station-vector.Rmd		explore_station-vector.Rmd
explore_taxa-tree.qmd		explore_taxa-tree.qmd
ingest_calcofi.org_bottle-database.qmd		ingest_calcofi.org_bottle-database.qmd
ingest_calcofi.org_bottle-database_0.qmd		ingest_calcofi.org_bottle-database_0.qmd
ingest_coastwatch.pfeg.noaa.gov_erdCalCOFIlrvsiz.qmd		ingest_coastwatch.pfeg.noaa.gov_erdCalCOFIlrvsiz.qmd
ingest_ices.dk_ship-ices.qmd		ingest_ices.dk_ship-ices.qmd
ingest_swfsc.noaa.gov_calcofi-db.qmd		ingest_swfsc.noaa.gov_calcofi-db.qmd
load_bottle-stations.Rmd		load_bottle-stations.Rmd
load_field-labels.Rmd		load_field-labels.Rmd
load_larvae.Rmd		load_larvae.Rmd
load_sanctuaries.Rmd		load_sanctuaries.Rmd
load_views.qmd		load_views.qmd
load_wdpa.Rmd		load_wdpa.Rmd
map_surveys.Rmd		map_surveys.Rmd
merge_ichthyo_bottle.qmd		merge_ichthyo_bottle.qmd
pg_rest_test.qmd		pg_rest_test.qmd
publish_icthyo-bottle_to_obis.qmd		publish_icthyo-bottle_to_obis.qmd
scrape_ctd.qmd		scrape_ctd.qmd
update_datasets-sitemap.qmd		update_datasets-sitemap.qmd
update_taxa.qmd		update_taxa.qmd
workflows.Rproj		workflows.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

workflows

notebooks

data pipeline

quick start

pipeline architecture

rclone setup

R package functions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

CalCOFI/workflows

Folders and files

Latest commit

History

Repository files navigation

workflows

notebooks

data pipeline

quick start

pipeline architecture

rclone setup

R package functions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages