CW3E Surface Meteorology Downloader

Download, parse, and aggregate CW3E SurfaceMetObs hourly station files into a single time-indexed dataset.

This tool automates retrieval of surface meteorological data hosted by the Center for Western Weather and Water Extremes (CW3E) at UC San Diego/SIO. It supports auto‑discovering per‑station schemas by reading each station’s DataFormat.txt, robustly handles missing files, and exports combined results to CSV and Parquet.

✨ Features

Auto‑schema: Reads <SITE>/DataFormat.txt to build column names and missing‑value tokens per station (fallback schema provided if missing).
Flexible ranges: Fetches a continuous range from start year/Julian day to end year/Julian day (inclusive), across years with leap‑year handling.
404‑tolerant: Quietly skips missing hours or transient network errors.
Memory‑only mode: Optionally does not save raw hourly files (parse directly from the HTTP response).
Clean timeseries: Converts Year + Julian_Day + HHMM (end of averaging) to a proper DatetimeIndex.
Exports: Writes one combined dataset to CSV and Parquet.

📦 Requirements

Python 3.9+
Packages:
```
pip install pandas requests pyarrow
```
pyarrow (or fastparquet) is required for Parquet output.

📁 File/URL Conventions

Remote path pattern:

https://cw3e-datashare.ucsd.edu/CW3E_SurfaceMetObs/{SITE_UPPER}/{YYYY}/{JJJ}/{site_lower}{YY}{JJJ}.{HH}m

Example:

https://cw3e-datashare.ucsd.edu/CW3E_SurfaceMetObs/SIO/2026/001/sio26001.00m

Station metadata/schemas:

https://cw3e-datashare.ucsd.edu/CW3E_SurfaceMetObs/{SITE_UPPER}/DataFormat.txt

🚀 Quick Start

Assuming the script filename is cw3e_surface_download.py:

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2026 --start_jday 1 \
  --end_year   2026 --end_jday   2 \
  --out_folder downloads

Outputs:

downloads/sio_2026j001_to_2026j002.csv
downloads/sio_2026j001_to_2026j002.parquet

🧠 Auto‑schema via `DataFormat.txt`

The script tries to fetch and parse DataFormat.txt from the site folder to:

build column names (leaving the first 4 time columns fixed: Datalogger_ID, Year, Julian_Day, HHMM)
detect missing‑value tokens (e.g., 99999, -7999, -99.99)

If a site lacks DataFormat.txt or it’s unparseable, the tool falls back to a 13‑field schema with common CW3E variables:

MSLP_mb, Temperature_C, Relative_Humidity_pct, Wind_Speed_mps, Wind_Direction_deg, Solar_Radiation_Wm2, Battery_Voltage_V, Precipitation_mm, Max_Wind_Speed_mps

Print and inspect the schema (no download)

python cw3e_surface_download.py --site_name sio --print_schema

Disable auto‑schema and use fallback

python cw3e_surface_download.py --site_name sio --print_schema --no_auto_schema

🧰 Command‑line Usage

--site_name SITE        Station code (e.g., "sio"). Case‑insensitive.
--start_year YYYY       Start year (e.g., 2026).
--start_jday DDD        Start Julian day (1..365/366).
--end_year YYYY         End year (>= start_year).
--end_jday DDD          End Julian day.

--out_folder PATH       Folder for raw files and CSV/Parquet (default: ./downloads).
--timeout SECONDS       HTTP timeout (default: 30).

--delete_unparsed       If saving raw files, delete any hourly file that fails to parse.
--no_save_raw           Do not save raw hourly files (parse directly from memory).

--no_auto_schema        Do not read DataFormat.txt; use built‑in fallback schema.
--dataformat_url URL    Override the DataFormat.txt URL (advanced).
--extra_na TOKENS ...   Additional NA tokens (e.g., --extra_na -8888 -9999).

--print_schema          Print discovered schema & NA tokens, then exit (no download).
-h, --help              Show full help and examples.

💡 Examples

1) Download SIO 2026 JD 001–002 (save raw files)

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2026 --start_jday 1 \
  --end_year   2026 --end_jday   2 \
  --out_folder downloads

2) Same range, memory‑only (no raw files)

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2026 --start_jday 1 \
  --end_year   2026 --end_jday   2 \
  --no_save_raw

3) Cross‑year range; add custom NA tokens

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2025 --start_jday 365 \
  --end_year   2026 --end_jday   2 \
  --extra_na -8888 -9999

4) Inspect schema only

python cw3e_surface_download.py --site_name sio --print_schema

📤 Outputs

CSV: <site>_<startyear>j<startday>_to_<endyear>j<endday>.csv (includes time column)
Parquet: same tag, DateTimeIndex preserved

The DataFrame columns include Datalogger_ID plus the discovered data variables for that site. The time index reflects the end time of averaging.

🔎 Notes & Behavior

The downloader silently skips missing hours (HTTP 404) and transient network errors.
Only concatenates hourly tables if their column layouts are identical; otherwise, the combined output is omitted.
NA tokens include those found in DataFormat.txt plus safe defaults. You can append more via --extra_na.
The first four fields are always: Datalogger_ID, Year, Julian_Day, HHMM.

🧪 Tips & Extensions

Add plotting/QA routines (e.g., temp/wind/precip) in a notebook using the Parquet output.
For large ranges, consider parallelizing by day/hour (e.g., concurrent.futures).
If your environment lacks pyarrow, comment out Parquet writing or install fastparquet.

🤝 Contributing

Issues and pull requests are welcome. If you have stations with unusual DataFormat.txt styles, please share examples so we can improve the parser.

📄 License

🙏 Acknowledgments

Data provided by CW3E (Scripps Institution of Oceanography, UC San Diego). Please cite CW3E appropriately in your work and follow their data policies.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
cw3e_surfacemet_download.py		cw3e_surfacemet_download.py
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CW3E Surface Meteorology Downloader

✨ Features

📦 Requirements

📁 File/URL Conventions

🚀 Quick Start

🧠 Auto‑schema via `DataFormat.txt`

Print and inspect the schema (no download)

Disable auto‑schema and use fallback

🧰 Command‑line Usage

💡 Examples

1) Download SIO 2026 JD 001–002 (save raw files)

2) Same range, memory‑only (no raw files)

3) Cross‑year range; add custom NA tokens

4) Inspect schema only

📤 Outputs

🔎 Notes & Behavior

🧪 Tips & Extensions

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CW3E Surface Meteorology Downloader

✨ Features

📦 Requirements

📁 File/URL Conventions

🚀 Quick Start

🧠 Auto‑schema via DataFormat.txt

Print and inspect the schema (no download)

Disable auto‑schema and use fallback

🧰 Command‑line Usage

💡 Examples

1) Download SIO 2026 JD 001–002 (save raw files)

2) Same range, memory‑only (no raw files)

3) Cross‑year range; add custom NA tokens

4) Inspect schema only

📤 Outputs

🔎 Notes & Behavior

🧪 Tips & Extensions

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🧠 Auto‑schema via `DataFormat.txt`

Packages