Skip to content

CW3E/datashare_tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CW3E Surface Meteorology Downloader

Download, parse, and aggregate CW3E SurfaceMetObs hourly station files into a single time-indexed dataset.

This tool automates retrieval of surface meteorological data hosted by the Center for Western Weather and Water Extremes (CW3E) at UC San Diego/SIO. It supports auto‑discovering per‑station schemas by reading each station’s DataFormat.txt, robustly handles missing files, and exports combined results to CSV and Parquet.


✨ Features

  • Auto‑schema: Reads <SITE>/DataFormat.txt to build column names and missing‑value tokens per station (fallback schema provided if missing).
  • Flexible ranges: Fetches a continuous range from start year/Julian day to end year/Julian day (inclusive), across years with leap‑year handling.
  • 404‑tolerant: Quietly skips missing hours or transient network errors.
  • Memory‑only mode: Optionally does not save raw hourly files (parse directly from the HTTP response).
  • Clean timeseries: Converts Year + Julian_Day + HHMM (end of averaging) to a proper DatetimeIndex.
  • Exports: Writes one combined dataset to CSV and Parquet.

📦 Requirements

  • Python 3.9+
  • Packages:
    pip install pandas requests pyarrow

    pyarrow (or fastparquet) is required for Parquet output.


📁 File/URL Conventions

  • Remote path pattern:
    https://cw3e-datashare.ucsd.edu/CW3E_SurfaceMetObs/{SITE_UPPER}/{YYYY}/{JJJ}/{site_lower}{YY}{JJJ}.{HH}m
    
    Example:
    https://cw3e-datashare.ucsd.edu/CW3E_SurfaceMetObs/SIO/2026/001/sio26001.00m
    
  • Station metadata/schemas:
    https://cw3e-datashare.ucsd.edu/CW3E_SurfaceMetObs/{SITE_UPPER}/DataFormat.txt
    

🚀 Quick Start

Assuming the script filename is cw3e_surface_download.py:

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2026 --start_jday 1 \
  --end_year   2026 --end_jday   2 \
  --out_folder downloads

Outputs:

  • downloads/sio_2026j001_to_2026j002.csv
  • downloads/sio_2026j001_to_2026j002.parquet

🧠 Auto‑schema via DataFormat.txt

The script tries to fetch and parse DataFormat.txt from the site folder to:

  • build column names (leaving the first 4 time columns fixed: Datalogger_ID, Year, Julian_Day, HHMM)
  • detect missing‑value tokens (e.g., 99999, -7999, -99.99)

If a site lacks DataFormat.txt or it’s unparseable, the tool falls back to a 13‑field schema with common CW3E variables:

  • MSLP_mb, Temperature_C, Relative_Humidity_pct, Wind_Speed_mps, Wind_Direction_deg, Solar_Radiation_Wm2, Battery_Voltage_V, Precipitation_mm, Max_Wind_Speed_mps

Print and inspect the schema (no download)

python cw3e_surface_download.py --site_name sio --print_schema

Disable auto‑schema and use fallback

python cw3e_surface_download.py --site_name sio --print_schema --no_auto_schema

🧰 Command‑line Usage

--site_name SITE        Station code (e.g., "sio"). Case‑insensitive.
--start_year YYYY       Start year (e.g., 2026).
--start_jday DDD        Start Julian day (1..365/366).
--end_year YYYY         End year (>= start_year).
--end_jday DDD          End Julian day.

--out_folder PATH       Folder for raw files and CSV/Parquet (default: ./downloads).
--timeout SECONDS       HTTP timeout (default: 30).

--delete_unparsed       If saving raw files, delete any hourly file that fails to parse.
--no_save_raw           Do not save raw hourly files (parse directly from memory).

--no_auto_schema        Do not read DataFormat.txt; use built‑in fallback schema.
--dataformat_url URL    Override the DataFormat.txt URL (advanced).
--extra_na TOKENS ...   Additional NA tokens (e.g., --extra_na -8888 -9999).

--print_schema          Print discovered schema & NA tokens, then exit (no download).
-h, --help              Show full help and examples.

💡 Examples

1) Download SIO 2026 JD 001–002 (save raw files)

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2026 --start_jday 1 \
  --end_year   2026 --end_jday   2 \
  --out_folder downloads

2) Same range, memory‑only (no raw files)

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2026 --start_jday 1 \
  --end_year   2026 --end_jday   2 \
  --no_save_raw

3) Cross‑year range; add custom NA tokens

python cw3e_surface_download.py \
  --site_name sio \
  --start_year 2025 --start_jday 365 \
  --end_year   2026 --end_jday   2 \
  --extra_na -8888 -9999

4) Inspect schema only

python cw3e_surface_download.py --site_name sio --print_schema

📤 Outputs

  • CSV: <site>_<startyear>j<startday>_to_<endyear>j<endday>.csv (includes time column)
  • Parquet: same tag, DateTimeIndex preserved

The DataFrame columns include Datalogger_ID plus the discovered data variables for that site. The time index reflects the end time of averaging.


🔎 Notes & Behavior

  • The downloader silently skips missing hours (HTTP 404) and transient network errors.
  • Only concatenates hourly tables if their column layouts are identical; otherwise, the combined output is omitted.
  • NA tokens include those found in DataFormat.txt plus safe defaults. You can append more via --extra_na.
  • The first four fields are always: Datalogger_ID, Year, Julian_Day, HHMM.

🧪 Tips & Extensions

  • Add plotting/QA routines (e.g., temp/wind/precip) in a notebook using the Parquet output.
  • For large ranges, consider parallelizing by day/hour (e.g., concurrent.futures).
  • If your environment lacks pyarrow, comment out Parquet writing or install fastparquet.

🤝 Contributing

Issues and pull requests are welcome. If you have stations with unusual DataFormat.txt styles, please share examples so we can improve the parser.


📄 License

This software is Copyright © 2026 The Regents of the University of California. All Rights Reserved.


🙏 Acknowledgments

Data provided by CW3E (Scripps Institution of Oceanography, UC San Diego). Please cite CW3E appropriately in your work and follow their data policies.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages