Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
463c0ac
rename caption folders
gabrieletijunaityte Mar 4, 2026
ba10999
Introduce concept captions and validation logic with top-k metrics
gabrieletijunaityte Mar 4, 2026
ed48c48
Add constructive concept eval for test split
gabrieletijunaityte Mar 4, 2026
f486bc4
add concept caption v1
gabrieletijunaityte Mar 4, 2026
4f9467a
add contrastive concept top-k metrics
gabrieletijunaityte Mar 4, 2026
cd9b840
test fixes
gabrieletijunaityte Mar 4, 2026
7b792a8
Merge branch 'develop' into feature/contrastive_setup
gabrieletijunaityte Mar 4, 2026
0aaa7e0
Change column ids
gabrieletijunaityte Mar 5, 2026
b02edd6
Move auc out of if statement and rename k_threshold
gabrieletijunaityte Mar 5, 2026
4bf4f0f
Rename k_threshold
gabrieletijunaityte Mar 5, 2026
ddf30fc
Add no_grad for text embedding
gabrieletijunaityte Mar 5, 2026
c19a304
Move data bound model configuration into setup methods
gabrieletijunaityte Mar 5, 2026
a529674
return dic of scores and average them later
gabrieletijunaityte Mar 5, 2026
339ef88
Fix tests
gabrieletijunaityte Mar 5, 2026
a4a164e
Resolve threshold_k naming confusion and remove redundant dictionary
gabrieletijunaityte Mar 5, 2026
d492299
Merge pull request #57 from WUR-AI/feature/contrastive_setup
vdplasthijs Mar 5, 2026
dcb998e
Merge branch 'develop' into feature/refactor-model-setup
vdplasthijs Mar 5, 2026
a8da730
Add setup to basemodel
gabrieletijunaityte Mar 5, 2026
c04cc06
abstracting setup method
gabrieletijunaityte Mar 5, 2026
8530874
Merge branch 'feature/refactor-model-setup' of github.com:WUR-AI/aeth…
gabrieletijunaityte Mar 5, 2026
cb58282
Merge pull request #59 from WUR-AI/feature/refactor-model-setup
gabrieletijunaityte Mar 5, 2026
de8b340
Merge remote-tracking branch 'WUR-AI/develop' into feature/butterfly-…
vdplasthijs Mar 6, 2026
2f118bd
minor gee update
vdplasthijs Mar 6, 2026
ccd7ac0
Crop Yield Africa use case - dataset and example tabular regression o…
robknapen Mar 7, 2026
8fa2a9f
Crop Yield Africa use case - example coords and fusion experiments; i…
robknapen Mar 7, 2026
a03d228
The eo_encoders to geo_encoders renaming and refactoring
robknapen Mar 7, 2026
1a20366
Missed renames
robknapen Mar 7, 2026
9c205b0
Crop Yield use case: tessera embeddings download script
robknapen Mar 8, 2026
2ef555a
Crop Yield use case: tessera modality added to dataset
robknapen Mar 8, 2026
90b6a49
Configurable coords encoder in MultiModalEncoder
robknapen Mar 8, 2026
20e7d51
Crop Yield use case: leave-one-country-out split script
robknapen Mar 8, 2026
3e1aec0
Crop Yield use case: Hydra configs for various experiments
robknapen Mar 8, 2026
eb7d802
Fixed the freezer
robknapen Mar 8, 2026
e6e00b5
Adds tessera embeddings directory to .env.example
robknapen Mar 8, 2026
7e71170
Makes L2 normalization optional in predictive model. Switching it off…
robknapen Mar 10, 2026
c4f75b8
Crop Yield use case: Consolidated / clarified cache use for downloade…
robknapen Mar 10, 2026
c24ce39
Merge pull request #60 from WUR-AI/feature/crop-yield-africa
gabrieletijunaityte Mar 10, 2026
829dd67
Create tabular encoder
gabrieletijunaityte Mar 10, 2026
a215e27
Makes DBScan clustering more efficient and much faster.
robknapen Mar 11, 2026
221a495
Create mlp projector/adapter
gabrieletijunaityte Mar 11, 2026
6e083b1
Replace multi-modal encoder with wrapper
gabrieletijunaityte Mar 11, 2026
f4aa97b
Crop Yield use case: spatial splitting
robknapen Mar 11, 2026
ac57242
Crop Yield use case: configs for various experiments
robknapen Mar 11, 2026
944814d
Adds RRMSE loss function for crop yield error comparison
robknapen Mar 11, 2026
5c90f4f
Crop Yield use case: Adds Fourier harmonics as engineered location fe…
robknapen Mar 11, 2026
5c40f07
concept captions
vdplasthijs Mar 11, 2026
1a46947
Merge branch 'develop' into feature/butterfly-data
vdplasthijs Mar 11, 2026
7a82a18
Merge pull request #61 from WUR-AI/feature/spatial-splits
vdplasthijs Mar 12, 2026
dd679e0
Merge branch 'develop' into feature/butterfly-data
gabrieletijunaityte Mar 12, 2026
79e3339
Merge pull request #62 from WUR-AI/feature/butterfly-data
vdplasthijs Mar 12, 2026
8782f37
Fix processor to clip to max sequence length of CLIP text encoder
gabrieletijunaityte Mar 12, 2026
9af0d5c
Add setup method to all geo-encoders
gabrieletijunaityte Mar 12, 2026
d85d329
Add setup method and fix devices/dtypes
gabrieletijunaityte Mar 12, 2026
ffdf985
Add setup method and simplify architecture
gabrieletijunaityte Mar 12, 2026
62422ff
Change how trainable parts are reported/printed
gabrieletijunaityte Mar 12, 2026
aea39af
Add setup to prediction heads + docs
gabrieletijunaityte Mar 12, 2026
ca43ec8
pre-commit hook changes
gabrieletijunaityte Mar 12, 2026
afef0b6
Setup method for mlp projector
gabrieletijunaityte Mar 12, 2026
6200a78
Introduce encoder wrapper to remove multi-modal encoder
gabrieletijunaityte Mar 12, 2026
58cf1eb
Merge branch 'develop' of github.com:WUR-AI/aether into feature/encod…
gabrieletijunaityte Mar 12, 2026
94832e0
Fix encoder tests
gabrieletijunaityte Mar 12, 2026
004b803
fix tests
gabrieletijunaityte Mar 12, 2026
0855b4a
fix tests
gabrieletijunaityte Mar 12, 2026
215e603
Fix depth of summary report for modules
gabrieletijunaityte Mar 12, 2026
eb571bc
Crop Yield use case: Reduced MLP projector, equal contribution of spa…
robknapen Mar 12, 2026
e27ecdc
fix value 0 being ignored
gabrieletijunaityte Mar 15, 2026
e6115bf
Add model set up print statements
gabrieletijunaityte Mar 15, 2026
673fc0a
Merge branch 'feature/encoder-wrapper' of github.com:WUR-AI/aether in…
gabrieletijunaityte Mar 15, 2026
dbdd476
Guatemala UC tessera
gabrieletijunaityte Mar 15, 2026
6891a51
Alignment training
gabrieletijunaityte Mar 15, 2026
0c710d9
De-duplicate geotessera requirements
gabrieletijunaityte Mar 19, 2026
9e406a4
Create input and output dimensions as attributes
gabrieletijunaityte Mar 19, 2026
d358066
Fix broken tests
gabrieletijunaityte Mar 23, 2026
4023d96
Merge pull request #63 from WUR-AI/feature/encoder-wrapper
gabrieletijunaityte Mar 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ TRAINER_PROFILE="gpu" # cpu/gpu/mps/ddp
HF_HOME="${PROJECT_ROOT}/.cache/huggingface/" # set or will default to './.cache/huggingface/'
DATA_DIR="${PROJECT_ROOT}/data/" # set to your local data folder (for aether), or will default to '${PROJECT_ROOT}/data/'

# Base cache directory for TESSERA.
# GeoTessera registry/metadata is stored here; large raw source tiles go in the
# raw/ subfolder. This folder can get very large — point it at an external drive
# if needed.
TESSERA_EMBEDDINGS_DIR="${PROJECT_ROOT}/data/cache/tessera/"

# Working directories
# STORAGE_MODE=# or "shared"
# SHARED_CACHE=# or "/path/to/shared/.cache"
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -228,3 +228,4 @@ notebooks/01-TvdP-tmp.ipynb
*/source/*
*.tif # for now
..env.swp
/data/yield_africa/
2 changes: 1 addition & 1 deletion configs/callbacks/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ early_stopping:
mode: "min"

model_summary:
max_depth: 2
max_depth: 1
1 change: 1 addition & 0 deletions configs/data/butterfly_coords_text.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dataset:
caption_builder:
_target_: src.data.butterfly_caption_builder.ButterflyCaptionBuilder
templates_fname: v3.json
concepts_fname: v1.json
data_dir: ${paths.data_dir}/s2bms
seed: ${seed}

Expand Down
3 changes: 2 additions & 1 deletion configs/data/butterfly_full_param_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ dataset:

caption_builder:
_target_: src.data.butterfly_caption_builder.ButterflyCaptionBuilder
templates_fname: caption_templates.json
templates_fname: v3.json
concepts_fname: v1.json
data_dir: ${paths.data_dir}/s2bms
seed: ${seed}

Expand Down
33 changes: 33 additions & 0 deletions configs/data/yield_africa_all.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
_target_: src.data.base_datamodule.BaseDataModule

dataset:
_target_: src.data.yield_africa_dataset.YieldAfricaDataset
data_dir: ${paths.data_dir}
modalities:
coords: {}
use_target_data: true
use_features: true
use_aux_data: none
seed: ${seed}
cache_dir: ${paths.cache_dir}
# Country/year filters — set to a list to restrict, null to include all.
# countries and years select only the listed values;
# exclude_countries and exclude_years drop the listed values.
countries: ["BF", "BUR", "ETH", "KEN", "MAL", "RWA", "TAN", "ZAM"]
years: [2014, 2016, 2017, 2018, 2019, 2020, 2021, 2023, 2024]
exclude_countries: null
exclude_years: null

batch_size: 64
num_workers: 0
pin_memory: false

# todo - use spatial split (pre-calculate and then load from file)
# - hold out country/year block for validation
# - or leave one country out for validation
# - normalize data by country (after filtering)

split_mode: "random"
train_val_test_split: [0.7, 0.15, 0.15]
save_split: false
seed: ${seed}
33 changes: 33 additions & 0 deletions configs/data/yield_africa_loco.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
_target_: src.data.base_datamodule.BaseDataModule

dataset:
_target_: src.data.yield_africa_dataset.YieldAfricaDataset
data_dir: ${paths.data_dir}
modalities:
coords: {}
use_target_data: true
use_features: true
use_aux_data: none
seed: ${seed}
cache_dir: ${paths.cache_dir}
# Include all countries and years so the split file determines the partition.
countries: ["BF", "BUR", "ETH", "KEN", "MAL", "RWA", "TAN", "ZAM"]
years: [2014, 2016, 2017, 2018, 2019, 2020, 2021, 2023, 2024]
exclude_countries: null
exclude_years: null

batch_size: 64
num_workers: 0
pin_memory: false

# Leave-one-country-out split loaded from a pre-generated file.
# Generate split files first:
# python src/data_preprocessing/yield_africa_loco_splits.py --data_dir <data_dir>
#
# Override saved_split_file_name at the command line to change the held-out country:
# python src/train.py experiment=yield_africa_tabular_loco \
# data.saved_split_file_name=split_loco_RWA.pth
split_mode: "from_file"
saved_split_file_name: "split_loco_KEN.pth"
save_split: false
seed: ${seed}
33 changes: 33 additions & 0 deletions configs/data/yield_africa_spatial.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
_target_: src.data.base_datamodule.BaseDataModule

dataset:
_target_: src.data.yield_africa_dataset.YieldAfricaDataset
data_dir: ${paths.data_dir}
modalities:
coords: {}
use_target_data: true
use_features: true
use_aux_data: none
seed: ${seed}
cache_dir: ${paths.cache_dir}
# Include all countries and years so the split file determines the partition.
countries: ["BF", "BUR", "ETH", "KEN", "MAL", "RWA", "TAN", "ZAM"]
years: [2014, 2016, 2017, 2018, 2019, 2020, 2021, 2023, 2024]
exclude_countries: null
exclude_years: null

batch_size: 64
num_workers: 0
pin_memory: false

# Spatial-cluster split loaded from a pre-generated file.
# Generate split files first (produces 10 km, 25 km, and 50 km variants):
# python src/data_preprocessing/yield_africa_spatial_splits.py --data_dir <data_dir>
#
# Override saved_split_file_name at the command line to change the cluster distance:
# python src/train.py experiment=yield_africa_tabular_spatial \
# data.saved_split_file_name=split_spatial_10km.pth
split_mode: "from_file"
saved_split_file_name: "split_spatial_25km.pth"
save_split: false
seed: ${seed}
31 changes: 31 additions & 0 deletions configs/data/yield_africa_tessera.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
_target_: src.data.base_datamodule.BaseDataModule

dataset:
_target_: src.data.yield_africa_dataset.YieldAfricaDataset
data_dir: ${paths.data_dir}
modalities:
tessera:
# size must match the tile_size used when running the preprocessing script.
# Default: 9 pixels (set by yield_africa_tessera_preprocess.py --tile_size).
size: 9
format: npy
# year is intentionally omitted: yield_africa fetches per-record year tiles
# via the preprocessing script rather than a single bulk-year download.
use_target_data: true
use_features: true
use_aux_data: none
seed: ${seed}
cache_dir: ${paths.cache_dir}
countries: ["BF", "BUR", "ETH", "KEN", "MAL", "RWA", "TAN", "ZAM"]
years: [2014, 2016, 2017, 2018, 2019, 2020, 2021, 2023, 2024]
exclude_countries: null
exclude_years: null

batch_size: 64
num_workers: 0
pin_memory: false

split_mode: "random"
train_val_test_split: [0.7, 0.15, 0.15]
save_split: false
seed: ${seed}
39 changes: 39 additions & 0 deletions configs/data/yield_africa_tessera_loco.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
_target_: src.data.base_datamodule.BaseDataModule

dataset:
_target_: src.data.yield_africa_dataset.YieldAfricaDataset
data_dir: ${paths.data_dir}
modalities:
tessera:
# size must match the tile_size used when running the preprocessing script.
# Default: 9 pixels (set by yield_africa_tessera_preprocess.py --tile_size).
size: 9
format: npy
# year is intentionally omitted: yield_africa fetches per-record year tiles
# via the preprocessing script rather than a single bulk-year download.
use_target_data: true
use_features: true
use_aux_data: none
seed: ${seed}
cache_dir: ${paths.cache_dir}
# Include all countries and years so the split file determines the partition.
countries: ["BF", "BUR", "ETH", "KEN", "MAL", "RWA", "TAN", "ZAM"]
years: [2014, 2016, 2017, 2018, 2019, 2020, 2021, 2023, 2024]
exclude_countries: null
exclude_years: null

batch_size: 64
num_workers: 0
pin_memory: false

# Leave-one-country-out split loaded from a pre-generated file.
# Generate split files first:
# python src/data_preprocessing/yield_africa_loco_splits.py --data_dir <data_dir>
#
# Override saved_split_file_name at the command line to change the held-out country:
# python src/train.py experiment=yield_africa_tessera_fusion_loco \
# data.saved_split_file_name=split_loco_RWA.pth
split_mode: "from_file"
saved_split_file_name: "split_loco_KEN.pth"
save_split: false
seed: ${seed}
39 changes: 39 additions & 0 deletions configs/data/yield_africa_tessera_spatial.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
_target_: src.data.base_datamodule.BaseDataModule

dataset:
_target_: src.data.yield_africa_dataset.YieldAfricaDataset
data_dir: ${paths.data_dir}
modalities:
tessera:
# size must match the tile_size used when running the preprocessing script.
# Default: 9 pixels (set by yield_africa_tessera_preprocess.py --tile_size).
size: 9
format: npy
# year is intentionally omitted: yield_africa fetches per-record year tiles
# via the preprocessing script rather than a single bulk-year download.
use_target_data: true
use_features: true
use_aux_data: none
seed: ${seed}
cache_dir: ${paths.cache_dir}
# Include all countries and years so the split file determines the partition.
countries: ["BF", "BUR", "ETH", "KEN", "MAL", "RWA", "TAN", "ZAM"]
years: [2014, 2016, 2017, 2018, 2019, 2020, 2021, 2023, 2024]
exclude_countries: null
exclude_years: null

batch_size: 64
num_workers: 0
pin_memory: false

# Spatial-cluster split loaded from a pre-generated file.
# Generate split files first (produces 10 km, 25 km, and 50 km variants):
# python src/data_preprocessing/yield_africa_spatial_splits.py --data_dir <data_dir>
#
# Override saved_split_file_name at the command line to change the cluster distance:
# python src/train.py experiment=yield_africa_tessera_fusion_spatial \
# data.saved_split_file_name=split_spatial_10km.pth
split_mode: "from_file"
saved_split_file_name: "split_spatial_25km.pth"
save_split: false
seed: ${seed}
25 changes: 25 additions & 0 deletions configs/experiment/yield_africa_coords_reg.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# @package _global_
# configs/experiment/yield_africa_tabular_reg.yaml
# Variant: Tabular features only, full dataset

defaults:
- override /model: yield_geoclip_reg
- override /data: yield_africa_all
- override /metrics: yield_africa_regression

tags: ["yield_africa", "coords_only", "regression"]
seed: 12345

trainer:
min_epochs: 1
max_epochs: 150

data:
batch_size: 64

logger:
wandb:
tags: ${tags}
group: "yield_africa"
aim:
experiment: "yield_africa"
33 changes: 33 additions & 0 deletions configs/experiment/yield_africa_fusion_loco.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# @package _global_
# configs/experiment/yield_africa_fusion_loco.yaml
# GeoClip + tabular fusion model evaluated with leave-one-country-out split.
# Default held-out country: KEN (largest, most representative test set).
#
# Generate split files first:
# python src/data_preprocessing/yield_africa_loco_splits.py --data_dir <data_dir>
#
# To evaluate on a different held-out country:
# python src/train.py experiment=yield_africa_fusion_loco \
# data.saved_split_file_name=split_loco_RWA.pth

defaults:
- override /model: yield_fusion_reg
- override /data: yield_africa_loco
- override /metrics: yield_africa_regression

tags: ["yield_africa", "fusion", "regression", "loco"]
seed: 12345

trainer:
min_epochs: 1
max_epochs: 150

data:
batch_size: 64

logger:
wandb:
tags: ${tags}
group: "yield_africa"
aim:
experiment: "yield_africa"
25 changes: 25 additions & 0 deletions configs/experiment/yield_africa_fusion_reg.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# @package _global_
# configs/experiment/heat_guatemala_fusion_reg.yaml
# Variant C: GeoClip + tabular fusion

defaults:
- override /model: yield_fusion_reg
- override /data: yield_africa_all
- override /metrics: yield_africa_regression

tags: ["yield_africa", "fusion", "regression"]
seed: 12345

trainer:
min_epochs: 1
max_epochs: 150

data:
batch_size: 64

logger:
wandb:
tags: ${tags}
group: "yield_africa"
aim:
experiment: "yield_africa"
33 changes: 33 additions & 0 deletions configs/experiment/yield_africa_fusion_spatial.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# @package _global_
# configs/experiment/yield_africa_fusion_spatial.yaml
# GeoClip + tabular fusion model evaluated with a spatial-cluster split.
# Default cluster distance: 25 km (split_spatial_25km.pth).
#
# Generate split files first:
# python src/data_preprocessing/yield_africa_spatial_splits.py --data_dir <data_dir>
#
# To evaluate at a different cluster distance:
# python src/train.py experiment=yield_africa_fusion_spatial \
# data.saved_split_file_name=split_spatial_10km.pth

defaults:
- override /model: yield_fusion_reg
- override /data: yield_africa_spatial
- override /metrics: yield_africa_regression

tags: ["yield_africa", "fusion", "regression", "spatial"]
seed: 12345

trainer:
min_epochs: 1
max_epochs: 150

data:
batch_size: 64

logger:
wandb:
tags: ${tags}
group: "yield_africa"
aim:
experiment: "yield_africa"
Loading
Loading