📟 PaGeR: Panorama Geometry Estimation using Single-Step Diffusion Models

This project implements PaGeR, a Computer Vision method for estimating geometry from monocular panoramic ERP images implemented in the paper Panorama Geometry Estimation using Single-Step Diffusion Models.

[](website here) [](paper here)

Team: Vukasin Bozic, Isidora Slavkovic, Dominik Narnhofer Nando Metzger, Denis Rozumny, Konrad Schindler, Nikolai Kalischek

We present PaGeR, a diffusion-based model for panoramic geometry reconstruction that extends monocular depth estimation to full 360° scenes. PaGeR is a one-step diffusion model trained directly in pixel space, capable of predicting high-resolution panoramic depth and surface normals with strong generalization to unseen environments. Leveraging advances in panorama generation and diffusion fine-tuning, PaGeR is trained on PanoInfinigen, a newly introduced synthetic dataset of indoor and outdoor scenes with metric depth and normals, producing coherent, metrically accurate geometry. It outperforms prior approaches across standard, few-shot, and zero-shot scenarios.

📢 News

05-02-2026: Full training, inference, and evaluation code added, along with the arXiv paper, interactive demo and depth, metric depth and normals model checkpoints. Full dataset release coming soon.

🚀 Usage

There are several ways to interact with PaGeR:

A quick start is to use our HF-hosted demo:
Run the demo locally (requires a 24VRAM GPU) -> see instructions below.
Some interactive examples are also available at our project page:
Finally, local development instructions with this codebase are given below.

🛠️ Setup

The code was tested on:

Debian GNU/Linux 12, Python 3.10.16, PyTorch 2.2.0, and CUDA 12.1.

📦 Repository

Clone the repository (requires git):

git clone https://github.com/prs-eth/PaGeR.git
cd PaGeR

💻 Dependencies

Create the Conda environment and install the dependencies:

conda env create -f environment.yaml

🏁 Prepare the checkpoints

The model checkpoints are hosted on Hugging Face:

Depth: prs-eth/PaGeR-depth
Metric Depth: prs-eth/PaGeR-metric-depth
Normals: prs-eth/PaGeR-normals

Models specialized for indoor scenes are also available:

Depth Indoor: prs-eth/PaGeR-depth-indoor
Metric Depth Indoor: prs-eth/PaGeR-metric-depth-indoor

As well as the Surface Normals Estimation model finetuned on Structured3D after the pretraining:

Normals-Structured3D: prs-eth/PaGeR-normals-Structured3D

You can either download them automatically by specifying the HF checkpoint name in the arguments, or download them manually and load from a local path. If you choose the latter, please preserve the original folder structure, as in the Hugging Face repository.

📥 Download the datasets

For training, testing or evaluation, you would need to choose and download one or more of the following datasets:

For download instructions, terms of use, and dataset description, please refer to the webpages of the respective datasets. We provide the dataloaders for all of these datasets. You just need to choose the respective dataset in the config file or command line argument.

📷 Local Gradio Demo

The easiest way to test PaGeR locally is to run the Gradio demo. Make sure you have installed the dependencies as described above, then run:

python app.py --enable_xformers

Now you can test the model, explore interactive 3D visualizations on both provided examples and your own images, or download the results.

🔧 Configuration settings

We use OmegaConf and argparse for configuration management in all our scripts and models. The parameters for running the script could be influenced by either setting it in the config script, or directly providing a parameter in the CLI. The latter will always take precedence. Note that the model loading parameters will always be loaded from a YAML config file stored along with the model checkpoint, and they won't be overwritten by the local config or CLI args. Feel free to set up your own configuration files; the template is given as configs/base.yaml.

🚀 Run inference

If you want to test models in the regular inference regime

# Depth
python inference.py \
    --configs "path/to/config" \
    --checkpoint_path "path/to/checkpoint" \
    --enable_xformers \
    --data_path "path/to/dataset" \
    --dataset "dataset-choice" \
    --results_path "path/to/save/results" \
    --pred_only \

⚙️ Inference settings

The behavior of the code can be customized in the following ways:

Argument	Description
`config`	Path to the YAML configuration file.
`checkpoint_path`	Model checkpoint to load (local path or HuggingFace repo ID).
`results_path`	Output directory where predictions are saved.
`dataset`	Dataset to use (list given above).
`data_path`	Root directory of the dataset.
`scenes`	Scene type to use: `indoor`, `outdoor`, or `both` (if supported).
`img_report_frequency`	Save an example output image every N samples.
`pred_only`	Save only the prediction image (otherwise saves an RGB + prediction mosaic).
`generate_eval`	Save predictions as `.npz` files for later evaluation or Point Cloud Generation.
`enable_xformers`	Enable memory-efficient attention (recommended).

🧊 Point Cloud Generation

Once the inference results are generated, you can also visualize rgb- or surface normals- colored 3D point cloud:

python generate_point_cloud.py \
    --data_path "path/to/dataset" \
    --dataset "dataset-choice" \
    --color_modality "rgb-or-normals" \
    --depth_path "path/to/depth/predictions" \
    --normals_path "path/to/normals/predictions"

Note that you should run inference with generate_eval set to True, since this code will try to load raw predictions from eval folder used for evaluation.

📊 Run Evaluation

In order to run depth evaluation of inference results of our (or some other) model with the standard set of depth estimation metrics:

# Depth
python evaluation/depth_evaluation.py \
    --pred_path "path/to/preds/folder" \
    --data_path "path/to/dataset" \
    --dataset "dataset-choice" \
    --alignment_type "alignment-type-to-apply" \
    --save_error_maps

Evaluation of the surface normals estimation could be done, similar to the PanoNormal paper, by running the following command:

# Normals
python evaluation/normals_estimation.py \
    --pred_path "path/to/preds/folder" \
    --data_path "path/to/dataset" \
    --dataset "dataset-choice" \

Finally, edge sharpness evaluation is run as:

# Edges
python evaluation/edge_estimation.py \
    --pred_path "path/to/preds/folder" \
    --data_path "path/to/dataset" \
    --dataset "dataset-choice" \

Evaluation Settings

The behavior of the code can be customized in the following ways:

Argument	Description
`data_path`	Root directory of the dataset.
`dataset`	Dataset to use (list given above).
`pred_path`	Directory containing the predicted depth maps to be evaluated.
`alignment_type`	Alignment strategy applied between prediction and ground truth before evaluation.
`save_error_maps`	If set, saves per-sample error maps during evaluation.
`error_maps_saving_frequency`	Frequency (in number of batches) at which error maps are saved.

🏋🏻 Run training

The training for both depth and surface normals model is run from the single script, for example:

python train.py \
    --config "path/to/config" \
    --modality "depth" \
    --enable_xformers \
    --data_path "path/to/dataset" \
    --dataset "PanoInfinigen" \
    --log_scale \
    ...

Note again that the CLI arguments will overwrite the arguments given in the config file.

Training settings

Here we provide an exhaustive list of training arguments along with the short description:

Global Settings

Argument	Description
`debug`	Use a small subset of the dataset; useful for quick debugging.
`seed`	A seed for reproducible training.
`enable_xformers`	Enable memory-efficient attention (recommended).

Training Configuration

Argument	Description
`num_train_epochs`	Total number of training epochs to perform.
`max_train_steps`	Total number of training steps (overrides `num_train_epochs`).
`gradient_accumulation_steps`	Number of steps to accumulate before a backward/update pass.
`only_train_attention_layers`	Train only the attention parameters of the UNet model.
`gradient_checkpointing`	Enable to save memory (slower backward pass).
`resume_path`	Training checkpoint to resume from (expects an `Accelerator` folder).
`use_EMA`	Enable Exponential Moving Average (EMA) for model weights.

Model Configuration

Argument	Description
`modality`	Modality to use for training: `depth` or `normals`.
`pretrained_path`	Path to pretrained model or HuggingFace repo ID.
`checkpoint_path`	UNet checkpoint to load (loads `.safetensors` weights only).
`unet_positional_encoding`	Type of positional encoding: `uv`, `RoPE`, or `none`.
`vae_use_RoPE`	Whether or not to use RoPE positional encoding in the VAE.
`metric_depth`	Use metric depth instead of relative depth. Depth only.
`log_scale`	Use log scale depth instead of linear. Depth only.

Data Configuration

Argument	Description
`data_path`	Root directory of the training dataset.
`dataset`	Dataset selection (e.g., `PanoInfinigen`, `Matterport3D360`).
`scenes`	Scene type to use: `indoor`, `outdoor`, or `both`.
`batch_size`	Training batch size per device.
`use_data_augmentation`	Enable data augmentation (horizontal random rotation).

Optimization

Argument	Description
`learning_rate`	Initial learning rate.
`lr_exp_warmup_steps`	Ratio of steps for exponential LR warmup (e.g., 0.03 = 3%).
`adam_beta1` / `beta2`	Beta parameters for the Adam optimizer.
`adam_weight_decay`	Weight decay to use for optimization.
`adam_epsilon`	Epsilon value for the Adam optimizer.
`clip_grad_norm`	Enable gradient clipping.
`max_grad_norm`	Max gradient norm threshold.

Loss Weights (depth training only)

Argument	Description
`l1_loss_weight`	Weight for the L1 loss term.
`grad_loss_weight`	Weight for the gradient loss term.
`normals_consistency_loss_weight`	Weight for the normals consistency loss.
`invalid_mask_weight`	Weight for the invalid mask loss.

Validation & Logging

Argument	Description
`run_validation`	Whether to use the full validation set.
`run_tiny_validation`	Whether to use a smaller validation set for mid-training checks.
`tiny_val_frequency`	Frequency for running the tiny validation (in steps).
`tracker_project_name`	Project name for the experiment tracker.
`save_path`	Directory where predictions and checkpoints are saved.
`save_frequency`	Save the model every X epochs.
`loss_report_frequency`	How often to report loss (in steps).
`img_report_frequency`	How often to report/save image examples (in steps).
`report_to`	Logging backend: `tensorboard` or `wandb`.
`run_name`	Name for the WandB run.

Resuming training

Along with the regular model checkpointing, full Accelerate checkpoint is saved as well in subfolder training_checkpoint inside the checkpointing folder. This enables the continuation of the training - set through the parameter resume_path.

✏️ Contributing

Please refer to this instruction.

🎓 Citation

Please cite our paper:

Put citations here

🎫 License

This code of this work is licensed under the Apache License, Version 2.0 (as defined in the LICENSE).

The models are licensed under RAIL++-M License (as defined in the LICENSE-MODEL)

By downloading and using the code and model you agree to the terms in LICENSE and LICENSE-MODEL respectively.

Acknowledgements

This project builds upon and is inspired by the following repositories and works:

Marigold-e2e-ft, based on paper Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think.
Marigold, based on paper Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation.

We thank the authors and maintainers for making their code publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Marigold		Marigold
assets		assets
configs		configs
dataloaders		dataloaders
evaluation		evaluation
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-MODEL		LICENSE-MODEL
README.md		README.md
app.py		app.py
environment.yml		environment.yml
generate_point_cloud.py		generate_point_cloud.py
inference.py		inference.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

📟 PaGeR: Panorama Geometry Estimation using Single-Step Diffusion Models

📢 News

🚀 Usage

🛠️ Setup

📦 Repository

💻 Dependencies

🏁 Prepare the checkpoints

📥 Download the datasets

📷 Local Gradio Demo

🔧 Configuration settings

🚀 Run inference

⚙️ Inference settings

🧊 Point Cloud Generation

📊 Run Evaluation

Evaluation Settings

🏋🏻 Run training

Training settings

Global Settings

Training Configuration

Model Configuration

Data Configuration

Optimization

Loss Weights (depth training only)

Validation & Logging

Resuming training

✏️ Contributing

🎓 Citation

🎫 License

Acknowledgements

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages