VARDiff: Vision-Augmented Retrieval-Guided Diffusion for Stock Forecasting

📌 Overview

VARDiff is a novel vision-guided diffusion framework for uncertainty-aware stock forecasting, combining the complementary strengths of diffusion models and vision-based retrieval.

Historical time series are transformed into image representations and embedded using a pretrained vision encoder to capture rich spatial features.
Using cosine similarity matching, we retrieve semantically similar historical patterns that serve as conditional guidance during the diffusion denoising process.
This retrieval-guided conditioning mechanism enables the model to generate more accurate and contextually-informed forecasts, while producing well-calibrated predictive distributions to better quantify uncertainty.

⚙️ Setup

Clone the repository:

git clone https://github.com/AppliedAI-Lab/VARDiff.git
cd VARDiff

Install dependencies: We provide a requirements.yaml file for Conda environment configured to run the model:

conda env create -f requirements.yaml
conda activate VARDiff

🚀 Usage

A quick & visually appealing guide to run the Retrieval → Diffusion pipeline for both univariate and multivariate time series.

🔹 Retrieval Process (Build Reference Database)

📈 Univariate Time Series (e.g., stock datasets in this paper)

cd retrieval
python univariate_embedding.py \
  --symbol_list <desired_dataset> \
  --his_len_list 20 40 60 80 100 \
  --step_size_list 5 \
  --num_first_layers 4

Notes:
• symbol_list → list of datasets/symbols (9 symbols in this paper)
• his_len_list → historical lengths for benchmark (future length = historical length)
• num_first_layers → number of first layers from pretrained vision encoder
• step_size_list → step sizes (details in Section 6.4 of the paper)
• ⚡ Default: number of retrieved references k = 10 by default because it can reuser for smaller cases)

Or simply use the provided script:

cd scripts
./retriever.sh

📊 Multivariate Time Series (e.g., ETT dataset)

We implement independent feature retrieval:

cd retrieval
python multivariate_embedding.py \
  --symbol <desired_dataset> \
  --his_len_list 20 40 60 80 100 \
  --step_size_list 5 \
  --num_first_layers 4

🔹 Diffusion Process (to generate forecasts)

▶️ Run on a specific dataset

Works for both univariate & multivariate:

python run_conditional.py --config ./configs/extrapolation/<desired_dataset>.yaml

⚙️ Moreover, we can un with default settings / tune hyperparameters

cd scripts
./diffusion.sh

📖 Citation

If you find this work useful, please consider citing:
@article{NGUYEN2026123113,
title = {VARDiff: vision-augmented retrieval-guided diffusion for stock forecasting},
journal = {Information Sciences},
pages = {123113},
year = {2026},
issn = {0020-0255},
doi = {https://doi.org/10.1016/j.ins.2026.123113},
url = {https://www.sciencedirect.com/science/article/pii/S0020025526000447},
author = {Thi-Thu Nguyen and Xuan-Thong Truong and Thai-Binh Nguyen and Nhat-Hai Nguyen},
keywords = {Diffusion, Image retrieval, Stock forecasting},
abstract = {Stock price forecasting is a critical yet inherently difficult task in quantitative finance due to the volatile and non-stationary nature of financial time series. While diffusion models have emerged as promising tools for capturing predictive uncertainty, their effectiveness is often limited by insufficient data and the absence of informative guidance during generation. To address these challenges, we propose VARDiff, a diffusion forecasting architecture conditioned on visual-semantic references retrieved from a historical database. Our core novelty is a cross-attention-based denoising network that operates on delay embedding (DE) image representations of time series, fusing the target trajectory with its visually similar historical counterparts retrieved via a GAF-based visual encoding pipeline using a pre-trained VGG backbone to provide structured guidance during iterative denoising. VARDiff transforms historical price sequences into image representations and extracts semantic embeddings using a pre-trained vision encoder. These embeddings facilitate the retrieval of visually similar historical trajectories, which serve as external references to guide the denoising process of the diffusion model. Extensive experiments on nine benchmark stock datasets show that VARDiff reduces forecasting errors by an average of 16.27% (MSE) and 8.12% (MAE) compared to state-of-the-art baselines. The results underscore the effectiveness of integrating vision-based retrieval into diffusion forecasting, leading to more robust and data-efficient financial prediction.}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VARDiff: Vision-Augmented Retrieval-Guided Diffusion for Stock Forecasting

📌 Overview

⚙️ Setup

🚀 Usage

🔹 Retrieval Process (Build Reference Database)

📈 Univariate Time Series (e.g., stock datasets in this paper)

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs/extrapolation		configs/extrapolation
data		data
models		models
raw_data		raw_data
retrieval		retrieval
scripts		scripts
utils		utils
visual		visual
Readme.md		Readme.md
clear_cuda.py		clear_cuda.py
requirements.yaml		requirements.yaml
run_conditional.py		run_conditional.py

Folders and files

Latest commit

History

Repository files navigation

VARDiff: Vision-Augmented Retrieval-Guided Diffusion for Stock Forecasting

📌 Overview

⚙️ Setup

🚀 Usage

🔹 Retrieval Process (Build Reference Database)

📈 Univariate Time Series (e.g., stock datasets in this paper)

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages