RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation

This repository contains the official implementation of RAG-GFM, proposed in the paper RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation (WWW 2026).

RAG-GFM addresses the fundamental in-memory bottleneck of Graph Foundation Models (GFMs) by externalizing graph knowledge into a unified retrieval system. Instead of compressing heterogeneous semantic and structural knowledge into model parameters, RAG-GFM leverages retrieval-augmented generation to enable scalable, interpretable, and efficient cross-domain graph learning.

🔍 Key Ideas

Dual-Modal Knowledge Externalization
- Semantic Store: Prefix-structured node texts stored in a vector database for controllable semantic retrieval.
- Structural Store: Centrality-based graph motifs encoded via Walk-Spectrum Encoding (WSE) to capture higher-order structural patterns.
Cross-View Knowledge Alignment
- Self-supervised alignment between semantic and structural views during multi-domain pre-training to learn transferable priors.
In-Context Retrieval Augmentation
- Retrieved texts and motifs are injected as contextual evidence during few-shot adaptation, enabling efficient downstream learning without updating backbone parameters.

📦 Environment Setup

Install dependencies:

pip install -r requirements.txt

We recommend using Python ≥ 3.9 and a CUDA-enabled environment for efficient training and retrieval.

🚀 Running Pipeline

The overall workflow follows three stages:
(1) Knowledge Externalization → (2) Cross-View Pre-training → (3) Retrieval-Augmented Adaptation

1. Build Semantic (Text) Vector Database

python build_nano_db.py

Creates a semantic vector database from prefix-structured node texts using dense embeddings.
This database supports top-k semantic retrieval during both pre-training and fine-tuning.

2. Train Graph Motif Finders

python train_all_motif_finders.py

Trains motif encoders based on Walk-Spectrum Encoding (WSE) to identify structurally important nodes and subgraphs.
These motifs form the basis of the structural retrieval store.

3. Build Structural Motif Databases

python build_all_motif_dbs.py

Constructs motif-level vector databases from trained motif finders, enabling efficient retrieval of transferable structural patterns.

4. Run Node Classification (Few-Shot)

python execute_cora.py

Evaluates RAG-GFM on node classification under few-shot and cross-domain (LODO) settings using the Cora dataset.

5. Run Graph Classification (Few-Shot)

python execute_graph_cora.py

Evaluates graph-level classification by reformulating node-centered ego-graphs as graph instances.

📊 Supported Tasks & Settings

Tasks
- Few-shot Node Classification
- Few-shot Graph Classification
Evaluation Protocols
- Leave-One-Dataset-Out (LODO-dataset)
- Leave-One-Domain-Out (LODO-domain)
Domains
- Citation Networks (Cora, CiteSeer, PubMed)
- E-Commerce Graphs (Ogbn-Products)
- Web Link Graphs (Wiki-CS)

📂 Dataset Download

Download the required datasets from the following links:

After downloading, place all datasets in the project root directory before running experiments.

📖 Citation

If you find this work useful, please cite:

@inproceedings{yuan2026rag,
  author    = {Haonan Yuan and Qingyun Sun and Jiacheng Tao and Xingcheng Fu and Jianxin Li},
  title     = {RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation},
  booktitle = {Proceedings of the ACM Web Conference 2026 (WWW '26)},
  year      = {2026},
  publisher = {ACM},
  address   = {New York, NY, USA},
  doi       = {10.1145/3774904.3792139},
  url       = {https://doi.org/10.1145/3774904.3792139}
}

📬 Contact

For questions or discussions, please contact Haonan Yuan or open an issue in this repository.

Enjoy exploring retrieval-augmented graph foundation models! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
layers		layers
models		models
utils		utils
LICENSE		LICENSE
RAG_enhancer.py		RAG_enhancer.py
README.md		README.md
aug.py		aug.py
build_all_motif_dbs.py		build_all_motif_dbs.py
build_nano_db.py		build_nano_db.py
centrality_utils.py		centrality_utils.py
downprompt.py		downprompt.py
downprompt_graph.py		downprompt_graph.py
execute_cora.py		execute_cora.py
execute_graph_cora.py		execute_graph_cora.py
feature_enhancer.py		feature_enhancer.py
fewshot.py		fewshot.py
preprompt.py		preprompt.py
requirements.txt		requirements.txt
search_motif_db.py		search_motif_db.py
search_nano_db.py		search_nano_db.py
subgraph_dataset.py		subgraph_dataset.py
train_all_motif_finders.py		train_all_motif_finders.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation

🔍 Key Ideas

📦 Environment Setup

🚀 Running Pipeline

1. Build Semantic (Text) Vector Database

2. Train Graph Motif Finders

3. Build Structural Motif Databases

4. Run Node Classification (Few-Shot)

5. Run Graph Classification (Few-Shot)

📊 Supported Tasks & Settings

📂 Dataset Download

📖 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Languages

License

RingBDStack/RAG-GFM

Folders and files

Latest commit

History

Repository files navigation

RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation

🔍 Key Ideas

📦 Environment Setup

🚀 Running Pipeline

1. Build Semantic (Text) Vector Database

2. Train Graph Motif Finders

3. Build Structural Motif Databases

4. Run Node Classification (Few-Shot)

5. Run Graph Classification (Few-Shot)

📊 Supported Tasks & Settings

📂 Dataset Download

📖 Citation

📬 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages