Skip to content

This repository is the official implementation of "RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation" accepted by WWW 2026.

License

Notifications You must be signed in to change notification settings

RingBDStack/RAG-GFM

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation

This repository contains the official implementation of RAG-GFM, proposed in the paper RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation (WWW 2026).

RAG-GFM addresses the fundamental in-memory bottleneck of Graph Foundation Models (GFMs) by externalizing graph knowledge into a unified retrieval system. Instead of compressing heterogeneous semantic and structural knowledge into model parameters, RAG-GFM leverages retrieval-augmented generation to enable scalable, interpretable, and efficient cross-domain graph learning.


🔍 Key Ideas

  • Dual-Modal Knowledge Externalization

    • Semantic Store: Prefix-structured node texts stored in a vector database for controllable semantic retrieval.
    • Structural Store: Centrality-based graph motifs encoded via Walk-Spectrum Encoding (WSE) to capture higher-order structural patterns.
  • Cross-View Knowledge Alignment

    • Self-supervised alignment between semantic and structural views during multi-domain pre-training to learn transferable priors.
  • In-Context Retrieval Augmentation

    • Retrieved texts and motifs are injected as contextual evidence during few-shot adaptation, enabling efficient downstream learning without updating backbone parameters.

📦 Environment Setup

Install dependencies:

pip install -r requirements.txt

We recommend using Python ≥ 3.9 and a CUDA-enabled environment for efficient training and retrieval.


🚀 Running Pipeline

The overall workflow follows three stages:
(1) Knowledge Externalization → (2) Cross-View Pre-training → (3) Retrieval-Augmented Adaptation

1. Build Semantic (Text) Vector Database

python build_nano_db.py

Creates a semantic vector database from prefix-structured node texts using dense embeddings.
This database supports top-k semantic retrieval during both pre-training and fine-tuning.


2. Train Graph Motif Finders

python train_all_motif_finders.py

Trains motif encoders based on Walk-Spectrum Encoding (WSE) to identify structurally important nodes and subgraphs.
These motifs form the basis of the structural retrieval store.


3. Build Structural Motif Databases

python build_all_motif_dbs.py

Constructs motif-level vector databases from trained motif finders, enabling efficient retrieval of transferable structural patterns.


4. Run Node Classification (Few-Shot)

python execute_cora.py

Evaluates RAG-GFM on node classification under few-shot and cross-domain (LODO) settings using the Cora dataset.


5. Run Graph Classification (Few-Shot)

python execute_graph_cora.py

Evaluates graph-level classification by reformulating node-centered ego-graphs as graph instances.


📊 Supported Tasks & Settings

  • Tasks

    • Few-shot Node Classification
    • Few-shot Graph Classification
  • Evaluation Protocols

    • Leave-One-Dataset-Out (LODO-dataset)
    • Leave-One-Domain-Out (LODO-domain)
  • Domains

    • Citation Networks (Cora, CiteSeer, PubMed)
    • E-Commerce Graphs (Ogbn-Products)
    • Web Link Graphs (Wiki-CS)

📂 Dataset Download

Download the required datasets from the following links:

After downloading, place all datasets in the project root directory before running experiments.


📖 Citation

If you find this work useful, please cite:

@inproceedings{yuan2026rag,
  author    = {Haonan Yuan and Qingyun Sun and Jiacheng Tao and Xingcheng Fu and Jianxin Li},
  title     = {RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation},
  booktitle = {Proceedings of the ACM Web Conference 2026 (WWW '26)},
  year      = {2026},
  publisher = {ACM},
  address   = {New York, NY, USA},
  doi       = {10.1145/3774904.3792139},
  url       = {https://doi.org/10.1145/3774904.3792139}
}

📬 Contact

For questions or discussions, please contact Haonan Yuan or open an issue in this repository.

Enjoy exploring retrieval-augmented graph foundation models! 🚀

About

This repository is the official implementation of "RAG-GFM: Overcoming In-Memory Bottlenecks in Graph Foundation Models via Retrieval-Augmented Generation" accepted by WWW 2026.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%