Paper URL: arxiv:2507.15245
Dataset URL: SPARBench
An intelligent academic paper retrieval system based on Large Language Model (LLM) agents, providing search, retrieval, and re-ranking capabilities.
pip install -r requirements.txt-
Configure API Keys
- Edit
global_config.pyto set GPT-related keys and search parameters - For local models, refer to
local_request_v2.pyto configureMODEL_CONFIGS
- Edit
-
Launch Web Interface
python3 demo_app_with_front.py
search result details can be found: here
-
Use Service Interface
python3 run_spr_agent.py $benchnameSupported
benchname:OwnBenchmark|AutoScholarQuery
| File | Description |
|---|---|
search_engine.py |
Main entry point for retrieval system |
pipeline_spar.py |
Complete SPAR processing pipeline |
search_node.py |
Specific functionality implementation for pipeline |
rerank.py |
Result re-ranking module |
global_config.py |
Global configuration file |
demo_app_with_front.py |
Visual frontend application |
Download local cache database to improve query speed:
# Download database file
mkdir -p database
wget "http://flagchat.ks3-cn-beijing.ksyuncs.com/shixiaofeng/project/SPAR/arxiv_data.db?KSSAccessKeyId=AKLTkqVnZwpfTBiiu7O6iQHnA&Expires=7753081204&Signature=gaj8%2F5rJ%2BUQWp6wSr0f5KKuJdqs%3D" -O database/arxiv_data.dbInstall Graphviz to generate tree diagrams of the retrieval process:
# Ubuntu/Debian
sudo apt-get install graphviz
pip install graphviz
# macOS
brew install graphviz
pip install graphviz
# Windows
# 1. Download and install Graphviz: https://graphviz.org/download/
# 2. pip install graphviz- Retrieval results saved to:
./figs/search_results_2025-07-22.json - Visualization charts saved in
./figs/directory
- Query Rewriting: Automatic query expansion and refinement
- Intent Analysis: Understanding search intent for better results
- Reference Search: Follow citation networks for comprehensive coverage
- Advanced Re-ranking: Multi-layer relevance scoring
- Multi-source Search: ArXiv, OpenAlex, PubMed integration
- Basic Re-ranking: Fast relevance scoring
- Batch Processing: Efficient parallel processing
- Interactive UI: User-friendly search interface
- Real-time Results: Live search progress and results
- Export Options: JSON export for further analysis
- Search Tree Visualization: Visual representation of search process
If you use the SPAR system, please cite our work:
@misc{shi2025sparscholarpaperretrieval,
title={SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search},
author={Xiaofeng Shi and Yuduo Li and Qian Kou and Longbin Yu and Jinxin Xie and Hua Zhou},
year={2025},
eprint={2507.15245},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2507.15245},
}This project is licensed under the MIT License.
Issues and Pull Requests are welcome to help improve the SPAR system!
- API Quota Exceeded: Ensure you have sufficient API quota for LLM calls
- Slow Performance: Consider using local database acceleration
- Network Issues: Check network connectivity for external API calls
- Test on small datasets first to evaluate system performance
- Use local models when possible to reduce API costs
- Enable database caching for frequently accessed papers
Note: Ensure you have sufficient API quota for Large Language Model calls. It's recommended to test the system performance on small-scale data first.


