SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

An intelligent academic paper retrieval system based on Large Language Model (LLM) agents, providing search, retrieval, and re-ranking capabilities.

🚀 Quick Start

Requirements

pip install -r requirements.txt

Basic Configuration

Configure API Keys
- Edit global_config.py to set GPT-related keys and search parameters
- For local models, refer to local_request_v2.py to configure MODEL_CONFIGS
Launch Web Interface
```
python3 demo_app_with_front.py
```
search result details can be found: here
Use Service Interface
```
python3 run_spr_agent.py $benchname
```
Supported benchname: OwnBenchmark | AutoScholarQuery

Project Structure

File	Description
`search_engine.py`	Main entry point for retrieval system
`pipeline_spar.py`	Complete SPAR processing pipeline
`search_node.py`	Specific functionality implementation for pipeline
`rerank.py`	Result re-ranking module
`global_config.py`	Global configuration file
`demo_app_with_front.py`	Visual frontend application

🔧 Advanced Configuration

Local Database Acceleration (Optional)

Download local cache database to improve query speed:

# Download database file
mkdir -p database
wget "http://flagchat.ks3-cn-beijing.ksyuncs.com/shixiaofeng/project/SPAR/arxiv_data.db?KSSAccessKeyId=AKLTkqVnZwpfTBiiu7O6iQHnA&Expires=7753081204&Signature=gaj8%2F5rJ%2BUQWp6wSr0f5KKuJdqs%3D" -O database/arxiv_data.db

Graphical Visualization (Optional)

Install Graphviz to generate tree diagrams of the retrieval process:

# Ubuntu/Debian
sudo apt-get install graphviz
pip install graphviz

# macOS
brew install graphviz
pip install graphviz

# Windows
# 1. Download and install Graphviz: https://graphviz.org/download/
# 2. pip install graphviz

Preview:

📈 Experimental Results

Output Description

Retrieval results saved to: ./figs/search_results_2025-07-22.json
Visualization charts saved in ./figs/ directory

Features

🎯 Advanced Search Mode

Query Rewriting: Automatic query expansion and refinement
Intent Analysis: Understanding search intent for better results
Reference Search: Follow citation networks for comprehensive coverage
Advanced Re-ranking: Multi-layer relevance scoring

⚡ Simple Search Mode

Multi-source Search: ArXiv, OpenAlex, PubMed integration
Basic Re-ranking: Fast relevance scoring
Batch Processing: Efficient parallel processing

🎨 Web Interface

Interactive UI: User-friendly search interface
Real-time Results: Live search progress and results
Export Options: JSON export for further analysis
Search Tree Visualization: Visual representation of search process

📖 Citation

If you use the SPAR system, please cite our work:

@misc{shi2025sparscholarpaperretrieval,
      title={SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search},
      author={Xiaofeng Shi and Yuduo Li and Qian Kou and Longbin Yu and Jinxin Xie and Hua Zhou},
      year={2025},
      eprint={2507.15245},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2507.15245},
}

📄 License

This project is licensed under the MIT License.

🤝 Contributing

Issues and Pull Requests are welcome to help improve the SPAR system!

Troubleshooting

Common Issues

API Quota Exceeded: Ensure you have sufficient API quota for LLM calls
Slow Performance: Consider using local database acceleration
Network Issues: Check network connectivity for external API calls

Performance Tips

Test on small datasets first to evaluate system performance
Use local models when possible to reduce API costs
Enable database caching for frequently accessed papers

Note: Ensure you have sufficient API quota for Large Language Model calls. It's recommended to test the system performance on small-scale data first.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
benchmark		benchmark
figs		figs
.gitignore		.gitignore
README.md		README.md
README_ZH.md		README_ZH.md
api_web.py		api_web.py
base_class.py		base_class.py
demo_app_with_front.py		demo_app_with_front.py
global_config.py		global_config.py
index.html		index.html
instruction.py		instruction.py
local_db_v2.py		local_db_v2.py
local_request_v2.py		local_request_v2.py
log.py		log.py
pipeline_spar.py		pipeline_spar.py
requirements.txt		requirements.txt
rerank.py		rerank.py
run_spr_agent.py		run_spr_agent.py
search_engine.py		search_engine.py
search_node.py		search_node.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

🚀 Quick Start

Requirements

Basic Configuration

Project Structure

🔧 Advanced Configuration

Local Database Acceleration (Optional)

Graphical Visualization (Optional)

📈 Experimental Results

Output Description

Features

🎯 Advanced Search Mode

⚡ Simple Search Mode

🎨 Web Interface

📖 Citation

📄 License

🤝 Contributing

Troubleshooting

Common Issues

Performance Tips

About

Uh oh!

Releases

Packages

Languages

xiaofengShi/SPAR

Folders and files

Latest commit

History

Repository files navigation

SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

🚀 Quick Start

Requirements

Basic Configuration

Project Structure

🔧 Advanced Configuration

Local Database Acceleration (Optional)

Graphical Visualization (Optional)

📈 Experimental Results

Output Description

Features

🎯 Advanced Search Mode

⚡ Simple Search Mode

🎨 Web Interface

📖 Citation

📄 License

🤝 Contributing

Troubleshooting

Common Issues

Performance Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages