Skip to content

lishangli/PaperPulse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PaperPulse

Python 3.10+ License: MIT

πŸ“œ Paper Monitoring, Analysis & Innovative IDEA Recommendation System

PaperPulse is an automated paper monitoring system that collects papers from multiple sources, analyzes them using LLM, detects trends, generates innovative research ideas, and integrates with your note-taking workflow.

✨ Features

Feature Description
πŸ” Multi-source Collection arXiv, Semantic Scholar, Papers with Code
πŸ“₯ Document Download PDF + arXiv LaTeX source code
πŸ“„ PDF Conversion Convert PDFs to Markdown using MinerU
πŸ€– LLM Analysis Extract keywords, innovations, and limitations
πŸ“ˆ Trend Detection Identify trending keywords and emerging topics
πŸ’‘ IDEA Generation Generate innovative research ideas from papers
πŸ“ Daily Reports Markdown reports + Obsidian notes
⏰ Scheduling Configurable automated monitoring
πŸ”— Research Integration One-click launch AutoResearchClaw experiments

πŸš€ Quick Start

# Clone the repository
git clone https://github.com/yourusername/paperpulse.git
cd paperpulse

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e .

# Copy and configure
cp config.example.yaml config.yaml
# Edit config.yaml and set your OPENAI_API_KEY

# Check environment
paperpulse doctor

# Run monitoring
paperpulse monitor

πŸ“‹ CLI Commands

# Run full monitoring pipeline
paperpulse monitor

# Collect papers only
paperpulse collect --days 1 --limit 50

# Download documents (PDF and LaTeX)
paperpulse download 2301.00001

# Convert PDF to Markdown
paperpulse convert 2301.00001

# Analyze papers with LLM
paperpulse analyze 2301.00001

# Generate research ideas
paperpulse ideas --num 5

# View trending keywords
paperpulse trends

# Generate reports
paperpulse report --daily

# Sync to Obsidian vault
paperpulse obsidian

# Start research for an idea
paperpulse research idea-20260322-abc123

# Show database statistics
paperpulse db

# Environment check
paperpulse doctor

# Start daemon
paperpulse daemon

βš™οΈ Configuration

Data Sources

sources:
  arxiv:
    enabled: true
    categories: [cs.AI, cs.CL, cs.LG, cs.CV, cs.NE]
    keywords: []  # Optional filter keywords
    max_papers_per_day: 100

  semantic_scholar:
    enabled: true
    fields: ["Artificial Intelligence", "Machine Learning"]
    max_papers_per_day: 50

  papers_with_code:
    enabled: true
    areas: [natural-language-processing, computer-vision]
    max_papers_per_day: 30

arXiv Categories

Category Field
cs.AI Artificial Intelligence
cs.CL Computation and Language (NLP)
cs.LG Machine Learning
cs.CV Computer Vision
cs.NE Neural Computing
cs.RO Robotics
stat.ML Statistics - Machine Learning

Scheduler Configuration

scheduler:
  enabled: true
  daily_monitor: "0 8 * * *"     # Daily at 8:00 AM
  daily_report: "30 8 * * *"     # Daily at 8:30 AM
  weekly_summary: "0 9 * * 1"    # Monday at 9:00 AM

Obsidian Integration

obsidian:
  enabled: true
  vault_path: "/path/to/your/vault"  # Leave empty for default
  folders:
    papers: "Papers"
    ideas: "Ideas"
    daily: "Daily"
    latex: "LaTeX"

LLM Configuration

llm:
  provider: openai
  api_key_env: OPENAI_API_KEY
  base_url: https://api.openai.com/v1
  primary_model: gpt-4o
  fallback_model: gpt-4o-mini

πŸ“Š Output Example

Daily Report

# πŸ“œ PaperPulse Daily Report - 2026-03-22

> πŸ€– Discovered **47** new papers | Generated **8** innovative ideas

## πŸ”₯ Trending Keywords

| Keyword | Papers | Trend |
|---------|--------|-------|
| Mixture of Experts | 12 | ↑ 156% |
| Long Context | 9 | ↑ 89% |
| Chain of Thought | 7 | ↑ 45% |

## πŸ’‘ Top Ideas Today

### #1 [High Value] Dynamic Expert Selection for MoE in Long Context

**Score**: 0.86 ⭐⭐⭐⭐

**Description**: Current MoE expert selection faces efficiency challenges in long documents...

πŸ”— AutoResearchClaw Integration

PaperPulse integrates with AutoResearchClaw to automatically run experiments for generated ideas:

# View generated ideas
paperpulse ideas

# Start research for an idea
paperpulse research idea-20260322-abc123

# AutoResearchClaw will be invoked to generate a research paper

πŸ“ Project Structure

paperpulse/
β”œβ”€β”€ src/paperpulse/
β”‚   β”œβ”€β”€ cli.py              # CLI entry point
β”‚   β”œβ”€β”€ config.py           # Configuration management
β”‚   β”œβ”€β”€ collectors/         # Paper collection
β”‚   β”‚   β”œβ”€β”€ arxiv.py        # arXiv + LaTeX download
β”‚   β”‚   β”œβ”€β”€ semantic_scholar.py
β”‚   β”‚   └── papers_with_code.py
β”‚   β”œβ”€β”€ downloader/         # Document download
β”‚   β”‚   └── pdf.py          # PDF + LaTeX downloader
β”‚   β”œβ”€β”€ converter/          # PDF conversion
β”‚   β”‚   └── mineru.py       # MinerU integration
β”‚   β”œβ”€β”€ storage/            # Data storage
β”‚   β”‚   β”œβ”€β”€ models.py       # Data models
β”‚   β”‚   └── database.py     # SQLite database
β”‚   β”œβ”€β”€ analysis/           # LLM analysis
β”‚   β”‚   β”œβ”€β”€ llm_client.py   # OpenAI client
β”‚   β”‚   β”œβ”€β”€ paper_analyzer.py
β”‚   β”‚   └── trend_detector.py
β”‚   β”œβ”€β”€ ideas/              # Idea generation
β”‚   β”‚   β”œβ”€β”€ generator.py
β”‚   β”‚   └── scorer.py
β”‚   β”œβ”€β”€ output/             # Output generation
β”‚   β”‚   β”œβ”€β”€ markdown.py     # Report generator
β”‚   β”‚   └── obsidian.py     # Obsidian sync
β”‚   └── integration/        # External integrations
β”‚       β”œβ”€β”€ researchclaw.py # AutoResearchClaw
β”‚       └── scheduler.py    # Task scheduler
β”œβ”€β”€ data/                   # Local data storage
β”‚   β”œβ”€β”€ papers.db           # SQLite database
β”‚   β”œβ”€β”€ pdfs/               # PDF files
β”‚   β”œβ”€β”€ latex/              # LaTeX sources
β”‚   └── markdown/           # Converted markdown
β”œβ”€β”€ reports/                # Generated reports
β”œβ”€β”€ obsidian/               # Obsidian notes
β”œβ”€β”€ config.example.yaml     # Example configuration
β”œβ”€β”€ pyproject.toml          # Project metadata
└── README.md

πŸ“¦ Dependencies

Core Dependencies

pip install -e .

This installs:

  • arxiv - arXiv API client
  • openai - OpenAI API client
  • httpx - HTTP client
  • pydantic - Data validation
  • click - CLI framework
  • rich - Terminal output
  • pyyaml - YAML parsing
  • feedparser - RSS parsing
  • schedule - Task scheduling

Optional: MinerU (PDF β†’ Markdown)

For high-quality PDF to Markdown conversion:

pip install mineru[all]

MinerU is optimized for academic papers and supports:

  • Mathematical formulas
  • Tables
  • Figures
  • Multi-column layouts
  • Chinese text

πŸ”§ Advanced Usage

Custom Collection Query

# Collect from specific arXiv categories
paperpulse collect --categories cs.AI,cs.LG --limit 100

# Collect from specific date range
paperpulse collect --days 7

Batch Processing

from paperpulse.config import load_config
from paperpulse.storage.database import Database
from paperpulse.collectors.arxiv import ArxivCollector
from paperpulse.analysis.llm_client import create_llm_client
from paperpulse.analysis.paper_analyzer import PaperAnalyzer

config = load_config()
db = Database(config.storage.database)

# Collect
collector = ArxivCollector(categories=["cs.CL", "cs.AI"])
papers = collector.collect()
db.insert_papers(papers)

# Analyze
llm = create_llm_client(config.llm)
analyzer = PaperAnalyzer(llm, db)
for paper in papers:
    analyzer.analyze_paper(paper)

Obsidian Templates

PaperPulse creates structured Obsidian notes with:

  • Paper metadata (arXiv ID, DOI, authors, date)
  • Abstract and key points
  • Extracted innovations and limitations
  • BibTeX citation
  • Links to PDF/LaTeX sources
  • Research command snippets

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments


Made with ❀️ by the PaperPulse Team

About

Paper Monitoring, Analysis & Innovative IDEA Recommendation System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors