π Paper Monitoring, Analysis & Innovative IDEA Recommendation System
PaperPulse is an automated paper monitoring system that collects papers from multiple sources, analyzes them using LLM, detects trends, generates innovative research ideas, and integrates with your note-taking workflow.
| Feature | Description |
|---|---|
| π Multi-source Collection | arXiv, Semantic Scholar, Papers with Code |
| π₯ Document Download | PDF + arXiv LaTeX source code |
| π PDF Conversion | Convert PDFs to Markdown using MinerU |
| π€ LLM Analysis | Extract keywords, innovations, and limitations |
| π Trend Detection | Identify trending keywords and emerging topics |
| π‘ IDEA Generation | Generate innovative research ideas from papers |
| π Daily Reports | Markdown reports + Obsidian notes |
| β° Scheduling | Configurable automated monitoring |
| π Research Integration | One-click launch AutoResearchClaw experiments |
# Clone the repository
git clone https://github.com/yourusername/paperpulse.git
cd paperpulse
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e .
# Copy and configure
cp config.example.yaml config.yaml
# Edit config.yaml and set your OPENAI_API_KEY
# Check environment
paperpulse doctor
# Run monitoring
paperpulse monitor# Run full monitoring pipeline
paperpulse monitor
# Collect papers only
paperpulse collect --days 1 --limit 50
# Download documents (PDF and LaTeX)
paperpulse download 2301.00001
# Convert PDF to Markdown
paperpulse convert 2301.00001
# Analyze papers with LLM
paperpulse analyze 2301.00001
# Generate research ideas
paperpulse ideas --num 5
# View trending keywords
paperpulse trends
# Generate reports
paperpulse report --daily
# Sync to Obsidian vault
paperpulse obsidian
# Start research for an idea
paperpulse research idea-20260322-abc123
# Show database statistics
paperpulse db
# Environment check
paperpulse doctor
# Start daemon
paperpulse daemonsources:
arxiv:
enabled: true
categories: [cs.AI, cs.CL, cs.LG, cs.CV, cs.NE]
keywords: [] # Optional filter keywords
max_papers_per_day: 100
semantic_scholar:
enabled: true
fields: ["Artificial Intelligence", "Machine Learning"]
max_papers_per_day: 50
papers_with_code:
enabled: true
areas: [natural-language-processing, computer-vision]
max_papers_per_day: 30| Category | Field |
|---|---|
| cs.AI | Artificial Intelligence |
| cs.CL | Computation and Language (NLP) |
| cs.LG | Machine Learning |
| cs.CV | Computer Vision |
| cs.NE | Neural Computing |
| cs.RO | Robotics |
| stat.ML | Statistics - Machine Learning |
scheduler:
enabled: true
daily_monitor: "0 8 * * *" # Daily at 8:00 AM
daily_report: "30 8 * * *" # Daily at 8:30 AM
weekly_summary: "0 9 * * 1" # Monday at 9:00 AMobsidian:
enabled: true
vault_path: "/path/to/your/vault" # Leave empty for default
folders:
papers: "Papers"
ideas: "Ideas"
daily: "Daily"
latex: "LaTeX"llm:
provider: openai
api_key_env: OPENAI_API_KEY
base_url: https://api.openai.com/v1
primary_model: gpt-4o
fallback_model: gpt-4o-mini# π PaperPulse Daily Report - 2026-03-22
> π€ Discovered **47** new papers | Generated **8** innovative ideas
## π₯ Trending Keywords
| Keyword | Papers | Trend |
|---------|--------|-------|
| Mixture of Experts | 12 | β 156% |
| Long Context | 9 | β 89% |
| Chain of Thought | 7 | β 45% |
## π‘ Top Ideas Today
### #1 [High Value] Dynamic Expert Selection for MoE in Long Context
**Score**: 0.86 ββββ
**Description**: Current MoE expert selection faces efficiency challenges in long documents...PaperPulse integrates with AutoResearchClaw to automatically run experiments for generated ideas:
# View generated ideas
paperpulse ideas
# Start research for an idea
paperpulse research idea-20260322-abc123
# AutoResearchClaw will be invoked to generate a research paperpaperpulse/
βββ src/paperpulse/
β βββ cli.py # CLI entry point
β βββ config.py # Configuration management
β βββ collectors/ # Paper collection
β β βββ arxiv.py # arXiv + LaTeX download
β β βββ semantic_scholar.py
β β βββ papers_with_code.py
β βββ downloader/ # Document download
β β βββ pdf.py # PDF + LaTeX downloader
β βββ converter/ # PDF conversion
β β βββ mineru.py # MinerU integration
β βββ storage/ # Data storage
β β βββ models.py # Data models
β β βββ database.py # SQLite database
β βββ analysis/ # LLM analysis
β β βββ llm_client.py # OpenAI client
β β βββ paper_analyzer.py
β β βββ trend_detector.py
β βββ ideas/ # Idea generation
β β βββ generator.py
β β βββ scorer.py
β βββ output/ # Output generation
β β βββ markdown.py # Report generator
β β βββ obsidian.py # Obsidian sync
β βββ integration/ # External integrations
β βββ researchclaw.py # AutoResearchClaw
β βββ scheduler.py # Task scheduler
βββ data/ # Local data storage
β βββ papers.db # SQLite database
β βββ pdfs/ # PDF files
β βββ latex/ # LaTeX sources
β βββ markdown/ # Converted markdown
βββ reports/ # Generated reports
βββ obsidian/ # Obsidian notes
βββ config.example.yaml # Example configuration
βββ pyproject.toml # Project metadata
βββ README.md
pip install -e .This installs:
arxiv- arXiv API clientopenai- OpenAI API clienthttpx- HTTP clientpydantic- Data validationclick- CLI frameworkrich- Terminal outputpyyaml- YAML parsingfeedparser- RSS parsingschedule- Task scheduling
For high-quality PDF to Markdown conversion:
pip install mineru[all]MinerU is optimized for academic papers and supports:
- Mathematical formulas
- Tables
- Figures
- Multi-column layouts
- Chinese text
# Collect from specific arXiv categories
paperpulse collect --categories cs.AI,cs.LG --limit 100
# Collect from specific date range
paperpulse collect --days 7from paperpulse.config import load_config
from paperpulse.storage.database import Database
from paperpulse.collectors.arxiv import ArxivCollector
from paperpulse.analysis.llm_client import create_llm_client
from paperpulse.analysis.paper_analyzer import PaperAnalyzer
config = load_config()
db = Database(config.storage.database)
# Collect
collector = ArxivCollector(categories=["cs.CL", "cs.AI"])
papers = collector.collect()
db.insert_papers(papers)
# Analyze
llm = create_llm_client(config.llm)
analyzer = PaperAnalyzer(llm, db)
for paper in papers:
analyzer.analyze_paper(paper)PaperPulse creates structured Obsidian notes with:
- Paper metadata (arXiv ID, DOI, authors, date)
- Abstract and key points
- Extracted innovations and limitations
- BibTeX citation
- Links to PDF/LaTeX sources
- Research command snippets
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- arXiv for providing open access to papers
- Semantic Scholar for their API
- Papers with Code for linking papers to code
- MinerU for PDF parsing
- AutoResearchClaw for research automation
Made with β€οΈ by the PaperPulse Team