Skip to content

damaoooo/PaperReviewer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper Polish System

An LLM-based workflow for iteratively improving academic papers using two AI agents (Author and Reviewer) in an adversarial/collaborative loop.

📚 Documentation Hub | Quick Start | API Setup | Verify API | Implementation Details


Overview

The Paper Polish System implements an "Actor-Critic" workflow that leverages two LLM agents to polish academic papers:

  1. Reviewer Agent: Acts as a strict academic peer reviewer, providing critical feedback and determining when the paper is "acceptable"
  2. Author Agent: Takes feedback and rewrites the paper to address reviewer concerns
  3. Workflow Loop: Iterates between agents until the paper is accepted or max rounds reached

Key Features

Double-Blind Design: Author doesn't know the iteration number (maintains fairness) ✅ LaTeX Preservation: Uses pylatexenc to ensure LaTeX syntax integrity ✅ latexdiff Generation: Automatically generates visual diffs between iterations ✅ Robust Error Handling: Handles API rate limits, parsing errors, and LaTeX corruption ✅ Comprehensive Testing: 80+ unit tests with full mocking (no API calls during testing) ✅ Flexible Configuration: JSON config files for API credentials and parameters ✅ Logging Control: INFO/DEBUG logging shows all LLM calls for debugging ✅ API Verification: Built-in command to validate Gemini API setup before running

Project Structure

paper_polish_system/
├── config.py                 # Configuration management
├── models.py                 # Gemini API abstraction
├── latex_utils.py            # LaTeX validation & safety
├── io_handler.py             # Input/output & latexdiff
├── workflow.py               # Main workflow controller
├── main.py                   # CLI entry point
├── agents/
│   ├── __init__.py
│   └── base_agent.py         # Author & Reviewer agents
├── tests/                    # 80+ comprehensive unit tests
│   ├── test_config.py
│   ├── test_models.py
│   ├── test_latex_utils.py
│   ├── test_agents.py
│   ├── test_io_handler.py
│   └── test_workflow.py
└── requirements.txt

Installation

Prerequisites

Setup

  1. Activate conda environment:

    conda activate ml
  2. Install dependencies:

    pip install -r requirements.txt
  3. Initialize configuration file:

    python main.py init

    This creates config.json with the following structure:

    {
      "api": {
        "api_key": "<YOUR_GEMINI_API_KEY_HERE>",
        "model": "gemini-1.5-pro",
        "temperature": 0.7,
        "max_tokens": 4096
      },
      "workflow": {
        "max_rounds": 5
      },
      "reviewer": {
        "strict_mode": true
      },
      "logging": {
        "level": "WARNING"
      }
    }
  4. Fill in your Gemini API key in config.json:

    "api_key": "AIzaSyD... (your actual key)"

📖 Documentation Guide

This project includes comprehensive documentation. Here's where to find what you need:

Getting Started

API Setup & Verification

Technical Documentation

Bilingual Support

  • README_CN.md - Complete Chinese documentation (中文文档)

Usage

Quick Links

View Available Commands

python main.py --help

Output:

 Usage: main.py [OPTIONS] COMMAND [ARGS]...

 Paper Polish System - Iterative LLM-based Academic Paper Improvement

╭─ Commands ──────────────────────────────────────────────────────────────────╮
│ polish   Polish an academic paper using LLM agents.                         │
│ init     Initialize configuration file with sample values.                  │
│ verify   Verify Gemini API configuration and connectivity.                  │
╰─────────────────────────────────────────────────────────────────────────────╯

✅ Step 1: Verify Your API Setup

Before running the main workflow, validate your API configuration:

python main.py verify

This command:

  • ✓ Checks if the config file exists
  • ✓ Validates that an API key is configured
  • ✓ Tests the connection to Gemini API
  • ✓ Confirms the specified model is available
  • ✓ Provides helpful error messages if anything is wrong

Success output:

✓ API key found (length: 39 chars)
✓ API connection successful

✨ All checks passed! Your API is ready to use.

Troubleshooting? See API_VERIFICATION.md for detailed help.

📝 Step 2: Polish Your Paper

python main.py polish paper.tex

This command:

  1. Loads your LaTeX paper
  2. Creates an output directory with timestamp
  3. Runs up to 5 iterations (configurable)
  4. Displays a beautiful progress bar with status
  5. Shows final results in a formatted table
  6. Saves all iterations, feedback, and diffs

📋 Command Options

For detailed command options, see QUICKSTART.md#command-line-options or run:

python main.py polish --help

📂 Output Structure

After running, you'll find:

output/
└── paper_<timestamp>/
    ├── iteration_01.tex      # Paper after iteration 1
    ├── iteration_02.tex      # Paper after iteration 2
    ├── feedback_01.txt       # Reviewer feedback for iteration 1
    ├── diff_01.tex           # latexdiff output (if available)
    ├── final_paper.tex       # Final polished paper
    ├── SUMMARY.md            # Human-readable summary
    ├── metadata.json         # Structured workflow data
    └── paper_polish.log      # Detailed logs

🔧 Configuration & API Setup

Updating LLM API and Model

The system currently supports Google Gemini API. For comprehensive API configuration guide, see API_QUICK_REFERENCE.md.

Quick Steps:

  1. Edit config.json and update the api section:

    {
      "api": {
        "api_key": "your-new-api-key-here",
        "provider": "gemini",
        "model": "gemini-1.5-pro",
        "temperature": 0.7,
        "max_tokens": 4096,
        "timeout_seconds": 120
      }
    }
  2. Get a Gemini API Key from https://aistudio.google.com/app/apikey

  3. Verify your configuration:

    python main.py verify

Available Gemini Models:

  • gemini-1.5-pro - Most powerful (recommended for academic papers)
  • gemini-1.5-flash - Faster and lower cost
  • gemini-2.0-pro - Latest model (if available)

📖 For detailed information on:

📊 Model Selection Guide

For detailed model comparison and selection, see API_QUICK_REFERENCE.md#model-selection-guide.

gemini-1.5-pro (Recommended)

  • ✅ Most powerful model
  • ✅ Best paper understanding
  • ✅ Supports long context (100K tokens)
  • ❌ Slightly higher cost
  • Best for: High-quality feedback on academic papers

gemini-1.5-flash

  • ✅ Faster response time
  • ✅ Lower cost
  • ✅ Still high quality
  • ❌ May miss some details
  • Best for: Quick iterations, initial drafts

✔️ Verify Configuration

Test your setup with:

python main.py verify

For detailed troubleshooting, see API_VERIFICATION.md.

🏗️ Architecture & Technical Details

Data Flow

Input Paper (LaTeX)
        ↓
[ITERATION LOOP]
    ├─→ Reviewer: Analyze & Feedback
    │   └─→ Return (feedback_text, is_acceptable)
    │
    ├─→ If NOT acceptable:
    │   └─→ Author: Rewrite with feedback
    │       └─→ Validate LaTeX preservation
    │       └─→ Save iteration & diff
    │
    └─→ Loop until: accepted OR max_rounds reached
        ↓
    Output: Final paper + metadata + diffs

📚 Module Breakdown

For comprehensive architectural details, see IMPLEMENTATION.md.

Key modules:

  • config.py - Configuration management and validation
  • models.py - LLM abstraction (Gemini API with rate limiting)
  • agents/base_agent.py - Author and Reviewer agents
  • latex_utils.py - LaTeX validation and structure analysis
  • io_handler.py - File I/O and diff generation
  • workflow.py - Main workflow orchestration
  • main.py - CLI entry point

✅ Testing

All modules have comprehensive unit tests with mocking (no real API calls):

# Run all tests
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_agents.py -v

# Run with coverage
python -m pytest tests/ --cov

Total Test Coverage: 80+ unit tests across all modules

For detailed test information, see IMPLEMENTATION.md#testing.

  • Workflow initialization
  • Paper acceptance scenarios
  • Max rounds exceeded
  • Error handling
  • Metadata tracking

⚙️ Configuration Reference

Complete configuration documentation is available in API_QUICK_REFERENCE.md.

config.json Structure

{
  "api": {
    "api_key": "your-gemini-key",
    "provider": "gemini",
    "model": "gemini-1.5-pro",
    "temperature": 0.7,
    "max_tokens": 4096,
    "timeout_seconds": 120
  },
  "workflow": {
    "max_rounds": 5,
    "timeout_per_call": 60
  },
  "reviewer": {
    "strict_mode": true,
    "check_items": [
      "grammatical_correctness",
      "logical_clarity",
      "academic_rigor",
      "citation_accuracy",
      "structure_coherence"
    ]
  },
  "author": {
    "preserve_formatting": true,
    "maintain_references": true
  },
  "latex": {
    "validation_enabled": true,
    "use_pylatexenc": true
  },
  "logging": {
    "level": "WARNING",
    "log_file": "paper_polish.log",
    "log_llm_calls": false
  },
  "output": {
    "save_iterations": true,
    "generate_latexdiff": true,
    "output_dir": "output"
  }
}

🛡️ LaTeX Safety

The system ensures LaTeX integrity through multiple checks:

  1. Brace Balancing: Validates {...} nesting
  2. Environment Balancing: Checks \begin{...}\end{...} pairing
  3. Citation Preservation: Ensures citations aren't lost/added
  4. Section Preservation: Maintains document structure
  5. Math Block Detection: Identifies and protects equations

If LaTeX corruption is detected, the iteration is saved but warnings are logged.

See IMPLEMENTATION.md#latex-safety for details.

📦 Dependencies

Complete dependency list available in requirements.txt. Key packages:

  • google-generativeai: Gemini API client
  • pylatexenc: LaTeX parsing & validation
  • pydantic: Configuration validation
  • python-dotenv: Environment variable support
  • typer: Modern CLI framework
  • rich: Beautiful terminal formatting

🆘 Troubleshooting

Common Issues

"API key not found" error:

"latexdiff not found" warning:

  • This is optional; system works without it
  • Install: sudo apt-get install texlive-extra-utils (Linux)

"ModuleNotFoundError" for google.generativeai:

  • Run: pip install -r requirements.txt

Test failures:

For comprehensive troubleshooting, see:

📊 Performance

  • Per Iteration: ~30-60 seconds (API latency dependent)
  • API Calls: 1 reviewer + 1 author per iteration
  • Rate Limiting: 1 second minimum between calls
  • Token Budget: ~4000 tokens per response

🚀 Future Enhancements

  • Support for OpenAI API (GPT-4)
  • Support for Anthropic Claude
  • Section-by-section processing
  • Multi-reviewer consensus
  • Web UI
  • Integration with Overleaf
  • Batch processing

See PROJECT_COMPLETION.md for completed features.

📄 License

MIT License - See LICENSE file for details

🤝 Contributing

Contributions welcome! Please:

  1. Add tests for new features
  2. Ensure all tests pass: pytest tests/ -q
  3. Follow existing code style
  4. Update documentation

📚 Additional Resources


Built with: Python 3.10+ | Google Gemini API | pylatexenc | Typer | Rich

Latest Status: ✅ Complete and production-ready

Test Coverage: 80+ unit tests with full mocking

About

Paper Reviewer using LLM and follows the Actor Critic pattern

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages