From 8763b7c9a3c0d02bdc15cc3fe028e045fb8cfe16 Mon Sep 17 00:00:00 2001 From: 4ndrelim Date: Sat, 6 Dec 2025 23:09:31 +0800 Subject: [PATCH 1/3] Add XtraMCP notes to README --- README.md | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 88748444..d8a19f10 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,9 @@ License -**PaperDebugger** is an AI-powered academic writing assistant that helps researchers debug and improve their LaTeX papers with intelligent suggestions and seamless Overleaf integration. +**PaperDebugger** is an AI-powered academic writing assistant that helps researchers debug and improve their LaTeX papers with intelligent suggestions and seamless Overleaf integration. It is powered by a custom MCP-based orchestration engine that simulates the full academic workflow **Research → Critique → Revision**.
+This enables multi-step reasoning, reviewer-style critique, and structured revision passes beyond standard chat-based assistance. +
🚀 Install from Chrome Web Store📦 Download Latest Release @@ -39,7 +41,8 @@ - [1. Clone the Repository](#1-clone-the-repository) - [2. Start MongoDB](#2-start-mongodb) - [3. Environment Configuration](#3-environment-configuration) - - [4. Build and Run](#4-build-and-run) + - [4. Custom MCP Backend Orchestration](#4-custom-mcp-backend-orchestration) + - [5. Build and Run](#4-build-and-run) - [Frontend Extension Build](#frontend-extension-build) - [Chrome Extension Development](#chrome-extension-development) - [Installing the Development Extension](#installing-the-development-extension) @@ -53,6 +56,7 @@ PaperDebugger never modifies your project, it only reads and provides suggestion - **💬 Comment System**: Automatically generate and insert comments into your project - **📚 Prompt Library**: Custom prompt templates for different use cases - **🔒 Privacy First**: Your content stays secure - we only read, never modify +- **🧠 Multi-Agent Orchestration** – [XtraMCP](https://github.com/4ndrelim/academic-paper-mcp-server) support for literature-grounded research, AI-Conference review, and domain-specific revision https://github.com/user-attachments/assets/6c20924d-1eb6-44d5-95b0-207bd08b718b @@ -154,7 +158,19 @@ cp .env.example .env # Edit the .env file based on your configuration ``` -#### 4. Build and Run +#### 4. Custom MCP Backend Orchestration [OPTIONAL FOR LOCAL DEV] +Our enhanced orchestration backend, [**XtraMCP**](https://github.com/4ndrelim/academic-paper-mcp-server), is currently closed-source while under active development.
+You can run PaperDebugger without it; all core features (chat, formatting, edits, comments) work normally. + +Connecting to XtraMCP unlocks: +- research-mode agents, +- structured reviewer-style critique, +- domain-specific revisions tailored to academic writing powered by [XtraGPT](https://huggingface.co/Xtra-Computing/XtraGPT-14B) models + +We plan to **open-source XtraMCP** once the API stabilizes for community use. + + +#### 5. Build and Run ```bash # Build the backend make build From cfad6ace83cbc6ac8aed8f732b6c1ef18ee6e108 Mon Sep 17 00:00:00 2001 From: 4ndrelim Date: Sat, 6 Dec 2025 23:14:24 +0800 Subject: [PATCH 2/3] Update env example to add XtraMCP host field --- .env.example | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.env.example b/.env.example index 6d153bf6..bca4d1c1 100644 --- a/.env.example +++ b/.env.example @@ -1,2 +1,3 @@ OPENAI_API_KEY=dummy-key -PD_MONGO_URI="mongodb://localhost:27017" \ No newline at end of file +PD_MONGO_URI="mongodb://localhost:27017" +XTRAMCP_URI="" # currently closed-source; Pending release upon stable version \ No newline at end of file From efeac72a3987c83e141d3f4a9d99fc6b72643063 Mon Sep 17 00:00:00 2001 From: 4ndrelim Date: Sat, 6 Dec 2025 23:47:57 +0800 Subject: [PATCH 3/3] Update demo docs for XtraMCP --- demo/xtramcp/readme.md | 333 ++++++++++++++++++++++------------------- 1 file changed, 179 insertions(+), 154 deletions(-) diff --git a/demo/xtramcp/readme.md b/demo/xtramcp/readme.md index ec572bed..f37d0afe 100644 --- a/demo/xtramcp/readme.md +++ b/demo/xtramcp/readme.md @@ -1,154 +1,179 @@ -# XtraMCP Server - Orchestration Prompts - -This directory contains MCP prompts that orchestrate complex workflows by guiding the AI on how to use multiple tools together effectively. - -## Available Prompts - -### 1. `analyze_paper_find_similar` -**Purpose**: Analyze existing research papers (PDF/LaTeX) and find similar work in the academic literature. - -**Use Cases**: -- Finding papers similar to your own research -- Identifying related work for a paper you're writing -- Comparing your approach with existing methods in the literature -- Building a collection of papers related to a specific source paper - -**Arguments**: -- `paper_path` (required): Path to PDF or LaTeX file to analyze -- `analysis_focus` (optional): Focus area - 'methodology', 'application domain', 'theoretical contributions', or 'all' (default: 'all') -- `comparison_type` (optional): Type of comparison - 'similar_methods', 'related_problems', 'same_domain', 'theoretical_connections' (default: 'related_problems') -- `venues` (optional): Conference venues to search (default: ICLR.cc, NeurIPS.cc, ICML.cc) -- `years` (optional): Years to search (default: last 3 years) -- `max_papers` (optional): Maximum papers to find (default: 12) - -**Example Usage**: -``` -paper_path: "./papers/my_research_paper.pdf" -analysis_focus: "methodology" -comparison_type: "similar_methods" -max_papers: 15 -``` - -### 2. `literature_review` -**Purpose**: Conduct comprehensive and systematic literature reviews with topic-based discovery. - -**Use Cases**: -- Systematic literature reviews for research proposals -- Comprehensive coverage of a research area -- Finding papers on a specific topic or research question -- Multi-faceted topic exploration with related areas -- Building reference collections for academic writing - -**Arguments**: -- `main_topic` (required): Main research topic, research question, or paper description to investigate -- `source_context` (optional): Context from existing work, abstracts, or specific research focus to guide keyword extraction -- `related_topics` (optional): Comma-separated list of related topics, subtopics, or alternative terms to explore -- `research_scope` (optional): 'focused' (10 papers, specific), 'standard' (15 papers, balanced), 'comprehensive' (25 papers, broad coverage) (default: 'standard') -- `venues` (optional): Conference venues to search (default: ICLR.cc, NeurIPS.cc, ICML.cc) -- `time_range` (optional): 'recent' (2 years), 'standard' (3 years), 'comprehensive' (5 years) (default: 'standard') - -**Example Usage**: -``` -main_topic: "multimodal machine learning for medical imaging" -related_topics: "vision-language models, medical AI, cross-modal attention" -research_scope: "comprehensive" -time_range: "comprehensive" -``` - -## Key Differences - -| Aspect | `analyze_paper_find_similar` | `literature_review` | -|--------|------------------------------|---------------------| -| **Input** | Existing paper file (PDF/LaTeX) | Research topic/question | -| **Approach** | Paper content analysis → keyword extraction | Topic analysis → keyword strategy | -| **Focus** | Finding work similar to specific paper | Comprehensive topic coverage | -| **Output** | Papers similar to source paper | Systematic literature collection | -| **Tools Used** | `search_papers_on_openreview` → `export_papers` | `search_papers_on_openreview` → `export_papers` | -| **Export Dir** | `./papers/openreview_exports/similar_papers/` | `./papers/openreview_exports/literature_review/` | -| **Search Strategy** | High precision (min_score 0.8) | Balanced coverage (min_score 0.75) | -| **Loop Prevention** | Allowed to run more than once but avoid loops, proceed with results | Allowed to run more than once but avoid loops, proceed with results | - -## Workflow Overview - -Both prompts follow a structured approach: - -### `analyze_paper_find_similar` Workflow: -1. **Source Paper Analysis**: Extract content from PDF/LaTeX file -2. **Keyword Extraction**: Identify key concepts based on analysis focus -3. **Strategic Search**: Use `search_papers_on_openreview` tool with extracted keywords -4. **Export Collection**: Use `export_papers` tool for organized download -5. **Similarity Report**: Analyze how found papers relate to source - -### `literature_review` Workflow: -1. **Topic Analysis**: Extract effective search terms from research topic -2. **Keyword Strategy**: Develop comprehensive search approach -3. **Systematic Search**: Use `search_papers_on_openreview` tool with strategic keywords -4. **Export Organization**: Use `export_papers` tool with systematic naming -5. **Research Synthesis**: Provide structured literature analysis - -## Default Configuration - -The prompts use these optimized defaults: - -| Parameter | `analyze_paper_find_similar` | `literature_review` | -|-----------|------------------------------|---------------------| -| **Venues** | ICLR.cc, NeurIPS.cc, ICML.cc | ICLR.cc, NeurIPS.cc, ICML.cc | -| **Search Fields** | title, abstract | title, abstract | -| **Match Mode** | threshold | threshold | -| **Match Threshold** | 0.6 | 0.5 | -| **Min Score** | 0.8 (high precision) | 0.75 (balanced) | -| **Max Papers** | 12 | 10-25 (scope dependent) | -| **Years** | Last 3 years | 2-5 years (time_range dependent) | -| **Search Strategy** | Allowed to run more than once but avoid loops | ONE Allowed to run more than once but avoid loops | - -## Output Structure - -Each workflow creates: - -- **JSON Files**: Structured metadata about found papers -- **PDF Downloads**: Full paper downloads for offline reading -- **Organized Exports**: Papers saved to specific subdirectories -- **Analysis Reports**: Key findings and research insights - -### File Organization: -``` -papers/openreview_exports/ -├── similar_papers/ # analyze_paper_find_similar outputs -│ └── [source_paper]_similar_[comparison_type].json -└── literature_review/ # literature_review outputs - └── [topic]_review_[scope].json -``` - -## Integration with Tools - -These prompts orchestrate the following MCP tools in a two-step workflow: - -1. **`search_papers_on_openreview`**: Find relevant papers based on keywords and venues, returning paper IDs -2. **`export_papers`**: Download PDFs and create organized JSON collections using the paper IDs from search results - -The prompts provide precise instructions on: -- Sequential tool execution (search first, then export) -- Paper ID extraction from search results -- Tool parameter configuration -- Error handling and validation -- Output organization and naming - -## Tips for Effective Use - -### For `analyze_paper_find_similar`: -1. **File Access**: Ensure the paper path is accessible and readable -2. **Analysis Focus**: Choose specific focus for more targeted results -3. **Comparison Type**: Select based on what aspect of similarity you want -4. **File Formats**: Works with both PDF and LaTeX source files - -### For `literature_review`: -1. **Topic Clarity**: Use precise, technical terminology in your main topic -2. **Scope Selection**: Match scope to your research needs (focused/standard/comprehensive) -3. **Related Topics**: Include synonyms and alternative terms for broader coverage -4. **Context Utilization**: Provide source context to guide keyword extraction - -### General Best Practices: -1. **Venue Selection**: Add domain-specific venues for specialized topics -2. **Time Range**: Adjust based on field evolution and research currency -3. **Quality Thresholds**: Higher min_score for more precise results -4. **Export Organization**: Use descriptive names for easy file management +# XtraMCP Server – Orchestration Prompts + +XtraMCP is a **custom MCP-based orchestration server** that powers PaperDebugger’s higher-level workflows: + +- 🧑‍🔬 **Researcher** – find and position your work within the literature +- 🧑‍⚖️ **Reviewer** – critique drafts like a top-tier ML reviewer +- ✍️ **Enhancer** – perform fine-grained, context-aware rewrites +- 🧾 **Conference Formatter** (WIP) – adapt drafts to conference templates (NeurIPS, ICLR, AAAI, etc.) + +This document describes the core tools exposed by XtraMCP and how they combine into these workflows. + +> **Note:** XtraMCP is currently **closed-source** while the API and deployment story stabilize. +> PaperDebugger runs fully without it; connecting XtraMCP unlocks the advanced research/review pipelines described here. + +--- + +## Tool Overview + +| Tool Name | Role | Purpose | Primary Data Source | +|---------------------------|-----------|-----------------------------------------------------------------|-----------------------------| +| `search_relevant_papers` | Researcher | Fast semantic search over recent CS papers in a local vector DB, enhanced with semantic re-ranker module | Local vector database | +| `deep_research` | Researcher | Multi-step literature synthesis & positioning of your draft | Local DB + retrieved papers | +| `online_search_papers` | Researcher | Online search over external academic corpora | OpenReview + arXiv | +| `review_paper` | Reviewer | Conference-style structured review of a draft | Your draft | +| `enhance_academic_writing`| Enhancer | Context-aware rewriting and polishing of selected text | Your draft + XtraGPT | +| `get_user_papers`| Misc | Fetch all papers, alongside description, published (OpenReview) by a specific user identified by email | User's email address + +--- + +## 1. `search_relevant_papers` + +**Purpose:** +Search for similar or relevant papers by keywords or extracted concepts against a **local database of academic papers**.
This tool uses semantic search with vector embeddings to find the most relevant results, enhanced with a re-ranker module to better capture nuance. It is fast and the default and recommended tool for paper searches. + +**How it works:** + +- Recent CS papers (last few years) are **vectorized** into a local index. +- Queries (from your topic or draft) are embedded and matched via **similarity search**. +- Results are reranked by an **LLM-based reranker** for better semantic alignment. + +**Typical usage:** + +- “Find the 10 most relevant papers to this draft.” +- “Search for relevant works on diffusion models for imbalanced medical imaging.” + +--- + +## 2. `deep_research` + +**Purpose:** +Given a **research topic or draft paper**, perform multi-step literature exploration and synthesis. Summarize their findings, and provide insights on similarities and differences to assist in the research process. + +**How it works:** + +1. Uses `search_relevant_papers` (and optionally `online_search_papers`) to retrieve candidate works. +2. Summarizes key ideas, methods, and results from retrieved papers. +3. Performs **chain-of-thought style analysis** to: + - highlight similarities/differences vs your draft, + - surface missing baselines or evaluation settings, + - suggest how to position your contribution. + +**Typical usage:** + +- “deep_research to compare my draft to recent work on retrieval-augmented generation.” +- “For this topic, deep_research 5-10 relevant papers and explain where the open gaps are.” + +--- + +## 3. `online_search_papers` + +**Purpose:** +Expand beyond the local DB to search **online academic corpora** (OpenReview + arXiv). This tool is ideal for discovering recent or broader papers beyond those available in the local database. + +**How it works:** + +- Called when local search is **too sparse** (new topic) or you explicitly want the **latest** work. +- Queries both **OpenReview** and **arXiv** for up-to-date results. +- Results can then be fed into `deep_research` for synthesis. + +**Typical usage:** + +- “My topic is very new. Look online for the latest preprints from OpenReview/arXiv.” + +--- + +## 4. `review_paper` + +**Purpose:** +Analyze and review a draft against the standards of **top-tier ML conferences** (ICLR, ICML, NeurIPS). Identifies improvements and issues in structure, completeness, clarity, and argumentation, then provides prioritized, actionable suggestions. + +**How it works:** + +- **Pass A – Deterministic checks (fast, high-precision)** + - Required sections present (e.g., Abstract, Method, Experiments, Limitations/Broader Impact). + - Abstract contains problem, approach, core results, significance. + - Acronyms defined at first use; “TODO”, “FIXME”, “Figure ??” flags. + - Figures/tables referenced; equation references consistent; citation style uniform. + - Reproducibility signals: code/data availability, hyperparameters, seeds, compute, eval protocol. + +- **Pass B – Section-aware LLM critiques** + - Run per section with **venue-aware rubrics** (NeurIPS/ICML/ICLR style). + - Suggest *minimal, targeted edits* (what to add/remove/clarify). + - Focus on clarity, completeness, and logical flow. + +- **Pass C – Cross-checks (claims vs evidence)** + - Are “state-of-the-art” claims backed by numbers + baselines? + - Are method components properly ablated? + - Are there red flags for data leakage, HPO on test sets, or missing uncertainty reporting? + +- **Prioritization** + - Each issue is scored by severity (blocker/major/minor), impact, and confidence. + - Duplicates are merged and **top-N issues** are surfaced as “quick fixes” vs “substantial rewrites”. + +**Typical usage:** + +- “review_paper this draft like a NeurIPS reviewer and give me the top 10 issues to fix.” +- “review_paper on method clarity and experimental rigor.” + +--- + +## 5. `enhance_academic_writing` + +**Purpose:** +Suggest **context-aware academic writing enhancements** for selected text. + +**How it works:** + +- Powered by **XtraGPT models** tuned for academic style and LaTeX-heavy text. +- Uses surrounding context (section, paper intent, venue) to: + - improve clarity and flow, + - reduce redundancy and filler, + - keep technical content intact, + - align tone with ML/AI papers. + +**Typical usage:** + +- "enhance_academic_writing this paragraph to be clearer and more concise, preserving all technical details.” +- "enhance_academic_writing the abstract to be suitable for NeurIPS.” + +## 6. `get_user_papers` + +**Purpose:** +Retrieve **all papers authored by a given user** (OpenReview), identified by email. +Useful for quickly assembling a researcher’s publication list or grounding context for comparison/positioning. + +**How it works:** +- Queries the paper database for matching author email(s). +- Returns structured metadata: title, authors, venue, year, abstract, and identifiers. +- Often used as a preprocessing step before `deep_research`. + +**Typical usage:** +- “get_user_papers for in summary mode.” +- “Retrieve all publications by this researcher and then compare my draft using deep_research.” + +## 7. Conference Formatter (WIP) + +Upcoming workflows will: + +- map your draft onto specific **conference templates** (NeurIPS, ICLR, AAAI, etc.), +- adjust sectioning, citation style, and boilerplate requirements, +- highlight formatting and policy mismatches (e.g., ethics, broader impact sections). + +--- + +## Putting It Together: Example Orchestrated Flows + +- **Researcher Flow** + 1. Use `search_relevant_papers` on your draft or topic. + 2. If results are thin or stale, fall back to `online_search_papers`. + 3. Call `deep_research` to synthesize and position your work. + +- **Reviewer Flow** + 1. Run `review_paper` on the full draft. + 2. For high-impact issues, call `enhance_academic_writing` on the relevant spans. + +- **Enhancer Flow** + 1. Select a paragraph or section in Overleaf. + 2. Call `enhance_academic_writing` with your preferences (e.g., “more formal”, “shorter”). + 3. Use edit-diff tool to effect changes.