Skip to content

ynoda714/AnyResearch-matlab

Repository files navigation

AnyResearch

Open in MATLAB Online   日本語

A MATLAB pipeline that automates research trend analysis, multi-institution comparison, and literature collection — just enter keywords.
Powered by OpenAlex and arXiv for worldwide access to scholarly metadata.

Design philosophy: Convert information published by scholarly databases into analyzed, decision-ready material.
From individual literature reviews to cross-institutional benchmarking, AnyResearch supports evidence-based research strategy decisions.

Who Uses It

Scenario Question What to use
Research theme selection Which topics are growing fastest? What fields surged in the last 5 years? Layer 0 — keyword search + Summary
Competitive technology survey What are competitors and rival institutions working on? Layer 1 — institution batch + batch_comparison
Grant proposal support Show citation trend evidence for originality claims Layer 2 — citation_velocity analysis
University IR / planning Compare your institution's research output and citation impact against peers Layer 1+2 — batch × Analytics
Literature review Comprehensively collect review articles in a specific field Layer 0 — filterType=review

Who It's For

  • Faculty — Quantify research trends; gather evidence for grant proposals
  • Graduate students — Systematize literature reviews; comprehensively collect prior work by keyword
  • IR offices / University administration — Compare your institution's research output against benchmarks
  • Industry engineers / IP departments — Survey technology trends and prior art using an existing MATLAB environment, without dedicated literature analysis tools

Four-Layer Architecture

AnyResearch is designed as four incremental layers. Layer 0 alone covers the primary use case.

Layer Additional requirements What you get
Layer 0 (Core) MATLAB + OpenAlex API Key (API Key is free) Keyword search → Excel workbook (4 sheets: Overview / Detail / Summary / Config)
Optional: fetch arXiv preprints in parallel (useArxiv=true)
Layer 1 (Batch) + institutions.csv Process multiple institutions at once → cross-institution comparison sheet (batch_comparison.xlsx)
Layer 2 (Analytics) (none — auto-integrated into Layer 0/1) Citation velocity · topic growth rate · institution dominance score → Summary and batch_comparison extensions
Layer 3 (PDF) + Text Analytics Toolbox or Python Auto-download OA PDFs → keyword evidence extraction

Quick Start (Layer 0 only — minimal setup)

1. Get an OpenAlex API Key (free)

  1. Create an account at openalex.org (takes ~30 seconds)
  2. Copy your API Key from openalex.org/settings/api
  3. Paste it into config/settings.json (copy from config/settings.example.json)
{
  "openalex": {
    "api_key": "YOUR_API_KEY_HERE"
  }
}

2. Run a keyword search

Open main_run_pipeline.m and set your query in Section 0:

query    = "renewable energy forecasting";   % search keywords
fromDate = "2023-01-01";
toDate   = "2025-12-31";
sortBy   = "cited_by_count:desc";             % "publication_date:desc" / "relevance_score"
filterType = "";                              % "article", "review", "article,review", etc.

Search syntax:

  • AND: separate with spaces (e.g. "renewable energy forecasting")
  • OR: use | (e.g. "solar|wind energy")
  • Phrase: wrap in quotes (e.g. '"deep learning"')

Run Section 0 (parameters) then Section 1 (execute) using Run Section (Ctrl+Enter).
Output is saved to result/runs/. No OpenAI key or PDF setup required.

3. Check outputs

result/runs/<YYYYMMDD_HHMMSS>/
  ├─ search_results.xlsx    ← Main output (4-sheet Excel workbook)
  ├─ search_results.jsonl   ← All data (machine-readable)
  ├─ search_results.csv     ← CSV-compatible output
  └─ run_meta.json          ← Search conditions and run metadata

Excel Output (4 sheets)

Sheet Contents
Overview Title, DOI (hyperlinked), publication year, citation count, OA flag, journal name, abstract
Detail All columns: authors, affiliations, PDF status, keyword evidence, AI summary, etc.
Summary Year-by-year paper count, average citation count, citation velocity, and growth rate
Config Search conditions, run timestamp, and API usage record

Optional Features

Layer 1: Batch Mode (IR / multi-institution comparison)

Process multiple universities or organizations in one run and generate cross-institution comparisons.

Step 1: Prepare an institution list

% Option A: Generate candidate CSV from institution name list
prepare_institutions_csv(["Nagoya University", "Kyoto University", "Osaka University"], ...
    countryFilter="JP", maxCandidates=3)
% → Outputs candidates to data/list/institutions_candidate.csv
% → Review and save as institutions.csv

% Option B: Look up one institution at a time
lookup_institution_id("Nagoya University")

Edit institutions_candidate.csv and save as data/list/institutions.csv with columns Account / openalex_institution_id.

Step 2: Run batch

main_run_batch   % processes all institutions in institutions.csv

Results are saved to result/batch/<YYYYMMDD_HHMMSS>/ per institution, with a cross-institution comparison sheet (batch_comparison.xlsx) generated automatically.

Layer 2: Analytics (auto-integrated, no extra setup)

Analytics metrics are automatically added to the Summary sheet and batch_comparison.xlsx with no additional configuration.

Metric Meaning Example use
avg_citation_velocity Average annual citation rate per paper Identify research gaining attention
growth_rate_pct Year-over-year paper count growth rate (%) Expanding fields vs. stagnant ones
institution_dominance Composite score of paper share × citation share per institution Compare competitor influence (batch runs)

Note: These are simplified metrics based on OpenAlex indexed data. Values may be skewed by field, time range, or OA rate biases. Use them as a starting point for quantitative comparison; final judgments should be made by the user with appropriate context.

arXiv Integration (Layer 0 option: useArxiv=true)

Fetch preprints from arXiv in parallel, capturing works not yet indexed by OpenAlex.

% main_run_pipeline.m — add to Section 0
useArxiv = true;   % fetch preprints from arXiv in addition (default: false)
  • arXiv records appear in JSONL / Excel with source_dataset = "arxiv"
  • DOI duplicates with OpenAlex records are automatically removed
  • When filterType = "article" is set, arXiv "preprint" records are excluded

Layer 3: PDF Extension (Text Analytics Toolbox or Python)

Feature Parameter Description
PDF download & extraction enablePdfDownload Automatically download and extract text from OA PDFs
Keyword evidence enableKeywordEvidence Extract keyword occurrence snippets from PDF text

PDF extraction uses a two-stage engine:

  • Engine 1 (primary): extractFileText() — Text Analytics Toolbox
  • Engine 2 (fallback): Python pdfminer — for environments without the Toolbox

Why MATLAB?

  • Ready to use immediately — No additional setup if MATLAB is already installed. Also runs on MATLAB Online (Basic), so it works regardless of OS or machine.
  • Stays in your workflow — Complete literature surveys in the same environment as your existing analysis scripts and simulations.
  • Fewer environment issues — No venv, dependency packages, or version conflicts. Python is not required unless you use PDF processing (Layer 3).

Directory Structure

main_run_pipeline.m         ← Single-search entry point
main_run_batch.m            ← Batch entry point
src/
  openalex/   API retrieval       adapters/   Data transformation
  export/     Excel output        pipeline/   Orchestration
  config/     Config loading      pdf/        PDF extraction (Layer 3)
  analytics/  Citation analytics  python/     Python sidecar (Layer 3)
  util/       Log helpers
config/       Configuration       data/list/  Input data (institution lists, etc.)
result/       Outputs (not tracked by Git)   test/smoke/  Smoke tests
docs/         Documentation

Requirements

Item Layer Required/Optional
MATLAB R2025b or later 0 Required
OpenAlex API Key 0 Required (free)
institutions.csv 1 Optional (batch runs)
Text Analytics Toolbox 3 Optional (PDF text extraction)
Python 3.11 + venv 3 Optional (PDF fallback)

Documentation

File Contents
docs/en/quickstart.md Detailed setup, usage guide, FAQ, and troubleshooting
docs/release_notes_v0.1.0.md v0.1.0 release notes
docs/sample/ Output samples (Excel screenshots)

Data Policy

  • OpenAlex: Data published under CC0 license. API Key is free.
  • Paper metadata: Publicly available information. Please follow your institution's policies regarding researcher personal data.
  • For full attribution details, see THIRD_PARTY_NOTICES.md.

Disclaimer

AnyResearch is provided "as is" without warranty of any kind. The authors make no guarantees regarding accuracy of retrieved metadata or uninterrupted access to the OpenAlex API. Use of collected paper metadata is subject to the policies of respective publishers and your institution.

Contributing

Bug reports and feature requests are welcome via Issues. See CONTRIBUTING.md for details.

License

MIT License — see LICENSE for details.

About

MATLAB pipeline for research trend analysis, institution comparison, and literature collection using OpenAlex and arXiv

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors