BioEval — Computational Simulation Paper Evaluator

An AI-powered tool that scores computational and simulation research papers on data transparency, reproducibility, and code availability. Researchers paste a URL or upload a PDF; Claude analyses the paper across six dimensions and returns a structured report with findings, gaps, and recommendations.

📄 Published paper: BioEval Reproducibility Report — the system and full evaluation corpus (raw rubric v0.8.0 scores, including the information-theoretic dimension), archived with the source code on Zenodo: doi.org/10.5281/zenodo.20567720.

Scoring Rubric

Each paper is evaluated on six weighted dimensions (0–100):

Dimension	Weight	What it measures
Data Disclosure	20%	Datasets listed with repo links, accession IDs, version, access method
Dataset Resolvability	15%	Identifiers actually resolve and metadata matches the paper
Code Availability	15%	Code is public, versioned, archived, and documented
Traceability	20%	Every data-loading step maps back to a declared dataset
Simulation Clarity	20%	Parameters, distributions, and seeds are traceable to cited sources
Reproducibility Package	10%	Environment + workflow + test data + instructions + checksums

Baseline calibration: the rubric is deliberately strict, so a fixed +20 baseline adjustment is applied to each dimension (capped at 100) before weighting. Every score below is reported post-calibration, and each Overall is the weighted mean of the six calibrated dimensions.

Insect Simulation Series

A focused evaluation of fourteen publicly available insect colony and swarm simulations — bees, ants, termites, and fireflies — scored June 2026. Code Availability and Reproducibility Package are scored against live GitHub repository signals (license, releases, dependency manifests, tests, CI, recency), so every dimension is continuous rather than snapped to rubric tiers. Full per-paper PDF reports available on request.

Rankings

Rank	Score	Repo	Type	Language	Stars
1	85.4	lrdcxdes/ant-simulation	🐜 Ant	Python	—
2	73.6	lax4mike/firefly	🔆 Firefly	JS	—
3	73.1	docxology/BeeStack	🐝 Bee	Python	—
4	72.2	fractastical/antstack	🐜 Ant	Python	—
5	71.6	Dougarasu/termite-multiagent-system	🪳 Termite	C#/Unity	—
6	68.1	fractastical/bee-swarm-sim	🐝 Bee	HTML/JS	—
7	64.3	bones-ai/rust-ants-colony-simulation	🐜 Ant	Rust	★208
8	54.5	Haghrah/ACO---Robot-Path-Planning	🐜 Ant	Python	★61
9	52.7	cfrBernard/ant-colony-optimization	🐜 Ant	JS/React	★34
10	51.6	piXelicidio/locas-ants	🐜 Ant	Lua/Love2D	★161
11	51.1	tulustul/ants-sandbox	🐜 Ant	TypeScript	★106
12	43.8	MeoMix/symbiants	🐜 Ant	Rust	★235
13	41.9	darwiiiish/swarm-abc	🐝 Bee	HTML/JS	—
14	37.8	matheuslosilva/Hardware-Accelerated-Ant-Colony...	🐜 Ant	C++/CUDA	★12

Dimension Breakdown

Repo	Data Disclosure	Dataset Resolvability	Code Availability	Traceability	Sim Clarity	Repro Pkg	Overall
ant-simulation (pygame)	92	98	83	88	75	72	85.4
firefly	82	78	78	75	68	52	73.6
BeeStack	72	58	88	75	78	62	73.1
antstack	72	68	92	62	66	82	72.2
termite-multiagent	82	78	62	75	72	48	71.6
bee-swarm-sim	72	68	75	62	58	82	68.1
rust-ants-colony	82	75	72	48	52	58	64.3
ACO-robot-path-planning	52	58	68	62	50	28	54.5
ant-colony-optimization	58	62	72	42	38	50	52.7
locas-ants	48	55	82	38	38	62	51.6
ants-sandbox	48	58	78	38	40	55	51.1
symbiants	38	42	78	30	32	58	43.8
swarm-abc	38	48	62	35	40	28	41.9
hw-accel-ant-colony	32	38	58	35	30	40	37.8

Key Findings per Paper

🐜 ant-simulation (pygame) — 85.4/100 · #1

Pygame stigmergy ant colony with emergent intelligence from simple pheromone rules. MIT license, requirements.txt with the three core dependencies (Pygame, NumPy, SciPy). The clear series leader once repository signals are read. ✅ Tops the series on Data Disclosure (92) and Dataset Resolvability (98) — generative rules and config are well specified ✅ Explicit dependency manifest plus strong Traceability (88) ❌ Dependencies unpinned (pygame==, numpy==, scipy==) ❌ No release tag, commit hash, or Zenodo archive

🔆 firefly — 73.6/100 · #2

JavaScript firefly synchronization simulation. Live demo at mikelambert.me/firefly. Jumped from near-last to the top tier once repo signals were read properly. ✅ Solid Data Disclosure (82) and Dataset Resolvability (78) — a simple, fully specified model ✅ Live deployed demo for interactive inspection ❌ Sparse README; no parameter, initialization, or seed documentation ❌ No release or archive → lowest Repro Package in the top tier (52)

🐝 BeeStack — 73.1/100 · #3

Whole-colony honeybee simulation scaffold. Code archived on Zenodo with MD5 checksums, tagged release, MIT license, large passing test suite. ✅ Strong Code Availability (88) and the best Simulation Clarity in the series (78) ✅ Versioned release + SHA checksums; CLI-driven artifact regeneration ❌ Primary DOI does not resolve in Crossref → Dataset Resolvability only 58 ❌ Citation-to-parameter mapping partially opaque

🐜 antstack — 72.2/100 · #4

SHA-256 checksummed manifest system, comprehensive tests, CLI-driven artifact regeneration. External data sources (VFB, hemibrain) named in prose only without DOIs. ✅ Leads the series on Code Availability (92) and Reproducibility Package (82) ✅ SHA-256 provenance per run; comprehensive test suite ❌ No resolvable dataset identifiers; data sources named in prose only ❌ Heuristic constants undocumented → Sim Clarity 66

🪳 termite-multiagent-system — 71.6/100 · #5

Unity/C# 3D termite colony. Agent behavior reduced to two IF-THEN rules, clearly documented in README. MIT license. ✅ Behavioral rules and environment structure well documented → Data Disclosure 82 ❌ Weak Code Availability (62) — no dependency manifest, release, or tests detected ❌ No datasets or accession numbers of any kind

🐝 bee-swarm-sim — 68.1/100 · #6

Client-side agent-based waggle dance simulation. Runs zero-dependency in a browser. Citation-backed mode references 9 peer-reviewed sources (von Frisch 1967, Seeley 1995, Couvillon 2019, Menzel 2023 and others). BeeStack trace replay and JSON export supported. ✅ Strongest Reproducibility Package in the series (82) — self-contained with trace export ✅ Explicit citation-backed vs. heuristic mode distinction ❌ Only 4 commits, no tagged release, no Zenodo DOI ❌ Stochastic with no seed control → Sim Clarity 58

🐜 rust-ants-colony-simulation — 64.3/100 ★208 · #7

Ant colony simulation in Rust (Bevy engine). Clear repo structure; cargo run --release startup; KD-tree and query caching documented. ✅ Good Data Disclosure (82) and Dataset Resolvability (75) ✅ cargo run --release one-liner launch ❌ Low Traceability (48) — algorithmic choices not mapped back to sources ❌ No versioned release, pinned commit, or Zenodo archive

🐜 ACO-robot-path-planning — 54.5/100 ★61 · #8

Python ACO for robot path planning, explicitly tied to a published reference (Liu et al., 2017, Soft Computing). GPL-3.0. ✅ Linked to a specific peer-reviewed paper ❌ Lowest Reproducibility Package in the series (28) — no environment or test files ❌ No data availability statement or accession numbers

🐜 ant-colony-optimization — 52.7/100 ★34 · #9

React + HTML5 canvas ACO visualizer. MIT license; sprite/tileset/map assets bundled in-repo. ✅ Bundled map/asset files give non-trivial Dataset Resolvability (62) ❌ Low Traceability (42) and Sim Clarity (38) ❌ Only 14 commits, no tagged release or DOI

🐜 locas-ants — 51.6/100 ★161 · #10

Lua/Love2D ant colony remake. 6 versioned releases; pre-built .love binary available; MIT license. ✅ Best Code Availability in the bottom tier (82) — versioned releases + binary download ❌ Pheromone decay rates and ant rules undocumented → Traceability/Sim Clarity 38 ❌ No biological citations or dataset references

🐜 ants-sandbox — 51.1/100 ★106 · #11

TypeScript/web ant colony. Live demo at ants-sandbox.vercel.app; MIT license; npm install + run documented. ✅ Decent Code Availability (78) — standard, runnable npm repo ❌ Very low Traceability (38) and Sim Clarity (40) ❌ No datasets, citations, or parameter documentation

🐜 symbiants — 43.8/100 ★235 · #12

Rust/Bevy ant colony simulation game. Dual Apache-2.0/MIT license; devcontainer setup; native + WASM builds. ✅ Well-documented dev environment → Code Availability 78 ❌ No biological data sources or parameter citations → Traceability 30, Sim Clarity 32 ❌ Most-starred repo in the series, yet near the bottom on reproducibility

🐝 swarm-abc — 41.9/100 · #13

Artificial Bee Colony algorithm in HTML/JS. Public code; explanation page included. ✅ Explanation page documents algorithm intent ❌ No README, no license, no parameter config, no citations ❌ Reproducibility Package 28 — not practically rerunnable

🐜 hw-accel-ant-colony — 37.8/100 ★12 · #14

C++/CUDA/OpenGL hardware-accelerated ant colony swarm. 38 commits, clear structure. ✅ GPU-accelerated implementation (CUDA + OpenGL) ❌ Weakest Data Disclosure (32) and Dataset Resolvability (38) in the series ❌ No benchmark scenarios, documented results, or test files

Cross-Series Observations

Popularity ≠ reproducibility. The two most-starred repos (symbiants ★235, rust-ants ★208) land at #12 and #7; the top scorer (pygame ant-simulation) has no stars listed at all.
Stigmergy done simply wins. The pygame ant-simulation is the clear #1 — explicit dependency manifest, documented rules, and the best-specified generative model in the series.
Code Availability actually discriminates. Scoring against live GitHub signals (license, releases, manifests, tests, CI, recency) spreads this dimension from 58 (bare repos like hw-accel) to 92 (ant-stack), instead of collapsing every public repo into one value.
No external datasets — but specification quality varies widely. These are rule-based sims, so none deposit accessioned data. Scoring resolvability of the synthetic-data definition instead spreads the field from 38 (hw-accel) to 98 (pygame ant): the gap is how completely rules, parameters, and configs are written down.
The fastest path to a higher score is: (1) pin dependency versions and add a deterministic seed, (2) add a parameters table mapping every constant to its source, (3) tag a release and archive to Zenodo for a DOI, (4) commit one example output file with a checksum.

Features

Submit by URL or PDF upload — paste a journal link or drag-and-drop a PDF
Real PDF text extraction — uploaded and URL-linked PDFs are parsed to text (via unpdf) and scored on their actual contents; if a PDF yields too little readable text (e.g. a scanned image), the evaluation is flagged as an error instead of scoring an empty document
Multi-agent pipeline — four Claude agents extract evidence, resolve dataset accessions, score dimensions, and audit weak claims
Full report — findings, gaps, and prioritised recommendations per paper
Code analysis — paste simulation code to trace each segment back to the data sources and citations it depends on
Dashboard — aggregate stats across all evaluated papers with score distribution and dimension breakdown
PDF reports — generate and email structured PDF reports for any subset of evaluations
Hardened ingestion — SSRF-guarded URL fetching (blocks private/loopback/link-local targets and re-validates every redirect hop), upload size/MIME/magic-byte checks, capped/timed downloads, locked CORS, request body limits, and rate limiting

Stack

Frontend: React + Vite + Tailwind + shadcn/ui
API: Express 5 + OpenAPI (contract-first, Orval codegen)
DB: PostgreSQL + Drizzle ORM
AI: Anthropic Claude via Replit AI Integrations
PDF extraction: unpdf (bundled pure-JS pdf.js — no native deps)
Email: Resend
Security: SSRF-guarded fetch, express-rate-limit, locked CORS, body-size limits
Build: pnpm workspaces, esbuild, Node.js 24, TypeScript 5.9

Running Locally

# Install dependencies
pnpm install

# Start API server (reads PORT from env)
pnpm --filter @workspace/api-server run dev

# Start frontend (reads PORT from env)
pnpm --filter @workspace/biopaper-eval run dev

# Push DB schema changes (dev only)
pnpm --filter @workspace/db run push

# Regenerate API hooks after spec changes
pnpm --filter @workspace/api-spec run codegen

# Generate and email a report (defaults to IDs 12, 35, 36, 37, 38, 39, 40, 41)
pnpm --filter @workspace/scripts run send-report

# Or pass specific evaluation IDs
pnpm --filter @workspace/scripts run send-report 12 36

Required environment variables:

DATABASE_URL — PostgreSQL connection string
AI_INTEGRATIONS_ANTHROPIC_BASE_URL — auto-provisioned by Replit
AI_INTEGRATIONS_ANTHROPIC_API_KEY — auto-provisioned by Replit

Repository Structure

artifacts/
  api-server/          # Express API + evaluation pipeline
  biopaper-eval/       # React frontend
lib/
  api-spec/            # OpenAPI contract (source of truth)
  db/                  # Drizzle schema + migrations
  integrations-anthropic-ai/  # Anthropic client
scripts/
  src/sendReport.ts    # PDF report generator + email sender
docs/
  dashboard.jpg        # Dashboard screenshot

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.agents		.agents
artifacts		artifacts
attached_assets		attached_assets
docs		docs
lib		lib
scripts		scripts
.gitignore		.gitignore
.npmrc		.npmrc
.replit		.replit
.replitignore		.replitignore
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
replit.md		replit.md
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioEval — Computational Simulation Paper Evaluator

Scoring Rubric

Insect Simulation Series

Rankings

Dimension Breakdown

Key Findings per Paper

Cross-Series Observations

Features

Stack

Running Locally

Repository Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BioEval — Computational Simulation Paper Evaluator

Scoring Rubric

Insect Simulation Series

Rankings

Dimension Breakdown

Key Findings per Paper

Cross-Series Observations

Features

Stack

Running Locally

Repository Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages