ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
agent science benchmark ai end-to-end evaluation discovery openai codex claude ai-agent ai4science llm ai-scientist claude-code clawdbot openclaw auto-research research-claw
-
Updated
Mar 21, 2026 - Jupyter Notebook