🛰️ DiscoveryBench Integration
This issue tracks the integration of DiscoveryBench, a benchmark designed to evaluate multi-step scientific discovery tasks, into OpenHands. The integration will assess OpenHands’ capability to handle complex, data-driven workflows and problem-solving.
📋 Tasks
1. Set up DiscoveryBench
2. Initialize Runtime
3. Run Evaluation and Extract Responses
4. Log and Manage Evaluation Outputs
5. Validate Integration
🛰️ DiscoveryBench Integration
This issue tracks the integration of DiscoveryBench, a benchmark designed to evaluate multi-step scientific discovery tasks, into OpenHands. The integration will assess OpenHands’ capability to handle complex, data-driven workflows and problem-solving.
📋 Tasks
1. Set up DiscoveryBench
2. Initialize Runtime
3. Run Evaluation and Extract Responses
4. Log and Manage Evaluation Outputs
5. Validate Integration