This organization contains the source code for the GSO benchmark, including:
- GSO, a benchmark for evaluating AI systems on real world GitHub issues.
- Experiments, execution logs, trajectories, and results from evaluation runs on GSO.
- Scaffolds, examples of running popular Agents via Harbor (or) a specific harness like OpenHands on GSO.