AutoGen-TraceKit — Agentic Evaluation Toolkit

AutoGen-TraceKit is a research-oriented toolkit for evaluating agentic AI systems and autonomous problem solvers. The repository collects dataset examples, evaluation scripts, and tooling to run reproducible experiments, produce evaluation metrics, and generate visualizations for analysis.

Key features:

Dataset loading and preprocessing for evaluation tasks
Modular evaluator and solver components for running model-based experiments
Utilities for reproducible experiments (seeded runs, configurable temperatures)
Built-in analysis and visualization scripts for results and summaries

See the docs/ directory for the research proposal, methodology, and literature review that motivated this work.

Project Structure

├── README.md                    # This file
├── docs/
│   ├── research_proposal.md     # Research proposal
│   ├── literature_review.md     # Literature review and references
│   ├── methodology.md           # Detailed methodology
│   └── progress_reports/        # Weekly progress reports
├── data/                        # Datasets and data files
├── experiments/                 # Experiment scripts and configs
│   └── logs/
├── results/                     # Experimental results
├── src/                         # Source code
│   ├── evaluator/
│   ├── model/
│   ├── sanity-checks/
│   └── utils/
├── visualizations/
│   └──  visualizations.py
├── .gitignore
├── .env
├── config.py
├── requirements.txt             # Project dependencies
└── run.py

Prerequisites

Python 3.8 or higher
pip (Python package installer)

Setup Instructions

Clone the Repository

git clone https://github.com/CheliM7/AutoGen-TraceKit.git

Create a Virtual Environment
```
python -m venv env
```
Activate the Virtual Environment
- On Windows:
```
.\env\Scripts\activate
```
- On macOS/Linux:
```
source env/bin/activate
```
Create a .env File In the root directory, create a file named .env and add the following values:
```
GROQ_API_KEY=
MODEL_ID=
DATA_PATH=data/math_easy_int_120.jsonl
```
Install Dependencies
```
pip install -r requirements.txt
```
Run the Project (for initial testing, only the first five rows of the dataset will be processed. Modify run.py to handle the entire dataset as needed.)
```
python run.py
```

Generate Visualizations

python src/visualizations/visualizations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoGen-TraceKit — Agentic Evaluation Toolkit

Project Structure

Prerequisites

Setup Instructions

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AutoGen-TraceKit — Agentic Evaluation Toolkit

Project Structure

Prerequisites

Setup Instructions