Skip to content

Latest commit

 

History

History
98 lines (73 loc) · 2.72 KB

File metadata and controls

98 lines (73 loc) · 2.72 KB

AutoGen-TraceKit — Agentic Evaluation Toolkit

AutoGen-TraceKit is a research-oriented toolkit for evaluating agentic AI systems and autonomous problem solvers. The repository collects dataset examples, evaluation scripts, and tooling to run reproducible experiments, produce evaluation metrics, and generate visualizations for analysis.

Key features:

  • Dataset loading and preprocessing for evaluation tasks
  • Modular evaluator and solver components for running model-based experiments
  • Utilities for reproducible experiments (seeded runs, configurable temperatures)
  • Built-in analysis and visualization scripts for results and summaries

See the docs/ directory for the research proposal, methodology, and literature review that motivated this work.

Project Structure

├── README.md                    # This file
├── docs/
│   ├── research_proposal.md     # Research proposal
│   ├── literature_review.md     # Literature review and references
│   ├── methodology.md           # Detailed methodology
│   └── progress_reports/        # Weekly progress reports
├── data/                        # Datasets and data files
├── experiments/                 # Experiment scripts and configs
│   └── logs/
├── results/                     # Experimental results
├── src/                         # Source code
│   ├── evaluator/
│   ├── model/
│   ├── sanity-checks/
│   └── utils/
├── visualizations/
│   └──  visualizations.py
├── .gitignore
├── .env
├── config.py
├── requirements.txt             # Project dependencies
└── run.py

Prerequisites

  • Python 3.8 or higher
  • pip (Python package installer)

Setup Instructions

  1. Clone the Repository

    git clone https://github.com/CheliM7/AutoGen-TraceKit.git
  2. Create a Virtual Environment

    python -m venv env
  3. Activate the Virtual Environment

    • On Windows:

      .\env\Scripts\activate
    • On macOS/Linux:

      source env/bin/activate
  4. Create a .env File In the root directory, create a file named .env and add the following values:

    GROQ_API_KEY=
    MODEL_ID=
    DATA_PATH=data/math_easy_int_120.jsonl
  5. Install Dependencies

    pip install -r requirements.txt
  6. Run the Project (for initial testing, only the first five rows of the dataset will be processed. Modify run.py to handle the entire dataset as needed.)

    python run.py
  7. Generate Visualizations

    python src/visualizations/visualizations.py