Skip to content

Add diagnostic logging for debugging eval runs #9

@udid-aws

Description

@udid-aws

Problem

When skill-eval functional, trigger, or report produces unexpected results, there's no way to see what's happening internally:

  • Which Claude command was invoked
  • What working directory was used
  • What Claude returned (stdout/stderr)
  • How trigger signals were classified
  • What files were copied into temp workspaces

This makes it difficult to diagnose issues like incorrect AWS profiles, trigger detection gaps, or workspace misconfiguration.

Proposed Solution

Add global --debug and --debug-log FILE flags that emit DEBUG-level logs from the key modules (agent_runner, functional, trigger) using Python's stdlib logging — zero external dependencies.

Related

PR: #8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions