Skip to content

feat: add --debug and --debug-log flags for diagnostic logging#8

Merged
melanie531 merged 1 commit into
aws-samples:mainfrom
udid-aws:feat/debug-logging
May 19, 2026
Merged

feat: add --debug and --debug-log flags for diagnostic logging#8
melanie531 merged 1 commit into
aws-samples:mainfrom
udid-aws:feat/debug-logging

Conversation

@udid-aws

@udid-aws udid-aws commented May 19, 2026

Copy link
Copy Markdown
Contributor

Problem

When debugging why skill-eval functional or skill-eval trigger produces unexpected results, there's no way to see what's happening internally — which Claude command was invoked, what workspace was used, what Claude returned, or how trigger signals were classified.

Solution

Adds two global CLI flags:

  • --debug — emits DEBUG logs to stderr (visible in terminal)
  • --debug-log FILE — writes DEBUG logs to a file (terminal stays quiet)
  • Both can be combined

What gets logged

  • agent_runner: Full Claude command, working directory, exit code, elapsed time, stdout/stderr
  • functional: Skill path, eval parameters, workspace paths, file copy operations
  • trigger: Eval parameters, each query, trigger signal classification, tool calls, text output preview

Design

  • Zero external dependencies (Python stdlib logging)
  • Logs go to stderr, not stdout — tool output (reports, JSON) stays pipeable
  • File uses overwrite mode (mode="w") — each run starts fresh
  • Without either flag, behavior is identical to before — no logs emitted, no files created, no output difference
  • Audit's existing --verbose / -v flag is completely unaffected (it controls finding visibility in audit reports, not diagnostic logging)

Usage

skill-eval --debug functional myskill              # DEBUG to terminal
skill-eval --debug-log debug.log trigger myskill   # DEBUG to file only
skill-eval --debug --debug-log out.log report .    # both
skill-eval audit myskill -v                        # audit verbose unchanged

Testing

  • All 645 tests pass (8 pre-existing failures in test_config.py unrelated to this change)
  • Manually verified --debug and --debug-log with skill-eval functional and skill-eval trigger against real skills
  • Verified skill-eval audit --verbose behavior is unchanged with and without the new flags

Fixes #9

Adds global --debug and --debug-log FILE flags to the CLI. When enabled,
emits DEBUG-level logs showing:
- Full Claude CLI commands and working directories
- Claude stdout/stderr and exit codes
- Workspace setup (file copies, temp dirs)
- Trigger signal classification and tool call details

Uses Python stdlib logging (zero external deps). Logs go to stderr
(--debug) or file (--debug-log) to keep stdout clean for tool output.
Without either flag, previous behavior is unchanged.
@melanie531 melanie531 merged commit 199847c into aws-samples:main May 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add diagnostic logging for debugging eval runs

2 participants