feat: add --debug and --debug-log flags for diagnostic logging by udid-aws · Pull Request #8 · aws-samples/sample-agent-skill-eval

udid-aws · 2026-05-19T04:45:44Z

Problem

When debugging why skill-eval functional or skill-eval trigger produces unexpected results, there's no way to see what's happening internally — which Claude command was invoked, what workspace was used, what Claude returned, or how trigger signals were classified.

Solution

Adds two global CLI flags:

--debug — emits DEBUG logs to stderr (visible in terminal)
--debug-log FILE — writes DEBUG logs to a file (terminal stays quiet)
Both can be combined

What gets logged

agent_runner: Full Claude command, working directory, exit code, elapsed time, stdout/stderr
functional: Skill path, eval parameters, workspace paths, file copy operations
trigger: Eval parameters, each query, trigger signal classification, tool calls, text output preview

Design

Zero external dependencies (Python stdlib logging)
Logs go to stderr, not stdout — tool output (reports, JSON) stays pipeable
File uses overwrite mode (mode="w") — each run starts fresh
Without either flag, behavior is identical to before — no logs emitted, no files created, no output difference
Audit's existing --verbose / -v flag is completely unaffected (it controls finding visibility in audit reports, not diagnostic logging)

Usage

skill-eval --debug functional myskill              # DEBUG to terminal
skill-eval --debug-log debug.log trigger myskill   # DEBUG to file only
skill-eval --debug --debug-log out.log report .    # both
skill-eval audit myskill -v                        # audit verbose unchanged

Testing

All 645 tests pass (8 pre-existing failures in test_config.py unrelated to this change)
Manually verified --debug and --debug-log with skill-eval functional and skill-eval trigger against real skills
Verified skill-eval audit --verbose behavior is unchanged with and without the new flags

Fixes #9

Adds global --debug and --debug-log FILE flags to the CLI. When enabled, emits DEBUG-level logs showing: - Full Claude CLI commands and working directories - Claude stdout/stderr and exit codes - Workspace setup (file copies, temp dirs) - Trigger signal classification and tool call details Uses Python stdlib logging (zero external deps). Logs go to stderr (--debug) or file (--debug-log) to keep stdout clean for tool output. Without either flag, previous behavior is unchanged.

udid-aws mentioned this pull request May 19, 2026

Add diagnostic logging for debugging eval runs #9

Closed

melanie531 merged commit 199847c into aws-samples:main May 19, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --debug and --debug-log flags for diagnostic logging#8

feat: add --debug and --debug-log flags for diagnostic logging#8
melanie531 merged 1 commit into
aws-samples:mainfrom
udid-aws:feat/debug-logging

udid-aws commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

udid-aws commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

What gets logged

Design

Usage

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

udid-aws commented May 19, 2026 •

edited

Loading