This document covers two types of validation:
- Code Example Tests - Automated tests verifying example applications work
- Agent Integration Tests - Manual/automated validation that AI agents can use these skills effectively
Each example includes tests for signature verification. Run them individually:
Express examples:
cd skills/stripe-webhooks/examples/express
npm install
npm testFastAPI examples:
cd skills/stripe-webhooks/examples/fastapi
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pytest test_webhook.pyNext.js examples:
cd skills/stripe-webhooks/examples/nextjs
npm install
npm testUse the test runner script to run all examples:
# All skills with examples
./scripts/test-examples.sh
# Specific skill(s)
./scripts/test-examples.sh stripe-webhooks
./scripts/test-examples.sh stripe-webhooks github-webhooksTests run automatically on PR and push to main via GitHub Actions. See .github/workflows/test-examples.yml.
Validate that AI agents (Cursor, Claude, Copilot) can successfully use these skills to:
- Find relevant skills when asked about webhooks
- Read and follow skill instructions
- Generate working code that matches expected patterns
- Handle edge cases correctly (raw body, signature verification)
Setup:
- Create a new Express project:
npm init -y && npm install express - Install the skill:
npx skills add hookdeck/webhook-skills --skill stripe-webhooks
Prompt:
"Add Stripe webhook handling to my Express app. I want to handle payment_intent.succeeded events."
Expected Behaviors:
- Agent reads
stripe-webhooks/SKILL.md - Agent references
references/verification.mdfor signature details - Generated code uses
express.raw({ type: 'application/json' })middleware - Generated code calls
stripe.webhooks.constructEvent() - Generated code handles the specific event type requested
- Generated code returns 200 on success, 400 on verification failure
Evaluation:
| Criterion | Points | Result |
|---|---|---|
| Skill discovery (read SKILL.md) | 1 | |
| Correct verification method | 2 | |
| Raw body handling | 2 | |
| Error handling (status codes) | 1 | |
| Code runs without errors | 2 | |
| Tests pass | 2 | |
| Total | 10 |
Setup:
- Create a Next.js project:
npx create-next-app@latest my-app --typescript - Install the skill:
npx skills add hookdeck/webhook-skills --skill shopify-webhooks
Prompt:
"Add a Shopify webhook endpoint to handle orders/create events in my Next.js app."
Expected Behaviors:
- Agent reads
shopify-webhooks/SKILL.md - Agent creates a route handler at
app/webhooks/shopify/route.ts - Generated code reads raw body with
request.text() - Generated code verifies HMAC SHA-256 signature (base64)
- Generated code uses
crypto.timingSafeEqual()for comparison - Generated code handles the X-Shopify-Topic header
Evaluation:
| Criterion | Points | Result |
|---|---|---|
| Skill discovery | 1 | |
| Correct file location (App Router) | 1 | |
| Correct verification (base64 HMAC) | 2 | |
| Raw body handling | 2 | |
| Error handling | 1 | |
| Code runs | 2 | |
| Tests pass | 1 | |
| Total | 10 |
Setup:
- Create a FastAPI project with main.py
- Install the skill:
npx skills add hookdeck/webhook-skills --skill github-webhooks
Prompt:
"Add a GitHub webhook endpoint to my FastAPI app. I need to handle push and pull_request events."
Expected Behaviors:
- Agent reads
github-webhooks/SKILL.md - Generated code reads raw body with
await request.body() - Generated code verifies X-Hub-Signature-256 header
- Generated code uses hex-encoded HMAC SHA-256
- Generated code uses
hmac.compare_digest()for timing-safe comparison - Generated code handles multiple event types via X-GitHub-Event header
Evaluation:
| Criterion | Points | Result |
|---|---|---|
| Skill discovery | 1 | |
| Correct verification (hex SHA-256) | 2 | |
| Raw body handling | 2 | |
| Event type routing | 1 | |
| Error handling | 1 | |
| Code runs | 2 | |
| Tests pass | 1 | |
| Total | 10 |
Setup:
- Existing Express app with webhook endpoint
- Install the skill:
npx skills add hookdeck/webhook-skills --skill hookdeck-event-gateway-webhooks
Prompt:
"I'm receiving webhooks through Hookdeck. Add signature verification for Hookdeck's signature."
Expected Behaviors:
- Agent reads
hookdeck-event-gateway-webhooks/SKILL.md - Agent references
references/verification.md - Generated code verifies
x-hookdeck-signatureheader - Generated code uses base64-encoded HMAC SHA-256
- Generated code logs or uses
x-hookdeck-event-idfor idempotency
Evaluation:
| Criterion | Points | Result |
|---|---|---|
| Skill discovery | 1 | |
| Correct header (x-hookdeck-signature) | 1 | |
| Correct verification (base64) | 2 | |
| Raw body handling | 2 | |
| Error handling | 1 | |
| Code runs | 2 | |
| Tests pass | 1 | |
| Total | 10 |
Setup:
- Existing webhook handler
- Install the skill:
npx skills add hookdeck/webhook-skills --skill webhook-handler-patterns
Prompt:
"My webhook handler is processing duplicate events. How do I make it idempotent?"
Expected Behaviors:
- Agent reads
webhook-handler-patterns/SKILL.md - Agent references
references/idempotency.md - Generated code extracts event ID from payload
- Generated code checks for previously processed events
- Generated code stores processed event IDs
- Generated code returns success for duplicate events (doesn't reprocess)
Evaluation:
| Criterion | Points | Result |
|---|---|---|
| Skill discovery | 1 | |
| Event ID extraction | 2 | |
| Duplicate check logic | 2 | |
| Storage pattern | 2 | |
| Safe handling of duplicates | 2 | |
| Pattern correctness | 1 | |
| Total | 10 |
Setup:
- Express app with failing webhook verification
- Install patterns skill
Prompt:
"My Stripe webhook signature verification is failing. The webhook works in Postman but not in my Express app."
Expected Behaviors:
- Agent reads
webhook-handler-patterns/references/frameworks/express.md - Agent identifies the raw body parsing issue
- Agent explains
express.raw()vsexpress.json()ordering - Agent provides corrected middleware configuration
- Solution addresses the specific Express gotcha
Evaluation:
| Criterion | Points | Result |
|---|---|---|
| Skill/reference discovery | 2 | |
| Correct diagnosis (body parsing) | 3 | |
| Correct solution | 3 | |
| Clear explanation | 2 | |
| Total | 10 |
-
Create fresh environment
- New directory with minimal starter code
- Install required skill(s)
-
Configure agent access
- Ensure skill files are in agent's context (project-local or global)
- For Cursor: Verify skills appear in @ mentions
-
Execute prompt
- Use exact prompt from scenario
- Record agent's response and actions
-
Evaluate results
- Check each expected behavior
- Score against rubric
- Note any issues or deviations
-
Document findings
- Record score
- Note what worked/didn't
- Capture any agent feedback or errors
## Test Run: [Scenario Name]
Date: YYYY-MM-DD
Agent: [Cursor/Claude/Copilot]
Tester: [Name]
### Environment
- Project type: [Express/Next.js/FastAPI]
- Skills installed: [list]
### Agent Actions
1. [What agent read/did first]
2. [Subsequent actions]
...
### Generated Code
[Paste generated code here]
### Behavior Checklist
- [x] Behavior 1
- [ ] Behavior 2 (note: what went wrong)
...
### Score
| Criterion | Points | Result |
|-----------|--------|--------|
| ... | ... | ... |
| **Total** | **10** | **X** |
### Notes
[Any observations, issues, or improvements needed]After multiple test runs, aggregate:
- Skill Discovery Rate: % of runs where agent found and read relevant skill
- Pattern Compliance: % of expected code patterns present in output
- Functional Success: % of generated code that passes tests
- Time to Completion: Average time for agent to complete task
- Error Rate: % of runs with syntax errors or crashes
For skills to be considered effective:
- Skill discovery rate > 90%
- Pattern compliance > 80%
- Functional success > 70%
- Generated code should pass example tests when run
The repository includes a script to automate agent test scenarios:
# List available providers and frameworks
./scripts/test-agent-scenario.sh --help
# Run a test scenario (dry-run to see what would happen)
./scripts/test-agent-scenario.sh stripe express --dry-run
# Run actual test (requires Claude CLI)
./scripts/test-agent-scenario.sh stripe express
# Test with other providers/frameworks
./scripts/test-agent-scenario.sh shopify nextjs
./scripts/test-agent-scenario.sh github fastapi
./scripts/test-agent-scenario.sh hookdeck-event-gateway express- Creates a fresh project directory in
/tmp/webhook-skills-agent-test/ - Initializes the project based on framework (Express, Next.js, or FastAPI)
- Installs the relevant skill via
npx skills add - Runs Claude CLI with a context-aware prompt
- Saves results to
test-results/for manual evaluation
Test prompts and events are configured in providers.yaml at the repository root:
providers:
- name: stripe
displayName: Stripe
# ... docs, notes ...
testScenario:
events:
- payment_intent.succeeded
- checkout.session.completed
# Optional custom prompt (uses default if not specified)
# prompt: "Custom prompt with {Provider}, {framework}, {events} placeholders"To add a new test scenario, add the testScenario field to the provider in providers.yaml.
tests/agent/
├── scenarios/
│ ├── stripe-express-basic.json
│ ├── shopify-nextjs-basic.json
│ └── ...
├── templates/
│ ├── express-starter/
│ ├── nextjs-starter/
│ └── fastapi-starter/
├── evaluator.ts
└── runner.ts
{
"id": "stripe-express-basic",
"name": "Stripe Webhook Setup (Express)",
"template": "express-starter",
"skills": ["stripe-webhooks"],
"prompt": "Add Stripe webhook handling to my Express app...",
"expectations": {
"files_read": [
"stripe-webhooks/SKILL.md",
"stripe-webhooks/references/verification.md"
],
"files_created": [
"src/webhooks/stripe.js"
],
"patterns": [
"stripe.webhooks.constructEvent",
"express.raw",
"stripe-signature"
],
"must_not_contain": [
"express.json().*webhooks"
]
},
"functional_test": "npm test"
}interface EvaluationResult {
skillDiscovery: boolean;
patternsFound: string[];
patternsMissing: string[];
antiPatternsFound: string[];
testsPass: boolean;
score: number;
}
function evaluate(scenario: Scenario, agentOutput: AgentOutput): EvaluationResult {
// Check files read
const skillDiscovery = scenario.expectations.files_read.some(
file => agentOutput.filesRead.includes(file)
);
// Check patterns in generated code
const patternsFound = scenario.expectations.patterns.filter(
pattern => agentOutput.generatedCode.includes(pattern)
);
// Run functional tests
const testsPass = runTests(scenario.functional_test);
// Calculate score
const score = calculateScore(skillDiscovery, patternsFound, testsPass);
return { skillDiscovery, patternsFound, patternsMissing, antiPatternsFound, testsPass, score };
}This automation framework would enable continuous evaluation of skill effectiveness as the repository evolves.