Testing Webhook Skills

This document covers two types of validation:

Code Example Tests - Automated tests verifying example applications work
Agent Integration Tests - Manual/automated validation that AI agents can use these skills effectively

Code Example Testing

Running Tests Locally

Each example includes tests for signature verification. Run them individually:

Express examples:

cd skills/stripe-webhooks/examples/express
npm install
npm test

FastAPI examples:

cd skills/stripe-webhooks/examples/fastapi
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pytest test_webhook.py

Next.js examples:

cd skills/stripe-webhooks/examples/nextjs
npm install
npm test

Running All Tests

Use the test runner script to run all examples:

# All skills with examples
./scripts/test-examples.sh

# Specific skill(s)
./scripts/test-examples.sh stripe-webhooks
./scripts/test-examples.sh stripe-webhooks github-webhooks

CI Pipeline

Tests run automatically on PR and push to main via GitHub Actions. See .github/workflows/test-examples.yml.

Agent Integration Testing

Purpose

Validate that AI agents (Cursor, Claude, Copilot) can successfully use these skills to:

Find relevant skills when asked about webhooks
Read and follow skill instructions
Generate working code that matches expected patterns
Handle edge cases correctly (raw body, signature verification)

Test Scenarios

Scenario 1: Stripe Webhook Setup (Express)

Setup:

Create a new Express project: npm init -y && npm install express
Install the skill: npx skills add hookdeck/webhook-skills --skill stripe-webhooks

Prompt:

"Add Stripe webhook handling to my Express app. I want to handle payment_intent.succeeded events."

Expected Behaviors:

Agent reads stripe-webhooks/SKILL.md
Agent references references/verification.md for signature details
Generated code uses express.raw({ type: 'application/json' }) middleware
Generated code calls stripe.webhooks.constructEvent()
Generated code handles the specific event type requested
Generated code returns 200 on success, 400 on verification failure

Evaluation:

Criterion	Points	Result
Skill discovery (read SKILL.md)	1
Correct verification method	2
Raw body handling	2
Error handling (status codes)	1
Code runs without errors	2
Tests pass	2
Total	10

Scenario 2: Shopify Webhook Setup (Next.js)

Setup:

Create a Next.js project: npx create-next-app@latest my-app --typescript
Install the skill: npx skills add hookdeck/webhook-skills --skill shopify-webhooks

Prompt:

"Add a Shopify webhook endpoint to handle orders/create events in my Next.js app."

Expected Behaviors:

Agent reads shopify-webhooks/SKILL.md
Agent creates a route handler at app/webhooks/shopify/route.ts
Generated code reads raw body with request.text()
Generated code verifies HMAC SHA-256 signature (base64)
Generated code uses crypto.timingSafeEqual() for comparison
Generated code handles the X-Shopify-Topic header

Evaluation:

Criterion	Points	Result
Skill discovery	1
Correct file location (App Router)	1
Correct verification (base64 HMAC)	2
Raw body handling	2
Error handling	1
Code runs	2
Tests pass	1
Total	10

Scenario 3: GitHub Webhook Setup (FastAPI)

Setup:

Create a FastAPI project with main.py
Install the skill: npx skills add hookdeck/webhook-skills --skill github-webhooks

Prompt:

"Add a GitHub webhook endpoint to my FastAPI app. I need to handle push and pull_request events."

Expected Behaviors:

Agent reads github-webhooks/SKILL.md
Generated code reads raw body with await request.body()
Generated code verifies X-Hub-Signature-256 header
Generated code uses hex-encoded HMAC SHA-256
Generated code uses hmac.compare_digest() for timing-safe comparison
Generated code handles multiple event types via X-GitHub-Event header

Evaluation:

Criterion	Points	Result
Skill discovery	1
Correct verification (hex SHA-256)	2
Raw body handling	2
Event type routing	1
Error handling	1
Code runs	2
Tests pass	1
Total	10

Scenario 4: Hookdeck Event Gateway Webhooks Setup

Setup:

Existing Express app with webhook endpoint
Install the skill: npx skills add hookdeck/webhook-skills --skill hookdeck-event-gateway-webhooks

Prompt:

"I'm receiving webhooks through Hookdeck. Add signature verification for Hookdeck's signature."

Expected Behaviors:

Agent reads hookdeck-event-gateway-webhooks/SKILL.md
Agent references references/verification.md
Generated code verifies x-hookdeck-signature header
Generated code uses base64-encoded HMAC SHA-256
Generated code logs or uses x-hookdeck-event-id for idempotency

Evaluation:

Criterion	Points	Result
Skill discovery	1
Correct header (x-hookdeck-signature)	1
Correct verification (base64)	2
Raw body handling	2
Error handling	1
Code runs	2
Tests pass	1
Total	10

Scenario 5: Idempotency Implementation

Setup:

Existing webhook handler
Install the skill: npx skills add hookdeck/webhook-skills --skill webhook-handler-patterns

Prompt:

"My webhook handler is processing duplicate events. How do I make it idempotent?"

Expected Behaviors:

Agent reads webhook-handler-patterns/SKILL.md
Agent references references/idempotency.md
Generated code extracts event ID from payload
Generated code checks for previously processed events
Generated code stores processed event IDs
Generated code returns success for duplicate events (doesn't reprocess)

Evaluation:

Criterion	Points	Result
Skill discovery	1
Event ID extraction	2
Duplicate check logic	2
Storage pattern	2
Safe handling of duplicates	2
Pattern correctness	1
Total	10

Scenario 6: Framework-Specific Debugging

Setup:

Express app with failing webhook verification
Install patterns skill

Prompt:

"My Stripe webhook signature verification is failing. The webhook works in Postman but not in my Express app."

Expected Behaviors:

Agent reads webhook-handler-patterns/references/frameworks/express.md
Agent identifies the raw body parsing issue
Agent explains express.raw() vs express.json() ordering
Agent provides corrected middleware configuration
Solution addresses the specific Express gotcha

Evaluation:

Criterion	Points	Result
Skill/reference discovery	2
Correct diagnosis (body parsing)	3
Correct solution	3
Clear explanation	2
Total	10

Running Manual Tests

Protocol

Create fresh environment
- New directory with minimal starter code
- Install required skill(s)
Configure agent access
- Ensure skill files are in agent's context (project-local or global)
- For Cursor: Verify skills appear in @ mentions
Execute prompt
- Use exact prompt from scenario
- Record agent's response and actions
Evaluate results
- Check each expected behavior
- Score against rubric
- Note any issues or deviations
Document findings
- Record score
- Note what worked/didn't
- Capture any agent feedback or errors

Recording Template

## Test Run: [Scenario Name]
Date: YYYY-MM-DD
Agent: [Cursor/Claude/Copilot]
Tester: [Name]

### Environment
- Project type: [Express/Next.js/FastAPI]
- Skills installed: [list]

### Agent Actions
1. [What agent read/did first]
2. [Subsequent actions]
...

### Generated Code
[Paste generated code here]

### Behavior Checklist
- [x] Behavior 1
- [ ] Behavior 2 (note: what went wrong)
...

### Score
| Criterion | Points | Result |
|-----------|--------|--------|
| ... | ... | ... |
| **Total** | **10** | **X** |

### Notes
[Any observations, issues, or improvements needed]

Metrics to Track

After multiple test runs, aggregate:

Skill Discovery Rate: % of runs where agent found and read relevant skill
Pattern Compliance: % of expected code patterns present in output
Functional Success: % of generated code that passes tests
Time to Completion: Average time for agent to complete task
Error Rate: % of runs with syntax errors or crashes

Success Criteria

For skills to be considered effective:

Skill discovery rate > 90%
Pattern compliance > 80%
Functional success > 70%
Generated code should pass example tests when run

Running Agent Tests with Scripts

The repository includes a script to automate agent test scenarios:

# List available providers and frameworks
./scripts/test-agent-scenario.sh --help

# Run a test scenario (dry-run to see what would happen)
./scripts/test-agent-scenario.sh stripe express --dry-run

# Run actual test (requires Claude CLI)
./scripts/test-agent-scenario.sh stripe express

# Test with other providers/frameworks
./scripts/test-agent-scenario.sh shopify nextjs
./scripts/test-agent-scenario.sh github fastapi
./scripts/test-agent-scenario.sh hookdeck-event-gateway express

How It Works

Creates a fresh project directory in /tmp/webhook-skills-agent-test/
Initializes the project based on framework (Express, Next.js, or FastAPI)
Installs the relevant skill via npx skills add
Runs Claude CLI with a context-aware prompt
Saves results to test-results/ for manual evaluation

Configuration

Test prompts and events are configured in providers.yaml at the repository root:

providers:
  - name: stripe
    displayName: Stripe
    # ... docs, notes ...
    testScenario:
      events:
        - payment_intent.succeeded
        - checkout.session.completed
      # Optional custom prompt (uses default if not specified)
      # prompt: "Custom prompt with {Provider}, {framework}, {events} placeholders"

To add a new test scenario, add the testScenario field to the provider in providers.yaml.

Automating Agent Tests (Future)

Architecture

tests/agent/
├── scenarios/
│   ├── stripe-express-basic.json
│   ├── shopify-nextjs-basic.json
│   └── ...
├── templates/
│   ├── express-starter/
│   ├── nextjs-starter/
│   └── fastapi-starter/
├── evaluator.ts
└── runner.ts

Scenario Format

{
  "id": "stripe-express-basic",
  "name": "Stripe Webhook Setup (Express)",
  "template": "express-starter",
  "skills": ["stripe-webhooks"],
  "prompt": "Add Stripe webhook handling to my Express app...",
  "expectations": {
    "files_read": [
      "stripe-webhooks/SKILL.md",
      "stripe-webhooks/references/verification.md"
    ],
    "files_created": [
      "src/webhooks/stripe.js"
    ],
    "patterns": [
      "stripe.webhooks.constructEvent",
      "express.raw",
      "stripe-signature"
    ],
    "must_not_contain": [
      "express.json().*webhooks"
    ]
  },
  "functional_test": "npm test"
}

Evaluation Logic

interface EvaluationResult {
  skillDiscovery: boolean;
  patternsFound: string[];
  patternsMissing: string[];
  antiPatternsFound: string[];
  testsPass: boolean;
  score: number;
}

function evaluate(scenario: Scenario, agentOutput: AgentOutput): EvaluationResult {
  // Check files read
  const skillDiscovery = scenario.expectations.files_read.some(
    file => agentOutput.filesRead.includes(file)
  );
  
  // Check patterns in generated code
  const patternsFound = scenario.expectations.patterns.filter(
    pattern => agentOutput.generatedCode.includes(pattern)
  );
  
  // Run functional tests
  const testsPass = runTests(scenario.functional_test);
  
  // Calculate score
  const score = calculateScore(skillDiscovery, patternsFound, testsPass);
  
  return { skillDiscovery, patternsFound, patternsMissing, antiPatternsFound, testsPass, score };
}

This automation framework would enable continuous evaluation of skill effectiveness as the repository evolves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing Webhook Skills

Code Example Testing

Running Tests Locally

Running All Tests

CI Pipeline

Agent Integration Testing

Purpose

Test Scenarios

Scenario 1: Stripe Webhook Setup (Express)

Scenario 2: Shopify Webhook Setup (Next.js)

Scenario 3: GitHub Webhook Setup (FastAPI)

Scenario 4: Hookdeck Event Gateway Webhooks Setup

Scenario 5: Idempotency Implementation

Scenario 6: Framework-Specific Debugging

Running Manual Tests

Protocol

Recording Template

Metrics to Track

Success Criteria

Running Agent Tests with Scripts

How It Works

Configuration

Automating Agent Tests (Future)

Architecture

Scenario Format

Evaluation Logic

FilesExpand file tree

TESTING.md

Latest commit

History

TESTING.md

File metadata and controls

Testing Webhook Skills

Code Example Testing

Running Tests Locally

Running All Tests

CI Pipeline

Agent Integration Testing

Purpose

Test Scenarios

Scenario 1: Stripe Webhook Setup (Express)

Scenario 2: Shopify Webhook Setup (Next.js)

Scenario 3: GitHub Webhook Setup (FastAPI)

Scenario 4: Hookdeck Event Gateway Webhooks Setup

Scenario 5: Idempotency Implementation

Scenario 6: Framework-Specific Debugging

Running Manual Tests

Protocol

Recording Template

Metrics to Track

Success Criteria

Running Agent Tests with Scripts

How It Works

Configuration

Automating Agent Tests (Future)

Architecture

Scenario Format

Evaluation Logic