Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 101 additions & 49 deletions build-mcpb/references/PATTERNS.md
Original file line number Diff line number Diff line change
Expand Up @@ -566,77 +566,129 @@ class TestMCPTools:

**Key:** FastMCP wraps tool exceptions as `fastmcp.exceptions.ToolError`, not the original type. Always catch `ToolError` in tests.

**Integration tests (`tests-integration/`)** — Requires real API key. Run with `make test-integration`.
**Integration tests (`tests-integration/`)** — Requires real API key in `.env`. Run with `make test-integration`.

`tests-integration/conftest.py` — Gate on env var:
`tests-integration/conftest.py` — The template scaffolds this with an env var gate and client fixture. It loads `.env` automatically via `load_dotenv()`, so the contributor just needs to add their key to `.env` rather than exporting it.

`tests-integration/test_core_tools.py` — Derive tests from `api_client.py`. Do NOT leave this as a stub or TODO. Write one test class per logical group of methods.

**Pattern 1 — Read-only method:**
```python
class TestListProjects:
@pytest.mark.asyncio
async def test_list_projects(self, client):
# Chain: get a workspace ID first, then list its projects
workspaces = await client.list_workspaces()
assert len(workspaces) > 0
workspace_gid = workspaces[0]["gid"]

result = await client.list_projects(workspace_gid)
assert isinstance(result, dict)
assert "data" in result
print(f"Found {len(result['data'])} project(s)")
```

**Pattern 2 — Write method with cleanup:**
```python
def pytest_configure(config):
if not os.environ.get("<NAME>_API_KEY"):
pytest.exit("ERROR: <NAME>_API_KEY required for integration tests.")

@pytest_asyncio.fixture
async def client(api_key: str) -> Client:
client = Client(api_key=api_key)
yield client
await client.close()
class TestContactCRUD:
@pytest.mark.asyncio
async def test_contact_lifecycle(self, client):
contact = None
try:
contact = await client.create_contact(
email=f"test-{int(time.time())}@example.com",
first_name="Integration",
last_name="Test",
)
assert contact["id"]

fetched = await client.get_contact(contact["id"])
assert fetched["id"] == contact["id"]

updated = await client.update_contact(
contact["id"], first_name="Updated"
)
assert updated["id"] == contact["id"]
finally:
if contact:
await client.delete_contact(contact["id"])
```

`tests-integration/test_core_tools.py` — Tier-skip pattern for plan-gated endpoints:
**Pattern 3 — Tier-gated method:**
```python
async def has_premium_access(client: Client) -> bool:
"""Check if the plan supports premium endpoints."""
async def has_search_access(client, workspace_gid: str) -> bool:
try:
await client.premium_method(limit=1)
await client.search_tasks(workspace_gid, text="test", limit=1)
return True
except APIError as e:
if e.status in (400, 401, 403):
if e.status in (400, 402, 403):
return False
raise

class TestPremiumFeature:
class TestSearchTasks:
@pytest.mark.asyncio
async def test_premium_endpoint(self, client):
if not await has_premium_access(client):
pytest.skip("Premium access not available on current plan")
result = await client.premium_method(limit=5)
async def test_search_tasks(self, client):
workspaces = await client.list_workspaces()
workspace_gid = workspaces[0]["gid"]

if not await has_search_access(client, workspace_gid):
pytest.skip("Task search requires premium plan")

result = await client.search_tasks(workspace_gid, text="test", limit=5)
assert isinstance(result, list)
```

**LLM smoke tests (`tests-integration/test_skill_llm.py`)** — Requires `ANTHROPIC_API_KEY`. Run with `make test-llm`.
**Key rules:**
- One test per method or per logical CRUD lifecycle is enough. Don't over-test — you're checking "does the API call work?", not exhaustive coverage.
- Don't test error cases here — unit tests already cover those with mocks.
- Always clean up write operations in `finally` blocks. If no delete method exists, mark resources as completed/archived.
- Use `print()` for visibility — integration tests often run interactively.
- Use timestamped names for created resources (e.g., `f"test-{int(time.time())}@example.com"`) to avoid collisions.

**LLM smoke tests (`tests-integration/test_skill_llm.py`)** — Requires `ANTHROPIC_API_KEY` in `.env`. Run with `make test-llm`.

Sends server context (instructions + skill + tools) to Claude Haiku, asserts correct tool selection. The template scaffolds `get_server_context()` and `get_anthropic_client()` — leave those as-is. Replace the commented-out test stub with real tests.

Write **3–5 tests**, one per key tool. Extract a `call_llm()` helper to avoid repeating the system prompt construction:

Sends server context (instructions + skill + tools) to Claude Haiku, asserts correct tool selection:
```python
async def get_server_context() -> dict:
async with Client(mcp) as client:
init = await client.initialize()
resources = await client.list_resources()
skill_text = ""
for r in resources:
if "skill://" in str(r.uri):
contents = await client.read_resource(str(r.uri))
skill_text = contents[0].text
tools_list = await client.list_tools()
tools = [{"name": t.name, "description": t.description, "input_schema": t.inputSchema} for t in tools_list]
return {"instructions": init.instructions, "skill": skill_text, "tools": tools}
async def call_llm(prompt: str) -> list:
"""Send a prompt to Claude Haiku with full server context, return tool calls."""
ctx = await get_server_context()
client = get_anthropic_client()
system = (
f"You are an assistant.\n\n"
f"## Server Instructions\n{ctx['instructions']}\n\n"
f"## Skill Resource\n{ctx['skill']}"
)
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": prompt}],
tools=[{"type": "custom", **t} for t in ctx["tools"]],
)
return [b for b in response.content if b.type == "tool_use"]

class TestSkillLLMInvocation:
@pytest.mark.asyncio
async def test_query_selects_correct_tool(self):
ctx = await get_server_context()
client = get_anthropic_client()
system = f"You are an assistant.\n\n## Instructions\n{ctx['instructions']}\n\n## Skill\n{ctx['skill']}"
response = client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": "Your test prompt here"}],
tools=[{"type": "custom", **t} for t in ctx["tools"]],
)
tool_calls = [b for b in response.content if b.type == "tool_use"]
assert len(tool_calls) > 0
assert tool_calls[0].name == "expected_tool_name"
async def test_list_projects_selected(self):
tool_calls = await call_llm("Show me all projects in workspace gid_123456")
assert len(tool_calls) > 0, "LLM did not call any tool"
assert tool_calls[0].name == "list_projects"

@pytest.mark.asyncio
async def test_create_task_selected(self):
tool_calls = await call_llm("Create a task called Review Q3 report in workspace gid_123456")
assert len(tool_calls) > 0, "LLM did not call any tool"
assert tool_calls[0].name == "create_task"
```

**Key rule — include concrete values for required parameters:** If a tool requires parameters (IDs, coordinates, dates), include concrete values in the prompt — even fake ones. Without them, the LLM will correctly ask for clarification instead of calling the tool, and the test will fail with "LLM did not call any tool." Examples:
- "Show me projects in workspace gid_123456" (not "Show me my projects")
- "What's the weather at lat=51.5, lon=-0.13?" (not "What's the weather in London?")
- "Am I busy tomorrow afternoon?" (not "Am I busy Thursday?" — ambiguous date)

### Build & Test Commands (Python)

```bash
Expand Down
66 changes: 56 additions & 10 deletions build-mcpb/references/workflow/phase-3-implement-and-verify.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,36 +49,82 @@ See `references/PATTERNS.md` → "TypeScript Server Patterns" for complete code

Run all checks and fix any issues before proceeding.

### Unit tests and code quality

**Python:**
```bash
uv sync --dev
make check # format, lint, typecheck, unit tests
make test-integration # real API calls (needs <NAME>_API_KEY)
make test-llm # LLM smoke tests (needs ANTHROPIC_API_KEY)
```

The template includes three test layers:
1. **Unit tests** (`tests/`) — Mocked HTTP, FastMCP Client-based tool tests, skill resource tests. Always run.
2. **Integration tests** (`tests-integration/`) — Real API calls with tier-skip helpers for plan-gated endpoints. Run when API key is available.
3. **LLM smoke tests** (`tests-integration/test_skill_llm.py`) — Verify Claude Haiku selects the correct tool given the skill resource. Run when both API key and ANTHROPIC_API_KEY are available.

At minimum, `make check` must pass (unit tests). Integration and LLM tests are recommended but not blocking for initial release.
Mocked HTTP, FastMCP Client-based tool tests, skill resource tests. The template scaffolds these — fill them in for every tool. `make check` must pass before proceeding.

See `references/PATTERNS.md` → "Test Patterns (Python)" for complete examples including the FastMCP Client pattern, ToolError handling, and tier-skip helpers.
See `references/PATTERNS.md` → "Test Patterns (Python)" for the FastMCP Client pattern, ToolError handling, and mock fixtures.

**TypeScript:**
```bash
make check
```
This runs: `format:check` → `lint` → `typecheck` → `test`

### Integration tests

**Python:**
```bash
make test-integration # real API calls (needs <NAME>_API_KEY in .env)
```

Real API calls against the live service. The template scaffolds `tests-integration/test_core_tools.py` as a stub — you must replace it with real tests.

**How to write them:** Open `api_client.py` and list every public method (skip `__init__`, `close`, `_request`, `_ensure_session`, and dunder methods). For each method, write a test:

- **Read methods** (list, get, search): Call with minimal valid parameters, assert the response has the expected shape (list, dict, or model with expected keys).
- **Write methods** (create, update, delete): Create a test resource, verify it, then clean it up in a `finally` block. If no delete method exists, mark the resource as completed/archived and leave a comment.
- **Chained methods:** Some methods need an ID from a prior call (e.g., `list_workspaces` returns a GID needed by `search_tasks`). Chain them — call the list method first, use the first result's ID.
- **Tier-gated methods:** If the API has premium endpoints that may not be available on the user's plan, write a `has_<feature>_access` helper that probes the endpoint and returns `False` on 401/402/403. Use `pytest.skip()` in the test if access is unavailable.

See `references/PATTERNS.md` → "Integration Test Patterns (Python)" for concrete examples.

**How to run them:**

1. Ask the contributor to add their API key to `.env` (e.g., `ASANA_API_KEY=xxx`). The `.env` file is already in `.gitignore` and `.mcpbignore`. The contributor was asked for this key at the start of the process — they should have it ready.
2. Run `make test-integration`.
3. All tests should pass or skip (for tier-gated features). Fix any failures before proceeding.
4. If the contributor says auth setup is too complex for now (e.g., OAuth flows, multi-step app configuration), proceed — the tests are written and ready to run later. Do not skip writing the tests.

### LLM smoke tests

**Python:**
```bash
make test-llm # needs <NAME>_API_KEY + ANTHROPIC_API_KEY in .env
```

Verify Claude Haiku selects the correct tool given the skill resource. Requires both the service API key and `ANTHROPIC_API_KEY`.

**How to write them:** The template scaffolds `get_server_context()` and `get_anthropic_client()` — leave those as-is. Replace the commented-out test stub with 3–5 real tests, one per key tool. Extract a `call_llm()` helper to avoid repeating the system prompt construction across tests.

Each test sends a natural language prompt and asserts the LLM selected the expected tool. Include concrete values for any required parameters in the prompt (IDs, coordinates, dates) — without them, the LLM will ask for clarification instead of calling the tool.

See `references/PATTERNS.md` → "LLM smoke tests" for the `call_llm()` helper pattern and the concrete-identifiers rule.

**How to run them:**

1. Ask the contributor to add `ANTHROPIC_API_KEY` to `.env` alongside the service API key.
2. Run `make test-llm`.
3. All tests should pass. If a test fails because the LLM picked the wrong tool, adjust the prompt to be more specific before touching the SKILL.md.
4. If the contributor does not have an `ANTHROPIC_API_KEY`, proceed — the tests are written and ready to run later. Do not skip writing the tests.

## Gate

**Criteria:**
- [ ] All tool logic, models, and client code implemented
- [ ] Linting passes with no errors
- [ ] Type checking passes with no errors
- [ ] All tests pass
- [ ] Unit tests pass (`make check`)
- [ ] Integration tests written with real assertions (not stubs or TODOs)
- [ ] Integration tests pass or skip, if API key is available
- [ ] LLM smoke tests written with real assertions (not stubs or TODOs)
- [ ] LLM smoke tests pass, if `ANTHROPIC_API_KEY` is available

**If any criterion fails:** Fix the reported issues and re-run checks.

Expand Down
Loading