NimbleBrainInc · JoeCardoso13 · Mar 25, 2026 · Mar 20, 2026 · Mar 21, 2026
@@ -566,77 +566,129 @@ class TestMCPTools:
 
 **Key:** FastMCP wraps tool exceptions as `fastmcp.exceptions.ToolError`, not the original type. Always catch `ToolError` in tests.
 
-**Integration tests (`tests-integration/`)** — Requires real API key. Run with `make test-integration`.
+**Integration tests (`tests-integration/`)** — Requires real API key in `.env`. Run with `make test-integration`.
 
-`tests-integration/conftest.py` — Gate on env var:
+`tests-integration/conftest.py` — The template scaffolds this with an env var gate and client fixture. It loads `.env` automatically via `load_dotenv()`, so the contributor just needs to add their key to `.env` rather than exporting it.
+
+`tests-integration/test_core_tools.py` — Derive tests from `api_client.py`. Do NOT leave this as a stub or TODO. Write one test class per logical group of methods.
+
+**Pattern 1 — Read-only method:**
+```python
+class TestListProjects:
+    @pytest.mark.asyncio
+    async def test_list_projects(self, client):
+        # Chain: get a workspace ID first, then list its projects
+        workspaces = await client.list_workspaces()
+        assert len(workspaces) > 0
+        workspace_gid = workspaces[0]["gid"]
+
+        result = await client.list_projects(workspace_gid)
+        assert isinstance(result, dict)
+        assert "data" in result
+        print(f"Found {len(result['data'])} project(s)")
+```
+
+**Pattern 2 — Write method with cleanup:**
 ```python
-def pytest_configure(config):
-    if not os.environ.get("<NAME>_API_KEY"):
-        pytest.exit("ERROR: <NAME>_API_KEY required for integration tests.")
-
-@pytest_asyncio.fixture
-async def client(api_key: str) -> Client:
-    client = Client(api_key=api_key)
-    yield client
-    await client.close()
+class TestContactCRUD:
+    @pytest.mark.asyncio
+    async def test_contact_lifecycle(self, client):
+        contact = None
+        try:
+            contact = await client.create_contact(
+                email=f"test-{int(time.time())}@example.com",
+                first_name="Integration",
+                last_name="Test",
+            )
+            assert contact["id"]
+
+            fetched = await client.get_contact(contact["id"])
+            assert fetched["id"] == contact["id"]
+
+            updated = await client.update_contact(
+                contact["id"], first_name="Updated"
+            )
+            assert updated["id"] == contact["id"]
+        finally:
+            if contact:
+                await client.delete_contact(contact["id"])
 ```
 
-`tests-integration/test_core_tools.py` — Tier-skip pattern for plan-gated endpoints:
+**Pattern 3 — Tier-gated method:**
 ```python
-async def has_premium_access(client: Client) -> bool:
-    """Check if the plan supports premium endpoints."""
+async def has_search_access(client, workspace_gid: str) -> bool:
     try:
-        await client.premium_method(limit=1)
+        await client.search_tasks(workspace_gid, text="test", limit=1)
         return True
     except APIError as e:
-        if e.status in (400, 401, 403):
+        if e.status in (400, 402, 403):
             return False
         raise
 
-class TestPremiumFeature:
+class TestSearchTasks:
     @pytest.mark.asyncio
-    async def test_premium_endpoint(self, client):
-        if not await has_premium_access(client):
-            pytest.skip("Premium access not available on current plan")
-        result = await client.premium_method(limit=5)
+    async def test_search_tasks(self, client):
+        workspaces = await client.list_workspaces()
+        workspace_gid = workspaces[0]["gid"]
+
+        if not await has_search_access(client, workspace_gid):
+            pytest.skip("Task search requires premium plan")
+
+        result = await client.search_tasks(workspace_gid, text="test", limit=5)
         assert isinstance(result, list)
 ```
 
-**LLM smoke tests (`tests-integration/test_skill_llm.py`)** — Requires `ANTHROPIC_API_KEY`. Run with `make test-llm`.
+**Key rules:**
+- One test per method or per logical CRUD lifecycle is enough. Don't over-test — you're checking "does the API call work?", not exhaustive coverage.
+- Don't test error cases here — unit tests already cover those with mocks.
+- Always clean up write operations in `finally` blocks. If no delete method exists, mark resources as completed/archived.
+- Use `print()` for visibility — integration tests often run interactively.
+- Use timestamped names for created resources (e.g., `f"test-{int(time.time())}@example.com"`) to avoid collisions.
+
+**LLM smoke tests (`tests-integration/test_skill_llm.py`)** — Requires `ANTHROPIC_API_KEY` in `.env`. Run with `make test-llm`.
+
+Sends server context (instructions + skill + tools) to Claude Haiku, asserts correct tool selection. The template scaffolds `get_server_context()` and `get_anthropic_client()` — leave those as-is. Replace the commented-out test stub with real tests.
+
+Write **3–5 tests**, one per key tool. Extract a `call_llm()` helper to avoid repeating the system prompt construction:
 
-Sends server context (instructions + skill + tools) to Claude Haiku, asserts correct tool selection:
 ```python
-async def get_server_context() -> dict:
-    async with Client(mcp) as client:
-        init = await client.initialize()
-        resources = await client.list_resources()
-        skill_text = ""
-        for r in resources:
-            if "skill://" in str(r.uri):
-                contents = await client.read_resource(str(r.uri))
-                skill_text = contents[0].text
-        tools_list = await client.list_tools()
-        tools = [{"name": t.name, "description": t.description, "input_schema": t.inputSchema} for t in tools_list]
-        return {"instructions": init.instructions, "skill": skill_text, "tools": tools}
+async def call_llm(prompt: str) -> list:
+    """Send a prompt to Claude Haiku with full server context, return tool calls."""
+    ctx = await get_server_context()
+    client = get_anthropic_client()
+    system = (
+        f"You are an assistant.\n\n"
+        f"## Server Instructions\n{ctx['instructions']}\n\n"
+        f"## Skill Resource\n{ctx['skill']}"
+    )
+    response = client.messages.create(
+        model="claude-haiku-4-5-20251001",
+        max_tokens=1024,
+        system=system,
+        messages=[{"role": "user", "content": prompt}],
+        tools=[{"type": "custom", **t} for t in ctx["tools"]],
+    )
+    return [b for b in response.content if b.type == "tool_use"]
 
 class TestSkillLLMInvocation:
     @pytest.mark.asyncio
-    async def test_query_selects_correct_tool(self):
-        ctx = await get_server_context()
-        client = get_anthropic_client()
-        system = f"You are an assistant.\n\n## Instructions\n{ctx['instructions']}\n\n## Skill\n{ctx['skill']}"
-        response = client.messages.create(
-            model="claude-haiku-4-5-20251001",
-            max_tokens=1024,
-            system=system,
-            messages=[{"role": "user", "content": "Your test prompt here"}],
-            tools=[{"type": "custom", **t} for t in ctx["tools"]],
-        )
-        tool_calls = [b for b in response.content if b.type == "tool_use"]
-        assert len(tool_calls) > 0
-        assert tool_calls[0].name == "expected_tool_name"
+    async def test_list_projects_selected(self):
+        tool_calls = await call_llm("Show me all projects in workspace gid_123456")
+        assert len(tool_calls) > 0, "LLM did not call any tool"
+        assert tool_calls[0].name == "list_projects"
+
+    @pytest.mark.asyncio
+    async def test_create_task_selected(self):
+        tool_calls = await call_llm("Create a task called Review Q3 report in workspace gid_123456")
+        assert len(tool_calls) > 0, "LLM did not call any tool"
+        assert tool_calls[0].name == "create_task"
 ```
 
+**Key rule — include concrete values for required parameters:** If a tool requires parameters (IDs, coordinates, dates), include concrete values in the prompt — even fake ones. Without them, the LLM will correctly ask for clarification instead of calling the tool, and the test will fail with "LLM did not call any tool." Examples:
+- "Show me projects in workspace gid_123456" (not "Show me my projects")
+- "What's the weather at lat=51.5, lon=-0.13?" (not "What's the weather in London?")
+- "Am I busy tomorrow afternoon?" (not "Am I busy Thursday?" — ambiguous date)
+
 ### Build & Test Commands (Python)
 
 ```bash

@@ -49,36 +49,82 @@ See `references/PATTERNS.md` → "TypeScript Server Patterns" for complete code
 
 Run all checks and fix any issues before proceeding.
 
+### Unit tests and code quality
+
 **Python:**
 ```bash
 uv sync --dev
 make check                    # format, lint, typecheck, unit tests
-make test-integration         # real API calls (needs <NAME>_API_KEY)
-make test-llm                 # LLM smoke tests (needs ANTHROPIC_API_KEY)
 ```
 
-The template includes three test layers:
-1. **Unit tests** (`tests/`) — Mocked HTTP, FastMCP Client-based tool tests, skill resource tests. Always run.
-2. **Integration tests** (`tests-integration/`) — Real API calls with tier-skip helpers for plan-gated endpoints. Run when API key is available.
-3. **LLM smoke tests** (`tests-integration/test_skill_llm.py`) — Verify Claude Haiku selects the correct tool given the skill resource. Run when both API key and ANTHROPIC_API_KEY are available.
-
-At minimum, `make check` must pass (unit tests). Integration and LLM tests are recommended but not blocking for initial release.
+Mocked HTTP, FastMCP Client-based tool tests, skill resource tests. The template scaffolds these — fill them in for every tool. `make check` must pass before proceeding.
 
-See `references/PATTERNS.md` → "Test Patterns (Python)" for complete examples including the FastMCP Client pattern, ToolError handling, and tier-skip helpers.
+See `references/PATTERNS.md` → "Test Patterns (Python)" for the FastMCP Client pattern, ToolError handling, and mock fixtures.
 
 **TypeScript:**
 ```bash
 make check
 ```
 This runs: `format:check` → `lint` → `typecheck` → `test`
 
+### Integration tests
+
+**Python:**
+```bash
+make test-integration         # real API calls (needs <NAME>_API_KEY in .env)
+```
+
+Real API calls against the live service. The template scaffolds `tests-integration/test_core_tools.py` as a stub — you must replace it with real tests.
+
+**How to write them:** Open `api_client.py` and list every public method (skip `__init__`, `close`, `_request`, `_ensure_session`, and dunder methods). For each method, write a test:
+
+- **Read methods** (list, get, search): Call with minimal valid parameters, assert the response has the expected shape (list, dict, or model with expected keys).
+- **Write methods** (create, update, delete): Create a test resource, verify it, then clean it up in a `finally` block. If no delete method exists, mark the resource as completed/archived and leave a comment.
+- **Chained methods:** Some methods need an ID from a prior call (e.g., `list_workspaces` returns a GID needed by `search_tasks`). Chain them — call the list method first, use the first result's ID.
+- **Tier-gated methods:** If the API has premium endpoints that may not be available on the user's plan, write a `has_<feature>_access` helper that probes the endpoint and returns `False` on 401/402/403. Use `pytest.skip()` in the test if access is unavailable.
+
+See `references/PATTERNS.md` → "Integration Test Patterns (Python)" for concrete examples.
+
+**How to run them:**
+
+1. Ask the contributor to add their API key to `.env` (e.g., `ASANA_API_KEY=xxx`). The `.env` file is already in `.gitignore` and `.mcpbignore`. The contributor was asked for this key at the start of the process — they should have it ready.
+2. Run `make test-integration`.
+3. All tests should pass or skip (for tier-gated features). Fix any failures before proceeding.
+4. If the contributor says auth setup is too complex for now (e.g., OAuth flows, multi-step app configuration), proceed — the tests are written and ready to run later. Do not skip writing the tests.
+
+### LLM smoke tests
+
+**Python:**
+```bash
+make test-llm                 # needs <NAME>_API_KEY + ANTHROPIC_API_KEY in .env
+```
+
+Verify Claude Haiku selects the correct tool given the skill resource. Requires both the service API key and `ANTHROPIC_API_KEY`.
+
+**How to write them:** The template scaffolds `get_server_context()` and `get_anthropic_client()` — leave those as-is. Replace the commented-out test stub with 3–5 real tests, one per key tool. Extract a `call_llm()` helper to avoid repeating the system prompt construction across tests.
+
+Each test sends a natural language prompt and asserts the LLM selected the expected tool. Include concrete values for any required parameters in the prompt (IDs, coordinates, dates) — without them, the LLM will ask for clarification instead of calling the tool.
+
+See `references/PATTERNS.md` → "LLM smoke tests" for the `call_llm()` helper pattern and the concrete-identifiers rule.
+
+**How to run them:**
+
+1. Ask the contributor to add `ANTHROPIC_API_KEY` to `.env` alongside the service API key.
+2. Run `make test-llm`.
+3. All tests should pass. If a test fails because the LLM picked the wrong tool, adjust the prompt to be more specific before touching the SKILL.md.
+4. If the contributor does not have an `ANTHROPIC_API_KEY`, proceed — the tests are written and ready to run later. Do not skip writing the tests.
+
 ## Gate
 
 **Criteria:**
 - [ ] All tool logic, models, and client code implemented
 - [ ] Linting passes with no errors
 - [ ] Type checking passes with no errors
-- [ ] All tests pass
+- [ ] Unit tests pass (`make check`)
+- [ ] Integration tests written with real assertions (not stubs or TODOs)
+- [ ] Integration tests pass or skip, if API key is available
+- [ ] LLM smoke tests written with real assertions (not stubs or TODOs)
+- [ ] LLM smoke tests pass, if `ANTHROPIC_API_KEY` is available
 
 **If any criterion fails:** Fix the reported issues and re-run checks.