-
Notifications
You must be signed in to change notification settings - Fork 16
E2E Testing Plan
Rick Hightower edited this page Feb 1, 2026
·
1 revision
True end-to-end integration testing system that starts a real server, indexes real documents, and validates query results using the CLI - exactly as a user would interact with the system.
Implemented: December 2024
doc-serve/
└── e2e/
├── integration/
│ ├── __init__.py
│ ├── conftest.py # Pytest fixtures (server lifecycle, CLI runner)
│ ├── test_full_workflow.py # Main E2E test scenarios
│ └── test_error_handling.py # Error case testing
├── fixtures/
│ └── test_docs/
│ └── coffee_brewing/ # Domain: Coffee brewing knowledge
│ ├── espresso_basics.md
│ ├── pour_over_guide.md
│ ├── french_press_tips.md
│ ├── water_temperature.md
│ └── grind_sizes.md
├── scripts/
│ ├── run_e2e.py # Main orchestrator script
│ ├── run_e2e.sh # Shell wrapper for CI/CD
│ └── wait_for_health.py # Health polling utility
└── config/
└── e2e_config.py # Configuration settings
The test documents cover coffee brewing methods with distinct, non-overlapping concepts that enable semantic search validation:
| File | Content |
|---|---|
espresso_basics.md |
Pressure (9 bars), temperature, extraction time, common drinks |
pour_over_guide.md |
Equipment, bloom technique, spiral pour patterns |
french_press_tips.md |
Immersion method, coarse grind, 4-minute steep time |
water_temperature.md |
Ideal range (195-205°F), method-specific temperatures |
grind_sizes.md |
Grind by method (fine for espresso, coarse for French press) |
| Query | Expected Terms | Purpose |
|---|---|---|
| "How do I make espresso?" | espresso, pressure | Basic retrieval |
| "What water temperature for coffee?" | temperature, fahrenheit | Cross-topic |
| "french press grind size" | coarse | Specific concept |
| "pour over technique bloom" | bloom | Technical term |
| "coffee brewing methods" | multiple sources | Cross-document |
| "9 bars pressure extraction" | pressure, espresso | Technical query |
cd e2e
python scripts/run_e2e.py --verbosecd e2e
python -m pytest integration/ -vcd e2e
./scripts/run_e2e.sh --verbose-
OPENAI_API_KEY- Required for embeddings. Pytest tests auto-load fromagent-brain-server/.env. Shell scripts require manual sourcing or export. - Python 3.10+
- Poetry installed
- Server and CLI dependencies installed
The CLI's default similarity threshold is 0.7. However, semantic similarity between natural language queries and technical documents typically scores 0.4-0.6. E2E tests use --threshold 0.3 to ensure results are returned.
If queries return empty results, verify the threshold is set appropriately in:
-
e2e/scripts/run_e2e.py:242(run_query_test method) -
e2e/integration/conftest.py:85(CLIRunner.query method) -
e2e/config/e2e_config.py:30(DEFAULT_THRESHOLD constant)
| Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1 | Tests failed |
| 2 | Setup failed (server didn't start, missing API key) |
| 3 | Indexing failed |
- Start Server - Launch doc-serve in background
-
Wait for Health - Poll
/healthuntil status is "healthy" - Index Documents - Use CLI to index coffee brewing docs
-
Wait for Indexing - Poll until
indexing_in_progressis false - Run Query Tests - Execute semantic queries via CLI
- Validate Results - Check for expected terms in results
- Cleanup - Reset index and stop server
The shell script (run_e2e.sh) is designed for CI/CD pipelines:
# Example GitHub Actions step
- name: Run E2E Tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
cd e2e
./scripts/run_e2e.sh- Server reports healthy status
- Status includes version
- Status includes indexing info
- Documents are indexed (5+ docs)
- Chunks are created
- Indexed folders are tracked
- Espresso query returns espresso content
- Temperature query returns temp info
- Grind size query returns grind info
- Cross-document queries work
- Results include scores, sources, text
- top_k limits results
- Query timing is reported
- Total results are reported
- Empty queries handled
- Unrelated queries return low scores
- Invalid parameters rejected
- Connection errors handled gracefully
- Multiple sequential queries succeed
- Repeated queries are consistent
- Design-Architecture-Overview
- Design-Query-Architecture
- Design-Storage-Architecture
- Design-Class-Diagrams
- GraphRAG-Guide
- Agent-Skill-Hybrid-Search-Guide
- Agent-Skill-Graph-Search-Guide
- Agent-Skill-Vector-Search-Guide
- Agent-Skill-BM25-Search-Guide
Search
Server
Setup
- Pluggable-Providers-Spec
- GraphRAG-Integration-Spec
- Agent-Brain-Plugin-Spec
- Multi-Instance-Architecture-Spec