test: restructure test suite into four distinct execution tiers#128
test: restructure test suite into four distinct execution tiers#128visahak wants to merge 5 commits intoAgentToolkit:mainfrom
Conversation
Added psycopg[binary]>=3.1 and pgvector>=0.3 to dev dependency group to ensure all unit tests can run during development and CI. This fixes the test collection error for test_postgres_backend.py while keeping postgres support optional for end users (via the pgvector optional dependency group).
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 1 minutes and 2 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughReplaces the old Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Test Runner
participant Conftest as pytest conftest
participant Phoenix as Phoenix Server (local)
participant CLI as evolve CLI / subprocess
participant Agent as Agent subprocess
participant Verifier as Trace verifier subprocess
Note over Runner,Conftest: Test session start
Runner->>Conftest: parse flags (--run-e2e?)
Conftest->>Runner: adjust markexpr (remove "not e2e")
Runner->>Phoenix: request phoenix_server fixture
Phoenix->>Phoenix: probe http://localhost:6006/status
alt status reachable
Phoenix-->>Runner: yield base URL
else not reachable
Phoenix->>CLI: run "evolve sync phoenix ..." (subprocess)
CLI->>Phoenix: emit stdout lines (polled)
Phoenix->>CLI: poll until ready or timeout
CLI-->>Phoenix: yield base URL
end
Runner->>Agent: run agent (timeout 90s)
Agent-->>Runner: produce output
Runner->>Verifier: run trace verification (timeout 30s)
Verifier-->>Runner: success/fail
Runner->>Phoenix: teardown (wait 10s, kill if needed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
docs/LOW_CODE_TRACING.md (1)
249-250: Add-vto the documented pytest commands for consistency with repo test-run conventions.This is a docs-quality consistency improvement for test output clarity.
Based on learnings: Run pytest verbosely with the `-v` flag by default when testing.Proposed docs tweak
-uv run pytest -m e2e --run-e2e -s +uv run pytest -v -m e2e --run-e2e -s -uv run pytest tests/e2e/test_e2e_pipeline.py -k smolagents -m e2e --run-e2e -s +uv run pytest -v tests/e2e/test_e2e_pipeline.py -k smolagents -m e2e --run-e2e -s -uv run pytest tests/e2e/test_e2e_pipeline.py -k openai_agents -m e2e --run-e2e -s +uv run pytest -v tests/e2e/test_e2e_pipeline.py -k openai_agents -m e2e --run-e2e -sAlso applies to: 258-262
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/LOW_CODE_TRACING.md` around lines 249 - 250, Update the documented pytest commands (e.g., the occurrences of "uv run pytest -m e2e --run-e2e -s" and other pytest command examples in the same section) to include the -v flag so they read "uv run pytest -v -m e2e --run-e2e -s" (and similarly add -v to the other pytest command examples mentioned around the block) to match repository test-run conventions and produce verbose test output.tests/e2e/test_e2e_pipeline.py (3)
21-22: Move imports to the top of the file.
urllib.requestandurllib.errorare imported after the configuration block. Per Python style conventions (and Ruff enforcement), all imports should be grouped at the top of the file.♻️ Suggested fix
Move these imports to join the other imports at the top:
import subprocess import time import re import os import datetime +import urllib.request +import urllib.error import pytest from evolve.config.phoenix import phoenix_settingsThen remove lines 21-22.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/test_e2e_pipeline.py` around lines 21 - 22, Move the late imports urllib.request and urllib.error to the module import block at the top of tests/e2e/test_e2e_pipeline.py so they join the other imports; remove the duplicate import statements from their current position (the lines currently importing urllib.request and urllib.error) and run the linter to ensure import grouping/order is correct (references: the urllib.request and urllib.error imports).
198-200:text=Trueanduniversal_newlines=Trueare redundant.These parameters are aliases for the same behavior. Having both is harmless but unnecessary.
🧹 Suggested fix
process = subprocess.Popen( sync_command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, # Merge stderr to monitor everything text=True, bufsize=1, - universal_newlines=True, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/test_e2e_pipeline.py` around lines 198 - 200, The call sets both text=True and universal_newlines=True which are redundant aliases; remove universal_newlines=True (keep text=True for clarity) in the subprocess invocation (the call that also sets bufsize=1) so only one of these flags remains.
11-13: Remove unusedTIMESTAMPvariable.
TIMESTAMPis defined here but never used—the test generatescurrent_timestamplocally at line 102. This is dead code that should be removed.🧹 Suggested fix
# Configuration PHOENIX_URL = phoenix_settings.url -# Use a session-scope timestamp or generate per test? -# Per-test ensures no collisions even if run in parallel (though these should satisfy sequential) -TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/test_e2e_pipeline.py` around lines 11 - 13, Remove the unused top-level TIMESTAMP variable: delete the TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") declaration so the module no longer contains dead code; tests already generate current_timestamp locally (see current_timestamp usage in the test function), so no other changes are required.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/LOW_CODE_TRACING.md`:
- Around line 203-205: Replace the documented CLI invocation "python -m
evolve.cli" with the entry-point command "evolve" wherever it appears (e.g., the
example command lines showing "uv run python -m evolve.cli sync phoenix ...");
the project exposes an "evolve" script in pyproject.toml and there is no
evolve.cli __main__.py, so update those occurrences (also the other instance
around lines 212–213) to use "uv run evolve" instead.
In `@README.md`:
- Around line 121-131: Update the "Unit Tests (Default)" section to accurately
reflect pytest defaults: either state that `uv run pytest` runs the default
selection (`-m "not llm and not e2e"` and therefore includes
`platform_integrations`), or add/replace the unit-only example with `uv run
pytest -m unit`; edit the heading/content mentioning "Unit Tests (Default)" and
the example command strings so readers know the difference between the default
`uv run pytest` behavior and the explicit unit-only `uv run pytest -m unit`.
In `@tests/e2e/test_e2e_pipeline.py`:
- Around line 24-31: The phoenix_server fixture currently yields
"http://localhost:6006" but tests read PHOENIX_URL from phoenix_settings.url,
causing a possible mismatch; update the fixture phoenix_server so tests actually
use its URL by either removing autouse=True and having tests accept
phoenix_server as a parameter, or keep autouse and set phoenix_settings.url (or
PHOENIX_URL) to the yielded URL inside phoenix_server before yield, or update
tests to accept phoenix_server and use that returned URL instead of
phoenix_settings.url; locate the phoenix_server fixture and phoenix_settings.url
/ PHOENIX_URL references to implement one of these fixes consistently.
---
Nitpick comments:
In `@docs/LOW_CODE_TRACING.md`:
- Around line 249-250: Update the documented pytest commands (e.g., the
occurrences of "uv run pytest -m e2e --run-e2e -s" and other pytest command
examples in the same section) to include the -v flag so they read "uv run pytest
-v -m e2e --run-e2e -s" (and similarly add -v to the other pytest command
examples mentioned around the block) to match repository test-run conventions
and produce verbose test output.
In `@tests/e2e/test_e2e_pipeline.py`:
- Around line 21-22: Move the late imports urllib.request and urllib.error to
the module import block at the top of tests/e2e/test_e2e_pipeline.py so they
join the other imports; remove the duplicate import statements from their
current position (the lines currently importing urllib.request and urllib.error)
and run the linter to ensure import grouping/order is correct (references: the
urllib.request and urllib.error imports).
- Around line 198-200: The call sets both text=True and universal_newlines=True
which are redundant aliases; remove universal_newlines=True (keep text=True for
clarity) in the subprocess invocation (the call that also sets bufsize=1) so
only one of these flags remains.
- Around line 11-13: Remove the unused top-level TIMESTAMP variable: delete the
TIMESTAMP = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") declaration so the
module no longer contains dead code; tests already generate current_timestamp
locally (see current_timestamp usage in the test function), so no other changes
are required.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1d353ec9-61a8-4fba-933d-d0573b8caf62
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
README.mddocs/LOW_CODE_TRACING.mdpyproject.tomltests/conftest.pytests/e2e/test_e2e_pipeline.pytests/unit/test_cli.pytests/unit/test_extract_trajectories.pytests/unit/test_mcp_server.pytests/unit/test_phoenix_sync.pytests/unit/test_tracing.py
💤 Files with no reviewable changes (1)
- tests/unit/test_cli.py
Background:
The existing test infrastructure suffered from "unknown mark" Pytest warnings, CLI timeout hangs during heavy testing, and a lack of explicit categorization between core logic vs. local infrastructure operations.
What Changed:
pyproject.toml(no more warnings):unit(246 tests) — Core logic + offline Phoenix testing.platform_integrations(18 tests) — Fast filesystem local installation tests.llm(62 tests) — Active remote LLM inference pipelines.e2e(12 tests) — Heavy infrastructure testing simulating full workflows.phoenix_serverfixture in the E2E suite to autonomously spin up and tear down a background observability server during tests.evolve syncCLI test "phantom hangs" by capturing and aggregating subprocess logs directly, ensuring error stack traces are printed correctly.README.mdanddocs/LOW_CODE_TRACING.mdto reflect the new architecture, removing the obsolete--run-phoenixexecution commands.Summary by CodeRabbit
Documentation
Chores