-
Notifications
You must be signed in to change notification settings - Fork 73
test: optimize example test discovery and execution speed #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
test: optimize example test discovery and execution speed #372
Conversation
- Add pytest markers to example files for proper categorization - Wrap heavy examples with complex initialization to prevent collection hangs - Fix critical skip logic bug in example conftest (import failure) - Configure qualitative tests to be excluded by default - Update docs Tests now run significantly faster and reliably on constrained systems. Run full suite with 'pytest -m ""'.
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
- Modified 58 example files to conditionally import pytest - Allows examples to run standalone without pytest dependency - Maintains pytest marker functionality when run via pytest
|
There's a few things to clarify from this PR
|
- USDA website returns 403 Forbidden (likely rate limiting/user-agent filtering) - External network dependency makes test flaky - Marked as qualitative to exclude from default test runs
- Fix vision_openai_examples.py: simplified skip logic, added requirements docstring - Fix langchain_messages.py: enhanced conftest to detect pytest.skip() in subprocess - Fix test_vision_openai.py: added @pytest.mark.qualitative decorator - Configure pytest to run test/ before docs/ for fail-fast behavior All tests now pass/skip/deselect correctly with default configuration.
|
Here's an example output run after these fixes, just using the default This was run on a 32GB macbook m1 max - and takes around 4 minutes |
|
Here's the results from a run on my 32GB RAM / 4GB VRAM Thinkpad |
|
I don't think my issues are blocking, so I'm inclined to approve / merge but will wait a bit in case there are other opinions. |
-> A parsing error in the check. I'll fix. The URL is set in my environment
We have a number of models used in our tests. I looked at this one for validation as I didn't have it, but actually we'd need to do something on every test to check the model exists. Docs/upfront check can easily get out of date unless we can do it programatically. -> Suggest discussion, then additional issue/pr if action needed and multiple ones like are down to insufficient resources. 4GB vram is low for AI dev (though we'd always want to drive down min requirements for usage). The error at least is clear. A possible mitigation would be more fine grained markers to document memory usage - perhaps resulting in skipping all llm tests? Another alternative might be to allow cpu only (not gpu) -> Similar - let's discuss and plan |
- Remove f-string formatting that converts None to 'None' string - Allows proper None handling by ibm_watsonx_ai library - Fixes WMLClientError when WATSONX_URL env var not set
c43a53b to
1540a3c
Compare
|
The issue with watsonx should be fixed now. |
- Add @pytest.mark.watsonx and @pytest.mark.requires_api_key markers - Ensures test is skipped at collection time when WATSONX_* env vars missing - Prevents misleading error messages about 'None' URLs
- Add @pytest.mark.requires_heavy_ram to vision_openai_examples.py - Model requires ~14GB RAM when loaded (too much for 32GB systems) - Updated docstring to document RAM requirement
|
I did not have the qwen model downloaded. Pulled it and trying again just to see what happens |
ollama probably intelligently split the layers across gpu and cpu. I have marked the test as large ram, as when I ran it the system was struggling somewhat since the model was ~16GB ram by itself. |
qwen2.5vl:7b ? For me it's only 6GB (we're starting to approach building cross-platform container images levels of fun here 😄 ) |
| @@ -1,3 +1,9 @@ | |||
| try: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this came up in a discussion earlier today, but depending on the purpose of these examples we may not want to add a pytest import: for example, if the goal is to just have a minimal snippet that can be referenced in the docs, then we may not want it. but might be something that warrants more discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
echoing this, I don't think adding a block like this to every example is the best path. Is there a way to "mark" dirs in the config instead? or have a config for the docs dir marking specific files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could ditch markers and do it programmatically. This might also allow the import of pytest to be removed but means another place to configure tests (ie code the markers when we add an example rather than being explicit)
However I don't think we could fix the import issues like hugging face initialisation loading the model (or operator by refactoring the Code maybe we could) which necessitates the main condition.
But that's a common pattern anyway so maybe ok
Happy to look tomorrow either based on merging this one and on top or not
| uv run pytest | ||
|
|
||
| # Full test suite (includes qualitative tests) | ||
| uv run pytest -m "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I agree with this, I think the default should always be too run everything and a flag to limit, though I believe we should find a way for the flag to be simpler than -m "not qualitative", also -m "" is very unintuitive.
Separately if #369 is merged first then these doc changes "should" all be consolidated to that new doc
Misc PR
Type of PR
Description
This PR optimizes the test infrastructure to improve test execution speed, reliability, and developer experience. The changes address test collection hangs, improve skip handling, and establish proper test categorization.
Key Changes:
Example Test Discovery Optimization (e1701c1)
@pytest.mark.ollama,@pytest.mark.qualitative, etc.) to 66 example filesdocs/examples/conftest.py(import failure handling)pytest -m ""for full suite)AGENTS.md,README.md,docs/tutorial.md,test/MARKERS_GUIDE.mdStandalone Example Execution (1cd9c7b)
Qualitative Test Marking (c5e36ef)
docs/examples/rag/mellea_pdf.pyas qualitative due to external PDF dependencyTest Failure Fixes (f03581e)
vision_openai_examples.py: Simplified skip logic, added requirements docstringdocs/examples/conftest.py: Detect pytest.skip() exceptions in subprocess stderrtest_vision_openai.py::test_image_block_in_chat: Added@pytest.mark.qualitativedecoratortestpaths = ["test", "docs"]inpyproject.tomlfor fail-fast behaviorImpact:
pytest -m ""Testing