test: optimize example test discovery and execution speed #372

planetf1 · 2026-01-28T12:44:36Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes Add pytest markers to examples #371

This PR optimizes the test infrastructure to improve test execution speed, reliability, and developer experience. The changes address test collection hangs, improve skip handling, and establish proper test categorization.

Key Changes:

Example Test Discovery Optimization (e1701c1)
- Added pytest markers (@pytest.mark.ollama, @pytest.mark.qualitative, etc.) to 66 example files
- Wrapped heavy examples (intrinsics, safety, mify) with lazy initialization to prevent collection hangs
- Fixed critical skip logic bug in docs/examples/conftest.py (import failure handling)
- Configured qualitative tests to be excluded by default (run with pytest -m "" for full suite)
- Updated documentation: AGENTS.md, README.md, docs/tutorial.md, test/MARKERS_GUIDE.md
Standalone Example Execution (1cd9c7b)
- Modified 59 example files to conditionally import pytest in try/except blocks
- Examples can now run standalone without pytest dependency while maintaining marker functionality
Qualitative Test Marking (c5e36ef)
- Marked docs/examples/rag/mellea_pdf.py as qualitative due to external PDF dependency
Test Failure Fixes (f03581e)
- Fixed vision_openai_examples.py: Simplified skip logic, added requirements docstring
- Enhanced docs/examples/conftest.py: Detect pytest.skip() exceptions in subprocess stderr
- Fixed test_vision_openai.py::test_image_block_in_chat: Added @pytest.mark.qualitative decorator
- Configured testpaths = ["test", "docs"] in pyproject.toml for fail-fast behavior

Impact:

Test execution time: Reduced from hangs/timeouts to ~4 minutes
Reliability: Proper skip handling for missing dependencies (langchain, models, etc.)
Developer experience: Fast feedback loop with default runs, full suite available with pytest -m ""
Test results: 224 passed, 37 skipped, 0 failures

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

- Add pytest markers to example files for proper categorization - Wrap heavy examples with complex initialization to prevent collection hangs - Fix critical skip logic bug in example conftest (import failure) - Configure qualitative tests to be excluded by default - Update docs Tests now run significantly faster and reliably on constrained systems. Run full suite with 'pytest -m ""'.

mergify · 2026-01-28T12:45:12Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

github-actions · 2026-01-28T12:46:46Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

- Modified 58 example files to conditionally import pytest - Allows examples to run standalone without pytest dependency - Maintains pytest marker functionality when run via pytest

planetf1 · 2026-01-28T12:59:42Z

There's a few things to clarify from this PR

as with other PRs, the objective is to run as many tests as we can sensibly without too much burden on developer or expectation of a very high spec system
Trying to 'collect' heavy tests with pytest is problematic, particularly from the hugging face examples. They initialize as part of the hugging face import, which attempts to load a large model - which is infeasible on modest systems. A workaround is to add a conditional as above - at the cost of some 'example' clutter.
Further to above, the marking of tests in examples brings in a dependency on pytest. To avoid this there's more conditional logic on the import so that examples can be run - at a little more example clutter
local tests in ollama will fail if models are not found. I've left these failing
We may be able to update particular tests to change models used etc - but should raise distinct issues for those. This PR is to get an initial set working
Issue mellea_pdf example requires access to geoblocked content - needs global accessible content #373 was opened - geoblocked content.

- USDA website returns 403 Forbidden (likely rate limiting/user-agent filtering) - External network dependency makes test flaky - Marked as qualitative to exclude from default test runs

- Fix vision_openai_examples.py: simplified skip logic, added requirements docstring - Fix langchain_messages.py: enhanced conftest to detect pytest.skip() in subprocess - Fix test_vision_openai.py: added @pytest.mark.qualitative decorator - Configure pytest to run test/ before docs/ for fail-fast behavior All tests now pass/skip/deselect correctly with default configuration.

planetf1 · 2026-01-28T14:01:29Z

Here's an example output run after these fixes, just using the default uv run pytest

This was run on a 32GB macbook m1 max - and takes around 4 minutes

= 224 passed, 37 skipped, 78 deselected, 1 xpassed, 24 warnings in 235.98s (0:03:55) =

test_run_after_fixes.txt

psschwei · 2026-01-28T14:23:11Z

Here's the results from a run on my 32GB RAM / 4GB VRAM Thinkpad

$ pytest
=========================================================================== test session starts ===========================================================================
platform linux -- Python 3.12.8, pytest-9.0.0, pluggy-1.6.0
rootdir: /home/paulschw/generative-computing/mellea-pr-372
configfile: pyproject.toml
testpaths: test, docs
plugins: cov-7.0.0, anyio-4.11.0, asyncio-1.3.0, nbmake-1.5.5, Faker-37.12.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 339 items / 78 deselected / 1 skipped / 261 selected

test/backends/test_adapters/test_adapter.py .                                                                                                                       [  0%]
test/backends/test_huggingface.py s                                                                                                                                 [  0%]
test/backends/test_litellm_ollama.py ......                                                                                                                         [  3%]
test/backends/test_litellm_watsonx.py s                                                                                                                             [  3%]
test/backends/test_model_options.py .....                                                                                                                           [  5%]
test/backends/test_ollama.py X....                                                                                                                                  [  7%]
test/backends/test_openai_ollama.py .......                                                                                                                         [  9%]
test/backends/test_tool_calls.py ...                                                                                                                                [ 11%]
test/backends/test_tool_helpers.py ...                                                                                                                              [ 12%]
test/backends/test_vision_ollama.py ....                                                                                                                            [ 13%]
test/backends/test_vision_openai.py ...                                                                                                                             [ 14%]
test/backends/test_watsonx.py s                                                                                                                                     [ 15%]
test/core/test_base.py ....                                                                                                                                         [ 16%]
test/core/test_component_typing.py .....                                                                                                                            [ 18%]
test/core/test_model_output_thunk.py ..                                                                                                                             [ 19%]
test/formatters/test_template_formatter.py ................                                                                                                         [ 25%]
test/helpers/test_event_loop_helper.py ...                                                                                                                          [ 26%]
test/stdlib/components/docs/test_richdocument.py EEEE.s                                                                                                             [ 29%]
test/stdlib/components/test_chat.py .                                                                                                                               [ 29%]
test/stdlib/components/test_genslot.py ssssssssssssssssss                                                                                                           [ 36%]
test/stdlib/components/test_hello_world.py ..                                                                                                                       [ 37%]
test/stdlib/components/test_mify.py ...........                                                                                                                     [ 41%]
test/stdlib/components/test_transform.py ..                                                                                                                         [ 42%]
test/stdlib/requirements/test_reqlib_markdown.py ......                                                                                                             [ 44%]
test/stdlib/requirements/test_reqlib_python.py .....................                                                                                                [ 52%]
test/stdlib/requirements/test_reqlib_tools.py .                                                                                                                     [ 52%]
test/stdlib/requirements/test_requirement.py .....                                                                                                                  [ 54%]
test/stdlib/sampling/test_majority_voting.py ..                                                                                                                     [ 55%]
test/stdlib/sampling/test_sampling_ctx.py ..                                                                                                                        [ 56%]
test/stdlib/sampling/test_sofai_graph_coloring.py ......................                                                                                            [ 64%]
test/stdlib/sampling/test_sofai_sampling.py ....................                                                                                                    [ 72%]
test/stdlib/test_base_context.py .....                                                                                                                              [ 74%]
test/stdlib/test_chat_view.py ..                                                                                                                                    [ 75%]
test/stdlib/test_functional.py ....                                                                                                                                 [ 76%]
test/stdlib/test_session.py F.......                                                                                                                                [ 79%]
docs/examples/aLora/101_example.py s                                                                                                                                [ 80%]
docs/examples/agents/react.py .                                                                                                                                     [ 80%]
docs/examples/agents/react_instruct.py .                                                                                                                            [ 80%]
docs/examples/conftest.py .                                                                                                                                         [ 81%]
docs/examples/context/contexts_with_sampling.py .                                                                                                                   [ 81%]
docs/examples/generative_slots/generate_with_context.py .                                                                                                           [ 81%]
docs/examples/generative_slots/generative_slots.py .                                                                                                                [ 82%]
docs/examples/generative_slots/generative_slots_with_requirements.py .                                                                                              [ 82%]
docs/examples/generative_slots/inter_module_composition/decision_aides.py .                                                                                         [ 83%]
docs/examples/generative_slots/inter_module_composition/summarize_and_decide.py .                                                                                   [ 83%]
docs/examples/generative_slots/inter_module_composition/summarizers.py .                                                                                            [ 83%]
docs/examples/generative_slots/investment_advice.py .                                                                                                               [ 84%]
docs/examples/hello_world.py .                                                                                                                                      [ 84%]
docs/examples/helper/helpers.py .                                                                                                                                   [ 85%]
docs/examples/image_text_models/vision_litellm_backend.py .                                                                                                         [ 85%]
docs/examples/image_text_models/vision_ollama_chat.py .                                                                                                             [ 85%]
docs/examples/image_text_models/vision_openai_examples.py F                                                                                                         [ 86%]
docs/examples/information_extraction/101_with_gen_slots.py .                                                                                                        [ 86%]
docs/examples/information_extraction/advanced_with_m_instruct.py .                                                                                                  [ 86%]
docs/examples/instruct_validate_repair/101_email.py .                                                                                                               [ 87%]
docs/examples/instruct_validate_repair/101_email_comparison.py .                                                                                                    [ 87%]
docs/examples/instruct_validate_repair/101_email_with_requirements.py .                                                                                             [ 88%]
docs/examples/instruct_validate_repair/101_email_with_validate.py .                                                                                                 [ 88%]
docs/examples/instruct_validate_repair/advanced_email_with_validate_function.py .                                                                                   [ 88%]
docs/examples/intrinsics/answer_relevance.py s                                                                                                                      [ 89%]
docs/examples/intrinsics/answerability.py s                                                                                                                         [ 89%]
docs/examples/intrinsics/citations.py s                                                                                                                             [ 90%]
docs/examples/intrinsics/context_relevance.py s                                                                                                                     [ 90%]
docs/examples/intrinsics/hallucination_detection.py s                                                                                                               [ 90%]
docs/examples/intrinsics/intrinsics.py s                                                                                                                            [ 91%]
docs/examples/intrinsics/query_rewrite.py s                                                                                                                         [ 91%]
docs/examples/library_interop/langchain_messages.py s                                                                                                               [ 91%]
docs/examples/m_serve/m_serve_example_simple.py .                                                                                                                   [ 92%]
docs/examples/melp/lazy_fib.py .                                                                                                                                    [ 92%]
docs/examples/melp/lazy_fib_sample.py .                                                                                                                             [ 93%]
docs/examples/melp/simple_example.py .                                                                                                                              [ 93%]
docs/examples/melp/states.py .                                                                                                                                      [ 93%]
docs/examples/mify/mify.py .                                                                                                                                        [ 94%]
docs/examples/mify/rich_document_advanced.py s                                                                                                                      [ 94%]
docs/examples/mini_researcher/context_docs.py .                                                                                                                     [ 95%]
docs/examples/mobject/table.py .                                                                                                                                    [ 95%]
docs/examples/safety/guardian.py s                                                                                                                                  [ 95%]
docs/examples/safety/guardian_huggingface.py s                                                                                                                      [ 96%]
docs/examples/safety/repair_with_guardian.py s                                                                                                                      [ 96%]
docs/examples/tools/interpreter_example.py .                                                                                                                        [ 96%]
docs/examples/tutorial/compositionality_with_generative_slots.py .                                                                                                  [ 97%]
docs/examples/tutorial/context_example.py .                                                                                                                         [ 97%]
docs/examples/tutorial/example.py .                                                                                                                                 [ 98%]
docs/examples/tutorial/instruct_validate_repair.py .                                                                                                                [ 98%]
docs/examples/tutorial/model_options_example.py .                                                                                                                   [ 98%]
docs/examples/tutorial/sentiment_classifier.py .                                                                                                                    [ 99%]
docs/examples/tutorial/simple_email.py .                                                                                                                            [ 99%]
docs/examples/tutorial/table_mobject.py .                                                                                                                           [100%]

<snip>

========================================================================= short test summary info =========================================================================
FAILED test/stdlib/test_session.py::test_start_session_watsonx - ibm_watsonx_ai.wml_client_error.WMLClientError: `url` must start with `https://`.
FAILED docs/examples/image_text_models/vision_openai_examples.py::vision_openai_examples.py - Example failed with exit code 1.
ERROR test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 3.64 GiB of which 69.75 MiB is free. Including non-PyTorch memo...
ERROR test/stdlib/components/docs/test_richdocument.py::test_richdocument_markdown - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 3.64 GiB of which 69.75 MiB is free. Including non-PyTorch memo...
ERROR test/stdlib/components/docs/test_richdocument.py::test_richdocument_save - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 3.64 GiB of which 69.75 MiB is free. Including non-PyTorch memo...
ERROR test/stdlib/components/docs/test_richdocument.py::test_table - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 3.64 GiB of which 69.75 MiB is free. Including non-PyTorch memo...
================================= 2 failed, 219 passed, 36 skipped, 78 deselected, 1 xpassed, 17 warnings, 4 errors in 531.47s (0:08:51) ==================================

psschwei · 2026-01-28T15:03:07Z

I don't think my issues are blocking, so I'm inclined to approve / merge but will wait a bit in case there are other opinions.

planetf1 · 2026-01-28T15:09:32Z

FAILED test/stdlib/test_session.py::test_start_session_watsonx - ibm_watsonx_ai.wml_client_error.WMLClientError: `url` must start with `https://`.

-> A parsing error in the check. I'll fix. The URL is set in my environment

FAILED docs/examples/image_text_models/vision_openai_examples.py::vision_openai_examples.py - Example failed with exit code 1.

qwen2.5vl:7b may not be available. That being said, you'd then fail with memory issues... maybe ollama failed to load

We have a number of models used in our tests. I looked at this one for validation as I didn't have it, but actually we'd need to do something on every test to check the model exists. Docs/upfront check can easily get out of date unless we can do it programatically.

-> Suggest discussion, then additional issue/pr if action needed

and multiple ones like

ERROR test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics - torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 3.64 GiB of which 69.75 MiB is free. Including non-PyTorch memo...

are down to insufficient resources.

4GB vram is low for AI dev (though we'd always want to drive down min requirements for usage). The error at least is clear. A possible mitigation would be more fine grained markers to document memory usage - perhaps resulting in skipping all llm tests? Another alternative might be to allow cpu only (not gpu)

-> Similar - let's discuss and plan

- Remove f-string formatting that converts None to 'None' string - Allows proper None handling by ibm_watsonx_ai library - Fixes WMLClientError when WATSONX_URL env var not set

planetf1 · 2026-01-28T15:16:33Z

The issue with watsonx should be fixed now.

- Add @pytest.mark.watsonx and @pytest.mark.requires_api_key markers - Ensures test is skipped at collection time when WATSONX_* env vars missing - Prevents misleading error messages about 'None' URLs

- Add @pytest.mark.requires_heavy_ram to vision_openai_examples.py - Model requires ~14GB RAM when loaded (too much for 32GB systems) - Updated docstring to document RAM requirement

psschwei · 2026-01-28T15:34:17Z

I did not have the qwen model downloaded. Pulled it and trying again just to see what happens
edit: once I downloaded the model, that test worked fine

planetf1 · 2026-01-28T15:54:49Z

I did not have the qwen model downloaded. Pulled it and trying again just to see what happens edit: once I downloaded the model, that test worked fine

ollama probably intelligently split the layers across gpu and cpu. I have marked the test as large ram, as when I ran it the system was struggling somewhat since the model was ~16GB ram by itself.

psschwei · 2026-01-28T16:00:05Z

the model was ~16GB ram by itself.

qwen2.5vl:7b ? For me it's only 6GB (we're starting to approach building cross-platform container images levels of fun here 😄 )

psschwei · 2026-01-28T17:01:21Z

docs/examples/agents/react.py

@@ -1,3 +1,9 @@
+try:


this came up in a discussion earlier today, but depending on the purpose of these examples we may not want to add a pytest import: for example, if the goal is to just have a minimal snippet that can be referenced in the docs, then we may not want it. but might be something that warrants more discussion.

echoing this, I don't think adding a block like this to every example is the best path. Is there a way to "mark" dirs in the config instead? or have a config for the docs dir marking specific files?

We could ditch markers and do it programmatically. This might also allow the import of pytest to be removed but means another place to configure tests (ie code the markers when we add an example rather than being explicit)

However I don't think we could fix the import issues like hugging face initialisation loading the model (or operator by refactoring the Code maybe we could) which necessitates the main condition.

But that's a common pattern anyway so maybe ok

Happy to look tomorrow either based on merging this one and on top or not

ajbozarth · 2026-01-28T17:56:08Z

README.md

+uv run pytest
+
+# Full test suite (includes qualitative tests)
+uv run pytest -m ""


I'm not sure I agree with this, I think the default should always be too run everything and a flag to limit, though I believe we should find a way for the flag to be simpler than -m "not qualitative", also -m "" is very unintuitive.

Separately if #369 is merged first then these doc changes "should" all be consolidated to that new doc

fix: wrap pytest imports in try/except for standalone execution

1cd9c7b

- Modified 58 example files to conditionally import pytest - Allows examples to run standalone without pytest dependency - Maintains pytest marker functionality when run via pytest

planetf1 added 2 commits January 28, 2026 13:04

fix: mark rag/mellea_pdf.py as qualitative due to external dependency

c5e36ef

- USDA website returns 403 Forbidden (likely rate limiting/user-agent filtering) - External network dependency makes test flaky - Marked as qualitative to exclude from default test runs

planetf1 marked this pull request as ready for review January 28, 2026 14:00

fix: handle None from os.environ.get in watsonx backend

1540a3c

- Remove f-string formatting that converts None to 'None' string - Allows proper None handling by ibm_watsonx_ai library - Fixes WMLClientError when WATSONX_URL env var not set

planetf1 force-pushed the feat/optimize-example-test-discovery branch from c43a53b to 1540a3c Compare January 28, 2026 15:15

planetf1 added 2 commits January 28, 2026 15:20

fix(test): skip watsonx test when credentials not configured

2805f0a

- Add @pytest.mark.watsonx and @pytest.mark.requires_api_key markers - Ensures test is skipped at collection time when WATSONX_* env vars missing - Prevents misleading error messages about 'None' URLs

feat(test): mark qwen2.5vl:7b tests as requiring heavy RAM

511bc50

- Add @pytest.mark.requires_heavy_ram to vision_openai_examples.py - Model requires ~14GB RAM when loaded (too much for 32GB systems) - Updated docstring to document RAM requirement

psschwei reviewed Jan 28, 2026

View reviewed changes

ajbozarth reviewed Jan 28, 2026

View reviewed changes

test: optimize example test discovery and execution speed #372

Are you sure you want to change the base?

test: optimize example test discovery and execution speed #372

Conversation

planetf1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

mergify bot commented Jan 28, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

github-actions bot commented Jan 28, 2026

Uh oh!

planetf1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

planetf1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psschwei commented Jan 28, 2026

Uh oh!

psschwei commented Jan 28, 2026

Uh oh!

planetf1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

planetf1 commented Jan 28, 2026

Uh oh!

psschwei commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

planetf1 commented Jan 28, 2026

Uh oh!

psschwei commented Jan 28, 2026

Uh oh!

psschwei Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

planetf1 Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajbozarth Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

planetf1 commented Jan 28, 2026 •

edited

Loading

planetf1 commented Jan 28, 2026 •

edited

Loading

planetf1 commented Jan 28, 2026 •

edited

Loading

planetf1 commented Jan 28, 2026 •

edited

Loading

psschwei commented Jan 28, 2026 •

edited

Loading

planetf1 Jan 28, 2026 •

edited

Loading