Skip to content

Speed up CI and test adjoint in parallel#438

Open
cpjordan wants to merge 9 commits intomainfrom
speed-up-CI-main
Open

Speed up CI and test adjoint in parallel#438
cpjordan wants to merge 9 commits intomainfrom
speed-up-CI-main

Conversation

@cpjordan
Copy link
Copy Markdown
Contributor

@cpjordan cpjordan commented Dec 28, 2025

Depends on #437. Closes #426.

Testing adjoint in (MPI) parallel:
For the MPI parallel adjoint tests, I have used the channel-optimisation example and copied the mesh across, because I couldn’t figure out how to add a subdomain to a RectangleMesh. We could move this into a channel-optimisation directory if that’s cleaner, or alternatively switch to a simplified headland inversion example with regions defined as Constant controls. Either way, this requires a duplicate test. With the current example testing setup, we don’t have a way to select whether tests are run in serial or parallel. This could be updated so that along with each adjoint example, we specify whether it is serial or parallel, that would avoid the duplication.

I've noticed that every now and again the Thetis MPI parallel tests hang and can take hours to complete, it might be worth also adding @pytest.mark.timeout(300) for all our parallel tests. I don't think this is because of Thetis itself but just due to MPI collectives or repeated pytest-xdist collection hanging.

Speeding up CI test suite:
It’s not possible to split matrix jobs on a single runner (as far as I can tell). I’ve implemented a matrix strategy that splits between runners, but we could revert to running everything on a single runner if preferred. Another option would be to merge the main and adjoint serial tests, although I prefer keeping them separate for clarity and cleaner outputs. If we stick with this then we need to update the tests that are required to pass for merging.

@cpjordan cpjordan marked this pull request as ready for review December 29, 2025 14:03
Copy link
Copy Markdown
Contributor

@connorjward connorjward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also adding @pytest.mark.timeout(300) for all our parallel tests

I recommend setting --timeout 300 --timeout-method thread for this (example).

. venv-thetis/bin/activate
python -m pytest -n 8 --verbose --durations=0 thetis-repo/test_adjoint
if [ "${{ matrix.target }}" = "thetis" ]; then
mpiexec -n 2 python -m pytest --verbose --durations=0 --durations-min=60.0 \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are losing a potential speedup of 4x here because there are 8 available cores on each runner. In Firedrake we use firedrake-run-split-tests to ensure utilisation (link).

@cpjordan
Copy link
Copy Markdown
Contributor Author

Thanks for the comments @connorjward - I'll take a look in detail again soon and probably have some follow up questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Testing of inversion/optimisation/adjoint capabality in parallel

2 participants