A/B tests with package sync + repeats#355
Conversation
|
A/B tests evidence: https://github.com/coiled/coiled-runtime/actions/runs/3091876943 |
|
This is ready for review |
ian-r-rose
left a comment
There was a problem hiding this comment.
Most of these changes look reasonable to me, though I haven't gone through in detail. My main question is whether we should just further simplify the tests workflow matrix rather than pushing these lumpy include blocks around.
| jobs: | ||
| runtime: | ||
| name: Runtime - ${{ matrix.os }}, Python ${{ matrix.python-version }}, Runtime ${{ matrix.runtime-version }} | ||
| tests: |
There was a problem hiding this comment.
Over in #279 I proposed doing something similar to this, but without a new category matrix item. What would you think about just running the tests as a single job, and letting xdist do the rest?
There was a problem hiding this comment.
I'd say it's reasonable, and it would save some time. Some non-trivial engineering is needed - mind if I do it in a successive PR?
There was a problem hiding this comment.
Sure, I don't think anything here makes that refactor harder.
| os: [ubuntu-latest] | ||
| python-version: ["3.9"] | ||
| runtime-version: ["upstream", "latest", "0.0.4", "0.1.0"] | ||
| category: [runtime, benchmarks, stability] |
There was a problem hiding this comment.
So maybe we just don't do this and all of the extra include logic.
There was a problem hiding this comment.
Even if you run all tests together, you'll still need a wordy include paragraph.
Instead of this:
matrix:
os: [ubuntu-latest]
python-version: ["3.9"]
category: [runtime, benchmarks, stability]
runtime-version: [upstream, latest, "0.0.4", "0.1.0"]
include:
# Run stability tests on Python 3.8
- category: stability
python-version: "3.8"
runtime-version: upstream
os: ubuntu-latest
...it will look like this:
matrix:
os: [ubuntu-latest]
python-version: ["3.9"]
pytest_args: [tests]
runtime-version: [upstream, latest, "0.0.4", "0.1.0"]
include:
# Run stability tests on Python 3.8
- pytest_args: tests/stability
python-version: "3.8"
runtime-version: upstream
os: ubuntu-latest
...In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with discover_ab_environments.py (to be renamed); the matrix for non-A/B tests will be generated from parameters in ci/config.yaml (now AB_environments/config.yaml)
There was a problem hiding this comment.
Yeah, I was proposing just running the whole test suite on every matrix value. Perhaps overkill, however.
In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with
discover_ab_environments.py(to be renamed); the matrix for non-A/B tests will be generated from parameters inci/config.yaml(nowAB_environments/config.yaml)
I'm a little concerned about the complexity of generating bespoke test matrices, and what it will mean for local testing. I was hoping to make the test matrix way simpler.
ian-r-rose
left a comment
There was a problem hiding this comment.
I haven't gone through in great detail, but this looks good to me from a high level
repeat: Nsetting, which causes every A/B test runtime to be rerun N timestest_null_hypothesis: truesetting, which creates a verbatim clone of AB_baselineOut of scope: