A/B tests with package sync + repeats by crusaderky · Pull Request #355 · coiled/benchmarks

crusaderky · 2022-09-20T16:25:34Z

Supersedes Repeat A/B tests #324
Repair A/B tests after Package Sync #235
Add throttling to prevent AWS DoS protection from kicking in
Refactor regular (non-A/B) tests
Introduce repeat: N setting, which causes every A/B test runtime to be rerun N times
Overhaul the A/B performance reports to display statistical data instead of exact
Introduce test_null_hypothesis: true setting, which creates a verbatim clone of AB_baseline

Out of scope:

Merge tests.yaml with ab_tests.yaml. This will happen in a future PR.

crusaderky · 2022-09-20T23:38:00Z

A/B tests evidence: https://github.com/coiled/coiled-runtime/actions/runs/3091876943

crusaderky · 2022-09-20T23:40:40Z

This is ready for review

ian-r-rose

Most of these changes look reasonable to me, though I haven't gone through in detail. My main question is whether we should just further simplify the tests workflow matrix rather than pushing these lumpy include blocks around.

ian-r-rose · 2022-09-21T16:06:12Z

.github/workflows/tests.yml

 jobs:
-  runtime:
-    name: Runtime - ${{ matrix.os }}, Python ${{ matrix.python-version }}, Runtime ${{ matrix.runtime-version }}
+  tests:


Over in #279 I proposed doing something similar to this, but without a new category matrix item. What would you think about just running the tests as a single job, and letting xdist do the rest?

I'd say it's reasonable, and it would save some time. Some non-trivial engineering is needed - mind if I do it in a successive PR?

Sure, I don't think anything here makes that refactor harder.

ian-r-rose · 2022-09-21T16:09:45Z

.github/workflows/tests.yml

+        os: [ubuntu-latest]
        python-version: ["3.9"]
-        runtime-version: ["upstream", "latest", "0.0.4", "0.1.0"]
+        category: [runtime, benchmarks, stability]


So maybe we just don't do this and all of the extra include logic.

Even if you run all tests together, you'll still need a wordy include paragraph.

Instead of this:

matrix: os: [ubuntu-latest] python-version: ["3.9"] category: [runtime, benchmarks, stability] runtime-version: [upstream, latest, "0.0.4", "0.1.0"] include: # Run stability tests on Python 3.8 - category: stability python-version: "3.8" runtime-version: upstream os: ubuntu-latest ...

it will look like this:

matrix: os: [ubuntu-latest] python-version: ["3.9"] pytest_args: [tests] runtime-version: [upstream, latest, "0.0.4", "0.1.0"] include: # Run stability tests on Python 3.8 - pytest_args: tests/stability python-version: "3.8" runtime-version: upstream os: ubuntu-latest ...

In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with discover_ab_environments.py (to be renamed); the matrix for non-A/B tests will be generated from parameters in ci/config.yaml (now AB_environments/config.yaml)

Yeah, I was proposing just running the whole test suite on every matrix value. Perhaps overkill, however.

In the next PR, I want to merge ab_tests.yaml into tests.yaml. In that PR I'll dynamically generate the whole matrix with discover_ab_environments.py (to be renamed); the matrix for non-A/B tests will be generated from parameters in ci/config.yaml (now AB_environments/config.yaml)

I'm a little concerned about the complexity of generating bespoke test matrices, and what it will mean for local testing. I was hoping to make the test matrix way simpler.

ian-r-rose

I haven't gone through in great detail, but this looks good to me from a high level

crusaderky added 8 commits September 15, 2022 22:05

Repeat A/B tests

45f0661

Merge from main

c778881

Throttle concurrent runs

c58806c

tweaks

1544518

Add upstream

79f4fa1

Merge branch 'main' into guido/AB_package_sync

245917a

Env variables

a0a4ac3

temp enable A/B

cce60d0

crusaderky marked this pull request as draft September 20, 2022 16:25

fix null hypothesis

603a55f

crusaderky self-assigned this Sep 20, 2022

revert temp changes

911f0c6

crusaderky requested review from ian-r-rose and ncclementi September 20, 2022 23:40

crusaderky marked this pull request as ready for review September 21, 2022 13:30

ian-r-rose reviewed Sep 21, 2022

View reviewed changes

Merge branch 'main' into guido/AB_package_sync

a7b1281

ian-r-rose approved these changes Sep 22, 2022

View reviewed changes

crusaderky merged commit b14536d into main Sep 22, 2022

crusaderky deleted the guido/AB_package_sync branch September 22, 2022 15:31

crusaderky mentioned this pull request Sep 22, 2022

Repeat A/B tests #324

Closed

crusaderky mentioned this pull request Nov 30, 2022

Tweak parallelism from A/B config #562

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A/B tests with package sync + repeats#355

A/B tests with package sync + repeats#355
crusaderky merged 11 commits intomainfrom
guido/AB_package_sync

crusaderky commented Sep 20, 2022 •

edited

Loading

Uh oh!

crusaderky commented Sep 20, 2022

Uh oh!

crusaderky commented Sep 20, 2022

Uh oh!

ian-r-rose left a comment

Uh oh!

ian-r-rose Sep 21, 2022

Uh oh!

crusaderky Sep 22, 2022

Uh oh!

ian-r-rose Sep 22, 2022

Uh oh!

ian-r-rose Sep 21, 2022

Uh oh!

crusaderky Sep 22, 2022 •

edited

Loading

Uh oh!

ian-r-rose Sep 22, 2022

Uh oh!

ian-r-rose left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

crusaderky commented Sep 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusaderky commented Sep 20, 2022

Uh oh!

crusaderky commented Sep 20, 2022

Uh oh!

ian-r-rose left a comment

Choose a reason for hiding this comment

Uh oh!

ian-r-rose Sep 21, 2022

Choose a reason for hiding this comment

Uh oh!

crusaderky Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

ian-r-rose Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

ian-r-rose Sep 21, 2022

Choose a reason for hiding this comment

Uh oh!

crusaderky Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ian-r-rose Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

ian-r-rose left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crusaderky commented Sep 20, 2022 •

edited

Loading

crusaderky Sep 22, 2022 •

edited

Loading