Skip to content

Ci speedup#35014

Draft
def- wants to merge 11 commits intoMaterializeInc:mainfrom
def-:ci-speedup
Draft

Ci speedup#35014
def- wants to merge 11 commits intoMaterializeInc:mainfrom
def-:ci-speedup

Conversation

@def-
Copy link
Contributor

@def- def- commented Feb 13, 2026

No description provided.

@github-actions
Copy link

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

def- and others added 9 commits February 13, 2026 22:46
docker_images() shells out to `docker images` on every call. It is
called once per DependencySet construction (~34 times during a full
mkpipeline run). Adding @cache avoids redundant subprocess calls.

Measured on dev machine:
  34 uncached calls: 0.987s (29.0ms each)
  1 uncached + 33 cached: 0.000s
  Savings: ~0.96s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
trim_tests_pipeline loads each composition with munge_services=True to
discover image dependencies. This triggers expensive fingerprinting and
dependency resolution for every composition. Since we only need to know
which mzbuild images a composition references (not their fingerprints),
use munge_services=False and extract image names directly from the
service configs.

Measured on dev machine (36 compositions):
  munge_services=True:  5.82s
  munge_services=False: 2.95s
  Savings: 2.87s (2x speedup)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
list-workflows only needs to enumerate workflow function names from the
mzcompose.py module. It does not need resolved image specs or
fingerprints. Pass munge_services=False to skip the expensive
dependency resolution.

This is called once per CI step from the mzcompose plugin hook.

Measured on dev machine (cluster composition):
  munge_services=True:  2.454s
  munge_services=False: 0.075s
  Savings: 2.379s per invocation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fetch_hashes resolves dependencies for both architectures sequentially,
each involving expensive file fingerprinting. Since the two arch builds
are completely independent, resolve them in parallel using
ThreadPoolExecutor.

Measured on dev machine:
  Sequential: 8.06s
  Parallel:   1.90s
  Savings:    6.16s (4.2x speedup)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `mzcompose description` command only needs the module docstring,
which is available without expensive dependency resolution and
fingerprinting. Use munge_services=False to skip that work.

This is called once per CI step via the mzcompose buildkite plugin
command hook (line 100: `TEST_DESC="$(mzcompose description)"`).

Measured savings: ~2.5s per `mzcompose description` call
  munge_services=True:  2.55s (full load_composition)
  munge_services=False: 0.23s (full load_composition)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `mzcompose describe` (aka `ls`/`list`) command only displays service
names, workflow names/docstrings, and the composition description. All of
this data is available without expensive dependency resolution and
fingerprinting.

This is primarily a local development speedup since describe is not
called in CI, but it makes `mzcompose ls` much more responsive.

Measured savings: ~2.5s per `mzcompose describe` call
  munge_services=True:  2.78s (full load_composition)
  munge_services=False: 0.25s (full load_composition)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When resolving image dependencies, each Rust crate's input files were
discovered via individual `git diff` + `git ls-files` subprocess calls.
With ~118 crates across the workspace, this meant ~236 subprocess calls
just for crate file enumeration.

Add Workspace.precompute_crate_inputs() which does a single pair of
git calls to discover all crate files at once, then partitions the
results by crate path in Python. This is called automatically at the
start of resolve_dependencies().

Measured savings for resolve_dependencies(all 41 images):
  Before: 4.80s
  After:  2.84s
  Savings: 1.96s (41%)

Measured savings for single composition (pg-cdc, munge_services=True):
  Before: 2.57s
  After:  0.78s
  Savings: 1.80s (70%)

This benefits every `mzcompose up` and `mzcompose run` call in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When computing image fingerprints, each image's context files were
discovered via individual `git diff` + `git ls-files` subprocess calls.
With 41 images, this meant 82 subprocess calls just for image context
enumeration.

Add Repository._precompute_image_context_files() which does a single
pair of git calls to discover all image context files at once, then
partitions results by image path. This is called automatically at the
start of resolve_dependencies().

Combined with the crate input batching from the previous commit:

Measured savings for resolve_dependencies(all 41 images):
  Before (no batching):       4.17s
  After (both batched):       1.85s
  Savings: 2.32s (56%)

Measured savings for single composition (pg-cdc, munge_services=True):
  Before: 2.43s
  After:  0.64s
  Savings: 1.79s (74%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two changes eliminate the remaining ~82 git subprocess calls from the
fingerprinting path:

1. CargoPreImage.inputs() now resolves its hardcoded inputs eagerly:
   - 'ci/builder' directory is expanded to individual files via a single
     cached expand_globs call
   - '.cargo/config' is included only if it exists
   - Result is cached with @cache since it's the same for all images

2. ResolvedImage.fingerprint() skips the expand_globs verification
   pass when precomputed data is available, since all inputs are
   already individual file paths from git.

This eliminates all git subprocess calls from resolve_dependencies,
reducing the total from ~384 calls (baseline) to just 5 (2 for crate
batch + 2 for image batch + 1 for ci/builder).

Measured savings for resolve_dependencies(all 41 images):
  Before (no batching):  4.15s
  After (all batching):  0.40s
  Savings: 3.75s (90%)

Measured savings for single composition (pg-cdc, munge_services=True):
  Before: 2.23s
  After:  0.26s
  Savings: 1.97s (88%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
def- and others added 2 commits February 13, 2026 23:03
The crate and image context precomputation used str(path) directly for
both git pathspecs and the file partitioning step. This works when the
repository root is a relative path (Path(".")), but fails when it's an
absolute path (as it is when MZ_ROOT is set in CI via mzcompose).

The issue: git --relative outputs paths relative to cwd, but the
partition logic compared these relative paths against potentially
absolute image/crate paths, causing the startswith() check to fail.
This left _context_files_cache empty, triggering the "files are
unknown to git" assertion.

Fix: use path.relative_to(root) to normalize all paths before
constructing git specs and partitioning results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pyright's type checker requires attributes to be declared on the class.
Declare _inputs_cache on Crate and _context_files_cache on Image as
Optional[set[str]] fields, initialized to None in __init__, and replace
hasattr() checks with `is not None` comparisons.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant