Ci speedup by def- · Pull Request #35014 · MaterializeInc/materialize

def- · 2026-02-13T22:27:05Z

No description provided.

github-actions · 2026-02-13T22:27:14Z

Pre-merge checklist

The PR title is descriptive and will make sense in the git log.
This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

docker_images() shells out to `docker images` on every call. It is called once per DependencySet construction (~34 times during a full mkpipeline run). Adding @cache avoids redundant subprocess calls. Measured on dev machine: 34 uncached calls: 0.987s (29.0ms each) 1 uncached + 33 cached: 0.000s Savings: ~0.96s Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

trim_tests_pipeline loads each composition with munge_services=True to discover image dependencies. This triggers expensive fingerprinting and dependency resolution for every composition. Since we only need to know which mzbuild images a composition references (not their fingerprints), use munge_services=False and extract image names directly from the service configs. Measured on dev machine (36 compositions): munge_services=True: 5.82s munge_services=False: 2.95s Savings: 2.87s (2x speedup) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

list-workflows only needs to enumerate workflow function names from the mzcompose.py module. It does not need resolved image specs or fingerprints. Pass munge_services=False to skip the expensive dependency resolution. This is called once per CI step from the mzcompose plugin hook. Measured on dev machine (cluster composition): munge_services=True: 2.454s munge_services=False: 0.075s Savings: 2.379s per invocation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fetch_hashes resolves dependencies for both architectures sequentially, each involving expensive file fingerprinting. Since the two arch builds are completely independent, resolve them in parallel using ThreadPoolExecutor. Measured on dev machine: Sequential: 8.06s Parallel: 1.90s Savings: 6.16s (4.2x speedup) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `mzcompose description` command only needs the module docstring, which is available without expensive dependency resolution and fingerprinting. Use munge_services=False to skip that work. This is called once per CI step via the mzcompose buildkite plugin command hook (line 100: `TEST_DESC="$(mzcompose description)"`). Measured savings: ~2.5s per `mzcompose description` call munge_services=True: 2.55s (full load_composition) munge_services=False: 0.23s (full load_composition) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `mzcompose describe` (aka `ls`/`list`) command only displays service names, workflow names/docstrings, and the composition description. All of this data is available without expensive dependency resolution and fingerprinting. This is primarily a local development speedup since describe is not called in CI, but it makes `mzcompose ls` much more responsive. Measured savings: ~2.5s per `mzcompose describe` call munge_services=True: 2.78s (full load_composition) munge_services=False: 0.25s (full load_composition) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When resolving image dependencies, each Rust crate's input files were discovered via individual `git diff` + `git ls-files` subprocess calls. With ~118 crates across the workspace, this meant ~236 subprocess calls just for crate file enumeration. Add Workspace.precompute_crate_inputs() which does a single pair of git calls to discover all crate files at once, then partitions the results by crate path in Python. This is called automatically at the start of resolve_dependencies(). Measured savings for resolve_dependencies(all 41 images): Before: 4.80s After: 2.84s Savings: 1.96s (41%) Measured savings for single composition (pg-cdc, munge_services=True): Before: 2.57s After: 0.78s Savings: 1.80s (70%) This benefits every `mzcompose up` and `mzcompose run` call in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When computing image fingerprints, each image's context files were discovered via individual `git diff` + `git ls-files` subprocess calls. With 41 images, this meant 82 subprocess calls just for image context enumeration. Add Repository._precompute_image_context_files() which does a single pair of git calls to discover all image context files at once, then partitions results by image path. This is called automatically at the start of resolve_dependencies(). Combined with the crate input batching from the previous commit: Measured savings for resolve_dependencies(all 41 images): Before (no batching): 4.17s After (both batched): 1.85s Savings: 2.32s (56%) Measured savings for single composition (pg-cdc, munge_services=True): Before: 2.43s After: 0.64s Savings: 1.79s (74%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two changes eliminate the remaining ~82 git subprocess calls from the fingerprinting path: 1. CargoPreImage.inputs() now resolves its hardcoded inputs eagerly: - 'ci/builder' directory is expanded to individual files via a single cached expand_globs call - '.cargo/config' is included only if it exists - Result is cached with @cache since it's the same for all images 2. ResolvedImage.fingerprint() skips the expand_globs verification pass when precomputed data is available, since all inputs are already individual file paths from git. This eliminates all git subprocess calls from resolve_dependencies, reducing the total from ~384 calls (baseline) to just 5 (2 for crate batch + 2 for image batch + 1 for ci/builder). Measured savings for resolve_dependencies(all 41 images): Before (no batching): 4.15s After (all batching): 0.40s Savings: 3.75s (90%) Measured savings for single composition (pg-cdc, munge_services=True): Before: 2.23s After: 0.26s Savings: 1.97s (88%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The crate and image context precomputation used str(path) directly for both git pathspecs and the file partitioning step. This works when the repository root is a relative path (Path(".")), but fails when it's an absolute path (as it is when MZ_ROOT is set in CI via mzcompose). The issue: git --relative outputs paths relative to cwd, but the partition logic compared these relative paths against potentially absolute image/crate paths, causing the startswith() check to fail. This left _context_files_cache empty, triggering the "files are unknown to git" assertion. Fix: use path.relative_to(root) to normalize all paths before constructing git specs and partitioning results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pyright's type checker requires attributes to be declared on the class. Declare _inputs_cache on Crate and _context_files_cache on Image as Optional[set[str]] fields, initialized to None in __init__, and replace hasattr() checks with `is not None` comparisons. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

def- and others added 9 commits February 13, 2026 22:46

def- force-pushed the ci-speedup branch from cc1ada6 to d324913 Compare February 13, 2026 22:47

def- and others added 2 commits February 13, 2026 23:03

def- force-pushed the ci-speedup branch from d324913 to 710a3c9 Compare February 13, 2026 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ci speedup#35014

Ci speedup#35014
def- wants to merge 11 commits intoMaterializeInc:mainfrom
def-:ci-speedup

def- commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

def- commented Feb 13, 2026

Uh oh!

github-actions bot commented Feb 13, 2026

Pre-merge checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant