[RFC] ConfigPipeline/AutotuneBackend: Separating compilation/benchmarking from search algorithms

## Background
While implementing #1416, it became clear that the current infrastructure is limited in terms of:
- Reproducible autotuning steps across different compilation/benchmarking approaches
- Metrics collection
- Making different strategies search algorithm agnostic.
- Exploring ideas for custom autotuning strategies that support existing search algorithms #1518 
## Summary
Extract the compile-and-benchmark lifecycle out of `BaseSearch` into a
standalone `ConfigPipeline`. Search algorithms hand a batch of candidate
configs to the pipeline and get scored results back. The pipeline owns
*how* configs are compiled and benchmarked, the search algorithm owns
*which* configs to try.
## Problem
`BaseSearch` is responsible for a wide range of implementations:
1. **Search logic**: candidate generation, population management,
   convergence
2. **Compile/benchmark infrastructure**: compilation, subprocess management
   and GPU benchmarking.
3. **Accuracy validation**: baseline computation, tolerance checking
Consequences:
- **No standalone benchmarking.** You cannot evaluate a fixed set of configs
  without subclassing `BaseSearch` and replicating its setup ceremony.
- **Search coupled with compilation/benchmarking.** Noise mitigation
  strategies (CPU pinning, SIGSTOP, etc.) require modifying `BaseSearch`,
  even though search algorithms don't care.
- **Extensibility is tricky.** Distributed autotuning, alternative
  benchmarking strategies, or reproducible A/B comparisons all require
  working around the current structure.
## Design
### Interface
```python
class ConfigPipeline:
    """Compile and benchmark a batch of kernel configs."""
    def __init__(
        self,
        kernel: _AutotunableKernel,
        args: Sequence[object],
        settings: Settings,
        *,
        log: AutotuningLogger | None = None,
        accuracy_check_fn: Callable | None = None,
    ) -> None: ...
    def run(
        self,
        configs: list[Config],
        *,
        desc: str = "Benchmarking",
    ) -> list[BenchmarkResult]: ...
```
`run()` takes the full batch. The pipeline internally decides how to
schedule compilation and benchmarking based on settings — sequential,
overlapped, or future strategies. Callers just get `list[BenchmarkResult]`
back.

## What this enables
**Custom Pipelines**: One search algorithm can support multiple different compilation and benchmarking strategies.

**Standalone benchmarking:**
```python
with ConfigPipeline(kernel, args, settings) as pipeline:
    results = pipeline.run(fixed_configs)
```
**Strategy comparison (same configs, different evaluation):**
```python
seq_results = ConfigPipeline(kernel, args, seq_settings).run(configs)
ovl_results = ConfigPipeline(kernel, args, ovl_settings).run(configs)
compare_rankings(seq_results, ovl_results)
```
**Future distributed autotuning:** A `DistributedPipeline` farms
compilation to remote workers or benchmarks on multiple GPUs. Search
algorithms call `pipeline.run(configs)` unchanged.
**Pluggable noise mitigation:** CPU pinning, SIGSTOP scheduling, CUDA
graphs, all become pipeline implementation details, invisible to search.

## To do
- Create an abstraction layer for `ConfigPipeline`
- Refactor existing search algorithms to support modular `ConfigPipeline`
- Make existing or custom `ConfigPipelines` easily exchangeable within search algoritms
- Create an interface that supports metrics collection and noise measurement. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] ConfigPipeline/AutotuneBackend: Separating compilation/benchmarking from search algorithms #1803

Background

Summary

Problem

Design

Interface

What this enables

To do

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] ConfigPipeline/AutotuneBackend: Separating compilation/benchmarking from search algorithms #1803

Description

Background

Summary

Problem

Design

Interface

What this enables

To do

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions