From afee96308394ed9884ed70aa08dda269e8182fa0 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Mon, 1 Jun 2026 16:04:40 -0700
Subject: [PATCH 01/13] refactor(scheduler): make bin packing a generic
 utility, decouple from optimizer types
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Addresses Abhishek's feedback on #534 (Bin.java:26):
> Shall we keep bin packing generic as common utility instead of referencing
> internal models and operations? That should give more flexibility and we
> should be able to integrate well with the optimizer flow as well existing
> scheduler flow. I am planning to leverage as a common lib as used in
> PR #604.

This refactor pulls Bin / BinPacker out of the optimizer types and into a
fresh sub-package that mirrors the API shape Abhishek is introducing in
#599 as `jobs.util.binpack` (BinItem + Bin + FirstFitDecreasingBinPacker
with weight + sizeBytes + items caps). When #599 merges we should be able
to swap the import to his shared lib and delete this copy with a one-line
change.

What changed:
- New sub-package `services/optimizer/scheduler/.../binpack/`:
  - BinItem — concrete struct with fqtn, operationId, tableUuid,
    databaseName, tableName, weight, sizeBytes. Carries everything the
    batched Spark app needs to do work and report back, without importing
    any optimizer DTO type.
  - Bin — mutable accumulator with totalWeight + totalSizeBytes running
    totals; package-private mutators, public read-only view.
  - BinPacker — strategy interface: `List<Bin> pack(List<BinItem>)`.
  - FirstFitDecreasingBinPacker — three-cap FFD algorithm
    (maxWeightPerBin, maxSizeBytesPerBin, maxItemsPerBin; 0 disables).
    An item over any single cap is placed in its own bin rather than
    dropped.
- SchedulerRunner now owns ALL optimizer-specific orchestration. Builds
  the BinItem list from (TableOperationDto, TableStatsDto) pairs,
  dispatches to the registered BinPacker, and per returned Bin: claim CAS
  (PENDING → SCHEDULING), partial-claim narrowing on the BinItem list,
  JobsServiceClient.launch, mark SCHEDULED with jobId / revert to PENDING.
  The old Bin.subset() / Bin.schedule() entry points are gone — the runner
  does it directly, since they were the optimizer-specific bits we wanted
  out of Bin.
- SchedulerConfig wires one FirstFitDecreasingBinPacker per operation
  type (currently OFD) with configurable caps:
  optimizer.scheduler.ofd.max-weight-per-bin (default 1_000_000)
  optimizer.scheduler.ofd.max-size-bytes-per-bin (default 5 TiB)
  optimizer.scheduler.ofd.max-items-per-bin (default 50)
  Replaces the old single-cap `max-files-per-bin` property; corresponding
  env vars renamed (SCHEDULER_OFD_MAX_FILES_PER_BIN →
  SCHEDULER_OFD_MAX_WEIGHT_PER_BIN, plus two new ones).
- SchedulerApplication imports BinPacker from the new sub-package.
- Removed: old Bin, BinPacker, FileCountBinPacker, SchedulingCandidate,
  FileCountBinPackerTest.
- New FirstFitDecreasingBinPackerTest covers all three caps, FFD
  ordering, oversized-item placement, and zero-cap-disables-dimension.
- Updated SchedulerRunnerTest: stubs the mock BinPacker by routing
  through a real FirstFitDecreasingBinPacker with unbounded caps, so the
  runner's op → BinItem projection is exercised without going around
  Bin's package-private mutators.

Notes for the merge:
- Same-shape duplication with #599 is intentional and short-lived; the
  swap-out is an import rename plus deleting the four binpack/*.java
  files here.
- Existing FileCountBinPacker semantics carry over: file count is the
  weight dimension. sizeBytes is now populated from
  TableStatsDto.snapshot.tableSizeBytes when available, otherwise 0 (the
  packer just ignores the dimension for items that don't carry it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../scheduler/SchedulerApplication.java       |   1 +
 .../src/main/resources/application.properties |   5 +-
 .../openhouse/optimizer/scheduler/Bin.java    |  61 --------
 .../optimizer/scheduler/BinPacker.java        |  24 ----
 .../scheduler/FileCountBinPacker.java         |  84 -----------
 .../optimizer/scheduler/SchedulerRunner.java  | 104 ++++++++++----
 .../scheduler/SchedulingCandidate.java        |  19 ---
 .../optimizer/scheduler/binpack/Bin.java      |  53 +++++++
 .../optimizer/scheduler/binpack/BinItem.java  |  42 ++++++
 .../scheduler/binpack/BinPacker.java          |  17 +++
 .../binpack/FirstFitDecreasingBinPacker.java  |  70 ++++++++++
 .../scheduler/config/SchedulerConfig.java     |  27 +++-
 .../scheduler/FileCountBinPackerTest.java     | 104 --------------
 .../scheduler/SchedulerRunnerTest.java        |  39 +++---
 .../FirstFitDecreasingBinPackerTest.java      | 131 ++++++++++++++++++
 .../resources/application-test.properties     |   4 +-
 16 files changed, 442 insertions(+), 343 deletions(-)
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/Bin.java
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPacker.java
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPacker.java
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulingCandidate.java
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
 delete mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPackerTest.java
 create mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
diff --git a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
index d83db7524..e17ecd0fc 100644
--- a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
+++ b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
@@ -1,6 +1,7 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import java.util.Map;
 import lombok.extern.slf4j.Slf4j;
 import org.springframework.beans.factory.annotation.Autowired;
diff --git a/apps/optimizer/schedulerapp/src/main/resources/application.properties b/apps/optimizer/schedulerapp/src/main/resources/application.properties
index 5184cf1bc..abb4b8d88 100644
--- a/apps/optimizer/schedulerapp/src/main/resources/application.properties
+++ b/apps/optimizer/schedulerapp/src/main/resources/application.properties
@@ -6,6 +6,9 @@ spring.datasource.username=${OPTIMIZER_DB_USER:sa}
 spring.datasource.password=${OPTIMIZER_DB_PASSWORD:}
 spring.jpa.hibernate.ddl-auto=none
 optimizer.scheduler.jobs.base-uri=${JOBS_BASE_URI:http://localhost:8002}
-optimizer.scheduler.ofd.max-files-per-bin=${SCHEDULER_OFD_MAX_FILES_PER_BIN:1000000}
+# Per-bin caps for ORPHAN_FILES_DELETION. 0 disables the dimension; see FirstFitDecreasingBinPacker.
+optimizer.scheduler.ofd.max-weight-per-bin=${SCHEDULER_OFD_MAX_WEIGHT_PER_BIN:1000000}
+optimizer.scheduler.ofd.max-size-bytes-per-bin=${SCHEDULER_OFD_MAX_SIZE_BYTES_PER_BIN:5497558138880}
+optimizer.scheduler.ofd.max-items-per-bin=${SCHEDULER_OFD_MAX_ITEMS_PER_BIN:50}
 optimizer.scheduler.results-endpoint=${SCHEDULER_RESULTS_ENDPOINT:http://openhouse-optimizer:8080/v1/optimizer/operations}
 optimizer.scheduler.cluster-id=${SCHEDULER_CLUSTER_ID:LocalHadoopCluster}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/Bin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/Bin.java
deleted file mode 100644
index 082a3bbd7..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/Bin.java
+++ /dev/null
@@ -1,61 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler;
-
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.time.Instant;
-import java.util.Collection;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Optional;
-import java.util.Set;
-import java.util.stream.Collectors;
-import lombok.Getter;
-import lombok.RequiredArgsConstructor;
-
-/**
- * A set of operations the scheduler will submit together as a single Spark job. A bin owns its own
- * launch — callers ask it to schedule itself and react to the returned job id. The surrounding
- * status-update machinery (claim, mark-scheduled, revert-to-pending) lives in the scheduler because
- * it is shared across all bins regardless of operation type.
- */
-@RequiredArgsConstructor
-public class Bin {
-
-  @Getter private final OperationTypeDto operationType;
-  @Getter private final List<TableOperationDto> operations;
-
-  /** Operation UUIDs in this bin, parallel to {@link #getTableNames()}. */
-  public List<String> getOperationIds() {
-    return operations.stream().map(TableOperationDto::getId).collect(Collectors.toList());
-  }
-
-  /** Fully-qualified {@code database.table} identifiers for the operations in this bin. */
-  public List<String> getTableNames() {
-    return operations.stream()
-        .map(op -> op.getDatabaseName() + "." + op.getTableName())
-        .collect(Collectors.toList());
-  }
-
-  /**
-   * Return a new {@link Bin} containing only the operations whose IDs are in {@code keepIds}. Used
-   * by the scheduler to narrow the bin to the rows it actually claimed before launching the job.
-   */
-  public Bin subset(Collection<String> keepIds) {
-    Set<String> keep = new HashSet<>(keepIds);
-    List<TableOperationDto> filtered =
-        operations.stream().filter(op -> keep.contains(op.getId())).collect(Collectors.toList());
-    return new Bin(operationType, filtered);
-  }
-
-  /**
-   * Submit this bin as a single Spark job. Returns the job id on success, or empty on submission
-   * failure — the caller is responsible for the surrounding status updates.
-   */
-  public Optional<String> schedule(JobsServiceClient client, String resultsEndpoint) {
-    String jobName =
-        "batched-" + operationType.name().toLowerCase() + "-" + Instant.now().toEpochMilli();
-    return client.launch(
-        jobName, operationType.name(), getTableNames(), getOperationIds(), resultsEndpoint);
-  }
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPacker.java
deleted file mode 100644
index 509c37b75..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPacker.java
+++ /dev/null
@@ -1,24 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler;
-
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import java.util.List;
-
-/**
- * Strategy for packing a set of operations into bins for batched job submission. Implementations
- * encode the constraints of a particular packing dimension (file count, partition count, etc.);
- * binding to an operation type is the responsibility of the scheduler configuration, not the
- * strategy class.
- *
- * <p>{@link TableStatsDto} is the cost source at the interface boundary, carried alongside each
- * operation in a {@link SchedulingCandidate}. Implementations project the stats down to the minimal
- * data needed to make their packing decision (e.g. file count for OFD) and do not retain the full
- * stats payload in the returned bins.
- */
-public interface BinPacker {
-
-  /**
-   * Pack {@code pending} into one or more {@link Bin}s. Each returned bin is non-empty; the
-   * scheduler dispatches one Spark job per bin.
-   */
-  List<Bin> pack(List<SchedulingCandidate> pending);
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPacker.java
deleted file mode 100644
index b62e1bf9b..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPacker.java
+++ /dev/null
@@ -1,84 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler;
-
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import java.util.ArrayList;
-import java.util.Comparator;
-import java.util.List;
-import java.util.Map;
-import java.util.OptionalInt;
-import java.util.stream.Collectors;
-import java.util.stream.IntStream;
-import lombok.RequiredArgsConstructor;
-
-/**
- * Greedy first-fit-descending bin-packer keyed on per-table file count, projected from each
- * candidate's {@link TableStatsDto}.
- *
- * <p>Candidates are sorted by descending file count, then assigned to the first bin whose running
- * total stays at or below {@code maxFilesPerBin}. An operation larger than the limit gets its own
- * bin (oversized bins are allowed — we never drop an operation).
- */
-@RequiredArgsConstructor
-public class FileCountBinPacker implements BinPacker {
-
-  private final OperationTypeDto operationType;
-  private final long maxFilesPerBin;
-
-  @Override
-  public List<Bin> pack(List<SchedulingCandidate> pending) {
-    if (pending.isEmpty()) {
-      return List.of();
-    }
-
-    // Project once: each candidate's packing cost is just a long, keyed by operation id.
-    Map<String, Long> costByOperationId =
-        pending.stream()
-            .collect(Collectors.toMap(c -> c.getOperation().getId(), c -> cost(c.getStats())));
-
-    List<TableOperationDto> sorted =
-        pending.stream()
-            .map(SchedulingCandidate::getOperation)
-            .sorted(
-                Comparator.comparingLong(
-                        (TableOperationDto op) -> costByOperationId.get(op.getId()))
-                    .reversed())
-            .collect(Collectors.toList());
-
-    // First-fit-descending is inherently stateful — each placement depends on the running totals
-    // for bins assembled so far.
-    List<List<TableOperationDto>> binContents = new ArrayList<>();
-    List<Long> binTotals = new ArrayList<>();
-    sorted.forEach(
-        op -> {
-          long c = costByOperationId.get(op.getId());
-          OptionalInt placed =
-              IntStream.range(0, binContents.size())
-                  .filter(i -> binTotals.get(i) + c <= maxFilesPerBin || binTotals.get(i) == 0)
-                  .findFirst();
-          if (placed.isPresent()) {
-            int idx = placed.getAsInt();
-            binContents.get(idx).add(op);
-            binTotals.set(idx, binTotals.get(idx) + c);
-          } else {
-            List<TableOperationDto> newBin = new ArrayList<>();
-            newBin.add(op);
-            binContents.add(newBin);
-            binTotals.add(c);
-          }
-        });
-
-    return binContents.stream()
-        .map(ops -> new Bin(operationType, ops))
-        .collect(Collectors.toList());
-  }
-
-  private static long cost(TableStatsDto stats) {
-    if (stats == null || stats.getSnapshot() == null) {
-      return 0L;
-    }
-    Long n = stats.getSnapshot().getNumCurrentFiles();
-    return n != null ? n : 0L;
-  }
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index 7b4f7594b..e8a5e3f6f 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -8,9 +8,13 @@
 import com.linkedin.openhouse.optimizer.model.TableStatsDto;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import java.time.Instant;
 import java.util.Comparator;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
@@ -23,10 +27,16 @@
 import org.springframework.transaction.annotation.Transactional;
 
 /**
- * For one operation type per call, reads PENDING rows, looks up per-table stats, dispatches to the
- * registered {@link BinPacker}, and submits one Spark job per returned {@link Bin}. The {@link
- * com.linkedin.openhouse.optimizer.scheduler.SchedulerApplication}'s CommandLineRunner loops over
- * the registered packers and invokes {@code schedule(opType)} for each.
+ * For one operation type per call, reads PENDING rows, looks up per-table stats, projects each into
+ * a {@link BinItem}, dispatches to the registered {@link BinPacker}, and submits one Spark job per
+ * returned {@link Bin}. The {@link com.linkedin.openhouse.optimizer.scheduler.SchedulerApplication}
+ * 's CommandLineRunner loops over the registered packers and invokes {@code schedule(opType)} for
+ * each.
+ *
+ * <p>The runner owns all optimizer-specific orchestration — claim CAS, status transitions, and the
+ * actual {@link JobsServiceClient#launch} call. The bin packer is a pure utility over a flat list
+ * of {@link BinItem}s, deliberately decoupled from operation types and JPA rows so the same packer
+ * can be shared with the existing {@code JobsScheduler} flow.
  */
 @Slf4j
 @Component
@@ -110,8 +120,8 @@ public void schedule(
         statsRepo.findAllById(uuids).stream()
             .collect(Collectors.toMap(TableStatsRow::getTableUuid, TableStatsDto::fromRow));
 
-    // Filter at the boundary so SchedulingCandidate.stats is guaranteed non-null. A table without
-    // a stats row gets skipped this cycle and reconsidered after stats land.
+    // Filter at the boundary so every BinItem is built from a known-non-null stats row. A table
+    // without a stats row gets skipped this cycle and reconsidered after stats land.
     List<TableOperationDto> withStats =
         pending.stream()
             .filter(op -> statsByUuid.containsKey(op.getTableUuid()))
@@ -126,19 +136,45 @@ public void schedule(
       return;
     }
 
-    List<SchedulingCandidate> candidates =
+    List<BinItem> items =
         withStats.stream()
-            .map(op -> new SchedulingCandidate(op, statsByUuid.get(op.getTableUuid())))
+            .map(op -> toBinItem(op, statsByUuid.get(op.getTableUuid())))
             .collect(Collectors.toList());
 
-    List<Bin> bins = packer.pack(candidates);
+    List<Bin> bins = packer.pack(items);
     log.info(
-        "Packed {} PENDING {} operations into {} bins",
-        candidates.size(),
-        operationType,
-        bins.size());
+        "Packed {} PENDING {} operations into {} bins", items.size(), operationType, bins.size());
 
-    bins.forEach(this::submitBin);
+    bins.forEach(bin -> submitBin(operationType, bin));
+  }
+
+  /**
+   * Project an (operation, stats) pair into the packer's input row. Weight is current file count
+   * (the packing dimension OFD cares about); sizeBytes is the on-disk footprint when stats expose
+   * it, else 0.
+   */
+  private static BinItem toBinItem(TableOperationDto op, TableStatsDto stats) {
+    long weight = 0L;
+    long sizeBytes = 0L;
+    if (stats != null && stats.getSnapshot() != null) {
+      Long files = stats.getSnapshot().getNumCurrentFiles();
+      if (files != null) {
+        weight = files;
+      }
+      Long bytes = stats.getSnapshot().getTableSizeBytes();
+      if (bytes != null) {
+        sizeBytes = bytes;
+      }
+    }
+    return BinItem.builder()
+        .fqtn(op.getDatabaseName() + "." + op.getTableName())
+        .operationId(op.getId())
+        .tableUuid(op.getTableUuid())
+        .databaseName(op.getDatabaseName())
+        .tableName(op.getTableName())
+        .weight(weight)
+        .sizeBytes(sizeBytes)
+        .build();
   }
 
   /**
@@ -175,13 +211,18 @@ private List<TableOperationsRow> cancelDuplicates(List<TableOperationsRow> pendi
         .collect(Collectors.toList());
   }
 
-  private void submitBin(Bin bin) {
-    List<String> ids = bin.getOperationIds();
+  /**
+   * Claim the bin, narrow to the rows actually claimed, launch the batched Spark job for the
+   * claimed subset, and mark them SCHEDULED — or revert to PENDING if launch failed.
+   */
+  private void submitBin(OperationTypeDto operationType, Bin bin) {
+    List<String> ids =
+        bin.items().stream().map(BinItem::getOperationId).collect(Collectors.toList());
 
-    // Claim the rows in one batched UPDATE: PENDING → SCHEDULING. The UPDATE's row count is just
-    // an aggregate — to know *which* rows we own, re-query for SCHEDULING rows tagged with our
-    // scheduledAt watermark. Anything not in that subset belongs to another instance or was
-    // canceled, and must not be submitted or marked SCHEDULED.
+    // Claim in one batched UPDATE: PENDING → SCHEDULING. Aggregate row count alone doesn't tell us
+    // *which* rows we own — re-query for SCHEDULING rows tagged with our scheduledAt watermark.
+    // Anything not in that subset belongs to another instance or was canceled, and must not be
+    // submitted or marked SCHEDULED.
     Instant claimedAt = Instant.now();
     operationsRepo.updateBatch(
         ids,
@@ -189,8 +230,7 @@ private void submitBin(Bin bin) {
         OperationStatus.SCHEDULING,
         Optional.of(claimedAt),
         Optional.empty());
-    // Unpaged: the result set is already bounded by ids.size() (the bin we just claimed); no
-    // need to cap it further.
+    // Unpaged: the result set is bounded by ids.size() (the bin we just claimed).
     List<String> claimedIds =
         operationsRepo
             .find(
@@ -216,8 +256,22 @@ private void submitBin(Bin bin) {
           ids.size());
     }
 
-    Bin claimedBin = bin.subset(claimedIds);
-    Optional<String> jobId = claimedBin.schedule(jobsClient, resultsEndpoint);
+    // Narrow the bin's items to the rows we actually own before extracting Spark-args.
+    Set<String> claimedSet = new HashSet<>(claimedIds);
+    List<BinItem> claimedItems =
+        bin.items().stream()
+            .filter(item -> claimedSet.contains(item.getOperationId()))
+            .collect(Collectors.toList());
+    List<String> tableNames =
+        claimedItems.stream().map(BinItem::getFqtn).collect(Collectors.toList());
+    List<String> operationIds =
+        claimedItems.stream().map(BinItem::getOperationId).collect(Collectors.toList());
+
+    String jobName =
+        "batched-" + operationType.name().toLowerCase() + "-" + claimedAt.toEpochMilli();
+    Optional<String> jobId =
+        jobsClient.launch(jobName, operationType.name(), tableNames, operationIds, resultsEndpoint);
+
     if (jobId.isPresent()) {
       int updated =
           operationsRepo.updateBatch(
@@ -229,7 +283,7 @@ private void submitBin(Bin bin) {
       log.info(
           "Submitted job {} for {} tables ({} rows marked SCHEDULED)",
           jobId.get(),
-          claimedBin.getOperations().size(),
+          claimedItems.size(),
           updated);
     } else {
       int reverted =
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulingCandidate.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulingCandidate.java
deleted file mode 100644
index b031ae6b7..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulingCandidate.java
+++ /dev/null
@@ -1,19 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler;
-
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import lombok.NonNull;
-import lombok.Value;
-
-/**
- * A pending operation paired with the stats the bin packer will use as its cost source. Built by
- * the scheduler at scheduling time and handed to the {@link BinPacker} as the unit of packing.
- *
- * <p>Both fields are non-null. The scheduler filters out operations whose tables have no stats row
- * before constructing candidates.
- */
-@Value
-public class SchedulingCandidate {
-  @NonNull TableOperationDto operation;
-  @NonNull TableStatsDto stats;
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
new file mode 100644
index 000000000..4b94ebb4b
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
@@ -0,0 +1,53 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import lombok.Getter;
+import lombok.ToString;
+
+/**
+ * Mutable accumulator used by a {@link BinPacker} while assembling a batch. Callers receiving a
+ * packed list of {@code Bin}s treat them as read-only — {@link #items()} returns an unmodifiable
+ * view, and the running totals are exposed only via getters.
+ *
+ * <p>Structurally identical to {@code jobs.util.binpack.Bin} introduced by PR&nbsp;#599; see the
+ * note on {@link BinItem} for the swap-out plan.
+ */
+@ToString
+public class Bin {
+  private final List<BinItem> items = new ArrayList<>();
+  @Getter private long totalWeight;
+  @Getter private long totalSizeBytes;
+
+  /**
+   * Returns true iff adding {@code item} would keep this bin at or below all three caps. A cap of
+   * {@code <= 0} disables that dimension.
+   */
+  boolean fits(BinItem item, long maxWeight, long maxSizeBytes, int maxItems) {
+    if (maxItems > 0 && items.size() >= maxItems) {
+      return false;
+    }
+    if (maxWeight > 0 && totalWeight + item.getWeight() > maxWeight) {
+      return false;
+    }
+    if (maxSizeBytes > 0 && totalSizeBytes + item.getSizeBytes() > maxSizeBytes) {
+      return false;
+    }
+    return true;
+  }
+
+  void add(BinItem item) {
+    items.add(item);
+    totalWeight += item.getWeight();
+    totalSizeBytes += item.getSizeBytes();
+  }
+
+  public List<BinItem> items() {
+    return Collections.unmodifiableList(items);
+  }
+
+  public int size() {
+    return items.size();
+  }
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
new file mode 100644
index 000000000..01d1d154d
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
@@ -0,0 +1,42 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import lombok.Builder;
+import lombok.Getter;
+import lombok.NonNull;
+import lombok.ToString;
+
+/**
+ * A single packable unit for a {@link BinPacker}. Carries enough identity for downstream consumers
+ * (the optimizer scheduler dispatching Spark, the existing JobsScheduler, an offline analyzer) to
+ * resolve the underlying table and report results without re-reading optimizer state.
+ *
+ * <p>{@link #weight} is the primary bin-packing dimension (for orphan files deletion: the number of
+ * current files in the table). {@link #sizeBytes} is a secondary capacity dimension so a packer can
+ * cap the on-disk footprint of a bin independently of file count.
+ *
+ * <p>This type is structurally identical to {@code jobs.util.binpack.BinItem} introduced by
+ * PR&nbsp;#599. When that PR merges, this class becomes a redundant copy and we should switch the
+ * scheduler to import the common one.
+ */
+@Getter
+@Builder
+@ToString
+public class BinItem {
+  /** Fully-qualified {@code database.table} identifier the batched Spark app will load. */
+  @NonNull private final String fqtn;
+
+  /** Optimizer operation id; the Spark app POSTs its outcome back keyed on this. */
+  @NonNull private final String operationId;
+
+  /** Stable table identity for stats lookup and history correlation. */
+  @NonNull private final String tableUuid;
+
+  @NonNull private final String databaseName;
+  @NonNull private final String tableName;
+
+  /** Primary packing cost — for OFD this is the table's current file count. */
+  private final long weight;
+
+  /** Secondary packing cost — on-disk size in bytes. {@code 0} when unknown. */
+  private final long sizeBytes;
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
new file mode 100644
index 000000000..d32193c9d
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
@@ -0,0 +1,17 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import java.util.List;
+
+/**
+ * Strategy interface for grouping a flat list of {@link BinItem}s into one or more {@link Bin}s.
+ * Implementations encode the per-bin caps (file count, byte size, item count, etc.) and the
+ * placement algorithm; callers iterate the returned bins and dispatch one batch per bin.
+ *
+ * <p>The interface does not reference any optimizer-specific types (operations, statuses,
+ * repositories). Adapter code in the scheduler maps its domain objects into {@code BinItem}s before
+ * calling and maps results back to operation ids after.
+ */
+public interface BinPacker {
+  /** Pack {@code items} into one or more bins. Each returned bin is non-empty. */
+  List<Bin> pack(List<BinItem> items);
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
new file mode 100644
index 000000000..04ae33c21
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
@@ -0,0 +1,70 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;
+import lombok.Builder;
+import lombok.extern.slf4j.Slf4j;
+
+/**
+ * First-fit-decreasing bin packer with three independent caps:
+ *
+ * <ul>
+ *   <li>{@code maxWeightPerBin} — total {@link BinItem#getWeight()} (for OFD: file count)
+ *   <li>{@code maxSizeBytesPerBin} — total on-disk size of all items in the bin
+ *   <li>{@code maxItemsPerBin} — number of items per bin
+ * </ul>
+ *
+ * <p>Pass {@code 0} or a negative value for any cap to disable that dimension.
+ *
+ * <p>An item that exceeds any single cap on its own is placed into a bin by itself rather than
+ * dropped — the scheduler never silently skips maintenance work for an oversized table.
+ *
+ * <p>Structurally mirrors {@code jobs.util.binpack.FirstFitDecreasingBinPacker} from PR&nbsp;#599.
+ */
+@Slf4j
+@Builder
+public class FirstFitDecreasingBinPacker implements BinPacker {
+
+  @Builder.Default private final long maxWeightPerBin = 1_000_000L;
+  @Builder.Default private final long maxSizeBytesPerBin = 5L * 1024L * 1024L * 1024L * 1024L;
+  @Builder.Default private final int maxItemsPerBin = 50;
+
+  @Override
+  public List<Bin> pack(List<BinItem> items) {
+    if (items == null || items.isEmpty()) {
+      return new ArrayList<>();
+    }
+
+    List<BinItem> sorted =
+        items.stream()
+            .sorted(Comparator.comparingLong(BinItem::getWeight).reversed())
+            .collect(Collectors.toList());
+
+    List<Bin> bins = new ArrayList<>();
+    for (BinItem item : sorted) {
+      Bin target = null;
+      for (Bin bin : bins) {
+        if (bin.fits(item, maxWeightPerBin, maxSizeBytesPerBin, maxItemsPerBin)) {
+          target = bin;
+          break;
+        }
+      }
+      if (target == null) {
+        target = new Bin();
+        bins.add(target);
+        if (!target.fits(item, maxWeightPerBin, maxSizeBytesPerBin, maxItemsPerBin)) {
+          log.warn(
+              "Item exceeds per-bin caps on its own; placing in dedicated bin: fqtn={} weight={} sizeBytes={}",
+              item.getFqtn(),
+              item.getWeight(),
+              item.getSizeBytes());
+        }
+      }
+      target.add(item);
+    }
+    log.info("Packed {} items into {} bins", items.size(), bins.size());
+    return bins;
+  }
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index 796e707f4..f39734f34 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -1,8 +1,8 @@
 package com.linkedin.openhouse.optimizer.scheduler.config;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.scheduler.BinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.FileCountBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import java.util.Map;
 import org.springframework.beans.factory.annotation.Value;
@@ -19,8 +19,17 @@ public class SchedulerConfig {
   @Value("${optimizer.scheduler.cluster-id}")
   private String clusterId;
 
-  @Value("${optimizer.scheduler.ofd.max-files-per-bin}")
-  private long ofdMaxFilesPerBin;
+  /** OFD bin packer: max files per bin (primary cost dimension). 0 disables. */
+  @Value("${optimizer.scheduler.ofd.max-weight-per-bin:1000000}")
+  private long ofdMaxWeightPerBin;
+
+  /** OFD bin packer: max on-disk size per bin in bytes. 0 disables. */
+  @Value("${optimizer.scheduler.ofd.max-size-bytes-per-bin:5497558138880}")
+  private long ofdMaxSizeBytesPerBin;
+
+  /** OFD bin packer: max tables per bin. 0 disables. */
+  @Value("${optimizer.scheduler.ofd.max-items-per-bin:50}")
+  private int ofdMaxItemsPerBin;
 
   @Bean
   public WebClient jobsWebClient() {
@@ -34,13 +43,17 @@ public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
 
   /**
    * Map of {@link OperationTypeDto} to the {@link BinPacker} strategy that handles it. Adding a new
-   * operation type means adding an entry here and configuring its packer; the strategy class itself
-   * stays generic.
+   * operation type means adding an entry here and configuring its packer caps; the packer itself
+   * stays generic over {@link com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem}.
    */
   @Bean
   public Map<OperationTypeDto, BinPacker> binPackers() {
     return Map.of(
         OperationTypeDto.ORPHAN_FILES_DELETION,
-        new FileCountBinPacker(OperationTypeDto.ORPHAN_FILES_DELETION, ofdMaxFilesPerBin));
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(ofdMaxWeightPerBin)
+            .maxSizeBytesPerBin(ofdMaxSizeBytesPerBin)
+            .maxItemsPerBin(ofdMaxItemsPerBin)
+            .build());
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPackerTest.java
deleted file mode 100644
index dc3b96b5c..000000000
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/FileCountBinPackerTest.java
+++ /dev/null
@@ -1,104 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler;
-
-import static org.assertj.core.api.Assertions.assertThat;
-
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import java.util.List;
-import java.util.UUID;
-import java.util.stream.Collectors;
-import org.junit.jupiter.api.Test;
-
-class FileCountBinPackerTest {
-
-  private static final long MAX = 1_000_000L;
-  private final FileCountBinPacker packer =
-      new FileCountBinPacker(OperationTypeDto.ORPHAN_FILES_DELETION, MAX);
-
-  private static TableOperationDto op(String uuid) {
-    return TableOperationDto.builder()
-        .id(UUID.randomUUID().toString())
-        .tableUuid(uuid)
-        .databaseName("db")
-        .tableName("tbl_" + uuid)
-        .operationType(OperationTypeDto.ORPHAN_FILES_DELETION)
-        .build();
-  }
-
-  private static TableStatsDto stats(Long fileCount) {
-    return TableStatsDto.builder()
-        .snapshot(TableStatsDto.SnapshotMetrics.builder().numCurrentFiles(fileCount).build())
-        .build();
-  }
-
-  private static SchedulingCandidate candidate(String uuid, Long fileCount) {
-    return new SchedulingCandidate(op(uuid), stats(fileCount));
-  }
-
-  @Test
-  void emptyInput_returnsEmptyBins() {
-    assertThat(packer.pack(List.of())).isEmpty();
-  }
-
-  @Test
-  void singleTable_oneBin() {
-    SchedulingCandidate c = candidate("uuid-1", 100L);
-    List<Bin> bins = packer.pack(List.of(c));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).getOperations()).containsExactly(c.getOperation());
-  }
-
-  @Test
-  void tablesUnderLimit_oneBin() {
-    List<Bin> bins =
-        packer.pack(
-            List.of(candidate("a", 300_000L), candidate("b", 300_000L), candidate("c", 300_000L)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).getOperations()).hasSize(3);
-  }
-
-  @Test
-  void tablesOverLimit_twoBins() {
-    List<Bin> bins =
-        packer.pack(
-            List.of(candidate("a", 600_000L), candidate("b", 600_000L), candidate("c", 400_000L)));
-    assertThat(bins).hasSize(2);
-    assertThat(bins.get(0).getOperations()).hasSize(2); // 600k + 400k
-    assertThat(bins.get(1).getOperations()).hasSize(1); // 600k alone
-  }
-
-  @Test
-  void largeTableAlone_exceedsLimitSingleBin() {
-    SchedulingCandidate big = candidate("big", 5_000_000L);
-    List<Bin> bins = packer.pack(List.of(big));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).getOperations()).containsExactly(big.getOperation());
-  }
-
-  @Test
-  void nullFileCount_treatedAsZero() {
-    List<Bin> bins = packer.pack(List.of(candidate("x", null), candidate("y", null)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).getOperations()).hasSize(2);
-  }
-
-  @Test
-  void sortedDescending_largestFirst() {
-    SchedulingCandidate small = candidate("small", 100L);
-    SchedulingCandidate large = candidate("large", 900_000L);
-    List<Bin> bins = packer.pack(List.of(small, large));
-    assertThat(bins).hasSize(1);
-    List<String> ordered =
-        bins.get(0).getOperations().stream()
-            .map(TableOperationDto::getTableUuid)
-            .collect(Collectors.toList());
-    assertThat(ordered).containsExactly("large", "small");
-  }
-
-  @Test
-  void binCarriesOperationType() {
-    List<Bin> bins = packer.pack(List.of(candidate("u", 1L)));
-    assertThat(bins.get(0).getOperationType()).isEqualTo(OperationTypeDto.ORPHAN_FILES_DELETION);
-  }
-}
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index aa4abce8f..4835273c6 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -17,13 +17,15 @@
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import java.time.Instant;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
 import java.util.UUID;
-import java.util.stream.Collectors;
 import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.junit.jupiter.api.extension.ExtendWith;
@@ -84,17 +86,20 @@ private void stubFindClaimed(List<TableOperationsRow> rows) {
         .thenReturn(rows);
   }
 
-  /** Stubs the bin packer to return one bin containing every candidate. */
-  private void stubOneBinForAllCandidates() {
+  /**
+   * Stubs the bin packer to put every input item into a single bin, by routing through a real FFD
+   * packer with unbounded caps. Lets the test exercise the runner's projection (op → BinItem)
+   * without bypassing Bin's package-private mutators.
+   */
+  private void stubOneBinForAllItems() {
+    FirstFitDecreasingBinPacker realPacker =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(0L)
+            .maxSizeBytesPerBin(0L)
+            .maxItemsPerBin(0)
+            .build();
     when(binPacker.pack(anyList()))
-        .thenAnswer(
-            inv ->
-                List.of(
-                    new Bin(
-                        OFD,
-                        inv.<List<SchedulingCandidate>>getArgument(0).stream()
-                            .map(SchedulingCandidate::getOperation)
-                            .collect(Collectors.toList()))));
+        .thenAnswer(inv -> realPacker.pack(inv.<List<BinItem>>getArgument(0)));
   }
 
   private TableOperationsRow pendingRow(String uuid, String db, String table) {
@@ -152,7 +157,7 @@ void schedule_singleBin_claimsAndMarksScheduled() {
 
     stubFindPending(List.of(row));
     when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100_000L)));
-    stubOneBinForAllCandidates();
+    stubOneBinForAllItems();
     when(operationsRepo.updateBatch(
             anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
         .thenReturn(1);
@@ -189,7 +194,7 @@ void schedule_jobLaunchFails_marksPendingForRetry() {
 
     stubFindPending(List.of(row));
     when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-    stubOneBinForAllCandidates();
+    stubOneBinForAllItems();
     when(operationsRepo.updateBatch(
             anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
         .thenReturn(1);
@@ -221,7 +226,7 @@ void schedule_rowsAlreadyClaimed_skipsSubmit() {
 
     stubFindPending(List.of(row));
     when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-    stubOneBinForAllCandidates();
+    stubOneBinForAllItems();
     when(operationsRepo.updateBatch(
             anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
         .thenReturn(0);
@@ -247,7 +252,7 @@ void schedule_cancelsDuplicatePendingPerCycle() {
     stubFindPending(List.of(row1, row2));
     when(operationsRepo.cancel(anyList())).thenReturn(1);
     when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-    stubOneBinForAllCandidates();
+    stubOneBinForAllItems();
     when(operationsRepo.updateBatch(
             anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
         .thenReturn(1);
@@ -281,7 +286,7 @@ void schedule_partialClaim_launchesAndMarksOnlyClaimedSubset() {
     stubFindPending(List.of(rowA, rowB));
     when(statsRepo.findAllById(any()))
         .thenReturn(List.of(statsRow(uuidA, 100L), statsRow(uuidB, 100L)));
-    stubOneBinForAllCandidates();
+    stubOneBinForAllItems();
     when(operationsRepo.updateBatch(
             anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
         .thenReturn(1);
@@ -325,7 +330,7 @@ void schedule_opsWithoutStats_skipped() {
 
     stubFindPending(List.of(withStatsRow, missingRow));
     when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(withStats, 50L)));
-    stubOneBinForAllCandidates();
+    stubOneBinForAllItems();
     when(operationsRepo.updateBatch(
             anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
         .thenReturn(1);
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
new file mode 100644
index 000000000..1c18eb63d
--- /dev/null
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
@@ -0,0 +1,131 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import java.util.List;
+import java.util.stream.Collectors;
+import org.junit.jupiter.api.Test;
+
+class FirstFitDecreasingBinPackerTest {
+
+  private static BinItem item(String id, long weight) {
+    return item(id, weight, 0L);
+  }
+
+  private static BinItem item(String id, long weight, long sizeBytes) {
+    return BinItem.builder()
+        .fqtn("db.tbl_" + id)
+        .operationId("op-" + id)
+        .tableUuid("uuid-" + id)
+        .databaseName("db")
+        .tableName("tbl_" + id)
+        .weight(weight)
+        .sizeBytes(sizeBytes)
+        .build();
+  }
+
+  @Test
+  void emptyInput_returnsEmptyBins() {
+    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
+    assertThat(packer.pack(List.of())).isEmpty();
+  }
+
+  @Test
+  void singleItem_oneBin() {
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin> bins = packer.pack(List.of(item("a", 100L)));
+    assertThat(bins).hasSize(1);
+    assertThat(bins.get(0).size()).isEqualTo(1);
+  }
+
+  @Test
+  void underWeightLimit_oneBin() {
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin> bins =
+        packer.pack(List.of(item("a", 300_000L), item("b", 300_000L), item("c", 300_000L)));
+    assertThat(bins).hasSize(1);
+    assertThat(bins.get(0).size()).isEqualTo(3);
+    assertThat(bins.get(0).getTotalWeight()).isEqualTo(900_000L);
+  }
+
+  @Test
+  void overWeightLimit_twoBins() {
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin> bins =
+        packer.pack(List.of(item("a", 600_000L), item("b", 600_000L), item("c", 400_000L)));
+    assertThat(bins).hasSize(2);
+    // FFD: largest first, place 600k → bin0; next 600k doesn't fit bin0, → bin1; 400k fits bin0.
+    assertThat(bins.get(0).getTotalWeight()).isEqualTo(1_000_000L);
+    assertThat(bins.get(1).getTotalWeight()).isEqualTo(600_000L);
+  }
+
+  @Test
+  void itemLargerThanCap_getsOwnBin() {
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000L).build();
+    List<Bin> bins = packer.pack(List.of(item("big", 5_000L)));
+    assertThat(bins).hasSize(1);
+    assertThat(bins.get(0).size()).isEqualTo(1);
+  }
+
+  @Test
+  void sortedDescending_largestFirst() {
+    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
+    List<Bin> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
+    assertThat(bins).hasSize(1);
+    List<String> uuids =
+        bins.get(0).items().stream().map(BinItem::getTableUuid).collect(Collectors.toList());
+    assertThat(uuids).containsExactly("uuid-large", "uuid-small");
+  }
+
+  @Test
+  void sizeBytesCap_splitsBins() {
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(0L) // disable
+            .maxSizeBytesPerBin(1_000L)
+            .maxItemsPerBin(0)
+            .build();
+    List<Bin> bins =
+        packer.pack(List.of(item("a", 0L, 600L), item("b", 0L, 500L), item("c", 0L, 400L)));
+    assertThat(bins).hasSize(2);
+    assertThat(bins.get(0).getTotalSizeBytes()).isEqualTo(1_000L); // 600 + 400
+    assertThat(bins.get(1).getTotalSizeBytes()).isEqualTo(500L);
+  }
+
+  @Test
+  void maxItemsCap_splitsBins() {
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(0L)
+            .maxSizeBytesPerBin(0L)
+            .maxItemsPerBin(2)
+            .build();
+    List<Bin> bins =
+        packer.pack(List.of(item("a", 1L), item("b", 1L), item("c", 1L), item("d", 1L)));
+    assertThat(bins).hasSize(2);
+    assertThat(bins.get(0).size()).isEqualTo(2);
+    assertThat(bins.get(1).size()).isEqualTo(2);
+  }
+
+  @Test
+  void zeroCap_disablesDimension() {
+    // All caps zero → everything in one bin regardless of weight/size.
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(0L)
+            .maxSizeBytesPerBin(0L)
+            .maxItemsPerBin(0)
+            .build();
+    List<Bin> bins =
+        packer.pack(
+            List.of(
+                item("a", Long.MAX_VALUE / 4, Long.MAX_VALUE / 4),
+                item("b", Long.MAX_VALUE / 4, Long.MAX_VALUE / 4)));
+    assertThat(bins).hasSize(1);
+    assertThat(bins.get(0).size()).isEqualTo(2);
+  }
+}
diff --git a/services/optimizer/scheduler/src/test/resources/application-test.properties b/services/optimizer/scheduler/src/test/resources/application-test.properties
index b0609fa34..db4e3136c 100644
--- a/services/optimizer/scheduler/src/test/resources/application-test.properties
+++ b/services/optimizer/scheduler/src/test/resources/application-test.properties
@@ -5,6 +5,8 @@ spring.jpa.hibernate.ddl-auto=none
 spring.sql.init.mode=always
 spring.sql.init.schema-locations=classpath:db/optimizer-schema.sql
 optimizer.scheduler.jobs.base-uri=http://localhost:9999
-optimizer.scheduler.ofd.max-files-per-bin=1000000
+optimizer.scheduler.ofd.max-weight-per-bin=1000000
+optimizer.scheduler.ofd.max-size-bytes-per-bin=5497558138880
+optimizer.scheduler.ofd.max-items-per-bin=50
 optimizer.scheduler.results-endpoint=http://localhost:8080/v1/optimizer/operations
 optimizer.scheduler.cluster-id=test-cluster

From 3191164ca979e46128f1012e5aa3a8c3a76f2db2 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Mon, 1 Jun 2026 16:46:19 -0700
Subject: [PATCH 02/13] refactor(scheduler): BinItem as interface + generic
 packer, OfdBinItem self-weights, functional FFD
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Addresses the four review comments on the prior bin-refactor commit:

1. BinItem becomes an interface (`long getWeight()`). Each operation type
   ships its own impl that encodes its own cost model. The packer never
   imports operation DTOs; the impls do.

2. Bin/BinPacker/FirstFitDecreasingBinPacker are generic over T extends
   BinItem. Heterogeneous packers register in a
   `Map<OperationTypeDto, BinPacker<? extends BinItem>>` and the
   scheduler narrows the wildcard with one cast at the per-op-type
   dispatch boundary. Compile-time `T`-consistency end-to-end through
   the packer pipeline.

3. New `operations/ofd/OfdBinItem` (package parallel to scheduler) holds
   only what the dispatch needs: fqtn, operationId, weight. The
   weighting logic — file count from `TableStatsDto.snapshot.
   numCurrentFiles` — lives in a private static `currentFileCount` on
   the impl, fed by a static factory `OfdBinItem.from(op, stats)` so
   callers do
   `withStats.stream().map(op -> OfdBinItem.from(op, statsByUuid.
   get(op.getTableUuid())))`.

4. FirstFitDecreasingBinPacker.pack() is one stream pipeline:
   `items.stream().sorted(...).collect(ArrayList::new, this::placeItem,
   List::addAll)`. The inner first-fit search is
   `bins.stream().filter(b -> b.fits(...)).findFirst().
   ifPresentOrElse(...)`. No imperative for-loops; the fold maintains
   the running list of bins as its accumulator. Compiler enforces
   T-consistency across the pipeline.

5. Dropped `maxSizeBytesPerBin` entirely. OFD cost is per-file (list +
   manifest joins + delete calls); bytes don't add information. A 10 GB
   table with 100k files is more expensive to OFD than a 1 TB table
   with 2k files. Bin/Packer now carry just `maxWeightPerBin` +
   `maxItemsPerBin`. Other op types encode their own dimension in
   `getWeight()`; the packer needn't know.

6. OFD config keys back to human-readable per-op vocabulary:
   `optimizer.scheduler.ofd.max-files-per-bin` (file count) +
   `optimizer.scheduler.ofd.max-tables-per-bin` (table count). Env vars
   `SCHEDULER_OFD_MAX_FILES_PER_BIN` + `SCHEDULER_OFD_MAX_TABLES_PER_BIN`.
   SchedulerConfig translates these into the packer's
   `maxWeightPerBin` + `maxItemsPerBin`.

7. Refactored SchedulerRunner:
   - `Map<OperationTypeDto, BinPacker<? extends BinItem>>` registration
   - Switch by operation type narrows to BinPacker<OfdBinItem> with one
     suppressed unchecked cast (safe by registration invariant; comment
     calls out the OperationScheduler<T> handler factoring once a
     second op type lands)
   - `scheduleOfd(...)` builds `OfdBinItem` via the factory and dispatches
   - `submitOfdBin(Bin<OfdBinItem>)` claims, narrows to claimed-only via
     OfdBinItem.getOperationId, launches, marks SCHEDULED/PENDING — same
     orchestration as before, but typed `Bin<OfdBinItem>` end-to-end

Tests:
- FirstFitDecreasingBinPackerTest uses a local `TestItem implements
  BinItem` (no optimizer-domain imports in the binpack test).
  Byte-cap test removed; max-items, max-weight, FFD order, oversized,
  and zero-cap-disables-dimension all preserved.
- SchedulerRunnerTest mocks `BinPacker<OfdBinItem>` and stubs through
  a real FFD packer with unbounded caps so the runner's projection
  (op + stats → OfdBinItem) is exercised without bypassing Bin's
  package-private mutators.

Divergence from #599: Abhishek's `jobs.util.binpack.BinItem` is a
concrete struct with optimizer-aware identity fields baked in. Ours is
a contract (`long getWeight()`) with per-op impls. The "swap to his
lib by import rename" gimmick no longer applies — instead this PR
proposes the interface-based shape as the common lib, and #599 would
rebase to adopt it (or at minimum offer an interface alongside his
concrete struct). Discussed in PR #626 thread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../scheduler/SchedulerApplication.java       |   6 +-
 .../src/main/resources/application.properties |   8 +-
 .../optimizer/operations/ofd/OfdBinItem.java  |  59 +++++++++
 .../optimizer/scheduler/SchedulerRunner.java  | 114 ++++++++----------
 .../optimizer/scheduler/binpack/Bin.java      |  24 ++--
 .../optimizer/scheduler/binpack/BinItem.java  |  45 ++-----
 .../scheduler/binpack/BinPacker.java          |  15 +--
 .../binpack/FirstFitDecreasingBinPacker.java  |  70 +++++------
 .../scheduler/config/SchedulerConfig.java     |  36 +++---
 .../scheduler/SchedulerRunnerTest.java        |  20 ++-
 .../FirstFitDecreasingBinPackerTest.java      |  98 ++++++---------
 .../resources/application-test.properties     |   5 +-
 12 files changed, 248 insertions(+), 252 deletions(-)
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java

diff --git a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
index e17ecd0fc..8bda62779 100644
--- a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
+++ b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
@@ -1,6 +1,7 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import java.util.Map;
 import lombok.extern.slf4j.Slf4j;
@@ -27,11 +28,12 @@
 public class SchedulerApplication implements CommandLineRunner, ExitCodeGenerator {
 
   private final SchedulerRunner runner;
-  private final Map<OperationTypeDto, BinPacker> binPackers;
+  private final Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers;
   private int exitCode = 0;
 
   @Autowired
-  public SchedulerApplication(SchedulerRunner runner, Map<OperationTypeDto, BinPacker> binPackers) {
+  public SchedulerApplication(
+      SchedulerRunner runner, Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers) {
     this.runner = runner;
     this.binPackers = binPackers;
   }
diff --git a/apps/optimizer/schedulerapp/src/main/resources/application.properties b/apps/optimizer/schedulerapp/src/main/resources/application.properties
index abb4b8d88..b43a66459 100644
--- a/apps/optimizer/schedulerapp/src/main/resources/application.properties
+++ b/apps/optimizer/schedulerapp/src/main/resources/application.properties
@@ -6,9 +6,9 @@ spring.datasource.username=${OPTIMIZER_DB_USER:sa}
 spring.datasource.password=${OPTIMIZER_DB_PASSWORD:}
 spring.jpa.hibernate.ddl-auto=none
 optimizer.scheduler.jobs.base-uri=${JOBS_BASE_URI:http://localhost:8002}
-# Per-bin caps for ORPHAN_FILES_DELETION. 0 disables the dimension; see FirstFitDecreasingBinPacker.
-optimizer.scheduler.ofd.max-weight-per-bin=${SCHEDULER_OFD_MAX_WEIGHT_PER_BIN:1000000}
-optimizer.scheduler.ofd.max-size-bytes-per-bin=${SCHEDULER_OFD_MAX_SIZE_BYTES_PER_BIN:5497558138880}
-optimizer.scheduler.ofd.max-items-per-bin=${SCHEDULER_OFD_MAX_ITEMS_PER_BIN:50}
+# Per-bin caps for ORPHAN_FILES_DELETION; 0 disables a dimension. File count is the OFD cost
+# driver — per-file list, manifest joins, and delete calls dominate, independent of file size.
+optimizer.scheduler.ofd.max-files-per-bin=${SCHEDULER_OFD_MAX_FILES_PER_BIN:1000000}
+optimizer.scheduler.ofd.max-tables-per-bin=${SCHEDULER_OFD_MAX_TABLES_PER_BIN:50}
 optimizer.scheduler.results-endpoint=${SCHEDULER_RESULTS_ENDPOINT:http://openhouse-optimizer:8080/v1/optimizer/operations}
 optimizer.scheduler.cluster-id=${SCHEDULER_CLUSTER_ID:LocalHadoopCluster}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
new file mode 100644
index 000000000..a449d0c67
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
@@ -0,0 +1,59 @@
+package com.linkedin.openhouse.optimizer.operations.ofd;
+
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
+import lombok.AllArgsConstructor;
+import lombok.Getter;
+import lombok.NonNull;
+import lombok.ToString;
+
+/**
+ * OFD-specific {@link BinItem}: carries only what the downstream Spark dispatch needs (table fqtn,
+ * operation id) plus the weight the packer uses (current file count). Self-weights from a paired
+ * {@link TableOperationDto} and {@link TableStatsDto} via {@link #from(TableOperationDto,
+ * TableStatsDto)} so the projection logic lives here rather than in the scheduler.
+ *
+ * <p>The weighting choice — file count, not bytes — reflects what makes OFD expensive: per-file
+ * listing, manifest joins, and delete calls scale with file count. A 10 GB table with 100k files is
+ * more expensive to OFD than a 1 TB table with 2k files.
+ */
+@AllArgsConstructor
+@Getter
+@ToString
+public class OfdBinItem implements BinItem {
+
+  /** Fully-qualified {@code database.table} identifier passed as {@code --tableNames}. */
+  @NonNull private final String fqtn;
+
+  /**
+   * Optimizer operation id passed as {@code --operationIds}; the Spark app POSTs back keyed on it.
+   */
+  @NonNull private final String operationId;
+
+  /** Current file count for this table; the FFD packer's cost dimension. */
+  private final long weight;
+
+  /**
+   * Project a pending operation + its stats row into a packable item. Callers do {@code
+   * pendingOps.stream().map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))} — the
+   * weighting decision lives entirely in this class.
+   */
+  public static OfdBinItem from(TableOperationDto op, TableStatsDto stats) {
+    return new OfdBinItem(
+        op.getDatabaseName() + "." + op.getTableName(), op.getId(), currentFileCount(stats));
+  }
+
+  private static long currentFileCount(TableStatsDto stats) {
+    if (stats == null || stats.getSnapshot() == null) {
+      return 0L;
+    }
+    Long files = stats.getSnapshot().getNumCurrentFiles();
+    return files != null ? files : 0L;
+  }
+
+  @Override
+  public long getWeight() {
+    return weight;
+  }
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index e8a5e3f6f..8d99ca26d 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -6,6 +6,7 @@
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.model.TableOperationDto;
 import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
@@ -28,15 +29,15 @@
 
 /**
  * For one operation type per call, reads PENDING rows, looks up per-table stats, projects each into
- * a {@link BinItem}, dispatches to the registered {@link BinPacker}, and submits one Spark job per
- * returned {@link Bin}. The {@link com.linkedin.openhouse.optimizer.scheduler.SchedulerApplication}
- * 's CommandLineRunner loops over the registered packers and invokes {@code schedule(opType)} for
- * each.
+ * the op-type's {@link BinItem} impl, dispatches to the registered {@link BinPacker}, and submits
+ * one Spark job per returned {@link Bin}. The {@link SchedulerApplication}'s CommandLineRunner
+ * loops over the registered packers and invokes {@code schedule(opType)} for each.
  *
  * <p>The runner owns all optimizer-specific orchestration — claim CAS, status transitions, and the
- * actual {@link JobsServiceClient#launch} call. The bin packer is a pure utility over a flat list
- * of {@link BinItem}s, deliberately decoupled from operation types and JPA rows so the same packer
- * can be shared with the existing {@code JobsScheduler} flow.
+ * actual {@link JobsServiceClient#launch} call. Per-op-type projection (build the right {@link
+ * BinItem} impl from an op + stats) and dispatch live in op-specific sub-methods; today there is
+ * only OFD, and the per-op switch is a TODO to factor into an {@code OperationScheduler<T>} handler
+ * once a second op type lands.
  */
 @Slf4j
 @Component
@@ -44,14 +45,14 @@ public class SchedulerRunner {
   private final TableOperationsRepository operationsRepo;
   private final TableStatsRepository statsRepo;
   private final JobsServiceClient jobsClient;
-  private final Map<OperationTypeDto, BinPacker> binPackers;
+  private final Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers;
   private final String resultsEndpoint;
 
   public SchedulerRunner(
       TableOperationsRepository operationsRepo,
       TableStatsRepository statsRepo,
       JobsServiceClient jobsClient,
-      Map<OperationTypeDto, BinPacker> binPackers,
+      Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers,
       @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint) {
     this.operationsRepo = operationsRepo;
     this.statsRepo = statsRepo;
@@ -74,7 +75,7 @@ public void schedule(OperationTypeDto operationType) {
   public void schedule(
       OperationTypeDto operationType, Optional<String> databaseName, Optional<String> tableName) {
 
-    BinPacker packer = binPackers.get(operationType);
+    BinPacker<? extends BinItem> packer = binPackers.get(operationType);
     if (packer == null) {
       throw new IllegalStateException(
           "No BinPacker registered for operation type " + operationType);
@@ -111,17 +112,17 @@ public void schedule(
         survivors.stream().map(TableOperationDto::fromRow).collect(Collectors.toList());
 
     // Tradeoff: we fetch fresh table_stats per scheduling cycle (one batched query) rather than
-    // denormalizing the relevant fields onto TableOperationDto. The denormalized alternative would
-    // remove the per-cycle lookup but widen the TableOperationDto row and serve staler data; the
-    // current shape favors smaller operations + freshness over fewer queries.
+    // denormalizing the relevant fields onto TableOperationDto. The denormalized alternative
+    // would remove the per-cycle lookup but widen the TableOperationDto row and serve staler
+    // data; the current shape favors smaller operations + freshness over fewer queries.
     Set<String> uuids =
         pending.stream().map(TableOperationDto::getTableUuid).collect(Collectors.toSet());
     Map<String, TableStatsDto> statsByUuid =
         statsRepo.findAllById(uuids).stream()
             .collect(Collectors.toMap(TableStatsRow::getTableUuid, TableStatsDto::fromRow));
 
-    // Filter at the boundary so every BinItem is built from a known-non-null stats row. A table
-    // without a stats row gets skipped this cycle and reconsidered after stats land.
+    // Filter at the boundary so every projection is built from a known-non-null stats row. A
+    // table without a stats row gets skipped this cycle and reconsidered after stats land.
     List<TableOperationDto> withStats =
         pending.stream()
             .filter(op -> statsByUuid.containsKey(op.getTableUuid()))
@@ -136,45 +137,36 @@ public void schedule(
       return;
     }
 
-    List<BinItem> items =
-        withStats.stream()
-            .map(op -> toBinItem(op, statsByUuid.get(op.getTableUuid())))
-            .collect(Collectors.toList());
+    // TODO: when a second op type lands, factor each branch into an OperationScheduler<T extends
+    // BinItem> handler (own projection + own submit). Today's switch is the one place we narrow
+    // the wildcard packer to a concrete BinItem impl; the cast is safe by SchedulerConfig's
+    // registration invariant (the packer for ORPHAN_FILES_DELETION is built as a
+    // BinPacker<OfdBinItem>).
+    switch (operationType) {
+      case ORPHAN_FILES_DELETION:
+        @SuppressWarnings("unchecked")
+        BinPacker<OfdBinItem> ofdPacker = (BinPacker<OfdBinItem>) packer;
+        scheduleOfd(ofdPacker, withStats, statsByUuid);
+        return;
+      default:
+        throw new IllegalStateException(
+            "No scheduling handler for operation type " + operationType);
+    }
+  }
 
-    List<Bin> bins = packer.pack(items);
-    log.info(
-        "Packed {} PENDING {} operations into {} bins", items.size(), operationType, bins.size());
+  private void scheduleOfd(
+      BinPacker<OfdBinItem> packer,
+      List<TableOperationDto> withStats,
+      Map<String, TableStatsDto> statsByUuid) {
 
-    bins.forEach(bin -> submitBin(operationType, bin));
-  }
+    List<OfdBinItem> items =
+        withStats.stream()
+            .map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))
+            .collect(Collectors.toList());
+    List<Bin<OfdBinItem>> bins = packer.pack(items);
+    log.info("Packed {} PENDING OFD operations into {} bins", items.size(), bins.size());
 
-  /**
-   * Project an (operation, stats) pair into the packer's input row. Weight is current file count
-   * (the packing dimension OFD cares about); sizeBytes is the on-disk footprint when stats expose
-   * it, else 0.
-   */
-  private static BinItem toBinItem(TableOperationDto op, TableStatsDto stats) {
-    long weight = 0L;
-    long sizeBytes = 0L;
-    if (stats != null && stats.getSnapshot() != null) {
-      Long files = stats.getSnapshot().getNumCurrentFiles();
-      if (files != null) {
-        weight = files;
-      }
-      Long bytes = stats.getSnapshot().getTableSizeBytes();
-      if (bytes != null) {
-        sizeBytes = bytes;
-      }
-    }
-    return BinItem.builder()
-        .fqtn(op.getDatabaseName() + "." + op.getTableName())
-        .operationId(op.getId())
-        .tableUuid(op.getTableUuid())
-        .databaseName(op.getDatabaseName())
-        .tableName(op.getTableName())
-        .weight(weight)
-        .sizeBytes(sizeBytes)
-        .build();
+    bins.forEach(this::submitOfdBin);
   }
 
   /**
@@ -212,12 +204,12 @@ private List<TableOperationsRow> cancelDuplicates(List<TableOperationsRow> pendi
   }
 
   /**
-   * Claim the bin, narrow to the rows actually claimed, launch the batched Spark job for the
-   * claimed subset, and mark them SCHEDULED — or revert to PENDING if launch failed.
+   * Claim a bin of OFD work, narrow to the rows actually claimed, launch the batched Spark job for
+   * the claimed subset, and mark them SCHEDULED — or revert to PENDING if launch failed.
    */
-  private void submitBin(OperationTypeDto operationType, Bin bin) {
+  private void submitOfdBin(Bin<OfdBinItem> bin) {
     List<String> ids =
-        bin.items().stream().map(BinItem::getOperationId).collect(Collectors.toList());
+        bin.items().stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
 
     // Claim in one batched UPDATE: PENDING → SCHEDULING. Aggregate row count alone doesn't tell us
     // *which* rows we own — re-query for SCHEDULING rows tagged with our scheduledAt watermark.
@@ -258,19 +250,19 @@ private void submitBin(OperationTypeDto operationType, Bin bin) {
 
     // Narrow the bin's items to the rows we actually own before extracting Spark-args.
     Set<String> claimedSet = new HashSet<>(claimedIds);
-    List<BinItem> claimedItems =
+    List<OfdBinItem> claimedItems =
         bin.items().stream()
             .filter(item -> claimedSet.contains(item.getOperationId()))
             .collect(Collectors.toList());
     List<String> tableNames =
-        claimedItems.stream().map(BinItem::getFqtn).collect(Collectors.toList());
+        claimedItems.stream().map(OfdBinItem::getFqtn).collect(Collectors.toList());
     List<String> operationIds =
-        claimedItems.stream().map(BinItem::getOperationId).collect(Collectors.toList());
+        claimedItems.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
 
-    String jobName =
-        "batched-" + operationType.name().toLowerCase() + "-" + claimedAt.toEpochMilli();
+    String operationTypeName = OperationTypeDto.ORPHAN_FILES_DELETION.name();
+    String jobName = "batched-" + operationTypeName.toLowerCase() + "-" + claimedAt.toEpochMilli();
     Optional<String> jobId =
-        jobsClient.launch(jobName, operationType.name(), tableNames, operationIds, resultsEndpoint);
+        jobsClient.launch(jobName, operationTypeName, tableNames, operationIds, resultsEndpoint);
 
     if (jobId.isPresent()) {
       int updated =
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
index 4b94ebb4b..584d8cc09 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
@@ -9,41 +9,35 @@
 /**
  * Mutable accumulator used by a {@link BinPacker} while assembling a batch. Callers receiving a
  * packed list of {@code Bin}s treat them as read-only — {@link #items()} returns an unmodifiable
- * view, and the running totals are exposed only via getters.
+ * view and the running total is exposed only via the getter.
  *
- * <p>Structurally identical to {@code jobs.util.binpack.Bin} introduced by PR&nbsp;#599; see the
- * note on {@link BinItem} for the swap-out plan.
+ * @param <T> concrete {@link BinItem} implementation carried by this bin
  */
 @ToString
-public class Bin {
-  private final List<BinItem> items = new ArrayList<>();
+public class Bin<T extends BinItem> {
+  private final List<T> items = new ArrayList<>();
   @Getter private long totalWeight;
-  @Getter private long totalSizeBytes;
 
   /**
-   * Returns true iff adding {@code item} would keep this bin at or below all three caps. A cap of
-   * {@code <= 0} disables that dimension.
+   * Returns true iff adding {@code item} keeps the bin at or below both caps. A cap of {@code <= 0}
+   * disables that dimension.
    */
-  boolean fits(BinItem item, long maxWeight, long maxSizeBytes, int maxItems) {
+  boolean fits(T item, long maxWeight, int maxItems) {
     if (maxItems > 0 && items.size() >= maxItems) {
       return false;
     }
     if (maxWeight > 0 && totalWeight + item.getWeight() > maxWeight) {
       return false;
     }
-    if (maxSizeBytes > 0 && totalSizeBytes + item.getSizeBytes() > maxSizeBytes) {
-      return false;
-    }
     return true;
   }
 
-  void add(BinItem item) {
+  void add(T item) {
     items.add(item);
     totalWeight += item.getWeight();
-    totalSizeBytes += item.getSizeBytes();
   }
 
-  public List<BinItem> items() {
+  public List<T> items() {
     return Collections.unmodifiableList(items);
   }
 
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
index 01d1d154d..72f4de278 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
@@ -1,42 +1,13 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
-import lombok.Builder;
-import lombok.Getter;
-import lombok.NonNull;
-import lombok.ToString;
-
 /**
- * A single packable unit for a {@link BinPacker}. Carries enough identity for downstream consumers
- * (the optimizer scheduler dispatching Spark, the existing JobsScheduler, an offline analyzer) to
- * resolve the underlying table and report results without re-reading optimizer state.
- *
- * <p>{@link #weight} is the primary bin-packing dimension (for orphan files deletion: the number of
- * current files in the table). {@link #sizeBytes} is a secondary capacity dimension so a packer can
- * cap the on-disk footprint of a bin independently of file count.
- *
- * <p>This type is structurally identical to {@code jobs.util.binpack.BinItem} introduced by
- * PR&nbsp;#599. When that PR merges, this class becomes a redundant copy and we should switch the
- * scheduler to import the common one.
+ * Smallest contract a {@link BinPacker} needs from each unit it packs: a single non-negative
+ * weight. Implementations are operation-specific (see {@code
+ * com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem}) and encode their own cost model in
+ * {@link #getWeight()}. They also carry whatever identity the downstream dispatcher needs (table
+ * name, operation id, etc.); those getters live on the impl, not on this interface, so the packer
+ * stays a pure utility.
  */
-@Getter
-@Builder
-@ToString
-public class BinItem {
-  /** Fully-qualified {@code database.table} identifier the batched Spark app will load. */
-  @NonNull private final String fqtn;
-
-  /** Optimizer operation id; the Spark app POSTs its outcome back keyed on this. */
-  @NonNull private final String operationId;
-
-  /** Stable table identity for stats lookup and history correlation. */
-  @NonNull private final String tableUuid;
-
-  @NonNull private final String databaseName;
-  @NonNull private final String tableName;
-
-  /** Primary packing cost — for OFD this is the table's current file count. */
-  private final long weight;
-
-  /** Secondary packing cost — on-disk size in bytes. {@code 0} when unknown. */
-  private final long sizeBytes;
+public interface BinItem {
+  long getWeight();
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
index d32193c9d..41b910385 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
@@ -4,14 +4,15 @@
 
 /**
  * Strategy interface for grouping a flat list of {@link BinItem}s into one or more {@link Bin}s.
- * Implementations encode the per-bin caps (file count, byte size, item count, etc.) and the
- * placement algorithm; callers iterate the returned bins and dispatch one batch per bin.
+ * Implementations encode the per-bin caps (weight, items, etc.) and the placement algorithm;
+ * callers iterate the returned bins and dispatch one batch per bin.
  *
- * <p>The interface does not reference any optimizer-specific types (operations, statuses,
- * repositories). Adapter code in the scheduler maps its domain objects into {@code BinItem}s before
- * calling and maps results back to operation ids after.
+ * <p>Parametric on the {@link BinItem} impl so the packer, bins, and items are all type-consistent
+ * end-to-end — the dispatch site receives {@code List<Bin<T>>} and never has to downcast.
+ *
+ * @param <T> concrete {@link BinItem} implementation packed by this packer
  */
-public interface BinPacker {
+public interface BinPacker<T extends BinItem> {
   /** Pack {@code items} into one or more bins. Each returned bin is non-empty. */
-  List<Bin> pack(List<BinItem> items);
+  List<Bin<T>> pack(List<T> items);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
index 04ae33c21..4325bae96 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
@@ -3,68 +3,68 @@
 import java.util.ArrayList;
 import java.util.Comparator;
 import java.util.List;
-import java.util.stream.Collectors;
 import lombok.Builder;
 import lombok.extern.slf4j.Slf4j;
 
 /**
- * First-fit-decreasing bin packer with three independent caps:
+ * Generic first-fit-decreasing bin packer with two independent caps:
  *
  * <ul>
- *   <li>{@code maxWeightPerBin} — total {@link BinItem#getWeight()} (for OFD: file count)
- *   <li>{@code maxSizeBytesPerBin} — total on-disk size of all items in the bin
+ *   <li>{@code maxWeightPerBin} — total {@link BinItem#getWeight()} per bin
  *   <li>{@code maxItemsPerBin} — number of items per bin
  * </ul>
  *
  * <p>Pass {@code 0} or a negative value for any cap to disable that dimension.
  *
- * <p>An item that exceeds any single cap on its own is placed into a bin by itself rather than
+ * <p>An item that exceeds the weight cap on its own is placed into a bin by itself rather than
  * dropped — the scheduler never silently skips maintenance work for an oversized table.
  *
- * <p>Structurally mirrors {@code jobs.util.binpack.FirstFitDecreasingBinPacker} from PR&nbsp;#599.
+ * <p>The pack body is a single stream pipeline: sort decreasing by weight, then fold each item into
+ * the running list of bins. The fold uses {@code Stream.collect(Supplier, BiConsumer, BiConsumer)}
+ * — the standard idiom for an FFD-style stateful collect — so the placement is expressed once, in
+ * functional form, with the compiler enforcing {@code T}-consistency across the pipeline.
+ *
+ * @param <T> concrete {@link BinItem} implementation packed by this packer
  */
 @Slf4j
 @Builder
-public class FirstFitDecreasingBinPacker implements BinPacker {
+public class FirstFitDecreasingBinPacker<T extends BinItem> implements BinPacker<T> {
 
   @Builder.Default private final long maxWeightPerBin = 1_000_000L;
-  @Builder.Default private final long maxSizeBytesPerBin = 5L * 1024L * 1024L * 1024L * 1024L;
   @Builder.Default private final int maxItemsPerBin = 50;
 
   @Override
-  public List<Bin> pack(List<BinItem> items) {
+  public List<Bin<T>> pack(List<T> items) {
     if (items == null || items.isEmpty()) {
       return new ArrayList<>();
     }
-
-    List<BinItem> sorted =
+    List<Bin<T>> bins =
         items.stream()
             .sorted(Comparator.comparingLong(BinItem::getWeight).reversed())
-            .collect(Collectors.toList());
-
-    List<Bin> bins = new ArrayList<>();
-    for (BinItem item : sorted) {
-      Bin target = null;
-      for (Bin bin : bins) {
-        if (bin.fits(item, maxWeightPerBin, maxSizeBytesPerBin, maxItemsPerBin)) {
-          target = bin;
-          break;
-        }
-      }
-      if (target == null) {
-        target = new Bin();
-        bins.add(target);
-        if (!target.fits(item, maxWeightPerBin, maxSizeBytesPerBin, maxItemsPerBin)) {
-          log.warn(
-              "Item exceeds per-bin caps on its own; placing in dedicated bin: fqtn={} weight={} sizeBytes={}",
-              item.getFqtn(),
-              item.getWeight(),
-              item.getSizeBytes());
-        }
-      }
-      target.add(item);
-    }
+            .collect(ArrayList::new, this::placeItem, List::addAll);
     log.info("Packed {} items into {} bins", items.size(), bins.size());
     return bins;
   }
+
+  /**
+   * Place {@code item} into the first bin that can hold it; if none, open a fresh bin. Mutates
+   * {@code bins} — used as the accumulator step of the {@code pack} fold.
+   */
+  private void placeItem(List<Bin<T>> bins, T item) {
+    bins.stream()
+        .filter(b -> b.fits(item, maxWeightPerBin, maxItemsPerBin))
+        .findFirst()
+        .ifPresentOrElse(
+            b -> b.add(item),
+            () -> {
+              Bin<T> fresh = new Bin<>();
+              if (!fresh.fits(item, maxWeightPerBin, maxItemsPerBin)) {
+                log.warn(
+                    "Item exceeds per-bin caps on its own; placing in dedicated bin: weight={}",
+                    item.getWeight());
+              }
+              fresh.add(item);
+              bins.add(fresh);
+            });
+  }
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index f39734f34..9dcc632b8 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -1,6 +1,8 @@
 package com.linkedin.openhouse.optimizer.scheduler.config;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
@@ -19,17 +21,13 @@ public class SchedulerConfig {
   @Value("${optimizer.scheduler.cluster-id}")
   private String clusterId;
 
-  /** OFD bin packer: max files per bin (primary cost dimension). 0 disables. */
-  @Value("${optimizer.scheduler.ofd.max-weight-per-bin:1000000}")
-  private long ofdMaxWeightPerBin;
+  /** Max table-current-file-count summed across one batched OFD Spark job. 0 disables. */
+  @Value("${optimizer.scheduler.ofd.max-files-per-bin:1000000}")
+  private long ofdMaxFilesPerBin;
 
-  /** OFD bin packer: max on-disk size per bin in bytes. 0 disables. */
-  @Value("${optimizer.scheduler.ofd.max-size-bytes-per-bin:5497558138880}")
-  private long ofdMaxSizeBytesPerBin;
-
-  /** OFD bin packer: max tables per bin. 0 disables. */
-  @Value("${optimizer.scheduler.ofd.max-items-per-bin:50}")
-  private int ofdMaxItemsPerBin;
+  /** Max number of tables per batched OFD Spark job. 0 disables. */
+  @Value("${optimizer.scheduler.ofd.max-tables-per-bin:50}")
+  private int ofdMaxTablesPerBin;
 
   @Bean
   public WebClient jobsWebClient() {
@@ -42,18 +40,20 @@ public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
   }
 
   /**
-   * Map of {@link OperationTypeDto} to the {@link BinPacker} strategy that handles it. Adding a new
-   * operation type means adding an entry here and configuring its packer caps; the packer itself
-   * stays generic over {@link com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem}.
+   * Map of {@link OperationTypeDto} to the {@link BinPacker} strategy that handles it. The packer
+   * is parametric on the op type's concrete {@link BinItem} impl; the map value uses a wildcard
+   * because heterogeneous parametric values aren't expressible directly. {@link
+   * com.linkedin.openhouse.optimizer.scheduler.SchedulerRunner} narrows back to the concrete type
+   * at dispatch. Adding a new operation type means adding an entry here, an impl of {@link
+   * BinItem}, and a {@code scheduleXxx} branch in the runner.
    */
   @Bean
-  public Map<OperationTypeDto, BinPacker> binPackers() {
+  public Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers() {
     return Map.of(
         OperationTypeDto.ORPHAN_FILES_DELETION,
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(ofdMaxWeightPerBin)
-            .maxSizeBytesPerBin(ofdMaxSizeBytesPerBin)
-            .maxItemsPerBin(ofdMaxItemsPerBin)
+        FirstFitDecreasingBinPacker.<OfdBinItem>builder()
+            .maxWeightPerBin(ofdMaxFilesPerBin)
+            .maxItemsPerBin(ofdMaxTablesPerBin)
             .build());
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index 4835273c6..de1ffefe7 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -15,6 +15,7 @@
 import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
 import com.linkedin.openhouse.optimizer.db.TableStatsRow;
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
@@ -45,15 +46,14 @@ class SchedulerRunnerTest {
   @Mock private TableOperationsRepository operationsRepo;
   @Mock private TableStatsRepository statsRepo;
   @Mock private JobsServiceClient jobsClient;
-  @Mock private BinPacker binPacker;
+  @Mock private BinPacker<OfdBinItem> binPacker;
 
   private SchedulerRunner runner;
 
   @BeforeEach
   void setUp() {
-    runner =
-        new SchedulerRunner(
-            operationsRepo, statsRepo, jobsClient, Map.of(OFD, binPacker), RESULTS_ENDPOINT);
+    Map<OperationTypeDto, BinPacker<? extends BinItem>> packers = Map.of(OFD, binPacker);
+    runner = new SchedulerRunner(operationsRepo, statsRepo, jobsClient, packers, RESULTS_ENDPOINT);
   }
 
   // ---- Stubbing helpers ----
@@ -87,19 +87,17 @@ private void stubFindClaimed(List<TableOperationsRow> rows) {
   }
 
   /**
-   * Stubs the bin packer to put every input item into a single bin, by routing through a real FFD
-   * packer with unbounded caps. Lets the test exercise the runner's projection (op → BinItem)
-   * without bypassing Bin's package-private mutators.
+   * Stubs the mock packer by routing through a real FFD packer with unbounded caps, so the runner's
+   * op→OfdBinItem projection is exercised without bypassing Bin's package-private mutators.
    */
   private void stubOneBinForAllItems() {
-    FirstFitDecreasingBinPacker realPacker =
-        FirstFitDecreasingBinPacker.builder()
+    FirstFitDecreasingBinPacker<OfdBinItem> realPacker =
+        FirstFitDecreasingBinPacker.<OfdBinItem>builder()
             .maxWeightPerBin(0L)
-            .maxSizeBytesPerBin(0L)
             .maxItemsPerBin(0)
             .build();
     when(binPacker.pack(anyList()))
-        .thenAnswer(inv -> realPacker.pack(inv.<List<BinItem>>getArgument(0)));
+        .thenAnswer(inv -> realPacker.pack(inv.<List<OfdBinItem>>getArgument(0)));
   }
 
   private TableOperationsRow pendingRow(String uuid, String db, String table) {
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
index 1c18eb63d..3bef7195b 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
@@ -4,46 +4,44 @@
 
 import java.util.List;
 import java.util.stream.Collectors;
+import lombok.AllArgsConstructor;
+import lombok.Getter;
 import org.junit.jupiter.api.Test;
 
 class FirstFitDecreasingBinPackerTest {
 
-  private static BinItem item(String id, long weight) {
-    return item(id, weight, 0L);
+  @AllArgsConstructor
+  @Getter
+  static class TestItem implements BinItem {
+    private final String id;
+    private final long weight;
   }
 
-  private static BinItem item(String id, long weight, long sizeBytes) {
-    return BinItem.builder()
-        .fqtn("db.tbl_" + id)
-        .operationId("op-" + id)
-        .tableUuid("uuid-" + id)
-        .databaseName("db")
-        .tableName("tbl_" + id)
-        .weight(weight)
-        .sizeBytes(sizeBytes)
-        .build();
+  private static TestItem item(String id, long weight) {
+    return new TestItem(id, weight);
   }
 
   @Test
   void emptyInput_returnsEmptyBins() {
-    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder().build();
     assertThat(packer.pack(List.of())).isEmpty();
   }
 
   @Test
   void singleItem_oneBin() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin> bins = packer.pack(List.of(item("a", 100L)));
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin<TestItem>> bins = packer.pack(List.of(item("a", 100L)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(1);
   }
 
   @Test
   void underWeightLimit_oneBin() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin> bins =
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin<TestItem>> bins =
         packer.pack(List.of(item("a", 300_000L), item("b", 300_000L), item("c", 300_000L)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(3);
@@ -52,59 +50,45 @@ void underWeightLimit_oneBin() {
 
   @Test
   void overWeightLimit_twoBins() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin> bins =
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin<TestItem>> bins =
         packer.pack(List.of(item("a", 600_000L), item("b", 600_000L), item("c", 400_000L)));
     assertThat(bins).hasSize(2);
-    // FFD: largest first, place 600k → bin0; next 600k doesn't fit bin0, → bin1; 400k fits bin0.
+    // FFD: sort desc → 600, 600, 400. Place 600 → bin0; next 600 doesn't fit bin0, → bin1;
+    // 400 fits bin0 (total 1_000_000).
     assertThat(bins.get(0).getTotalWeight()).isEqualTo(1_000_000L);
     assertThat(bins.get(1).getTotalWeight()).isEqualTo(600_000L);
   }
 
   @Test
   void itemLargerThanCap_getsOwnBin() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000L).build();
-    List<Bin> bins = packer.pack(List.of(item("big", 5_000L)));
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000L).build();
+    List<Bin<TestItem>> bins = packer.pack(List.of(item("big", 5_000L)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(1);
   }
 
   @Test
   void sortedDescending_largestFirst() {
-    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
-    List<Bin> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder().build();
+    List<Bin<TestItem>> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
     assertThat(bins).hasSize(1);
-    List<String> uuids =
-        bins.get(0).items().stream().map(BinItem::getTableUuid).collect(Collectors.toList());
-    assertThat(uuids).containsExactly("uuid-large", "uuid-small");
-  }
-
-  @Test
-  void sizeBytesCap_splitsBins() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(0L) // disable
-            .maxSizeBytesPerBin(1_000L)
-            .maxItemsPerBin(0)
-            .build();
-    List<Bin> bins =
-        packer.pack(List.of(item("a", 0L, 600L), item("b", 0L, 500L), item("c", 0L, 400L)));
-    assertThat(bins).hasSize(2);
-    assertThat(bins.get(0).getTotalSizeBytes()).isEqualTo(1_000L); // 600 + 400
-    assertThat(bins.get(1).getTotalSizeBytes()).isEqualTo(500L);
+    List<String> ids =
+        bins.get(0).items().stream().map(TestItem::getId).collect(Collectors.toList());
+    assertThat(ids).containsExactly("large", "small");
   }
 
   @Test
   void maxItemsCap_splitsBins() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder()
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder()
             .maxWeightPerBin(0L)
-            .maxSizeBytesPerBin(0L)
             .maxItemsPerBin(2)
             .build();
-    List<Bin> bins =
+    List<Bin<TestItem>> bins =
         packer.pack(List.of(item("a", 1L), item("b", 1L), item("c", 1L), item("d", 1L)));
     assertThat(bins).hasSize(2);
     assertThat(bins.get(0).size()).isEqualTo(2);
@@ -113,18 +97,14 @@ void maxItemsCap_splitsBins() {
 
   @Test
   void zeroCap_disablesDimension() {
-    // All caps zero → everything in one bin regardless of weight/size.
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder()
+    // All caps zero → everything in one bin regardless of weight.
+    FirstFitDecreasingBinPacker<TestItem> packer =
+        FirstFitDecreasingBinPacker.<TestItem>builder()
             .maxWeightPerBin(0L)
-            .maxSizeBytesPerBin(0L)
             .maxItemsPerBin(0)
             .build();
-    List<Bin> bins =
-        packer.pack(
-            List.of(
-                item("a", Long.MAX_VALUE / 4, Long.MAX_VALUE / 4),
-                item("b", Long.MAX_VALUE / 4, Long.MAX_VALUE / 4)));
+    List<Bin<TestItem>> bins =
+        packer.pack(List.of(item("a", Long.MAX_VALUE / 4), item("b", Long.MAX_VALUE / 4)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(2);
   }
diff --git a/services/optimizer/scheduler/src/test/resources/application-test.properties b/services/optimizer/scheduler/src/test/resources/application-test.properties
index db4e3136c..57354728e 100644
--- a/services/optimizer/scheduler/src/test/resources/application-test.properties
+++ b/services/optimizer/scheduler/src/test/resources/application-test.properties
@@ -5,8 +5,7 @@ spring.jpa.hibernate.ddl-auto=none
 spring.sql.init.mode=always
 spring.sql.init.schema-locations=classpath:db/optimizer-schema.sql
 optimizer.scheduler.jobs.base-uri=http://localhost:9999
-optimizer.scheduler.ofd.max-weight-per-bin=1000000
-optimizer.scheduler.ofd.max-size-bytes-per-bin=5497558138880
-optimizer.scheduler.ofd.max-items-per-bin=50
+optimizer.scheduler.ofd.max-files-per-bin=1000000
+optimizer.scheduler.ofd.max-tables-per-bin=50
 optimizer.scheduler.results-endpoint=http://localhost:8080/v1/optimizer/operations
 optimizer.scheduler.cluster-id=test-cluster

From 1c68d109cde30da7966691febe36fc8bd502683b Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 07:32:19 -0700
Subject: [PATCH 03/13] refactor(scheduler): drop type parameters;
 Bin/BinPacker operate on BinItem directly
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Follow-up to the prior commit: drop <T extends BinItem> from Bin,
BinPacker, FirstFitDecreasingBinPacker. The interface alone is enough
for the polymorphism we wanted — per-op-type impls (OfdBinItem and
future siblings) implement BinItem; the packer never knows the concrete
type. The dispatcher narrows once at access time.

What changed:

- Bin → non-generic; items() returns List<BinItem>; fits/add take BinItem.
- BinPacker → non-generic interface, `List<Bin> pack(List<? extends BinItem>)`.
  The wildcard on the input is the standard PECS shape so callers can
  pass `List<OfdBinItem>` directly without fighting invariance at the
  call site.
- FirstFitDecreasingBinPacker → non-generic, otherwise unchanged
  (functional stream pack body preserved).
- SchedulerConfig → `Map<OperationTypeDto, BinPacker>` (no wildcard);
  `FirstFitDecreasingBinPacker.builder()` no longer needs a type witness.
- SchedulerApplication → `Map<OperationTypeDto, BinPacker>` likewise.
- SchedulerRunner → drop generics on the map and on submitOfdBin. The
  per-op-type switch no longer needs a `BinPacker<OfdBinItem>` cast; the
  downcast to OfdBinItem happens once at the top of submitOfdBin via
  `bin.items().stream().map(OfdBinItem.class::cast).collect(toList())`,
  then everything downstream uses OfdBinItem directly. Same safety
  invariant — SchedulerConfig only feeds OfdBinItem instances to the
  OFD packer — just expressed through a runtime cast instead of an
  unchecked-suppression at the generic boundary.
- Tests updated; FirstFitDecreasingBinPackerTest's TestItem stays the
  same shape but the packer/bins are typed plainly.

Why: discussed in PR #626. The type parameter on Bin/BinPacker bought
compile-time `T`-consistency through the packer pipeline, but at the
cost of `Map<OperationTypeDto, BinPacker<? extends BinItem>>` and an
`@SuppressWarnings("unchecked")` cast at the switch boundary. With one
op type today, the cleaner shape is no generics on the type and one
explicit cast at the access site — the cast's locality makes the
invariant obvious. If a future op type adds its own subtle access
pattern, we can revisit per-handler abstraction (OperationScheduler).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../scheduler/SchedulerApplication.java       |  6 +-
 .../optimizer/scheduler/SchedulerRunner.java  | 40 +++++++-------
 .../optimizer/scheduler/binpack/Bin.java      | 13 +++--
 .../scheduler/binpack/BinPacker.java          | 15 +++--
 .../binpack/FirstFitDecreasingBinPacker.java  | 23 ++++----
 .../scheduler/config/SchedulerConfig.java     | 13 ++---
 .../scheduler/SchedulerRunnerTest.java        | 16 ++----
 .../FirstFitDecreasingBinPackerTest.java      | 55 +++++++++----------
 8 files changed, 81 insertions(+), 100 deletions(-)

diff --git a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
index 8bda62779..e17ecd0fc 100644
--- a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
+++ b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
@@ -1,7 +1,6 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import java.util.Map;
 import lombok.extern.slf4j.Slf4j;
@@ -28,12 +27,11 @@
 public class SchedulerApplication implements CommandLineRunner, ExitCodeGenerator {
 
   private final SchedulerRunner runner;
-  private final Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers;
+  private final Map<OperationTypeDto, BinPacker> binPackers;
   private int exitCode = 0;
 
   @Autowired
-  public SchedulerApplication(
-      SchedulerRunner runner, Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers) {
+  public SchedulerApplication(SchedulerRunner runner, Map<OperationTypeDto, BinPacker> binPackers) {
     this.runner = runner;
     this.binPackers = binPackers;
   }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index 8d99ca26d..5dd2fdda6 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -36,7 +36,7 @@
  * <p>The runner owns all optimizer-specific orchestration — claim CAS, status transitions, and the
  * actual {@link JobsServiceClient#launch} call. Per-op-type projection (build the right {@link
  * BinItem} impl from an op + stats) and dispatch live in op-specific sub-methods; today there is
- * only OFD, and the per-op switch is a TODO to factor into an {@code OperationScheduler<T>} handler
+ * only OFD, and the per-op switch is a TODO to factor into an {@code OperationScheduler} handler
  * once a second op type lands.
  */
 @Slf4j
@@ -45,14 +45,14 @@ public class SchedulerRunner {
   private final TableOperationsRepository operationsRepo;
   private final TableStatsRepository statsRepo;
   private final JobsServiceClient jobsClient;
-  private final Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers;
+  private final Map<OperationTypeDto, BinPacker> binPackers;
   private final String resultsEndpoint;
 
   public SchedulerRunner(
       TableOperationsRepository operationsRepo,
       TableStatsRepository statsRepo,
       JobsServiceClient jobsClient,
-      Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers,
+      Map<OperationTypeDto, BinPacker> binPackers,
       @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint) {
     this.operationsRepo = operationsRepo;
     this.statsRepo = statsRepo;
@@ -75,7 +75,7 @@ public void schedule(OperationTypeDto operationType) {
   public void schedule(
       OperationTypeDto operationType, Optional<String> databaseName, Optional<String> tableName) {
 
-    BinPacker<? extends BinItem> packer = binPackers.get(operationType);
+    BinPacker packer = binPackers.get(operationType);
     if (packer == null) {
       throw new IllegalStateException(
           "No BinPacker registered for operation type " + operationType);
@@ -137,16 +137,13 @@ public void schedule(
       return;
     }
 
-    // TODO: when a second op type lands, factor each branch into an OperationScheduler<T extends
-    // BinItem> handler (own projection + own submit). Today's switch is the one place we narrow
-    // the wildcard packer to a concrete BinItem impl; the cast is safe by SchedulerConfig's
-    // registration invariant (the packer for ORPHAN_FILES_DELETION is built as a
-    // BinPacker<OfdBinItem>).
+    // TODO: when a second op type lands, factor each branch into an OperationScheduler handler
+    // (own projection + own submit). Today's switch is the only place that knows the concrete
+    // BinItem impl per op type; the downcasts inside submitOfdBin are safe by SchedulerConfig's
+    // registration invariant (the packer for ORPHAN_FILES_DELETION is fed OfdBinItem instances).
     switch (operationType) {
       case ORPHAN_FILES_DELETION:
-        @SuppressWarnings("unchecked")
-        BinPacker<OfdBinItem> ofdPacker = (BinPacker<OfdBinItem>) packer;
-        scheduleOfd(ofdPacker, withStats, statsByUuid);
+        scheduleOfd(packer, withStats, statsByUuid);
         return;
       default:
         throw new IllegalStateException(
@@ -155,15 +152,13 @@ public void schedule(
   }
 
   private void scheduleOfd(
-      BinPacker<OfdBinItem> packer,
-      List<TableOperationDto> withStats,
-      Map<String, TableStatsDto> statsByUuid) {
+      BinPacker packer, List<TableOperationDto> withStats, Map<String, TableStatsDto> statsByUuid) {
 
     List<OfdBinItem> items =
         withStats.stream()
             .map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))
             .collect(Collectors.toList());
-    List<Bin<OfdBinItem>> bins = packer.pack(items);
+    List<Bin> bins = packer.pack(items);
     log.info("Packed {} PENDING OFD operations into {} bins", items.size(), bins.size());
 
     bins.forEach(this::submitOfdBin);
@@ -205,11 +200,16 @@ private List<TableOperationsRow> cancelDuplicates(List<TableOperationsRow> pendi
 
   /**
    * Claim a bin of OFD work, narrow to the rows actually claimed, launch the batched Spark job for
-   * the claimed subset, and mark them SCHEDULED — or revert to PENDING if launch failed.
+   * the claimed subset, and mark them SCHEDULED — or revert to PENDING if launch failed. Items in
+   * the bin are typed as {@link BinItem}; we narrow once to {@link OfdBinItem} on entry since this
+   * method runs only on bins produced by the OFD packer (see {@link #schedule(OperationTypeDto,
+   * Optional, Optional)}).
    */
-  private void submitOfdBin(Bin<OfdBinItem> bin) {
+  private void submitOfdBin(Bin bin) {
+    List<OfdBinItem> ofdItems =
+        bin.items().stream().map(OfdBinItem.class::cast).collect(Collectors.toList());
     List<String> ids =
-        bin.items().stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
+        ofdItems.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
 
     // Claim in one batched UPDATE: PENDING → SCHEDULING. Aggregate row count alone doesn't tell us
     // *which* rows we own — re-query for SCHEDULING rows tagged with our scheduledAt watermark.
@@ -251,7 +251,7 @@ private void submitOfdBin(Bin<OfdBinItem> bin) {
     // Narrow the bin's items to the rows we actually own before extracting Spark-args.
     Set<String> claimedSet = new HashSet<>(claimedIds);
     List<OfdBinItem> claimedItems =
-        bin.items().stream()
+        ofdItems.stream()
             .filter(item -> claimedSet.contains(item.getOperationId()))
             .collect(Collectors.toList());
     List<String> tableNames =
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
index 584d8cc09..5ee0dbbe3 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
@@ -11,18 +11,19 @@
  * packed list of {@code Bin}s treat them as read-only — {@link #items()} returns an unmodifiable
  * view and the running total is exposed only via the getter.
  *
- * @param <T> concrete {@link BinItem} implementation carried by this bin
+ * <p>Items are typed at the interface level only ({@link BinItem}). Callers that need the concrete
+ * impl downcast at the access site; the per-op-type dispatcher owns that contract.
  */
 @ToString
-public class Bin<T extends BinItem> {
-  private final List<T> items = new ArrayList<>();
+public class Bin {
+  private final List<BinItem> items = new ArrayList<>();
   @Getter private long totalWeight;
 
   /**
    * Returns true iff adding {@code item} keeps the bin at or below both caps. A cap of {@code <= 0}
    * disables that dimension.
    */
-  boolean fits(T item, long maxWeight, int maxItems) {
+  boolean fits(BinItem item, long maxWeight, int maxItems) {
     if (maxItems > 0 && items.size() >= maxItems) {
       return false;
     }
@@ -32,12 +33,12 @@ boolean fits(T item, long maxWeight, int maxItems) {
     return true;
   }
 
-  void add(T item) {
+  void add(BinItem item) {
     items.add(item);
     totalWeight += item.getWeight();
   }
 
-  public List<T> items() {
+  public List<BinItem> items() {
     return Collections.unmodifiableList(items);
   }
 
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
index 41b910385..15faffc0a 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
@@ -4,15 +4,14 @@
 
 /**
  * Strategy interface for grouping a flat list of {@link BinItem}s into one or more {@link Bin}s.
- * Implementations encode the per-bin caps (weight, items, etc.) and the placement algorithm;
- * callers iterate the returned bins and dispatch one batch per bin.
+ * Implementations encode the per-bin caps and the placement algorithm; callers iterate the returned
+ * bins and dispatch one batch per bin.
  *
- * <p>Parametric on the {@link BinItem} impl so the packer, bins, and items are all type-consistent
- * end-to-end — the dispatch site receives {@code List<Bin<T>>} and never has to downcast.
- *
- * @param <T> concrete {@link BinItem} implementation packed by this packer
+ * <p>The input parameter uses {@code ? extends BinItem} so callers can pass a typed list of a
+ * concrete impl (e.g. {@code List<OfdBinItem>}) without fighting Java's invariance. The packer sees
+ * the items only as {@link BinItem}s; the per-op-type dispatcher downcasts at access time.
  */
-public interface BinPacker<T extends BinItem> {
+public interface BinPacker {
   /** Pack {@code items} into one or more bins. Each returned bin is non-empty. */
-  List<Bin<T>> pack(List<T> items);
+  List<Bin> pack(List<? extends BinItem> items);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
index 4325bae96..f583dff78 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
@@ -7,38 +7,35 @@
 import lombok.extern.slf4j.Slf4j;
 
 /**
- * Generic first-fit-decreasing bin packer with two independent caps:
+ * First-fit-decreasing bin packer with two independent caps:
  *
  * <ul>
  *   <li>{@code maxWeightPerBin} — total {@link BinItem#getWeight()} per bin
  *   <li>{@code maxItemsPerBin} — number of items per bin
  * </ul>
  *
- * <p>Pass {@code 0} or a negative value for any cap to disable that dimension.
+ * <p>Pass {@code 0} or a negative value for either cap to disable that dimension.
  *
  * <p>An item that exceeds the weight cap on its own is placed into a bin by itself rather than
  * dropped — the scheduler never silently skips maintenance work for an oversized table.
  *
- * <p>The pack body is a single stream pipeline: sort decreasing by weight, then fold each item into
- * the running list of bins. The fold uses {@code Stream.collect(Supplier, BiConsumer, BiConsumer)}
- * — the standard idiom for an FFD-style stateful collect — so the placement is expressed once, in
- * functional form, with the compiler enforcing {@code T}-consistency across the pipeline.
- *
- * @param <T> concrete {@link BinItem} implementation packed by this packer
+ * <p>The pack body is one stream pipeline: sort decreasing by weight, then fold each item into the
+ * running list of bins via {@code Stream.collect(Supplier, BiConsumer, BiConsumer)} — the idiomatic
+ * shape for an FFD-style stateful collect.
  */
 @Slf4j
 @Builder
-public class FirstFitDecreasingBinPacker<T extends BinItem> implements BinPacker<T> {
+public class FirstFitDecreasingBinPacker implements BinPacker {
 
   @Builder.Default private final long maxWeightPerBin = 1_000_000L;
   @Builder.Default private final int maxItemsPerBin = 50;
 
   @Override
-  public List<Bin<T>> pack(List<T> items) {
+  public List<Bin> pack(List<? extends BinItem> items) {
     if (items == null || items.isEmpty()) {
       return new ArrayList<>();
     }
-    List<Bin<T>> bins =
+    List<Bin> bins =
         items.stream()
             .sorted(Comparator.comparingLong(BinItem::getWeight).reversed())
             .collect(ArrayList::new, this::placeItem, List::addAll);
@@ -50,14 +47,14 @@ public List<Bin<T>> pack(List<T> items) {
    * Place {@code item} into the first bin that can hold it; if none, open a fresh bin. Mutates
    * {@code bins} — used as the accumulator step of the {@code pack} fold.
    */
-  private void placeItem(List<Bin<T>> bins, T item) {
+  private void placeItem(List<Bin> bins, BinItem item) {
     bins.stream()
         .filter(b -> b.fits(item, maxWeightPerBin, maxItemsPerBin))
         .findFirst()
         .ifPresentOrElse(
             b -> b.add(item),
             () -> {
-              Bin<T> fresh = new Bin<>();
+              Bin fresh = new Bin();
               if (!fresh.fits(item, maxWeightPerBin, maxItemsPerBin)) {
                 log.warn(
                     "Item exceeds per-bin caps on its own; placing in dedicated bin: weight={}",
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index 9dcc632b8..5bb63eee0 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -1,8 +1,6 @@
 package com.linkedin.openhouse.optimizer.scheduler.config;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
@@ -41,17 +39,16 @@ public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
 
   /**
    * Map of {@link OperationTypeDto} to the {@link BinPacker} strategy that handles it. The packer
-   * is parametric on the op type's concrete {@link BinItem} impl; the map value uses a wildcard
-   * because heterogeneous parametric values aren't expressible directly. {@link
-   * com.linkedin.openhouse.optimizer.scheduler.SchedulerRunner} narrows back to the concrete type
-   * at dispatch. Adding a new operation type means adding an entry here, an impl of {@link
+   * is non-generic and operates on {@code BinItem} at the interface level; per-op-type dispatchers
+   * in {@link com.linkedin.openhouse.optimizer.scheduler.SchedulerRunner} narrow to their concrete
+   * impl at access time. Adding a new operation type means adding an entry here, an impl of {@code
    * BinItem}, and a {@code scheduleXxx} branch in the runner.
    */
   @Bean
-  public Map<OperationTypeDto, BinPacker<? extends BinItem>> binPackers() {
+  public Map<OperationTypeDto, BinPacker> binPackers() {
     return Map.of(
         OperationTypeDto.ORPHAN_FILES_DELETION,
-        FirstFitDecreasingBinPacker.<OfdBinItem>builder()
+        FirstFitDecreasingBinPacker.builder()
             .maxWeightPerBin(ofdMaxFilesPerBin)
             .maxItemsPerBin(ofdMaxTablesPerBin)
             .build());
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index de1ffefe7..827ce01d2 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -15,10 +15,8 @@
 import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
 import com.linkedin.openhouse.optimizer.db.TableStatsRow;
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
@@ -46,13 +44,13 @@ class SchedulerRunnerTest {
   @Mock private TableOperationsRepository operationsRepo;
   @Mock private TableStatsRepository statsRepo;
   @Mock private JobsServiceClient jobsClient;
-  @Mock private BinPacker<OfdBinItem> binPacker;
+  @Mock private BinPacker binPacker;
 
   private SchedulerRunner runner;
 
   @BeforeEach
   void setUp() {
-    Map<OperationTypeDto, BinPacker<? extends BinItem>> packers = Map.of(OFD, binPacker);
+    Map<OperationTypeDto, BinPacker> packers = Map.of(OFD, binPacker);
     runner = new SchedulerRunner(operationsRepo, statsRepo, jobsClient, packers, RESULTS_ENDPOINT);
   }
 
@@ -91,13 +89,9 @@ private void stubFindClaimed(List<TableOperationsRow> rows) {
    * op→OfdBinItem projection is exercised without bypassing Bin's package-private mutators.
    */
   private void stubOneBinForAllItems() {
-    FirstFitDecreasingBinPacker<OfdBinItem> realPacker =
-        FirstFitDecreasingBinPacker.<OfdBinItem>builder()
-            .maxWeightPerBin(0L)
-            .maxItemsPerBin(0)
-            .build();
-    when(binPacker.pack(anyList()))
-        .thenAnswer(inv -> realPacker.pack(inv.<List<OfdBinItem>>getArgument(0)));
+    FirstFitDecreasingBinPacker realPacker =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(0L).maxItemsPerBin(0).build();
+    when(binPacker.pack(anyList())).thenAnswer(inv -> realPacker.pack(inv.getArgument(0)));
   }
 
   private TableOperationsRow pendingRow(String uuid, String db, String table) {
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
index 3bef7195b..ab4dac078 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
@@ -23,25 +23,24 @@ private static TestItem item(String id, long weight) {
 
   @Test
   void emptyInput_returnsEmptyBins() {
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder().build();
+    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
     assertThat(packer.pack(List.of())).isEmpty();
   }
 
   @Test
   void singleItem_oneBin() {
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin<TestItem>> bins = packer.pack(List.of(item("a", 100L)));
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin> bins = packer.pack(List.of(item("a", 100L)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(1);
   }
 
   @Test
   void underWeightLimit_oneBin() {
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin<TestItem>> bins =
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin> bins =
         packer.pack(List.of(item("a", 300_000L), item("b", 300_000L), item("c", 300_000L)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(3);
@@ -50,9 +49,9 @@ void underWeightLimit_oneBin() {
 
   @Test
   void overWeightLimit_twoBins() {
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin<TestItem>> bins =
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
+    List<Bin> bins =
         packer.pack(List.of(item("a", 600_000L), item("b", 600_000L), item("c", 400_000L)));
     assertThat(bins).hasSize(2);
     // FFD: sort desc → 600, 600, 400. Place 600 → bin0; next 600 doesn't fit bin0, → bin1;
@@ -63,32 +62,31 @@ void overWeightLimit_twoBins() {
 
   @Test
   void itemLargerThanCap_getsOwnBin() {
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder().maxWeightPerBin(1_000L).build();
-    List<Bin<TestItem>> bins = packer.pack(List.of(item("big", 5_000L)));
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000L).build();
+    List<Bin> bins = packer.pack(List.of(item("big", 5_000L)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(1);
   }
 
   @Test
   void sortedDescending_largestFirst() {
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder().build();
-    List<Bin<TestItem>> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
+    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
+    List<Bin> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
     assertThat(bins).hasSize(1);
     List<String> ids =
-        bins.get(0).items().stream().map(TestItem::getId).collect(Collectors.toList());
+        bins.get(0).items().stream()
+            .map(TestItem.class::cast)
+            .map(TestItem::getId)
+            .collect(Collectors.toList());
     assertThat(ids).containsExactly("large", "small");
   }
 
   @Test
   void maxItemsCap_splitsBins() {
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder()
-            .maxWeightPerBin(0L)
-            .maxItemsPerBin(2)
-            .build();
-    List<Bin<TestItem>> bins =
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(0L).maxItemsPerBin(2).build();
+    List<Bin> bins =
         packer.pack(List.of(item("a", 1L), item("b", 1L), item("c", 1L), item("d", 1L)));
     assertThat(bins).hasSize(2);
     assertThat(bins.get(0).size()).isEqualTo(2);
@@ -98,12 +96,9 @@ void maxItemsCap_splitsBins() {
   @Test
   void zeroCap_disablesDimension() {
     // All caps zero → everything in one bin regardless of weight.
-    FirstFitDecreasingBinPacker<TestItem> packer =
-        FirstFitDecreasingBinPacker.<TestItem>builder()
-            .maxWeightPerBin(0L)
-            .maxItemsPerBin(0)
-            .build();
-    List<Bin<TestItem>> bins =
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(0L).maxItemsPerBin(0).build();
+    List<Bin> bins =
         packer.pack(List.of(item("a", Long.MAX_VALUE / 4), item("b", Long.MAX_VALUE / 4)));
     assertThat(bins).hasSize(1);
     assertThat(bins.get(0).size()).isEqualTo(2);

From dbe90f9b0b08ba883233c83ec92e9d639bac1b39 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 07:39:59 -0700
Subject: [PATCH 04/13] refactor(scheduler): drop FFD packer defaults +
 remaining wildcard
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two follow-ups from #626 review:

- Drop `@Builder.Default` on `FirstFitDecreasingBinPacker.maxWeightPerBin`
  (was 1_000_000L) and `.maxItemsPerBin` (was 50). "Weight" has no
  domain meaning inside the packer — picking a constant there was an
  arbitrary knob with no story attached. Callers (SchedulerConfig)
  supply the cap with units and a justification visible at the config
  site. Primitive defaults (0) carry the "disabled" sentinel meaning if
  a caller doesn't set the field, so explicit-not-required stays
  expressible.

- Change `BinPacker.pack(List<? extends BinItem>)` to
  `pack(List<BinItem>)`. The wildcard was a half-measure left from the
  prior pass — type parameters are gone but the variance marker on the
  method signature still leaked the generics shape. Callers widen at the
  call site via a stream type witness: `.<BinItem>map(op -> OfdBinItem
  .from(...))`. Docstring updated to drop the now-invalid "pass a typed
  list of a concrete impl" note.

- SchedulerRunner.scheduleOfd builds `List<BinItem>` directly via the
  type witness. Comment names the invariance reason so a future reader
  doesn't undo it.

- SchedulerRunnerTest's stubOneBinForAllItems answer cast updated to
  `inv.<List<BinItem>>getArgument(0)` to match the new signature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../optimizer/scheduler/SchedulerRunner.java         |  7 +++++--
 .../optimizer/scheduler/binpack/BinPacker.java       |  7 +++----
 .../binpack/FirstFitDecreasingBinPacker.java         | 12 ++++++++----
 .../optimizer/scheduler/SchedulerRunnerTest.java     |  4 +++-
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index 5dd2fdda6..bd6568a8f 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -154,9 +154,12 @@ public void schedule(
   private void scheduleOfd(
       BinPacker packer, List<TableOperationDto> withStats, Map<String, TableStatsDto> statsByUuid) {
 
-    List<OfdBinItem> items =
+    // Type witness on .map widens the stream element to BinItem so the collect yields
+    // List<BinItem> for the packer — Java's invariance forbids passing List<OfdBinItem>
+    // straight in.
+    List<BinItem> items =
         withStats.stream()
-            .map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))
+            .<BinItem>map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))
             .collect(Collectors.toList());
     List<Bin> bins = packer.pack(items);
     log.info("Packed {} PENDING OFD operations into {} bins", items.size(), bins.size());
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
index 15faffc0a..e7aa6381f 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
@@ -7,11 +7,10 @@
  * Implementations encode the per-bin caps and the placement algorithm; callers iterate the returned
  * bins and dispatch one batch per bin.
  *
- * <p>The input parameter uses {@code ? extends BinItem} so callers can pass a typed list of a
- * concrete impl (e.g. {@code List<OfdBinItem>}) without fighting Java's invariance. The packer sees
- * the items only as {@link BinItem}s; the per-op-type dispatcher downcasts at access time.
+ * <p>The packer sees items only as {@link BinItem}; per-op-type dispatchers narrow to their
+ * concrete impl at access time.
  */
 public interface BinPacker {
   /** Pack {@code items} into one or more bins. Each returned bin is non-empty. */
-  List<Bin> pack(List<? extends BinItem> items);
+  List<Bin> pack(List<BinItem> items);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
index f583dff78..c1e88eed6 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
@@ -14,7 +14,11 @@
  *   <li>{@code maxItemsPerBin} — number of items per bin
  * </ul>
  *
- * <p>Pass {@code 0} or a negative value for either cap to disable that dimension.
+ * <p>Both caps are explicit on construction. Neither has a default — "weight" has no domain meaning
+ * at this layer, so picking a constant here would be an arbitrary knob; callers (e.g. {@link
+ * com.linkedin.openhouse.optimizer.scheduler.config.SchedulerConfig}) supply the per-op- type cap
+ * with the unit attached and a justification at the config site. Pass {@code 0} or a negative value
+ * for either cap to disable that dimension.
  *
  * <p>An item that exceeds the weight cap on its own is placed into a bin by itself rather than
  * dropped — the scheduler never silently skips maintenance work for an oversized table.
@@ -27,11 +31,11 @@
 @Builder
 public class FirstFitDecreasingBinPacker implements BinPacker {
 
-  @Builder.Default private final long maxWeightPerBin = 1_000_000L;
-  @Builder.Default private final int maxItemsPerBin = 50;
+  private final long maxWeightPerBin;
+  private final int maxItemsPerBin;
 
   @Override
-  public List<Bin> pack(List<? extends BinItem> items) {
+  public List<Bin> pack(List<BinItem> items) {
     if (items == null || items.isEmpty()) {
       return new ArrayList<>();
     }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index 827ce01d2..3d2c23b31 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -17,6 +17,7 @@
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
@@ -91,7 +92,8 @@ private void stubFindClaimed(List<TableOperationsRow> rows) {
   private void stubOneBinForAllItems() {
     FirstFitDecreasingBinPacker realPacker =
         FirstFitDecreasingBinPacker.builder().maxWeightPerBin(0L).maxItemsPerBin(0).build();
-    when(binPacker.pack(anyList())).thenAnswer(inv -> realPacker.pack(inv.getArgument(0)));
+    when(binPacker.pack(anyList()))
+        .thenAnswer(inv -> realPacker.pack(inv.<List<BinItem>>getArgument(0)));
   }
 
   private TableOperationsRow pendingRow(String uuid, String db, String table) {

From 330533b9b6b6b9519639256699e03d5469a20a09 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 08:20:35 -0700
Subject: [PATCH 05/13] refactor(scheduler): bins schedule themselves;
 SchedulerRunner has zero op references
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bigger structural turn off #626 review. The runner is now a registration
map only; OFD lives entirely under operations/ofd/ and shows up as a
BinPacker bean. Adding a future operation type is one new @Component;
the scheduler module is untouched.

What changed:

- binpack/Bin → interface with `void schedule()`. Bins own their own
  scheduling (claim CAS, narrow to claimed, launch, mark
  SCHEDULED/PENDING).
- binpack/BinPacker → orchestration interface: `getOperationType()` plus
  `prepare(db, tableName) → List<Bin>`. Per-op-type orchestrator.
- binpack/FirstFitDecreasingBinPacker → standalone algorithm class (not
  a BinPacker in this vocabulary). Returns flat groupings
  `List<List<BinItem>>` that the per-op-type packer wraps into its own
  Bin impl. Internal `PackingBin` helper holds items + totalWeight
  during the fold so the public `Bin` interface stays minimal.
- scheduler/SchedulerRunner → rewritten as a thin dispatcher.
  Constructor takes `List<BinPacker>` via Spring injection; builds an
  immutable `Map<OperationTypeDto, BinPacker>` with `Map.copyOf`.
  `schedule(type)` calls `bp.prepare(...).forEach(Bin::schedule)`.
  Imports only `binpack.BinPacker`, `binpack.Bin`, and the model enum —
  no operations.* anywhere. Grep-able invariant.
- scheduler/config/SchedulerConfig → stripped to the shared infra
  (`WebClient`, `JobsServiceClient`, cluster id). No more OFD @Value
  fields, no more binPackers @Bean — those moved to OfdBinPacker's ctor.
- operations/ofd/OfdBinPacker → new @Component implementing BinPacker.
  Holds @Value-bound caps, an FFD instance, repos, jobs client, results
  endpoint. `prepare(...)` does load PENDING (OFD-filtered) → dedup →
  stats lookup → project to OfdBinItem → FFD pack → wrap each grouping
  in a new OfdBin.
- operations/ofd/OfdBin → new class implementing Bin. Holds the bin's
  OfdBinItems plus refs to repo/jobs client/endpoint. `@Transactional
  schedule()` does the claim CAS, partial-claim narrow, launch, mark
  SCHEDULED/PENDING. The OFD-specific job name and arg shape live here.
- operations/ofd/OfdBinItem → `currentFileCount` wraps nulls in
  Optional locally (`Optional.ofNullable(stats).map(...).map(...)
  .orElse(0L)`). DTO getters stay nullable; converting them to return
  Optional is deferred to a follow-up PR per the null-is-a-code-smell
  lesson.
- schedulerapp/SchedulerApplication → injects SchedulerRunner only;
  loops via `runner.getRegisteredOperationTypes()`.

Tests reorganized along the same lines:
- FirstFitDecreasingBinPackerTest asserts groupings (`List<List<BinItem>>`)
  instead of bins. zero-cap-disables test gone — caps are required
  positive now. maxItemsCap test uses a real positive weight cap.
- SchedulerRunnerTest slim: three tests for the dispatcher (unknown type
  throws, delegates to the right packer + schedules each returned bin,
  passes scope args through). Mock BinPacker, mock Bin, mock the
  delegation invariant.
- OfdBinPackerTest new: covers load-PENDING + dedup + stats-filter +
  groupings-to-OfdBins. Direct-constructor (no Spring).
- OfdBinTest new: covers OfdBin.schedule()'s claim + narrow + launch +
  mark paths (success, launch fails reverts to PENDING, rows already
  claimed skips, partial claim launches only the claimed subset).

Doc cleanup pass per "describe what code does, not what it doesn't do"
lesson: dropped the "no defaults" rationale and "0 disables a dimension"
sentinel from FFD's javadoc; Bin.fits (now a private PackingBin method
inside FFD) lost its `> 0` guards. Class doc reads positively now.

Global side: created ~/.claude/code-lessons.md with the null/Optional
lesson, the comment-style lesson, the no-abstract-knobs lesson, the
generics-vs-interface lesson, and the scheduler-pluggability principle.
Linked from ~/.claude/CLAUDE.md so future sessions surface them
proactively.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../scheduler/SchedulerApplication.java       |  17 +-
 .../optimizer/operations/ofd/OfdBin.java      | 124 +++++++
 .../optimizer/operations/ofd/OfdBinItem.java  |  26 +-
 .../operations/ofd/OfdBinPacker.java          | 171 +++++++++
 .../optimizer/scheduler/SchedulerRunner.java  | 286 +--------------
 .../optimizer/scheduler/binpack/Bin.java      |  47 +--
 .../scheduler/binpack/BinPacker.java          |  16 +-
 .../binpack/FirstFitDecreasingBinPacker.java  |  62 ++--
 .../scheduler/config/SchedulerConfig.java     |  34 +-
 .../operations/ofd/OfdBinPackerTest.java      | 173 +++++++++
 .../optimizer/operations/ofd/OfdBinTest.java  | 172 +++++++++
 .../scheduler/SchedulerRunnerTest.java        | 346 ++----------------
 .../FirstFitDecreasingBinPackerTest.java      | 101 ++---
 13 files changed, 798 insertions(+), 777 deletions(-)
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java
 create mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java
 create mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java

diff --git a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
index e17ecd0fc..b1f06e5d3 100644
--- a/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
+++ b/apps/optimizer/schedulerapp/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerApplication.java
@@ -1,8 +1,5 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
-import java.util.Map;
 import lombok.extern.slf4j.Slf4j;
 import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.boot.CommandLineRunner;
@@ -27,13 +24,11 @@
 public class SchedulerApplication implements CommandLineRunner, ExitCodeGenerator {
 
   private final SchedulerRunner runner;
-  private final Map<OperationTypeDto, BinPacker> binPackers;
   private int exitCode = 0;
 
   @Autowired
-  public SchedulerApplication(SchedulerRunner runner, Map<OperationTypeDto, BinPacker> binPackers) {
+  public SchedulerApplication(SchedulerRunner runner) {
     this.runner = runner;
-    this.binPackers = binPackers;
   }
 
   public static void main(String[] args) {
@@ -41,15 +36,15 @@ public static void main(String[] args) {
   }
 
   /**
-   * Runs the scheduler once per registered {@link BinPacker} per process invocation. Each call is
-   * scoped to one operation type. Any thrown exception is logged and surfaces as a non-zero exit
-   * code via {@link #getExitCode()} after the context is shut down cleanly.
+   * Runs the scheduler once per registered operation type per process invocation. Any thrown
+   * exception is logged and surfaces as a non-zero exit code via {@link #getExitCode()} after the
+   * context is shut down cleanly.
    */
   @Override
   public void run(String... args) {
     try {
-      log.info("Scheduler starting; operation types: {}", binPackers.keySet());
-      binPackers.keySet().forEach(runner::schedule);
+      log.info("Scheduler starting; operation types: {}", runner.getRegisteredOperationTypes());
+      runner.getRegisteredOperationTypes().forEach(runner::schedule);
       log.info("Scheduler completed successfully");
     } catch (Exception e) {
       log.error("Scheduler failed", e);
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java
new file mode 100644
index 000000000..6afe6ead5
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java
@@ -0,0 +1,124 @@
+package com.linkedin.openhouse.optimizer.operations.ofd;
+
+import com.linkedin.openhouse.optimizer.db.OperationStatus;
+import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
+import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
+import java.time.Instant;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Optional;
+import java.util.Set;
+import java.util.stream.Collectors;
+import lombok.extern.slf4j.Slf4j;
+import org.springframework.data.domain.Pageable;
+import org.springframework.transaction.annotation.Transactional;
+
+/**
+ * A single OFD batch: a group of operations that will be submitted together as one batched
+ * orphan-files-deletion Spark job. Claims its operations via CAS, narrows to the rows it actually
+ * owns, launches the Spark job, and marks SCHEDULED or reverts to PENDING based on launch outcome.
+ */
+@Slf4j
+public class OfdBin implements Bin {
+  private final List<OfdBinItem> items;
+  private final TableOperationsRepository operationsRepo;
+  private final JobsServiceClient jobsClient;
+  private final String resultsEndpoint;
+
+  public OfdBin(
+      List<OfdBinItem> items,
+      TableOperationsRepository operationsRepo,
+      JobsServiceClient jobsClient,
+      String resultsEndpoint) {
+    this.items = items;
+    this.operationsRepo = operationsRepo;
+    this.jobsClient = jobsClient;
+    this.resultsEndpoint = resultsEndpoint;
+  }
+
+  @Override
+  @Transactional
+  public void schedule() {
+    List<String> ids = items.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
+
+    // Claim in one batched UPDATE: PENDING → SCHEDULING. The aggregate row count alone doesn't
+    // tell us *which* rows we own; re-query for SCHEDULING rows tagged with our scheduledAt
+    // watermark to get that exact set.
+    Instant claimedAt = Instant.now();
+    operationsRepo.updateBatch(
+        ids,
+        OperationStatus.PENDING,
+        OperationStatus.SCHEDULING,
+        Optional.of(claimedAt),
+        Optional.empty());
+    List<String> claimedIds =
+        operationsRepo
+            .find(
+                Optional.empty(),
+                Optional.of(OperationStatus.SCHEDULING),
+                Optional.empty(),
+                Optional.empty(),
+                Optional.empty(),
+                Optional.of(claimedAt),
+                Optional.of(ids),
+                Pageable.unpaged())
+            .stream()
+            .map(TableOperationsRow::getId)
+            .collect(Collectors.toList());
+    if (claimedIds.isEmpty()) {
+      log.info("All rows in bin already claimed by another scheduler instance; skipping");
+      return;
+    }
+    if (claimedIds.size() < ids.size()) {
+      log.info(
+          "Partial claim: {} of {} ops in bin claimed; launching job for claimed subset only",
+          claimedIds.size(),
+          ids.size());
+    }
+
+    Set<String> claimedSet = new HashSet<>(claimedIds);
+    List<OfdBinItem> claimedItems =
+        items.stream()
+            .filter(item -> claimedSet.contains(item.getOperationId()))
+            .collect(Collectors.toList());
+    List<String> tableNames =
+        claimedItems.stream().map(OfdBinItem::getFqtn).collect(Collectors.toList());
+    List<String> operationIds =
+        claimedItems.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
+
+    String opTypeName = OperationTypeDto.ORPHAN_FILES_DELETION.name();
+    String jobName = "batched-" + opTypeName.toLowerCase() + "-" + claimedAt.toEpochMilli();
+    Optional<String> jobId =
+        jobsClient.launch(jobName, opTypeName, tableNames, operationIds, resultsEndpoint);
+
+    if (jobId.isPresent()) {
+      int updated =
+          operationsRepo.updateBatch(
+              claimedIds,
+              OperationStatus.SCHEDULING,
+              OperationStatus.SCHEDULED,
+              Optional.empty(),
+              Optional.of(jobId.get()));
+      log.info(
+          "Submitted job {} for {} tables ({} rows marked SCHEDULED)",
+          jobId.get(),
+          claimedItems.size(),
+          updated);
+    } else {
+      int reverted =
+          operationsRepo.updateBatch(
+              claimedIds,
+              OperationStatus.SCHEDULING,
+              OperationStatus.PENDING,
+              Optional.empty(),
+              Optional.empty());
+      log.warn(
+          "Job submission failed; reverted {} claimed rows back to PENDING for retry on the next"
+              + " pass",
+          reverted);
+    }
+  }
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
index a449d0c67..c145405e7 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
@@ -3,18 +3,19 @@
 import com.linkedin.openhouse.optimizer.model.TableOperationDto;
 import com.linkedin.openhouse.optimizer.model.TableStatsDto;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
+import java.util.Optional;
 import lombok.AllArgsConstructor;
 import lombok.Getter;
 import lombok.NonNull;
 import lombok.ToString;
 
 /**
- * OFD-specific {@link BinItem}: carries only what the downstream Spark dispatch needs (table fqtn,
- * operation id) plus the weight the packer uses (current file count). Self-weights from a paired
+ * OFD-specific {@link BinItem}: carries the table fqtn and operation id the downstream Spark
+ * dispatch needs, plus the weight (current file count) the packer uses. Self-weights from a paired
  * {@link TableOperationDto} and {@link TableStatsDto} via {@link #from(TableOperationDto,
- * TableStatsDto)} so the projection logic lives here rather than in the scheduler.
+ * TableStatsDto)}.
  *
- * <p>The weighting choice — file count, not bytes — reflects what makes OFD expensive: per-file
+ * <p>Weighting choice — file count, not bytes — reflects what makes OFD expensive: per-file
  * listing, manifest joins, and delete calls scale with file count. A 10 GB table with 100k files is
  * more expensive to OFD than a 1 TB table with 2k files.
  */
@@ -35,21 +36,20 @@ public class OfdBinItem implements BinItem {
   private final long weight;
 
   /**
-   * Project a pending operation + its stats row into a packable item. Callers do {@code
-   * pendingOps.stream().map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))} — the
-   * weighting decision lives entirely in this class.
+   * Project a pending operation + its stats row into a packable item. Weighting lives entirely in
+   * this class — callers do {@code pendingOps.stream().map(op -> OfdBinItem.from(op,
+   * statsByUuid.get(op.getTableUuid())))}.
    */
-  public static OfdBinItem from(TableOperationDto op, TableStatsDto stats) {
+  public static OfdBinItem from(@NonNull TableOperationDto op, TableStatsDto stats) {
     return new OfdBinItem(
         op.getDatabaseName() + "." + op.getTableName(), op.getId(), currentFileCount(stats));
   }
 
   private static long currentFileCount(TableStatsDto stats) {
-    if (stats == null || stats.getSnapshot() == null) {
-      return 0L;
-    }
-    Long files = stats.getSnapshot().getNumCurrentFiles();
-    return files != null ? files : 0L;
+    return Optional.ofNullable(stats)
+        .map(TableStatsDto::getSnapshot)
+        .map(TableStatsDto.SnapshotMetrics::getNumCurrentFiles)
+        .orElse(0L);
   }
 
   @Override
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java
new file mode 100644
index 000000000..e538bf133
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java
@@ -0,0 +1,171 @@
+package com.linkedin.openhouse.optimizer.operations.ofd;
+
+import com.linkedin.openhouse.optimizer.db.OperationStatus;
+import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
+import com.linkedin.openhouse.optimizer.db.TableStatsRow;
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
+import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+import java.util.stream.Collectors;
+import lombok.extern.slf4j.Slf4j;
+import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.beans.factory.annotation.Value;
+import org.springframework.data.domain.Pageable;
+import org.springframework.stereotype.Component;
+
+/**
+ * Per-cycle OFD orchestrator. Loads PENDING OFD operations, deduplicates duplicates per cycle,
+ * joins each to its stats row, projects into {@link OfdBinItem}, asks {@link
+ * FirstFitDecreasingBinPacker} to group them, and returns each grouping wrapped in an {@link
+ * OfdBin} that knows how to schedule itself.
+ */
+@Slf4j
+@Component
+public class OfdBinPacker implements BinPacker {
+
+  private final FirstFitDecreasingBinPacker ffd;
+  private final TableOperationsRepository operationsRepo;
+  private final TableStatsRepository statsRepo;
+  private final JobsServiceClient jobsClient;
+  private final String resultsEndpoint;
+
+  @Autowired
+  public OfdBinPacker(
+      @Value("${optimizer.scheduler.ofd.max-files-per-bin}") long maxFilesPerBin,
+      @Value("${optimizer.scheduler.ofd.max-tables-per-bin}") int maxTablesPerBin,
+      TableOperationsRepository operationsRepo,
+      TableStatsRepository statsRepo,
+      JobsServiceClient jobsClient,
+      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint) {
+    this.ffd =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(maxFilesPerBin)
+            .maxItemsPerBin(maxTablesPerBin)
+            .build();
+    this.operationsRepo = operationsRepo;
+    this.statsRepo = statsRepo;
+    this.jobsClient = jobsClient;
+    this.resultsEndpoint = resultsEndpoint;
+  }
+
+  @Override
+  public OperationTypeDto getOperationType() {
+    return OperationTypeDto.ORPHAN_FILES_DELETION;
+  }
+
+  @Override
+  public List<Bin> prepare(Optional<String> databaseName, Optional<String> tableName) {
+    // Unpaged: a single-page truncation would silently drop work past page 0 (next cycle would
+    // re-load the same first page in MySQL row order, leaving the tail unscheduled until the
+    // ordering shifts). Correctness here requires the full PENDING set in one cycle; the working
+    // set is bounded by count(PENDING for OFD).
+    List<TableOperationsRow> pendingRows =
+        operationsRepo.find(
+            Optional.of(OperationTypeDto.ORPHAN_FILES_DELETION.toDb()),
+            Optional.of(OperationStatus.PENDING),
+            Optional.empty(),
+            databaseName,
+            tableName,
+            Optional.empty(),
+            Optional.empty(),
+            Pageable.unpaged());
+    if (pendingRows.isEmpty()) {
+      log.info("No PENDING OFD operations; nothing to prepare");
+      return List.of();
+    }
+
+    // Deduplicate before claiming: if multiple PENDING rows exist for the same tableUuid, keep
+    // the oldest (lex-tiebreak on id) and cancel the rest. Per-cycle, not per-bin.
+    List<TableOperationsRow> survivors = cancelDuplicates(pendingRows);
+    if (survivors.isEmpty()) {
+      return List.of();
+    }
+
+    List<TableOperationDto> pending =
+        survivors.stream().map(TableOperationDto::fromRow).collect(Collectors.toList());
+
+    // Fetch fresh stats this cycle (one batched query) rather than denormalizing onto
+    // TableOperationDto. Smaller op rows, fresher cost data.
+    Set<String> uuids =
+        pending.stream().map(TableOperationDto::getTableUuid).collect(Collectors.toSet());
+    Map<String, TableStatsDto> statsByUuid =
+        statsRepo.findAllById(uuids).stream()
+            .collect(Collectors.toMap(TableStatsRow::getTableUuid, TableStatsDto::fromRow));
+
+    // Filter at the boundary so every projection is built from a known-non-null stats row. A
+    // table without a stats row gets skipped this cycle and reconsidered after stats land.
+    List<TableOperationDto> withStats =
+        pending.stream()
+            .filter(op -> statsByUuid.containsKey(op.getTableUuid()))
+            .collect(Collectors.toList());
+    if (withStats.size() < pending.size()) {
+      log.warn(
+          "Skipped {} OFD operations with no table_stats row", pending.size() - withStats.size());
+    }
+    if (withStats.isEmpty()) {
+      return List.of();
+    }
+
+    List<BinItem> items =
+        withStats.stream()
+            .<BinItem>map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))
+            .collect(Collectors.toList());
+
+    List<List<BinItem>> groupings = ffd.pack(items);
+    log.info("Prepared {} PENDING OFD operations into {} bins", items.size(), groupings.size());
+
+    return groupings.stream().map(this::toOfdBin).collect(Collectors.toList());
+  }
+
+  private Bin toOfdBin(List<BinItem> grouping) {
+    List<OfdBinItem> ofdItems =
+        grouping.stream().map(OfdBinItem.class::cast).collect(Collectors.toList());
+    return new OfdBin(ofdItems, operationsRepo, jobsClient, resultsEndpoint);
+  }
+
+  /**
+   * Group {@code pendingRows} by {@code tableUuid}; for any group with more than one row, cancel
+   * all but the oldest (lex-tiebreak on id). Returns the survivors in input order. Deterministic.
+   */
+  private List<TableOperationsRow> cancelDuplicates(List<TableOperationsRow> pendingRows) {
+    Map<String, List<TableOperationsRow>> byTableUuid =
+        pendingRows.stream().collect(Collectors.groupingBy(TableOperationsRow::getTableUuid));
+
+    List<String> duplicateIds =
+        byTableUuid.values().stream()
+            .filter(rows -> rows.size() > 1)
+            .flatMap(
+                rows ->
+                    rows.stream()
+                        .sorted(
+                            Comparator.comparing(TableOperationsRow::getCreatedAt)
+                                .thenComparing(TableOperationsRow::getId))
+                        .skip(1))
+            .map(TableOperationsRow::getId)
+            .collect(Collectors.toList());
+
+    if (duplicateIds.isEmpty()) {
+      return pendingRows;
+    }
+
+    int cancelled = operationsRepo.cancel(duplicateIds);
+    log.warn("Cancelled {} duplicate PENDING rows", cancelled);
+
+    Set<String> cancelledIds = Set.copyOf(duplicateIds);
+    return pendingRows.stream()
+        .filter(r -> !cancelledIds.contains(r.getId()))
+        .collect(Collectors.toList());
+  }
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index bd6568a8f..441ff577e 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -1,297 +1,51 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
-import com.linkedin.openhouse.optimizer.db.OperationStatus;
-import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
-import com.linkedin.openhouse.optimizer.db.TableStatsRow;
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem;
-import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
-import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.time.Instant;
-import java.util.Comparator;
-import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
 import java.util.Set;
+import java.util.function.Function;
 import java.util.stream.Collectors;
 import lombok.extern.slf4j.Slf4j;
-import org.springframework.beans.factory.annotation.Value;
-import org.springframework.data.domain.Pageable;
+import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.stereotype.Component;
-import org.springframework.transaction.annotation.Transactional;
 
 /**
- * For one operation type per call, reads PENDING rows, looks up per-table stats, projects each into
- * the op-type's {@link BinItem} impl, dispatches to the registered {@link BinPacker}, and submits
- * one Spark job per returned {@link Bin}. The {@link SchedulerApplication}'s CommandLineRunner
- * loops over the registered packers and invokes {@code schedule(opType)} for each.
- *
- * <p>The runner owns all optimizer-specific orchestration — claim CAS, status transitions, and the
- * actual {@link JobsServiceClient#launch} call. Per-op-type projection (build the right {@link
- * BinItem} impl from an op + stats) and dispatch live in op-specific sub-methods; today there is
- * only OFD, and the per-op switch is a TODO to factor into an {@code OperationScheduler} handler
- * once a second op type lands.
+ * Looks up the {@link BinPacker} registered for an operation type, asks it to prepare the bins for
+ * this cycle, and lets each bin schedule itself. The runner holds an immutable {@code
+ * OperationTypeDto -> BinPacker} map populated at construction by Spring injection; it doesn't know
+ * which operations exist beyond what's in that map.
  */
 @Slf4j
 @Component
 public class SchedulerRunner {
-  private final TableOperationsRepository operationsRepo;
-  private final TableStatsRepository statsRepo;
-  private final JobsServiceClient jobsClient;
   private final Map<OperationTypeDto, BinPacker> binPackers;
-  private final String resultsEndpoint;
 
-  public SchedulerRunner(
-      TableOperationsRepository operationsRepo,
-      TableStatsRepository statsRepo,
-      JobsServiceClient jobsClient,
-      Map<OperationTypeDto, BinPacker> binPackers,
-      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint) {
-    this.operationsRepo = operationsRepo;
-    this.statsRepo = statsRepo;
-    this.jobsClient = jobsClient;
-    this.binPackers = binPackers;
-    this.resultsEndpoint = resultsEndpoint;
+  @Autowired
+  public SchedulerRunner(List<BinPacker> binPackers) {
+    this.binPackers =
+        Map.copyOf(
+            binPackers.stream()
+                .collect(Collectors.toMap(BinPacker::getOperationType, Function.identity())));
   }
 
-  /** Schedule all PENDING operations of the given type across all databases. */
-  @Transactional
-  public void schedule(OperationTypeDto operationType) {
-    schedule(operationType, Optional.empty(), Optional.empty());
+  public void schedule(OperationTypeDto type) {
+    schedule(type, Optional.empty(), Optional.empty());
   }
 
-  /**
-   * Schedule PENDING operations for {@code operationType}, optionally scoped to a single database
-   * or table name.
-   */
-  @Transactional
   public void schedule(
-      OperationTypeDto operationType, Optional<String> databaseName, Optional<String> tableName) {
-
-    BinPacker packer = binPackers.get(operationType);
+      OperationTypeDto type, Optional<String> databaseName, Optional<String> tableName) {
+    BinPacker packer = binPackers.get(type);
     if (packer == null) {
-      throw new IllegalStateException(
-          "No BinPacker registered for operation type " + operationType);
-    }
-
-    // Unpaged: a single-page truncation would silently drop work past page 0 (next cycle would
-    // re-load the same first page in MySQL row order, leaving the tail unscheduled until the
-    // ordering shifts). Correctness here requires the full PENDING set in one cycle; the working
-    // set is bounded by count(PENDING for this op type).
-    List<TableOperationsRow> pendingRows =
-        operationsRepo.find(
-            Optional.of(operationType.toDb()),
-            Optional.of(OperationStatus.PENDING),
-            Optional.empty(),
-            databaseName,
-            tableName,
-            Optional.empty(),
-            Optional.empty(),
-            Pageable.unpaged());
-    if (pendingRows.isEmpty()) {
-      log.info("No PENDING operations of type {}; nothing to schedule", operationType);
-      return;
-    }
-
-    // Deduplicate before claiming: if multiple PENDING rows exist for the same tableUuid, keep
-    // the oldest (lex-tiebreak on id) and cancel the rest. Per-cycle, not per-bin — running this
-    // inside the bin loop nuked rows belonging to other bins of the same cycle.
-    List<TableOperationsRow> survivors = cancelDuplicates(pendingRows);
-    if (survivors.isEmpty()) {
-      return;
-    }
-
-    List<TableOperationDto> pending =
-        survivors.stream().map(TableOperationDto::fromRow).collect(Collectors.toList());
-
-    // Tradeoff: we fetch fresh table_stats per scheduling cycle (one batched query) rather than
-    // denormalizing the relevant fields onto TableOperationDto. The denormalized alternative
-    // would remove the per-cycle lookup but widen the TableOperationDto row and serve staler
-    // data; the current shape favors smaller operations + freshness over fewer queries.
-    Set<String> uuids =
-        pending.stream().map(TableOperationDto::getTableUuid).collect(Collectors.toSet());
-    Map<String, TableStatsDto> statsByUuid =
-        statsRepo.findAllById(uuids).stream()
-            .collect(Collectors.toMap(TableStatsRow::getTableUuid, TableStatsDto::fromRow));
-
-    // Filter at the boundary so every projection is built from a known-non-null stats row. A
-    // table without a stats row gets skipped this cycle and reconsidered after stats land.
-    List<TableOperationDto> withStats =
-        pending.stream()
-            .filter(op -> statsByUuid.containsKey(op.getTableUuid()))
-            .collect(Collectors.toList());
-    if (withStats.size() < pending.size()) {
-      log.warn(
-          "Skipped {} {} operations with no table_stats row",
-          pending.size() - withStats.size(),
-          operationType);
-    }
-    if (withStats.isEmpty()) {
-      return;
-    }
-
-    // TODO: when a second op type lands, factor each branch into an OperationScheduler handler
-    // (own projection + own submit). Today's switch is the only place that knows the concrete
-    // BinItem impl per op type; the downcasts inside submitOfdBin are safe by SchedulerConfig's
-    // registration invariant (the packer for ORPHAN_FILES_DELETION is fed OfdBinItem instances).
-    switch (operationType) {
-      case ORPHAN_FILES_DELETION:
-        scheduleOfd(packer, withStats, statsByUuid);
-        return;
-      default:
-        throw new IllegalStateException(
-            "No scheduling handler for operation type " + operationType);
-    }
-  }
-
-  private void scheduleOfd(
-      BinPacker packer, List<TableOperationDto> withStats, Map<String, TableStatsDto> statsByUuid) {
-
-    // Type witness on .map widens the stream element to BinItem so the collect yields
-    // List<BinItem> for the packer — Java's invariance forbids passing List<OfdBinItem>
-    // straight in.
-    List<BinItem> items =
-        withStats.stream()
-            .<BinItem>map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))
-            .collect(Collectors.toList());
-    List<Bin> bins = packer.pack(items);
-    log.info("Packed {} PENDING OFD operations into {} bins", items.size(), bins.size());
-
-    bins.forEach(this::submitOfdBin);
-  }
-
-  /**
-   * Group {@code pendingRows} by {@code tableUuid}; for any group with more than one row, cancel
-   * all but the oldest (lex-tiebreak on id). Returns the survivors in input order. Deterministic.
-   */
-  private List<TableOperationsRow> cancelDuplicates(List<TableOperationsRow> pendingRows) {
-    Map<String, List<TableOperationsRow>> byTableUuid =
-        pendingRows.stream().collect(Collectors.groupingBy(TableOperationsRow::getTableUuid));
-
-    List<String> duplicateIds =
-        byTableUuid.values().stream()
-            .filter(rows -> rows.size() > 1)
-            .flatMap(
-                rows ->
-                    rows.stream()
-                        .sorted(
-                            Comparator.comparing(TableOperationsRow::getCreatedAt)
-                                .thenComparing(TableOperationsRow::getId))
-                        .skip(1))
-            .map(TableOperationsRow::getId)
-            .collect(Collectors.toList());
-
-    if (duplicateIds.isEmpty()) {
-      return pendingRows;
+      throw new IllegalStateException("No BinPacker registered for operation type " + type);
     }
-
-    int cancelled = operationsRepo.cancel(duplicateIds);
-    log.warn("Cancelled {} duplicate PENDING rows", cancelled);
-
-    Set<String> cancelledIds = Set.copyOf(duplicateIds);
-    return pendingRows.stream()
-        .filter(r -> !cancelledIds.contains(r.getId()))
-        .collect(Collectors.toList());
+    packer.prepare(databaseName, tableName).forEach(Bin::schedule);
   }
 
-  /**
-   * Claim a bin of OFD work, narrow to the rows actually claimed, launch the batched Spark job for
-   * the claimed subset, and mark them SCHEDULED — or revert to PENDING if launch failed. Items in
-   * the bin are typed as {@link BinItem}; we narrow once to {@link OfdBinItem} on entry since this
-   * method runs only on bins produced by the OFD packer (see {@link #schedule(OperationTypeDto,
-   * Optional, Optional)}).
-   */
-  private void submitOfdBin(Bin bin) {
-    List<OfdBinItem> ofdItems =
-        bin.items().stream().map(OfdBinItem.class::cast).collect(Collectors.toList());
-    List<String> ids =
-        ofdItems.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
-
-    // Claim in one batched UPDATE: PENDING → SCHEDULING. Aggregate row count alone doesn't tell us
-    // *which* rows we own — re-query for SCHEDULING rows tagged with our scheduledAt watermark.
-    // Anything not in that subset belongs to another instance or was canceled, and must not be
-    // submitted or marked SCHEDULED.
-    Instant claimedAt = Instant.now();
-    operationsRepo.updateBatch(
-        ids,
-        OperationStatus.PENDING,
-        OperationStatus.SCHEDULING,
-        Optional.of(claimedAt),
-        Optional.empty());
-    // Unpaged: the result set is bounded by ids.size() (the bin we just claimed).
-    List<String> claimedIds =
-        operationsRepo
-            .find(
-                Optional.empty(),
-                Optional.of(OperationStatus.SCHEDULING),
-                Optional.empty(),
-                Optional.empty(),
-                Optional.empty(),
-                Optional.of(claimedAt),
-                Optional.of(ids),
-                Pageable.unpaged())
-            .stream()
-            .map(TableOperationsRow::getId)
-            .collect(Collectors.toList());
-    if (claimedIds.isEmpty()) {
-      log.info("All rows in bin already claimed by another scheduler instance; skipping");
-      return;
-    }
-    if (claimedIds.size() < ids.size()) {
-      log.info(
-          "Partial claim: {} of {} ops in bin claimed; launching job for claimed subset only",
-          claimedIds.size(),
-          ids.size());
-    }
-
-    // Narrow the bin's items to the rows we actually own before extracting Spark-args.
-    Set<String> claimedSet = new HashSet<>(claimedIds);
-    List<OfdBinItem> claimedItems =
-        ofdItems.stream()
-            .filter(item -> claimedSet.contains(item.getOperationId()))
-            .collect(Collectors.toList());
-    List<String> tableNames =
-        claimedItems.stream().map(OfdBinItem::getFqtn).collect(Collectors.toList());
-    List<String> operationIds =
-        claimedItems.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
-
-    String operationTypeName = OperationTypeDto.ORPHAN_FILES_DELETION.name();
-    String jobName = "batched-" + operationTypeName.toLowerCase() + "-" + claimedAt.toEpochMilli();
-    Optional<String> jobId =
-        jobsClient.launch(jobName, operationTypeName, tableNames, operationIds, resultsEndpoint);
-
-    if (jobId.isPresent()) {
-      int updated =
-          operationsRepo.updateBatch(
-              claimedIds,
-              OperationStatus.SCHEDULING,
-              OperationStatus.SCHEDULED,
-              Optional.empty(),
-              Optional.of(jobId.get()));
-      log.info(
-          "Submitted job {} for {} tables ({} rows marked SCHEDULED)",
-          jobId.get(),
-          claimedItems.size(),
-          updated);
-    } else {
-      int reverted =
-          operationsRepo.updateBatch(
-              claimedIds,
-              OperationStatus.SCHEDULING,
-              OperationStatus.PENDING,
-              Optional.empty(),
-              Optional.empty());
-      log.warn(
-          "Job submission failed; reverted {} claimed rows back to PENDING for retry on the next"
-              + " pass",
-          reverted);
-    }
+  public Set<OperationTypeDto> getRegisteredOperationTypes() {
+    return binPackers.keySet();
   }
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
index 5ee0dbbe3..e3dad4410 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
@@ -1,48 +1,9 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.List;
-import lombok.Getter;
-import lombok.ToString;
-
 /**
- * Mutable accumulator used by a {@link BinPacker} while assembling a batch. Callers receiving a
- * packed list of {@code Bin}s treat them as read-only — {@link #items()} returns an unmodifiable
- * view and the running total is exposed only via the getter.
- *
- * <p>Items are typed at the interface level only ({@link BinItem}). Callers that need the concrete
- * impl downcast at the access site; the per-op-type dispatcher owns that contract.
+ * A schedulable unit produced by a {@link BinPacker}. Each bin owns the work for a single Spark job
+ * — claiming the operations it covers, launching, and recording the outcome.
  */
-@ToString
-public class Bin {
-  private final List<BinItem> items = new ArrayList<>();
-  @Getter private long totalWeight;
-
-  /**
-   * Returns true iff adding {@code item} keeps the bin at or below both caps. A cap of {@code <= 0}
-   * disables that dimension.
-   */
-  boolean fits(BinItem item, long maxWeight, int maxItems) {
-    if (maxItems > 0 && items.size() >= maxItems) {
-      return false;
-    }
-    if (maxWeight > 0 && totalWeight + item.getWeight() > maxWeight) {
-      return false;
-    }
-    return true;
-  }
-
-  void add(BinItem item) {
-    items.add(item);
-    totalWeight += item.getWeight();
-  }
-
-  public List<BinItem> items() {
-    return Collections.unmodifiableList(items);
-  }
-
-  public int size() {
-    return items.size();
-  }
+public interface Bin {
+  void schedule();
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
index e7aa6381f..56ba78f06 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
@@ -1,16 +1,16 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import java.util.List;
+import java.util.Optional;
 
 /**
- * Strategy interface for grouping a flat list of {@link BinItem}s into one or more {@link Bin}s.
- * Implementations encode the per-bin caps and the placement algorithm; callers iterate the returned
- * bins and dispatch one batch per bin.
- *
- * <p>The packer sees items only as {@link BinItem}; per-op-type dispatchers narrow to their
- * concrete impl at access time.
+ * Per-operation-type orchestrator the scheduler dispatches to. The packer loads its PENDING work,
+ * groups it into batches, and returns a {@link Bin} for each batch. The scheduler then asks each
+ * bin to {@link Bin#schedule() schedule} itself.
  */
 public interface BinPacker {
-  /** Pack {@code items} into one or more bins. Each returned bin is non-empty. */
-  List<Bin> pack(List<BinItem> items);
+  OperationTypeDto getOperationType();
+
+  List<Bin> prepare(Optional<String> databaseName, Optional<String> tableName);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
index c1e88eed6..7a6b9275e 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
@@ -7,65 +7,57 @@
 import lombok.extern.slf4j.Slf4j;
 
 /**
- * First-fit-decreasing bin packer with two independent caps:
+ * First-fit-decreasing packing algorithm. Sorts items by weight descending and places each into the
+ * first group whose running totals stay at or below {@code maxWeightPerBin} and {@code
+ * maxItemsPerBin}. An item that exceeds the weight cap on its own goes into a group by itself.
  *
- * <ul>
- *   <li>{@code maxWeightPerBin} — total {@link BinItem#getWeight()} per bin
- *   <li>{@code maxItemsPerBin} — number of items per bin
- * </ul>
- *
- * <p>Both caps are explicit on construction. Neither has a default — "weight" has no domain meaning
- * at this layer, so picking a constant here would be an arbitrary knob; callers (e.g. {@link
- * com.linkedin.openhouse.optimizer.scheduler.config.SchedulerConfig}) supply the per-op- type cap
- * with the unit attached and a justification at the config site. Pass {@code 0} or a negative value
- * for either cap to disable that dimension.
- *
- * <p>An item that exceeds the weight cap on its own is placed into a bin by itself rather than
- * dropped — the scheduler never silently skips maintenance work for an oversized table.
- *
- * <p>The pack body is one stream pipeline: sort decreasing by weight, then fold each item into the
- * running list of bins via {@code Stream.collect(Supplier, BiConsumer, BiConsumer)} — the idiomatic
- * shape for an FFD-style stateful collect.
+ * <p>Returns flat groupings ({@code List<List<BinItem>>}). Callers wrap each grouping into the
+ * {@link Bin} implementation they need for their operation type.
  */
 @Slf4j
 @Builder
-public class FirstFitDecreasingBinPacker implements BinPacker {
+public class FirstFitDecreasingBinPacker {
 
   private final long maxWeightPerBin;
   private final int maxItemsPerBin;
 
-  @Override
-  public List<Bin> pack(List<BinItem> items) {
+  public List<List<BinItem>> pack(List<BinItem> items) {
     if (items == null || items.isEmpty()) {
       return new ArrayList<>();
     }
-    List<Bin> bins =
+    List<PackingBin> bins =
         items.stream()
             .sorted(Comparator.comparingLong(BinItem::getWeight).reversed())
             .collect(ArrayList::new, this::placeItem, List::addAll);
-    log.info("Packed {} items into {} bins", items.size(), bins.size());
-    return bins;
+    log.info("Packed {} items into {} groupings", items.size(), bins.size());
+    return bins.stream().map(b -> b.items).collect(java.util.stream.Collectors.toList());
   }
 
-  /**
-   * Place {@code item} into the first bin that can hold it; if none, open a fresh bin. Mutates
-   * {@code bins} — used as the accumulator step of the {@code pack} fold.
-   */
-  private void placeItem(List<Bin> bins, BinItem item) {
+  private void placeItem(List<PackingBin> bins, BinItem item) {
     bins.stream()
         .filter(b -> b.fits(item, maxWeightPerBin, maxItemsPerBin))
         .findFirst()
         .ifPresentOrElse(
             b -> b.add(item),
             () -> {
-              Bin fresh = new Bin();
-              if (!fresh.fits(item, maxWeightPerBin, maxItemsPerBin)) {
-                log.warn(
-                    "Item exceeds per-bin caps on its own; placing in dedicated bin: weight={}",
-                    item.getWeight());
-              }
+              PackingBin fresh = new PackingBin();
               fresh.add(item);
               bins.add(fresh);
             });
   }
+
+  /** Per-bin running-totals helper used during the fold. Hidden from callers. */
+  private static class PackingBin {
+    final List<BinItem> items = new ArrayList<>();
+    long totalWeight;
+
+    boolean fits(BinItem item, long maxWeight, int maxItems) {
+      return items.size() < maxItems && totalWeight + item.getWeight() <= maxWeight;
+    }
+
+    void add(BinItem item) {
+      items.add(item);
+      totalWeight += item.getWeight();
+    }
+  }
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index 5bb63eee0..be2f97cf7 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -1,15 +1,16 @@
 package com.linkedin.openhouse.optimizer.scheduler.config;
 
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.util.Map;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.context.annotation.Bean;
 import org.springframework.context.annotation.Configuration;
 import org.springframework.web.reactive.function.client.WebClient;
 
+/**
+ * Cross-cutting wiring shared across operation types: the jobs-service HTTP client and its cluster
+ * id. Per-operation configuration (caps, projection logic, launch args) lives with the operation's
+ * own {@link com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker} implementation.
+ */
 @Configuration
 public class SchedulerConfig {
 
@@ -19,14 +20,6 @@ public class SchedulerConfig {
   @Value("${optimizer.scheduler.cluster-id}")
   private String clusterId;
 
-  /** Max table-current-file-count summed across one batched OFD Spark job. 0 disables. */
-  @Value("${optimizer.scheduler.ofd.max-files-per-bin:1000000}")
-  private long ofdMaxFilesPerBin;
-
-  /** Max number of tables per batched OFD Spark job. 0 disables. */
-  @Value("${optimizer.scheduler.ofd.max-tables-per-bin:50}")
-  private int ofdMaxTablesPerBin;
-
   @Bean
   public WebClient jobsWebClient() {
     return WebClient.builder().baseUrl(jobsBaseUri).build();
@@ -36,21 +29,4 @@ public WebClient jobsWebClient() {
   public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
     return new JobsServiceClient(jobsWebClient, clusterId);
   }
-
-  /**
-   * Map of {@link OperationTypeDto} to the {@link BinPacker} strategy that handles it. The packer
-   * is non-generic and operates on {@code BinItem} at the interface level; per-op-type dispatchers
-   * in {@link com.linkedin.openhouse.optimizer.scheduler.SchedulerRunner} narrow to their concrete
-   * impl at access time. Adding a new operation type means adding an entry here, an impl of {@code
-   * BinItem}, and a {@code scheduleXxx} branch in the runner.
-   */
-  @Bean
-  public Map<OperationTypeDto, BinPacker> binPackers() {
-    return Map.of(
-        OperationTypeDto.ORPHAN_FILES_DELETION,
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(ofdMaxFilesPerBin)
-            .maxItemsPerBin(ofdMaxTablesPerBin)
-            .build());
-  }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java
new file mode 100644
index 000000000..4d5d1bba8
--- /dev/null
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java
@@ -0,0 +1,173 @@
+package com.linkedin.openhouse.optimizer.operations.ofd;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.ArgumentMatchers.anyList;
+import static org.mockito.ArgumentMatchers.eq;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+import com.linkedin.openhouse.optimizer.db.OperationStatus;
+import com.linkedin.openhouse.optimizer.db.SnapshotMetrics;
+import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
+import com.linkedin.openhouse.optimizer.db.TableStatsRow;
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
+import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
+import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
+import java.time.Instant;
+import java.util.List;
+import java.util.Optional;
+import java.util.UUID;
+import org.junit.jupiter.api.BeforeEach;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.mockito.ArgumentCaptor;
+import org.mockito.Mock;
+import org.mockito.junit.jupiter.MockitoExtension;
+
+@ExtendWith(MockitoExtension.class)
+class OfdBinPackerTest {
+
+  private static final com.linkedin.openhouse.optimizer.db.OperationType OFD_DB =
+      com.linkedin.openhouse.optimizer.db.OperationType.ORPHAN_FILES_DELETION;
+  private static final String RESULTS_ENDPOINT = "http://localhost:8080/v1/optimizer/operations";
+  private static final long MAX_FILES_PER_BIN = 1_000_000L;
+  private static final int MAX_TABLES_PER_BIN = 50;
+
+  @Mock private TableOperationsRepository operationsRepo;
+  @Mock private TableStatsRepository statsRepo;
+  @Mock private JobsServiceClient jobsClient;
+
+  private OfdBinPacker packer;
+
+  @BeforeEach
+  void setUp() {
+    packer =
+        new OfdBinPacker(
+            MAX_FILES_PER_BIN,
+            MAX_TABLES_PER_BIN,
+            operationsRepo,
+            statsRepo,
+            jobsClient,
+            RESULTS_ENDPOINT);
+  }
+
+  // ---- Helpers ----
+
+  private void stubFindPending(List<TableOperationsRow> rows) {
+    when(operationsRepo.find(
+            eq(Optional.of(OFD_DB)),
+            eq(Optional.of(OperationStatus.PENDING)),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            any()))
+        .thenReturn(rows);
+  }
+
+  private TableOperationsRow pendingRow(String uuid, String db, String table) {
+    return TableOperationsRow.builder()
+        .id(UUID.randomUUID().toString())
+        .tableUuid(uuid)
+        .databaseName(db)
+        .tableName(table)
+        .operationType(OFD_DB)
+        .status(OperationStatus.PENDING)
+        .createdAt(Instant.now())
+        .build();
+  }
+
+  private TableStatsRow statsRow(String uuid, long numCurrentFiles) {
+    return TableStatsRow.builder()
+        .tableUuid(uuid)
+        .snapshot(SnapshotMetrics.builder().numCurrentFiles(numCurrentFiles).build())
+        .build();
+  }
+
+  // ---- Tests ----
+
+  @Test
+  void prepare_noPending_returnsEmpty() {
+    stubFindPending(List.of());
+
+    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
+
+    assertThat(bins).isEmpty();
+    verify(statsRepo, never()).findAllById(any());
+  }
+
+  @Test
+  void prepare_allOpsWithoutStats_returnsEmpty() {
+    TableOperationsRow row = pendingRow(UUID.randomUUID().toString(), "db1", "tbl1");
+    stubFindPending(List.of(row));
+    when(statsRepo.findAllById(any())).thenReturn(List.of());
+
+    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
+
+    assertThat(bins).isEmpty();
+  }
+
+  @Test
+  void prepare_singleOpWithStats_returnsOneBin() {
+    String uuid = UUID.randomUUID().toString();
+    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
+    stubFindPending(List.of(row));
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
+
+    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
+
+    assertThat(bins).hasSize(1);
+  }
+
+  @Test
+  void prepare_cancelsDuplicatePendingPerCycle() {
+    String uuid = UUID.randomUUID().toString();
+    TableOperationsRow row1 = pendingRow(uuid, "db1", "tbl1");
+    TableOperationsRow row2 = pendingRow(uuid, "db1", "tbl1");
+    stubFindPending(List.of(row1, row2));
+    when(operationsRepo.cancel(anyList())).thenReturn(1);
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
+
+    packer.prepare(Optional.empty(), Optional.empty());
+
+    ArgumentCaptor<List<String>> cancelled = ArgumentCaptor.forClass(List.class);
+    verify(operationsRepo).cancel(cancelled.capture());
+    assertThat(cancelled.getValue()).hasSize(1);
+  }
+
+  @Test
+  void prepare_skipsOpsWithoutStats_includesOnlyThoseWithStats() {
+    String withStats = UUID.randomUUID().toString();
+    String missing = UUID.randomUUID().toString();
+    TableOperationsRow withStatsRow = pendingRow(withStats, "db1", "tblA");
+    TableOperationsRow missingRow = pendingRow(missing, "db1", "tblB");
+    stubFindPending(List.of(withStatsRow, missingRow));
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(withStats, 50L)));
+
+    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
+
+    assertThat(bins).hasSize(1);
+  }
+
+  @Test
+  void prepare_packerReturnsBinsThatAreOfdBins() {
+    String uuid = UUID.randomUUID().toString();
+    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
+    stubFindPending(List.of(row));
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
+
+    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
+
+    assertThat(bins).allMatch(b -> b instanceof OfdBin);
+  }
+
+  @Test
+  void getOperationType_returnsOrphanFilesDeletion() {
+    assertThat(packer.getOperationType()).isEqualTo(OperationTypeDto.ORPHAN_FILES_DELETION);
+  }
+}
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java
new file mode 100644
index 000000000..ac1700f1e
--- /dev/null
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java
@@ -0,0 +1,172 @@
+package com.linkedin.openhouse.optimizer.operations.ofd;
+
+import static org.assertj.core.api.Assertions.assertThat;
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.ArgumentMatchers.anyList;
+import static org.mockito.ArgumentMatchers.anyString;
+import static org.mockito.ArgumentMatchers.eq;
+import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+import com.linkedin.openhouse.optimizer.db.OperationStatus;
+import com.linkedin.openhouse.optimizer.db.OperationType;
+import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
+import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
+import java.time.Instant;
+import java.util.List;
+import java.util.Optional;
+import java.util.UUID;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.mockito.ArgumentCaptor;
+import org.mockito.Mock;
+import org.mockito.junit.jupiter.MockitoExtension;
+
+@ExtendWith(MockitoExtension.class)
+class OfdBinTest {
+
+  private static final String RESULTS_ENDPOINT = "http://localhost:8080/v1/optimizer/operations";
+
+  @Mock private TableOperationsRepository operationsRepo;
+  @Mock private JobsServiceClient jobsClient;
+
+  private static OfdBinItem item(String fqtn) {
+    return new OfdBinItem(fqtn, UUID.randomUUID().toString(), 100L);
+  }
+
+  private void stubFindClaimed(List<TableOperationsRow> rows) {
+    when(operationsRepo.find(
+            eq(Optional.empty()),
+            eq(Optional.of(OperationStatus.SCHEDULING)),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            any(),
+            any(),
+            any()))
+        .thenReturn(rows);
+  }
+
+  private TableOperationsRow schedulingRow(String opId) {
+    return TableOperationsRow.builder()
+        .id(opId)
+        .tableUuid(UUID.randomUUID().toString())
+        .databaseName("db")
+        .tableName("tbl")
+        .operationType(OperationType.ORPHAN_FILES_DELETION)
+        .status(OperationStatus.SCHEDULING)
+        .createdAt(Instant.now())
+        .build();
+  }
+
+  @Test
+  void schedule_singleBin_claimsAndMarksScheduled() {
+    OfdBinItem one = item("db1.tbl1");
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    stubFindClaimed(List.of(schedulingRow(one.getOperationId())));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
+        .thenReturn(1);
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.of("job-123"));
+
+    new OfdBin(List.of(one), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
+
+    verify(operationsRepo)
+        .updateBatch(
+            eq(List.of(one.getOperationId())),
+            eq(OperationStatus.SCHEDULING),
+            eq(OperationStatus.SCHEDULED),
+            eq(Optional.empty()),
+            eq(Optional.of("job-123")));
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
+
+    ArgumentCaptor<List<String>> tableNames = ArgumentCaptor.forClass(List.class);
+    verify(jobsClient)
+        .launch(
+            anyString(), eq("ORPHAN_FILES_DELETION"), tableNames.capture(), anyList(), anyString());
+    assertThat(tableNames.getValue()).containsExactly("db1.tbl1");
+  }
+
+  @Test
+  void schedule_jobLaunchFails_revertsToPending() {
+    OfdBinItem one = item("db1.tbl1");
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    stubFindClaimed(List.of(schedulingRow(one.getOperationId())));
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.empty());
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any()))
+        .thenReturn(1);
+
+    new OfdBin(List.of(one), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
+
+    verify(operationsRepo)
+        .updateBatch(
+            eq(List.of(one.getOperationId())),
+            eq(OperationStatus.SCHEDULING),
+            eq(OperationStatus.PENDING),
+            eq(Optional.empty()),
+            eq(Optional.empty()));
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
+  }
+
+  @Test
+  void schedule_rowsAlreadyClaimed_skipsSubmit() {
+    OfdBinItem one = item("db1.tbl1");
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(0);
+    stubFindClaimed(List.of());
+
+    new OfdBin(List.of(one), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
+
+    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
+  }
+
+  @Test
+  void schedule_partialClaim_launchesOnlyClaimedSubset() {
+    OfdBinItem a = item("db1.tblA");
+    OfdBinItem b = item("db1.tblB");
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    // Only A actually claimed.
+    stubFindClaimed(List.of(schedulingRow(a.getOperationId())));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
+        .thenReturn(1);
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.of("job-partial"));
+
+    new OfdBin(List.of(a, b), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
+
+    ArgumentCaptor<List<String>> launchedTableNames = ArgumentCaptor.forClass(List.class);
+    ArgumentCaptor<List<String>> launchedOpIds = ArgumentCaptor.forClass(List.class);
+    verify(jobsClient)
+        .launch(
+            anyString(),
+            anyString(),
+            launchedTableNames.capture(),
+            launchedOpIds.capture(),
+            anyString());
+    assertThat(launchedTableNames.getValue()).containsExactly("db1.tblA");
+    assertThat(launchedOpIds.getValue()).containsExactly(a.getOperationId());
+  }
+}
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index 3d2c23b31..d42fb976e 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -1,363 +1,59 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
-import static org.assertj.core.api.Assertions.assertThat;
 import static org.assertj.core.api.Assertions.assertThatThrownBy;
 import static org.mockito.ArgumentMatchers.any;
-import static org.mockito.ArgumentMatchers.anyList;
-import static org.mockito.ArgumentMatchers.anyString;
 import static org.mockito.ArgumentMatchers.eq;
-import static org.mockito.Mockito.never;
+import static org.mockito.Mockito.times;
 import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 
-import com.linkedin.openhouse.optimizer.db.OperationStatus;
-import com.linkedin.openhouse.optimizer.db.SnapshotMetrics;
-import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
-import com.linkedin.openhouse.optimizer.db.TableStatsRow;
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
-import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.time.Instant;
 import java.util.List;
-import java.util.Map;
 import java.util.Optional;
-import java.util.UUID;
-import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.junit.jupiter.api.extension.ExtendWith;
-import org.mockito.ArgumentCaptor;
 import org.mockito.Mock;
 import org.mockito.junit.jupiter.MockitoExtension;
 
 @ExtendWith(MockitoExtension.class)
 class SchedulerRunnerTest {
 
-  private static final OperationTypeDto OFD = OperationTypeDto.ORPHAN_FILES_DELETION;
-  private static final com.linkedin.openhouse.optimizer.db.OperationType OFD_DB =
-      com.linkedin.openhouse.optimizer.db.OperationType.ORPHAN_FILES_DELETION;
-  private static final String OFD_STR = OFD.name();
-  private static final String RESULTS_ENDPOINT = "http://localhost:8080/v1/optimizer/operations";
-
-  @Mock private TableOperationsRepository operationsRepo;
-  @Mock private TableStatsRepository statsRepo;
-  @Mock private JobsServiceClient jobsClient;
-  @Mock private BinPacker binPacker;
-
-  private SchedulerRunner runner;
-
-  @BeforeEach
-  void setUp() {
-    Map<OperationTypeDto, BinPacker> packers = Map.of(OFD, binPacker);
-    runner = new SchedulerRunner(operationsRepo, statsRepo, jobsClient, packers, RESULTS_ENDPOINT);
-  }
-
-  // ---- Stubbing helpers ----
-
-  /** Stubs the initial "find PENDING" call. */
-  private void stubFindPending(List<TableOperationsRow> rows) {
-    when(operationsRepo.find(
-            eq(Optional.of(OFD_DB)),
-            eq(Optional.of(OperationStatus.PENDING)),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            any()))
-        .thenReturn(rows);
-  }
-
-  /** Stubs the post-claim "find SCHEDULING" call. */
-  private void stubFindClaimed(List<TableOperationsRow> rows) {
-    when(operationsRepo.find(
-            eq(Optional.empty()),
-            eq(Optional.of(OperationStatus.SCHEDULING)),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            any(),
-            any(),
-            any()))
-        .thenReturn(rows);
-  }
-
-  /**
-   * Stubs the mock packer by routing through a real FFD packer with unbounded caps, so the runner's
-   * op→OfdBinItem projection is exercised without bypassing Bin's package-private mutators.
-   */
-  private void stubOneBinForAllItems() {
-    FirstFitDecreasingBinPacker realPacker =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(0L).maxItemsPerBin(0).build();
-    when(binPacker.pack(anyList()))
-        .thenAnswer(inv -> realPacker.pack(inv.<List<BinItem>>getArgument(0)));
-  }
-
-  private TableOperationsRow pendingRow(String uuid, String db, String table) {
-    return TableOperationsRow.builder()
-        .id(UUID.randomUUID().toString())
-        .tableUuid(uuid)
-        .databaseName(db)
-        .tableName(table)
-        .operationType(OFD_DB)
-        .status(OperationStatus.PENDING)
-        .createdAt(Instant.now())
-        .build();
-  }
-
-  private TableOperationsRow schedulingRow(TableOperationsRow source) {
-    return source.toBuilder().status(OperationStatus.SCHEDULING).build();
-  }
-
-  private TableStatsRow statsRow(String uuid, long numCurrentFiles) {
-    return TableStatsRow.builder()
-        .tableUuid(uuid)
-        .snapshot(SnapshotMetrics.builder().numCurrentFiles(numCurrentFiles).build())
-        .build();
-  }
-
-  // ---- Tests ----
-
-  @Test
-  void schedule_noPendingOps_noJobSubmitted() {
-    stubFindPending(List.of());
-
-    runner.schedule(OFD);
-
-    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
-    verify(binPacker, never()).pack(anyList());
-  }
+  @Mock private BinPacker packer;
+  @Mock private Bin bin1;
+  @Mock private Bin bin2;
 
   @Test
   void schedule_unknownOperationType_throws() {
-    SchedulerRunner emptyRunner =
-        new SchedulerRunner(operationsRepo, statsRepo, jobsClient, Map.of(), RESULTS_ENDPOINT);
+    SchedulerRunner runner = new SchedulerRunner(List.of());
 
-    assertThatThrownBy(() -> emptyRunner.schedule(OFD))
+    assertThatThrownBy(() -> runner.schedule(OperationTypeDto.ORPHAN_FILES_DELETION))
         .isInstanceOf(IllegalStateException.class)
         .hasMessageContaining("No BinPacker registered");
-
-    verify(operationsRepo, never()).find(any(), any(), any(), any(), any(), any(), any(), any());
-    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
-  }
-
-  @Test
-  void schedule_singleBin_claimsAndMarksScheduled() {
-    String uuid = UUID.randomUUID().toString();
-    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
-
-    stubFindPending(List.of(row));
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100_000L)));
-    stubOneBinForAllItems();
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    stubFindClaimed(List.of(schedulingRow(row)));
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
-        .thenReturn(1);
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.of("job-123"));
-
-    runner.schedule(OFD);
-
-    verify(operationsRepo)
-        .updateBatch(
-            eq(List.of(row.getId())),
-            eq(OperationStatus.SCHEDULING),
-            eq(OperationStatus.SCHEDULED),
-            eq(Optional.empty()),
-            eq(Optional.of("job-123")));
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
-
-    ArgumentCaptor<List<String>> tableNames = ArgumentCaptor.forClass(List.class);
-    verify(jobsClient)
-        .launch(anyString(), eq(OFD_STR), tableNames.capture(), anyList(), anyString());
-    assertThat(tableNames.getValue()).containsExactly("db1.tbl1");
-  }
-
-  @Test
-  void schedule_jobLaunchFails_marksPendingForRetry() {
-    String uuid = UUID.randomUUID().toString();
-    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
-
-    stubFindPending(List.of(row));
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-    stubOneBinForAllItems();
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    stubFindClaimed(List.of(schedulingRow(row)));
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.empty());
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any()))
-        .thenReturn(1);
-
-    runner.schedule(OFD);
-
-    verify(operationsRepo)
-        .updateBatch(
-            eq(List.of(row.getId())),
-            eq(OperationStatus.SCHEDULING),
-            eq(OperationStatus.PENDING),
-            eq(Optional.empty()),
-            eq(Optional.empty()));
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
   }
 
   @Test
-  void schedule_rowsAlreadyClaimed_skipsSubmit() {
-    String uuid = UUID.randomUUID().toString();
-    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
-
-    stubFindPending(List.of(row));
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-    stubOneBinForAllItems();
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(0);
-    stubFindClaimed(List.of());
-
-    runner.schedule(OFD);
-
-    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
-  }
-
-  @Test
-  void schedule_cancelsDuplicatePendingPerCycle() {
-    String uuid = UUID.randomUUID().toString();
-    TableOperationsRow row1 = pendingRow(uuid, "db1", "tbl1");
-    TableOperationsRow row2 = pendingRow(uuid, "db1", "tbl1");
-
-    stubFindPending(List.of(row1, row2));
-    when(operationsRepo.cancel(anyList())).thenReturn(1);
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-    stubOneBinForAllItems();
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    // After dedup, only row1 (oldest by createdAt then id) survives.
-    TableOperationsRow survivor = row1.getCreatedAt().isBefore(row2.getCreatedAt()) ? row1 : row2;
-    if (row1.getCreatedAt().equals(row2.getCreatedAt())) {
-      survivor = row1.getId().compareTo(row2.getId()) <= 0 ? row1 : row2;
-    }
-    stubFindClaimed(List.of(schedulingRow(survivor)));
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
-        .thenReturn(1);
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.of("job-dedup"));
+  void schedule_delegatesToPackerAndSchedulesEachBin() {
+    when(packer.getOperationType()).thenReturn(OperationTypeDto.ORPHAN_FILES_DELETION);
+    when(packer.prepare(any(), any())).thenReturn(List.of(bin1, bin2));
 
-    runner.schedule(OFD);
+    SchedulerRunner runner = new SchedulerRunner(List.of(packer));
+    runner.schedule(OperationTypeDto.ORPHAN_FILES_DELETION);
 
-    // Exactly one ID was cancelled (the duplicate).
-    ArgumentCaptor<List<String>> cancelled = ArgumentCaptor.forClass(List.class);
-    verify(operationsRepo).cancel(cancelled.capture());
-    assertThat(cancelled.getValue()).hasSize(1);
+    verify(packer).prepare(eq(Optional.empty()), eq(Optional.empty()));
+    verify(bin1, times(1)).schedule();
+    verify(bin2, times(1)).schedule();
   }
 
   @Test
-  void schedule_partialClaim_launchesAndMarksOnlyClaimedSubset() {
-    String uuidA = UUID.randomUUID().toString();
-    String uuidB = UUID.randomUUID().toString();
-    TableOperationsRow rowA = pendingRow(uuidA, "db1", "tblA");
-    TableOperationsRow rowB = pendingRow(uuidB, "db1", "tblB");
-
-    stubFindPending(List.of(rowA, rowB));
-    when(statsRepo.findAllById(any()))
-        .thenReturn(List.of(statsRow(uuidA, 100L), statsRow(uuidB, 100L)));
-    stubOneBinForAllItems();
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    // Only A actually claimed (B owned by another instance).
-    stubFindClaimed(List.of(schedulingRow(rowA)));
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
-        .thenReturn(1);
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.of("job-partial"));
-
-    runner.schedule(OFD);
-
-    ArgumentCaptor<List<String>> launchedTableNames = ArgumentCaptor.forClass(List.class);
-    ArgumentCaptor<List<String>> launchedOpIds = ArgumentCaptor.forClass(List.class);
-    verify(jobsClient)
-        .launch(
-            anyString(),
-            anyString(),
-            launchedTableNames.capture(),
-            launchedOpIds.capture(),
-            anyString());
-    assertThat(launchedTableNames.getValue()).containsExactly("db1.tblA");
-    assertThat(launchedOpIds.getValue()).containsExactly(rowA.getId());
-
-    verify(operationsRepo)
-        .updateBatch(
-            eq(List.of(rowA.getId())),
-            eq(OperationStatus.SCHEDULING),
-            eq(OperationStatus.SCHEDULED),
-            eq(Optional.empty()),
-            eq(Optional.of("job-partial")));
-  }
-
-  @Test
-  void schedule_opsWithoutStats_skipped() {
-    String withStats = UUID.randomUUID().toString();
-    String missing = UUID.randomUUID().toString();
-    TableOperationsRow withStatsRow = pendingRow(withStats, "db1", "tblA");
-    TableOperationsRow missingRow = pendingRow(missing, "db1", "tblB");
-
-    stubFindPending(List.of(withStatsRow, missingRow));
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(withStats, 50L)));
-    stubOneBinForAllItems();
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    stubFindClaimed(List.of(schedulingRow(withStatsRow)));
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
-        .thenReturn(1);
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.of("job-skip"));
-
-    runner.schedule(OFD);
-
-    ArgumentCaptor<List<String>> ids = ArgumentCaptor.forClass(List.class);
-    verify(operationsRepo)
-        .updateBatch(
-            ids.capture(),
-            eq(OperationStatus.PENDING),
-            eq(OperationStatus.SCHEDULING),
-            any(),
-            any());
-    assertThat(ids.getValue()).containsExactly(withStatsRow.getId());
-  }
-
-  @Test
-  void schedule_allOpsWithoutStats_noJobSubmitted() {
-    TableOperationsRow row = pendingRow(UUID.randomUUID().toString(), "db1", "tbl1");
-
-    stubFindPending(List.of(row));
-    when(statsRepo.findAllById(any())).thenReturn(List.of());
+  void schedule_passesScopeArgsThrough() {
+    when(packer.getOperationType()).thenReturn(OperationTypeDto.ORPHAN_FILES_DELETION);
+    when(packer.prepare(any(), any())).thenReturn(List.of());
 
-    runner.schedule(OFD);
+    SchedulerRunner runner = new SchedulerRunner(List.of(packer));
+    runner.schedule(OperationTypeDto.ORPHAN_FILES_DELETION, Optional.of("db1"), Optional.of("t1"));
 
-    verify(binPacker, never()).pack(anyList());
-    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
+    verify(packer).prepare(eq(Optional.of("db1")), eq(Optional.of("t1")));
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
index ab4dac078..e2efa2ce3 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
@@ -22,60 +22,78 @@ private static TestItem item(String id, long weight) {
   }
 
   @Test
-  void emptyInput_returnsEmptyBins() {
-    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
+  void emptyInput_returnsEmptyGroupings() {
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(100L).maxItemsPerBin(10).build();
     assertThat(packer.pack(List.of())).isEmpty();
   }
 
   @Test
-  void singleItem_oneBin() {
+  void singleItem_oneGrouping() {
     FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin> bins = packer.pack(List.of(item("a", 100L)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).size()).isEqualTo(1);
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(1_000_000L)
+            .maxItemsPerBin(10)
+            .build();
+    List<List<BinItem>> groupings = packer.pack(List.of(item("a", 100L)));
+    assertThat(groupings).hasSize(1);
+    assertThat(groupings.get(0)).hasSize(1);
   }
 
   @Test
-  void underWeightLimit_oneBin() {
+  void underWeightLimit_oneGrouping() {
     FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin> bins =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(1_000_000L)
+            .maxItemsPerBin(10)
+            .build();
+    List<List<BinItem>> groupings =
         packer.pack(List.of(item("a", 300_000L), item("b", 300_000L), item("c", 300_000L)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).size()).isEqualTo(3);
-    assertThat(bins.get(0).getTotalWeight()).isEqualTo(900_000L);
+    assertThat(groupings).hasSize(1);
+    assertThat(groupings.get(0)).hasSize(3);
+    long total = groupings.get(0).stream().mapToLong(BinItem::getWeight).sum();
+    assertThat(total).isEqualTo(900_000L);
   }
 
   @Test
-  void overWeightLimit_twoBins() {
+  void overWeightLimit_twoGroupings() {
     FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).build();
-    List<Bin> bins =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(1_000_000L)
+            .maxItemsPerBin(10)
+            .build();
+    List<List<BinItem>> groupings =
         packer.pack(List.of(item("a", 600_000L), item("b", 600_000L), item("c", 400_000L)));
-    assertThat(bins).hasSize(2);
-    // FFD: sort desc → 600, 600, 400. Place 600 → bin0; next 600 doesn't fit bin0, → bin1;
-    // 400 fits bin0 (total 1_000_000).
-    assertThat(bins.get(0).getTotalWeight()).isEqualTo(1_000_000L);
-    assertThat(bins.get(1).getTotalWeight()).isEqualTo(600_000L);
+    assertThat(groupings).hasSize(2);
+    // FFD: sort desc → 600, 600, 400. Place 600 → group0; next 600 doesn't fit group0 → group1;
+    // 400 fits group0 (total 1_000_000).
+    long g0Total = groupings.get(0).stream().mapToLong(BinItem::getWeight).sum();
+    long g1Total = groupings.get(1).stream().mapToLong(BinItem::getWeight).sum();
+    assertThat(g0Total).isEqualTo(1_000_000L);
+    assertThat(g1Total).isEqualTo(600_000L);
   }
 
   @Test
-  void itemLargerThanCap_getsOwnBin() {
+  void itemLargerThanCap_getsOwnGrouping() {
     FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000L).build();
-    List<Bin> bins = packer.pack(List.of(item("big", 5_000L)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).size()).isEqualTo(1);
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000L).maxItemsPerBin(10).build();
+    List<List<BinItem>> groupings = packer.pack(List.of(item("big", 5_000L)));
+    assertThat(groupings).hasSize(1);
+    assertThat(groupings.get(0)).hasSize(1);
   }
 
   @Test
   void sortedDescending_largestFirst() {
-    FirstFitDecreasingBinPacker packer = FirstFitDecreasingBinPacker.builder().build();
-    List<Bin> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
-    assertThat(bins).hasSize(1);
+    FirstFitDecreasingBinPacker packer =
+        FirstFitDecreasingBinPacker.builder()
+            .maxWeightPerBin(2_000_000L)
+            .maxItemsPerBin(10)
+            .build();
+    List<List<BinItem>> groupings =
+        packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
+    assertThat(groupings).hasSize(1);
     List<String> ids =
-        bins.get(0).items().stream()
+        groupings.get(0).stream()
             .map(TestItem.class::cast)
             .map(TestItem::getId)
             .collect(Collectors.toList());
@@ -83,24 +101,13 @@ void sortedDescending_largestFirst() {
   }
 
   @Test
-  void maxItemsCap_splitsBins() {
+  void maxItemsCap_splitsGroupings() {
     FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(0L).maxItemsPerBin(2).build();
-    List<Bin> bins =
+        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).maxItemsPerBin(2).build();
+    List<List<BinItem>> groupings =
         packer.pack(List.of(item("a", 1L), item("b", 1L), item("c", 1L), item("d", 1L)));
-    assertThat(bins).hasSize(2);
-    assertThat(bins.get(0).size()).isEqualTo(2);
-    assertThat(bins.get(1).size()).isEqualTo(2);
-  }
-
-  @Test
-  void zeroCap_disablesDimension() {
-    // All caps zero → everything in one bin regardless of weight.
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(0L).maxItemsPerBin(0).build();
-    List<Bin> bins =
-        packer.pack(List.of(item("a", Long.MAX_VALUE / 4), item("b", Long.MAX_VALUE / 4)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).size()).isEqualTo(2);
+    assertThat(groupings).hasSize(2);
+    assertThat(groupings.get(0)).hasSize(2);
+    assertThat(groupings.get(1)).hasSize(2);
   }
 }

From 843a57cfa4398811666c712359150624bd8dd882 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 10:25:13 -0700
Subject: [PATCH 06/13] refactor(scheduler): stateless packers, IO in
 scheduler, registration tuple
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The full restructuring from the PR #626 review. Operations layer
deleted; OFD's only footprint is one @Bean in SchedulerConfig.

binpack/ (op-agnostic):
- Bin: pure data class — operationType + items. The scheduler reads
  from a bin to schedule it; the bin does no IO.
- BinItem: interface — getWeight + getFullyQualifiedTableName +
  getOperationId + withOpAndStats(op, stats). Implementations
  self-weight from a (pending operation, stats) pair via withOpAndStats
  on a seat instance (no-arg constructor).
- BinPacker: interface — getOperationType + pack(List<BinItem>) →
  List<Bin>. Stateless, no IO.
- FirstFitBinPacker: pure FFD algorithm with file count + item count
  caps. Constructor takes only immutable configuration (operationType,
  maxWeightPerBin, maxItemsPerBin); pack() is a pure function. No
  BinItem instances, no repos.
- TotalFilesBinItem: BinItem that weights by current file count. Knows
  nothing about which operation type uses it — usable by any
  per-table-fanout job whose Spark cost scales with file count (OFD,
  stats collection, etc).

scheduler/:
- BinPackerRegistration: tuple bundling (operationType, packer,
  prototype). The one place each operation's full identity is composed.
- SchedulerRunner: rewritten as the generic dispatcher that owns all
  IO. Constructor injects List<BinPackerRegistration> via Spring,
  indexes by operation type into an immutable Map.copyOf. schedule()
  reads PENDING rows, dedups, fetches stats, projects via the
  registration's prototype, packs via the registration's packer, then
  calls scheduleBin(bin) for each result. scheduleBin() is generic:
  claim CAS (PENDING → SCHEDULING with watermark), re-query for
  claimed rows, narrow to claimed items, launch a batched Spark job
  (jobName = "batched-<optype>-<ts>", with claimed tableNames +
  operationIds + opType + resultsEndpoint), mark SCHEDULED on
  success or revert to PENDING on launch failure. The runner imports
  only binpack.* and the model enum — no operations.* anywhere.
- SchedulerConfig: cross-cutting beans (WebClient, JobsServiceClient)
  plus one @Bean per operation type. ofdRegistration() wires a
  FirstFitBinPacker(ORPHAN_FILES_DELETION, maxFiles, maxTables) with a
  new TotalFilesBinItem() prototype. This is the only file in the
  scheduler module that references OFD by name.

operations/ofd/ deleted entirely. OfdBin, OfdBinItem, OfdBinPacker —
gone. The behavior they encoded now lives generically across Bin,
TotalFilesBinItem, FirstFitBinPacker, and the @Bean wiring.

Other small polish in line with the review's lessons:
- BinItem.getFullyQualifiedTableName() spelled out, no FQTN
  abbreviation.
- FirstFitBinPacker caps are required positive (no @Builder.Default,
  no "0 disables" semantic, no arbitrary 1_000_000 constant).
- No <? extends Foo> wildcards anywhere — invariance handled with
  concrete types in the binpack interface (List<BinItem>) and an
  immutable Map<OperationTypeDto, BinPackerRegistration> for the
  registry.
- BinItem.withOpAndStats returns a new instance; no mutable state on
  the seat prototype.
- OFD's null-chain (stats → snapshot → numCurrentFiles) wrapped in an
  Optional chain inside TotalFilesBinItem.currentFileCount; DTO Optional
  conversion deferred to a follow-up PR per the null-is-a-code-smell
  lesson.

Tests rewritten:
- FirstFitBinPackerTest: pure algorithm tests with a local TestItem
  implementing BinItem; covers empty, single, under/over weight cap,
  oversized-on-its-own, FFD-decreasing order, max-items cap, and that
  produced bins carry the configured operation type. No optimizer-domain
  imports.
- TotalFilesBinItemTest: covers withOpAndStats projection of fqtn +
  operationId + weight, and the Optional chain on null stats / null
  snapshot / null file count. Asserts seat prototype state is not
  shared with the populated copy.
- SchedulerRunnerTest: full pipeline tests with mocked repos and jobs
  client, real FirstFitBinPacker + TotalFilesBinItem registration.
  Covers unknown-type-throws, no-pending-ops, ops-without-stats,
  single-bin claim+launch+mark, launch-fails-reverts, already-claimed
  skip, dedup-per-cycle, partial-claim launches only claimed subset,
  and ops-without-stats skipped from the projection.
- OfdBinPackerTest, OfdBinTest, FirstFitDecreasingBinPackerTest
  deleted.

SchedulerApplication keeps its prior shape — injects SchedulerRunner,
loops runner.getRegisteredOperationTypes().forEach(runner::schedule).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../optimizer/operations/ofd/OfdBin.java      | 124 -------
 .../optimizer/operations/ofd/OfdBinItem.java  |  59 ----
 .../operations/ofd/OfdBinPacker.java          | 171 ---------
 .../scheduler/BinPackerRegistration.java      |  24 ++
 .../optimizer/scheduler/SchedulerRunner.java  | 251 +++++++++++++-
 .../optimizer/scheduler/binpack/Bin.java      |  19 +-
 .../optimizer/scheduler/binpack/BinItem.java  |  22 +-
 .../scheduler/binpack/BinPacker.java          |   8 +-
 ...gBinPacker.java => FirstFitBinPacker.java} |  32 +-
 .../scheduler/binpack/TotalFilesBinItem.java  |  45 +++
 .../scheduler/config/SchedulerConfig.java     |  27 +-
 .../operations/ofd/OfdBinPackerTest.java      | 173 ---------
 .../optimizer/operations/ofd/OfdBinTest.java  | 172 ---------
 .../scheduler/SchedulerRunnerTest.java        | 328 ++++++++++++++++--
 .../binpack/FirstFitBinPackerTest.java        | 119 +++++++
 .../FirstFitDecreasingBinPackerTest.java      | 113 ------
 .../binpack/TotalFilesBinItemTest.java        |  70 ++++
 17 files changed, 879 insertions(+), 878 deletions(-)
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java
 rename services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/{FirstFitDecreasingBinPacker.java => FirstFitBinPacker.java} (56%)
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
 delete mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java
 delete mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java
 create mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
 delete mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
 create mode 100644 services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java

diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java
deleted file mode 100644
index 6afe6ead5..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBin.java
+++ /dev/null
@@ -1,124 +0,0 @@
-package com.linkedin.openhouse.optimizer.operations.ofd;
-
-import com.linkedin.openhouse.optimizer.db.OperationStatus;
-import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
-import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.time.Instant;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Optional;
-import java.util.Set;
-import java.util.stream.Collectors;
-import lombok.extern.slf4j.Slf4j;
-import org.springframework.data.domain.Pageable;
-import org.springframework.transaction.annotation.Transactional;
-
-/**
- * A single OFD batch: a group of operations that will be submitted together as one batched
- * orphan-files-deletion Spark job. Claims its operations via CAS, narrows to the rows it actually
- * owns, launches the Spark job, and marks SCHEDULED or reverts to PENDING based on launch outcome.
- */
-@Slf4j
-public class OfdBin implements Bin {
-  private final List<OfdBinItem> items;
-  private final TableOperationsRepository operationsRepo;
-  private final JobsServiceClient jobsClient;
-  private final String resultsEndpoint;
-
-  public OfdBin(
-      List<OfdBinItem> items,
-      TableOperationsRepository operationsRepo,
-      JobsServiceClient jobsClient,
-      String resultsEndpoint) {
-    this.items = items;
-    this.operationsRepo = operationsRepo;
-    this.jobsClient = jobsClient;
-    this.resultsEndpoint = resultsEndpoint;
-  }
-
-  @Override
-  @Transactional
-  public void schedule() {
-    List<String> ids = items.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
-
-    // Claim in one batched UPDATE: PENDING → SCHEDULING. The aggregate row count alone doesn't
-    // tell us *which* rows we own; re-query for SCHEDULING rows tagged with our scheduledAt
-    // watermark to get that exact set.
-    Instant claimedAt = Instant.now();
-    operationsRepo.updateBatch(
-        ids,
-        OperationStatus.PENDING,
-        OperationStatus.SCHEDULING,
-        Optional.of(claimedAt),
-        Optional.empty());
-    List<String> claimedIds =
-        operationsRepo
-            .find(
-                Optional.empty(),
-                Optional.of(OperationStatus.SCHEDULING),
-                Optional.empty(),
-                Optional.empty(),
-                Optional.empty(),
-                Optional.of(claimedAt),
-                Optional.of(ids),
-                Pageable.unpaged())
-            .stream()
-            .map(TableOperationsRow::getId)
-            .collect(Collectors.toList());
-    if (claimedIds.isEmpty()) {
-      log.info("All rows in bin already claimed by another scheduler instance; skipping");
-      return;
-    }
-    if (claimedIds.size() < ids.size()) {
-      log.info(
-          "Partial claim: {} of {} ops in bin claimed; launching job for claimed subset only",
-          claimedIds.size(),
-          ids.size());
-    }
-
-    Set<String> claimedSet = new HashSet<>(claimedIds);
-    List<OfdBinItem> claimedItems =
-        items.stream()
-            .filter(item -> claimedSet.contains(item.getOperationId()))
-            .collect(Collectors.toList());
-    List<String> tableNames =
-        claimedItems.stream().map(OfdBinItem::getFqtn).collect(Collectors.toList());
-    List<String> operationIds =
-        claimedItems.stream().map(OfdBinItem::getOperationId).collect(Collectors.toList());
-
-    String opTypeName = OperationTypeDto.ORPHAN_FILES_DELETION.name();
-    String jobName = "batched-" + opTypeName.toLowerCase() + "-" + claimedAt.toEpochMilli();
-    Optional<String> jobId =
-        jobsClient.launch(jobName, opTypeName, tableNames, operationIds, resultsEndpoint);
-
-    if (jobId.isPresent()) {
-      int updated =
-          operationsRepo.updateBatch(
-              claimedIds,
-              OperationStatus.SCHEDULING,
-              OperationStatus.SCHEDULED,
-              Optional.empty(),
-              Optional.of(jobId.get()));
-      log.info(
-          "Submitted job {} for {} tables ({} rows marked SCHEDULED)",
-          jobId.get(),
-          claimedItems.size(),
-          updated);
-    } else {
-      int reverted =
-          operationsRepo.updateBatch(
-              claimedIds,
-              OperationStatus.SCHEDULING,
-              OperationStatus.PENDING,
-              Optional.empty(),
-              Optional.empty());
-      log.warn(
-          "Job submission failed; reverted {} claimed rows back to PENDING for retry on the next"
-              + " pass",
-          reverted);
-    }
-  }
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
deleted file mode 100644
index c145405e7..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinItem.java
+++ /dev/null
@@ -1,59 +0,0 @@
-package com.linkedin.openhouse.optimizer.operations.ofd;
-
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
-import java.util.Optional;
-import lombok.AllArgsConstructor;
-import lombok.Getter;
-import lombok.NonNull;
-import lombok.ToString;
-
-/**
- * OFD-specific {@link BinItem}: carries the table fqtn and operation id the downstream Spark
- * dispatch needs, plus the weight (current file count) the packer uses. Self-weights from a paired
- * {@link TableOperationDto} and {@link TableStatsDto} via {@link #from(TableOperationDto,
- * TableStatsDto)}.
- *
- * <p>Weighting choice — file count, not bytes — reflects what makes OFD expensive: per-file
- * listing, manifest joins, and delete calls scale with file count. A 10 GB table with 100k files is
- * more expensive to OFD than a 1 TB table with 2k files.
- */
-@AllArgsConstructor
-@Getter
-@ToString
-public class OfdBinItem implements BinItem {
-
-  /** Fully-qualified {@code database.table} identifier passed as {@code --tableNames}. */
-  @NonNull private final String fqtn;
-
-  /**
-   * Optimizer operation id passed as {@code --operationIds}; the Spark app POSTs back keyed on it.
-   */
-  @NonNull private final String operationId;
-
-  /** Current file count for this table; the FFD packer's cost dimension. */
-  private final long weight;
-
-  /**
-   * Project a pending operation + its stats row into a packable item. Weighting lives entirely in
-   * this class — callers do {@code pendingOps.stream().map(op -> OfdBinItem.from(op,
-   * statsByUuid.get(op.getTableUuid())))}.
-   */
-  public static OfdBinItem from(@NonNull TableOperationDto op, TableStatsDto stats) {
-    return new OfdBinItem(
-        op.getDatabaseName() + "." + op.getTableName(), op.getId(), currentFileCount(stats));
-  }
-
-  private static long currentFileCount(TableStatsDto stats) {
-    return Optional.ofNullable(stats)
-        .map(TableStatsDto::getSnapshot)
-        .map(TableStatsDto.SnapshotMetrics::getNumCurrentFiles)
-        .orElse(0L);
-  }
-
-  @Override
-  public long getWeight() {
-    return weight;
-  }
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java
deleted file mode 100644
index e538bf133..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPacker.java
+++ /dev/null
@@ -1,171 +0,0 @@
-package com.linkedin.openhouse.optimizer.operations.ofd;
-
-import com.linkedin.openhouse.optimizer.db.OperationStatus;
-import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
-import com.linkedin.openhouse.optimizer.db.TableStatsRow;
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
-import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitDecreasingBinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.util.Comparator;
-import java.util.List;
-import java.util.Map;
-import java.util.Optional;
-import java.util.Set;
-import java.util.stream.Collectors;
-import lombok.extern.slf4j.Slf4j;
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.beans.factory.annotation.Value;
-import org.springframework.data.domain.Pageable;
-import org.springframework.stereotype.Component;
-
-/**
- * Per-cycle OFD orchestrator. Loads PENDING OFD operations, deduplicates duplicates per cycle,
- * joins each to its stats row, projects into {@link OfdBinItem}, asks {@link
- * FirstFitDecreasingBinPacker} to group them, and returns each grouping wrapped in an {@link
- * OfdBin} that knows how to schedule itself.
- */
-@Slf4j
-@Component
-public class OfdBinPacker implements BinPacker {
-
-  private final FirstFitDecreasingBinPacker ffd;
-  private final TableOperationsRepository operationsRepo;
-  private final TableStatsRepository statsRepo;
-  private final JobsServiceClient jobsClient;
-  private final String resultsEndpoint;
-
-  @Autowired
-  public OfdBinPacker(
-      @Value("${optimizer.scheduler.ofd.max-files-per-bin}") long maxFilesPerBin,
-      @Value("${optimizer.scheduler.ofd.max-tables-per-bin}") int maxTablesPerBin,
-      TableOperationsRepository operationsRepo,
-      TableStatsRepository statsRepo,
-      JobsServiceClient jobsClient,
-      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint) {
-    this.ffd =
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(maxFilesPerBin)
-            .maxItemsPerBin(maxTablesPerBin)
-            .build();
-    this.operationsRepo = operationsRepo;
-    this.statsRepo = statsRepo;
-    this.jobsClient = jobsClient;
-    this.resultsEndpoint = resultsEndpoint;
-  }
-
-  @Override
-  public OperationTypeDto getOperationType() {
-    return OperationTypeDto.ORPHAN_FILES_DELETION;
-  }
-
-  @Override
-  public List<Bin> prepare(Optional<String> databaseName, Optional<String> tableName) {
-    // Unpaged: a single-page truncation would silently drop work past page 0 (next cycle would
-    // re-load the same first page in MySQL row order, leaving the tail unscheduled until the
-    // ordering shifts). Correctness here requires the full PENDING set in one cycle; the working
-    // set is bounded by count(PENDING for OFD).
-    List<TableOperationsRow> pendingRows =
-        operationsRepo.find(
-            Optional.of(OperationTypeDto.ORPHAN_FILES_DELETION.toDb()),
-            Optional.of(OperationStatus.PENDING),
-            Optional.empty(),
-            databaseName,
-            tableName,
-            Optional.empty(),
-            Optional.empty(),
-            Pageable.unpaged());
-    if (pendingRows.isEmpty()) {
-      log.info("No PENDING OFD operations; nothing to prepare");
-      return List.of();
-    }
-
-    // Deduplicate before claiming: if multiple PENDING rows exist for the same tableUuid, keep
-    // the oldest (lex-tiebreak on id) and cancel the rest. Per-cycle, not per-bin.
-    List<TableOperationsRow> survivors = cancelDuplicates(pendingRows);
-    if (survivors.isEmpty()) {
-      return List.of();
-    }
-
-    List<TableOperationDto> pending =
-        survivors.stream().map(TableOperationDto::fromRow).collect(Collectors.toList());
-
-    // Fetch fresh stats this cycle (one batched query) rather than denormalizing onto
-    // TableOperationDto. Smaller op rows, fresher cost data.
-    Set<String> uuids =
-        pending.stream().map(TableOperationDto::getTableUuid).collect(Collectors.toSet());
-    Map<String, TableStatsDto> statsByUuid =
-        statsRepo.findAllById(uuids).stream()
-            .collect(Collectors.toMap(TableStatsRow::getTableUuid, TableStatsDto::fromRow));
-
-    // Filter at the boundary so every projection is built from a known-non-null stats row. A
-    // table without a stats row gets skipped this cycle and reconsidered after stats land.
-    List<TableOperationDto> withStats =
-        pending.stream()
-            .filter(op -> statsByUuid.containsKey(op.getTableUuid()))
-            .collect(Collectors.toList());
-    if (withStats.size() < pending.size()) {
-      log.warn(
-          "Skipped {} OFD operations with no table_stats row", pending.size() - withStats.size());
-    }
-    if (withStats.isEmpty()) {
-      return List.of();
-    }
-
-    List<BinItem> items =
-        withStats.stream()
-            .<BinItem>map(op -> OfdBinItem.from(op, statsByUuid.get(op.getTableUuid())))
-            .collect(Collectors.toList());
-
-    List<List<BinItem>> groupings = ffd.pack(items);
-    log.info("Prepared {} PENDING OFD operations into {} bins", items.size(), groupings.size());
-
-    return groupings.stream().map(this::toOfdBin).collect(Collectors.toList());
-  }
-
-  private Bin toOfdBin(List<BinItem> grouping) {
-    List<OfdBinItem> ofdItems =
-        grouping.stream().map(OfdBinItem.class::cast).collect(Collectors.toList());
-    return new OfdBin(ofdItems, operationsRepo, jobsClient, resultsEndpoint);
-  }
-
-  /**
-   * Group {@code pendingRows} by {@code tableUuid}; for any group with more than one row, cancel
-   * all but the oldest (lex-tiebreak on id). Returns the survivors in input order. Deterministic.
-   */
-  private List<TableOperationsRow> cancelDuplicates(List<TableOperationsRow> pendingRows) {
-    Map<String, List<TableOperationsRow>> byTableUuid =
-        pendingRows.stream().collect(Collectors.groupingBy(TableOperationsRow::getTableUuid));
-
-    List<String> duplicateIds =
-        byTableUuid.values().stream()
-            .filter(rows -> rows.size() > 1)
-            .flatMap(
-                rows ->
-                    rows.stream()
-                        .sorted(
-                            Comparator.comparing(TableOperationsRow::getCreatedAt)
-                                .thenComparing(TableOperationsRow::getId))
-                        .skip(1))
-            .map(TableOperationsRow::getId)
-            .collect(Collectors.toList());
-
-    if (duplicateIds.isEmpty()) {
-      return pendingRows;
-    }
-
-    int cancelled = operationsRepo.cancel(duplicateIds);
-    log.warn("Cancelled {} duplicate PENDING rows", cancelled);
-
-    Set<String> cancelledIds = Set.copyOf(duplicateIds);
-    return pendingRows.stream()
-        .filter(r -> !cancelledIds.contains(r.getId()))
-        .collect(Collectors.toList());
-  }
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java
new file mode 100644
index 000000000..752e04b51
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java
@@ -0,0 +1,24 @@
+package com.linkedin.openhouse.optimizer.scheduler;
+
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
+import lombok.AllArgsConstructor;
+import lombok.Getter;
+
+/**
+ * Registration tuple for one operation type. Bundles the bucketing strategy with the {@link
+ * BinItem} prototype the scheduler uses to project pending operations and their stats into packable
+ * items.
+ *
+ * <p>Spring bean assembled by {@link
+ * com.linkedin.openhouse.optimizer.scheduler.config.SchedulerConfig}; {@link SchedulerRunner}
+ * injects all registrations and indexes them by {@link #getOperationType()}.
+ */
+@AllArgsConstructor
+@Getter
+public class BinPackerRegistration {
+  private final OperationTypeDto operationType;
+  private final BinPacker packer;
+  private final BinItem prototype;
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index 441ff577e..2e1f544e7 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -1,8 +1,20 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
+import com.linkedin.openhouse.optimizer.db.OperationStatus;
+import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
+import com.linkedin.openhouse.optimizer.db.TableStatsRow;
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
+import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
+import java.time.Instant;
+import java.util.Comparator;
+import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
@@ -11,25 +23,59 @@
 import java.util.stream.Collectors;
 import lombok.extern.slf4j.Slf4j;
 import org.springframework.beans.factory.annotation.Autowired;
+import org.springframework.beans.factory.annotation.Value;
+import org.springframework.data.domain.Pageable;
 import org.springframework.stereotype.Component;
+import org.springframework.transaction.annotation.Transactional;
 
 /**
- * Looks up the {@link BinPacker} registered for an operation type, asks it to prepare the bins for
- * this cycle, and lets each bin schedule itself. The runner holds an immutable {@code
- * OperationTypeDto -> BinPacker} map populated at construction by Spring injection; it doesn't know
- * which operations exist beyond what's in that map.
+ * Generic scheduler. For each operation type registered via {@link BinPackerRegistration}:
+ *
+ * <ol>
+ *   <li>Reads PENDING rows from MySQL.
+ *   <li>Deduplicates duplicate PENDING rows for the same {@code tableUuid}.
+ *   <li>Loads the stats row for every survivor.
+ *   <li>Projects each (operation, stats) pair into a {@link BinItem} via the registration's
+ *       prototype.
+ *   <li>Hands the items to the {@link BinPacker} to get bins.
+ *   <li>Schedules each bin (claim CAS, narrow to claimed, launch, record).
+ * </ol>
+ *
+ * <p>The runner is operation-agnostic. All IO and the claim/launch/mark lifecycle live here. The
+ * only per-operation knowledge in the module is the {@link BinPackerRegistration} bean wired in
+ * {@link com.linkedin.openhouse.optimizer.scheduler.config.SchedulerConfig}.
  */
 @Slf4j
 @Component
 public class SchedulerRunner {
-  private final Map<OperationTypeDto, BinPacker> binPackers;
+
+  private final TableOperationsRepository operationsRepo;
+  private final TableStatsRepository statsRepo;
+  private final JobsServiceClient jobsClient;
+  private final String resultsEndpoint;
+  private final Map<OperationTypeDto, BinPackerRegistration> registry;
 
   @Autowired
-  public SchedulerRunner(List<BinPacker> binPackers) {
-    this.binPackers =
+  public SchedulerRunner(
+      TableOperationsRepository operationsRepo,
+      TableStatsRepository statsRepo,
+      JobsServiceClient jobsClient,
+      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint,
+      List<BinPackerRegistration> registrations) {
+    this.operationsRepo = operationsRepo;
+    this.statsRepo = statsRepo;
+    this.jobsClient = jobsClient;
+    this.resultsEndpoint = resultsEndpoint;
+    this.registry =
         Map.copyOf(
-            binPackers.stream()
-                .collect(Collectors.toMap(BinPacker::getOperationType, Function.identity())));
+            registrations.stream()
+                .collect(
+                    Collectors.toMap(
+                        BinPackerRegistration::getOperationType, Function.identity())));
+  }
+
+  public Set<OperationTypeDto> getRegisteredOperationTypes() {
+    return registry.keySet();
   }
 
   public void schedule(OperationTypeDto type) {
@@ -38,14 +84,191 @@ public void schedule(OperationTypeDto type) {
 
   public void schedule(
       OperationTypeDto type, Optional<String> databaseName, Optional<String> tableName) {
-    BinPacker packer = binPackers.get(type);
-    if (packer == null) {
+    BinPackerRegistration reg = registry.get(type);
+    if (reg == null) {
       throw new IllegalStateException("No BinPacker registered for operation type " + type);
     }
-    packer.prepare(databaseName, tableName).forEach(Bin::schedule);
+
+    List<TableOperationDto> pending = loadAndDedupPending(type, databaseName, tableName);
+    if (pending.isEmpty()) {
+      return;
+    }
+    Map<String, TableStatsDto> statsByUuid = loadStatsByUuid(pending);
+
+    List<BinItem> items = projectToItems(pending, statsByUuid, reg.getPrototype(), type);
+    if (items.isEmpty()) {
+      return;
+    }
+
+    List<Bin> bins = reg.getPacker().pack(items);
+    log.info("Packed {} PENDING {} operations into {} bins", items.size(), type, bins.size());
+
+    bins.forEach(this::scheduleBin);
   }
 
-  public Set<OperationTypeDto> getRegisteredOperationTypes() {
-    return binPackers.keySet();
+  private List<TableOperationDto> loadAndDedupPending(
+      OperationTypeDto type, Optional<String> databaseName, Optional<String> tableName) {
+    // Unpaged: correctness requires the full PENDING set in one cycle; the working set is bounded
+    // by count(PENDING for this op type). Single-page truncation would silently drop work past
+    // page 0.
+    List<TableOperationsRow> pendingRows =
+        operationsRepo.find(
+            Optional.of(type.toDb()),
+            Optional.of(OperationStatus.PENDING),
+            Optional.empty(),
+            databaseName,
+            tableName,
+            Optional.empty(),
+            Optional.empty(),
+            Pageable.unpaged());
+    if (pendingRows.isEmpty()) {
+      log.info("No PENDING operations of type {}; nothing to schedule", type);
+      return List.of();
+    }
+    List<TableOperationsRow> survivors = cancelDuplicates(pendingRows);
+    return survivors.stream().map(TableOperationDto::fromRow).collect(Collectors.toList());
+  }
+
+  /**
+   * Group {@code pendingRows} by {@code tableUuid}; for any group with more than one row, cancel
+   * all but the oldest (lex-tiebreak on id). Returns survivors in input order. Deterministic.
+   */
+  private List<TableOperationsRow> cancelDuplicates(List<TableOperationsRow> pendingRows) {
+    Map<String, List<TableOperationsRow>> byTableUuid =
+        pendingRows.stream().collect(Collectors.groupingBy(TableOperationsRow::getTableUuid));
+
+    List<String> duplicateIds =
+        byTableUuid.values().stream()
+            .filter(rows -> rows.size() > 1)
+            .flatMap(
+                rows ->
+                    rows.stream()
+                        .sorted(
+                            Comparator.comparing(TableOperationsRow::getCreatedAt)
+                                .thenComparing(TableOperationsRow::getId))
+                        .skip(1))
+            .map(TableOperationsRow::getId)
+            .collect(Collectors.toList());
+
+    if (duplicateIds.isEmpty()) {
+      return pendingRows;
+    }
+
+    int cancelled = operationsRepo.cancel(duplicateIds);
+    log.warn("Cancelled {} duplicate PENDING rows", cancelled);
+
+    Set<String> cancelledIds = Set.copyOf(duplicateIds);
+    return pendingRows.stream()
+        .filter(r -> !cancelledIds.contains(r.getId()))
+        .collect(Collectors.toList());
+  }
+
+  private Map<String, TableStatsDto> loadStatsByUuid(List<TableOperationDto> ops) {
+    Set<String> uuids =
+        ops.stream().map(TableOperationDto::getTableUuid).collect(Collectors.toSet());
+    return statsRepo.findAllById(uuids).stream()
+        .collect(Collectors.toMap(TableStatsRow::getTableUuid, TableStatsDto::fromRow));
+  }
+
+  private List<BinItem> projectToItems(
+      List<TableOperationDto> pending,
+      Map<String, TableStatsDto> statsByUuid,
+      BinItem prototype,
+      OperationTypeDto type) {
+    List<BinItem> items =
+        pending.stream()
+            .filter(op -> statsByUuid.containsKey(op.getTableUuid()))
+            .map(op -> prototype.withOpAndStats(op, statsByUuid.get(op.getTableUuid())))
+            .collect(Collectors.toList());
+    int skipped = pending.size() - items.size();
+    if (skipped > 0) {
+      log.warn("Skipped {} {} operations with no table_stats row", skipped, type);
+    }
+    return items;
+  }
+
+  /**
+   * Claim the bin's operations, narrow to the rows actually owned, launch one batched Spark job for
+   * the claimed subset, and mark SCHEDULED — or revert to PENDING if launch failed.
+   */
+  @Transactional
+  void scheduleBin(Bin bin) {
+    List<BinItem> items = bin.getItems();
+    OperationTypeDto type = bin.getOperationType();
+    List<String> ids = items.stream().map(BinItem::getOperationId).collect(Collectors.toList());
+
+    Instant claimedAt = Instant.now();
+    operationsRepo.updateBatch(
+        ids,
+        OperationStatus.PENDING,
+        OperationStatus.SCHEDULING,
+        Optional.of(claimedAt),
+        Optional.empty());
+    List<String> claimedIds =
+        operationsRepo
+            .find(
+                Optional.empty(),
+                Optional.of(OperationStatus.SCHEDULING),
+                Optional.empty(),
+                Optional.empty(),
+                Optional.empty(),
+                Optional.of(claimedAt),
+                Optional.of(ids),
+                Pageable.unpaged())
+            .stream()
+            .map(TableOperationsRow::getId)
+            .collect(Collectors.toList());
+    if (claimedIds.isEmpty()) {
+      log.info("All rows in bin already claimed by another scheduler instance; skipping");
+      return;
+    }
+    if (claimedIds.size() < ids.size()) {
+      log.info(
+          "Partial claim: {} of {} ops in bin claimed; launching job for claimed subset only",
+          claimedIds.size(),
+          ids.size());
+    }
+
+    Set<String> claimedSet = new HashSet<>(claimedIds);
+    List<BinItem> claimedItems =
+        items.stream()
+            .filter(item -> claimedSet.contains(item.getOperationId()))
+            .collect(Collectors.toList());
+    List<String> tableNames =
+        claimedItems.stream().map(BinItem::getFullyQualifiedTableName).collect(Collectors.toList());
+    List<String> operationIds =
+        claimedItems.stream().map(BinItem::getOperationId).collect(Collectors.toList());
+
+    String jobTypeName = type.name();
+    String jobName = "batched-" + jobTypeName.toLowerCase() + "-" + claimedAt.toEpochMilli();
+    Optional<String> jobId =
+        jobsClient.launch(jobName, jobTypeName, tableNames, operationIds, resultsEndpoint);
+
+    if (jobId.isPresent()) {
+      int updated =
+          operationsRepo.updateBatch(
+              claimedIds,
+              OperationStatus.SCHEDULING,
+              OperationStatus.SCHEDULED,
+              Optional.empty(),
+              Optional.of(jobId.get()));
+      log.info(
+          "Submitted job {} for {} tables ({} rows marked SCHEDULED)",
+          jobId.get(),
+          claimedItems.size(),
+          updated);
+    } else {
+      int reverted =
+          operationsRepo.updateBatch(
+              claimedIds,
+              OperationStatus.SCHEDULING,
+              OperationStatus.PENDING,
+              Optional.empty(),
+              Optional.empty());
+      log.warn(
+          "Job submission failed; reverted {} claimed rows back to PENDING for retry on the next"
+              + " pass",
+          reverted);
+    }
   }
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
index e3dad4410..7105aae23 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/Bin.java
@@ -1,9 +1,20 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import java.util.List;
+import lombok.AllArgsConstructor;
+import lombok.Getter;
+import lombok.ToString;
+
 /**
- * A schedulable unit produced by a {@link BinPacker}. Each bin owns the work for a single Spark job
- * — claiming the operations it covers, launching, and recording the outcome.
+ * One scheduling unit: the operation type the bin will run as, and the items the scheduler will
+ * claim, narrow to claimed, and launch a single Spark job for. Pure data — the scheduler reads from
+ * a bin to do the work; the bin does no IO itself.
  */
-public interface Bin {
-  void schedule();
+@AllArgsConstructor
+@Getter
+@ToString
+public class Bin {
+  private final OperationTypeDto operationType;
+  private final List<BinItem> items;
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
index 72f4de278..e71531d8f 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
@@ -1,13 +1,23 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+
 /**
- * Smallest contract a {@link BinPacker} needs from each unit it packs: a single non-negative
- * weight. Implementations are operation-specific (see {@code
- * com.linkedin.openhouse.optimizer.operations.ofd.OfdBinItem}) and encode their own cost model in
- * {@link #getWeight()}. They also carry whatever identity the downstream dispatcher needs (table
- * name, operation id, etc.); those getters live on the impl, not on this interface, so the packer
- * stays a pure utility.
+ * One packable unit. Exposes the weight a packer keys on, plus the identity the scheduler reads
+ * when it launches a Spark job (fully-qualified table name, operation id).
+ *
+ * <p>{@link #withOpAndStats(TableOperationDto, TableStatsDto)} returns a new populated instance
+ * from a (pending operation, current stats) pair. Implementations have a no-arg constructor that
+ * makes a "seat" prototype suitable for calling {@code withOpAndStats(...)} on; getters on a seat
+ * are not meaningful.
  */
 public interface BinItem {
   long getWeight();
+
+  String getFullyQualifiedTableName();
+
+  String getOperationId();
+
+  BinItem withOpAndStats(TableOperationDto op, TableStatsDto stats);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
index 56ba78f06..87ac0eb1b 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
@@ -2,15 +2,13 @@
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import java.util.List;
-import java.util.Optional;
 
 /**
- * Per-operation-type orchestrator the scheduler dispatches to. The packer loads its PENDING work,
- * groups it into batches, and returns a {@link Bin} for each batch. The scheduler then asks each
- * bin to {@link Bin#schedule() schedule} itself.
+ * A stateless bucketing strategy. Given a flat list of {@link BinItem}s, returns one {@link Bin}
+ * per batch the scheduler should submit. Implementations do no IO and hold no mutable state.
  */
 public interface BinPacker {
   OperationTypeDto getOperationType();
 
-  List<Bin> prepare(Optional<String> databaseName, Optional<String> tableName);
+  List<Bin> pack(List<BinItem> items);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
similarity index 56%
rename from services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
rename to services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
index 7a6b9275e..1466bc321 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
@@ -1,27 +1,31 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import java.util.ArrayList;
 import java.util.Comparator;
 import java.util.List;
-import lombok.Builder;
+import lombok.AllArgsConstructor;
+import lombok.Getter;
 import lombok.extern.slf4j.Slf4j;
 
 /**
- * First-fit-decreasing packing algorithm. Sorts items by weight descending and places each into the
- * first group whose running totals stay at or below {@code maxWeightPerBin} and {@code
- * maxItemsPerBin}. An item that exceeds the weight cap on its own goes into a group by itself.
+ * First-fit-decreasing packing. Sorts items by weight descending, then places each into the first
+ * group whose totals stay at or below {@code maxWeightPerBin} and {@code maxItemsPerBin}. An item
+ * whose weight exceeds the cap on its own goes into a group by itself.
  *
- * <p>Returns flat groupings ({@code List<List<BinItem>>}). Callers wrap each grouping into the
- * {@link Bin} implementation they need for their operation type.
+ * <p>Stateless: the constructor takes only immutable configuration; {@link #pack(List)} is a pure
+ * function over its argument.
  */
 @Slf4j
-@Builder
-public class FirstFitDecreasingBinPacker {
+@AllArgsConstructor
+public class FirstFitBinPacker implements BinPacker {
 
+  @Getter private final OperationTypeDto operationType;
   private final long maxWeightPerBin;
   private final int maxItemsPerBin;
 
-  public List<List<BinItem>> pack(List<BinItem> items) {
+  @Override
+  public List<Bin> pack(List<BinItem> items) {
     if (items == null || items.isEmpty()) {
       return new ArrayList<>();
     }
@@ -29,8 +33,12 @@ public List<List<BinItem>> pack(List<BinItem> items) {
         items.stream()
             .sorted(Comparator.comparingLong(BinItem::getWeight).reversed())
             .collect(ArrayList::new, this::placeItem, List::addAll);
-    log.info("Packed {} items into {} groupings", items.size(), bins.size());
-    return bins.stream().map(b -> b.items).collect(java.util.stream.Collectors.toList());
+    log.info("Packed {} items into {} bins", items.size(), bins.size());
+    List<Bin> result = new ArrayList<>(bins.size());
+    for (PackingBin pb : bins) {
+      result.add(new Bin(operationType, pb.items));
+    }
+    return result;
   }
 
   private void placeItem(List<PackingBin> bins, BinItem item) {
@@ -46,7 +54,7 @@ private void placeItem(List<PackingBin> bins, BinItem item) {
             });
   }
 
-  /** Per-bin running-totals helper used during the fold. Hidden from callers. */
+  /** Running-totals helper used during the fold. */
   private static class PackingBin {
     final List<BinItem> items = new ArrayList<>();
     long totalWeight;
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
new file mode 100644
index 000000000..92f56c8db
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
@@ -0,0 +1,45 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import java.util.Optional;
+import lombok.Getter;
+import lombok.ToString;
+
+/**
+ * {@link BinItem} that weights by the table's current file count. Suitable for any operation whose
+ * Spark cost scales with file count — orphan files deletion, stats collection, etc. The
+ * implementation knows nothing about which operation type is using it.
+ */
+@Getter
+@ToString
+public class TotalFilesBinItem implements BinItem {
+
+  private final String fullyQualifiedTableName;
+  private final String operationId;
+  private final long weight;
+
+  /** Seat constructor: call {@link #withOpAndStats} to get a populated instance. */
+  public TotalFilesBinItem() {
+    this("", "", 0L);
+  }
+
+  private TotalFilesBinItem(String fullyQualifiedTableName, String operationId, long weight) {
+    this.fullyQualifiedTableName = fullyQualifiedTableName;
+    this.operationId = operationId;
+    this.weight = weight;
+  }
+
+  @Override
+  public BinItem withOpAndStats(TableOperationDto op, TableStatsDto stats) {
+    return new TotalFilesBinItem(
+        op.getDatabaseName() + "." + op.getTableName(), op.getId(), currentFileCount(stats));
+  }
+
+  private static long currentFileCount(TableStatsDto stats) {
+    return Optional.ofNullable(stats)
+        .map(TableStatsDto::getSnapshot)
+        .map(TableStatsDto.SnapshotMetrics::getNumCurrentFiles)
+        .orElse(0L);
+  }
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index be2f97cf7..f2699527c 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -1,5 +1,9 @@
 package com.linkedin.openhouse.optimizer.scheduler.config;
 
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.scheduler.BinPackerRegistration;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesBinItem;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.context.annotation.Bean;
@@ -7,9 +11,10 @@
 import org.springframework.web.reactive.function.client.WebClient;
 
 /**
- * Cross-cutting wiring shared across operation types: the jobs-service HTTP client and its cluster
- * id. Per-operation configuration (caps, projection logic, launch args) lives with the operation's
- * own {@link com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker} implementation.
+ * Cross-cutting wiring (jobs-service client) plus the per-operation-type {@link
+ * BinPackerRegistration} beans. The registration is the one place each operation's identity (type,
+ * packing strategy, item prototype) is composed; the scheduler itself never names an operation type
+ * beyond the keys in its registry.
  */
 @Configuration
 public class SchedulerConfig {
@@ -29,4 +34,20 @@ public WebClient jobsWebClient() {
   public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
     return new JobsServiceClient(jobsWebClient, clusterId);
   }
+
+  /**
+   * Orphan files deletion: a {@link FirstFitBinPacker} over {@link TotalFilesBinItem}. Cost scales
+   * with file count — per-file list, manifest joins, and delete calls dominate independent of file
+   * size.
+   */
+  @Bean
+  public BinPackerRegistration ofdRegistration(
+      @Value("${optimizer.scheduler.ofd.max-files-per-bin}") long maxFilesPerBin,
+      @Value("${optimizer.scheduler.ofd.max-tables-per-bin}") int maxTablesPerBin) {
+    return new BinPackerRegistration(
+        OperationTypeDto.ORPHAN_FILES_DELETION,
+        new FirstFitBinPacker(
+            OperationTypeDto.ORPHAN_FILES_DELETION, maxFilesPerBin, maxTablesPerBin),
+        new TotalFilesBinItem());
+  }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java
deleted file mode 100644
index 4d5d1bba8..000000000
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinPackerTest.java
+++ /dev/null
@@ -1,173 +0,0 @@
-package com.linkedin.openhouse.optimizer.operations.ofd;
-
-import static org.assertj.core.api.Assertions.assertThat;
-import static org.mockito.ArgumentMatchers.any;
-import static org.mockito.ArgumentMatchers.anyList;
-import static org.mockito.ArgumentMatchers.eq;
-import static org.mockito.Mockito.never;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.when;
-
-import com.linkedin.openhouse.optimizer.db.OperationStatus;
-import com.linkedin.openhouse.optimizer.db.SnapshotMetrics;
-import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
-import com.linkedin.openhouse.optimizer.db.TableStatsRow;
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
-import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
-import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.time.Instant;
-import java.util.List;
-import java.util.Optional;
-import java.util.UUID;
-import org.junit.jupiter.api.BeforeEach;
-import org.junit.jupiter.api.Test;
-import org.junit.jupiter.api.extension.ExtendWith;
-import org.mockito.ArgumentCaptor;
-import org.mockito.Mock;
-import org.mockito.junit.jupiter.MockitoExtension;
-
-@ExtendWith(MockitoExtension.class)
-class OfdBinPackerTest {
-
-  private static final com.linkedin.openhouse.optimizer.db.OperationType OFD_DB =
-      com.linkedin.openhouse.optimizer.db.OperationType.ORPHAN_FILES_DELETION;
-  private static final String RESULTS_ENDPOINT = "http://localhost:8080/v1/optimizer/operations";
-  private static final long MAX_FILES_PER_BIN = 1_000_000L;
-  private static final int MAX_TABLES_PER_BIN = 50;
-
-  @Mock private TableOperationsRepository operationsRepo;
-  @Mock private TableStatsRepository statsRepo;
-  @Mock private JobsServiceClient jobsClient;
-
-  private OfdBinPacker packer;
-
-  @BeforeEach
-  void setUp() {
-    packer =
-        new OfdBinPacker(
-            MAX_FILES_PER_BIN,
-            MAX_TABLES_PER_BIN,
-            operationsRepo,
-            statsRepo,
-            jobsClient,
-            RESULTS_ENDPOINT);
-  }
-
-  // ---- Helpers ----
-
-  private void stubFindPending(List<TableOperationsRow> rows) {
-    when(operationsRepo.find(
-            eq(Optional.of(OFD_DB)),
-            eq(Optional.of(OperationStatus.PENDING)),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            any()))
-        .thenReturn(rows);
-  }
-
-  private TableOperationsRow pendingRow(String uuid, String db, String table) {
-    return TableOperationsRow.builder()
-        .id(UUID.randomUUID().toString())
-        .tableUuid(uuid)
-        .databaseName(db)
-        .tableName(table)
-        .operationType(OFD_DB)
-        .status(OperationStatus.PENDING)
-        .createdAt(Instant.now())
-        .build();
-  }
-
-  private TableStatsRow statsRow(String uuid, long numCurrentFiles) {
-    return TableStatsRow.builder()
-        .tableUuid(uuid)
-        .snapshot(SnapshotMetrics.builder().numCurrentFiles(numCurrentFiles).build())
-        .build();
-  }
-
-  // ---- Tests ----
-
-  @Test
-  void prepare_noPending_returnsEmpty() {
-    stubFindPending(List.of());
-
-    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
-
-    assertThat(bins).isEmpty();
-    verify(statsRepo, never()).findAllById(any());
-  }
-
-  @Test
-  void prepare_allOpsWithoutStats_returnsEmpty() {
-    TableOperationsRow row = pendingRow(UUID.randomUUID().toString(), "db1", "tbl1");
-    stubFindPending(List.of(row));
-    when(statsRepo.findAllById(any())).thenReturn(List.of());
-
-    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
-
-    assertThat(bins).isEmpty();
-  }
-
-  @Test
-  void prepare_singleOpWithStats_returnsOneBin() {
-    String uuid = UUID.randomUUID().toString();
-    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
-    stubFindPending(List.of(row));
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-
-    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
-
-    assertThat(bins).hasSize(1);
-  }
-
-  @Test
-  void prepare_cancelsDuplicatePendingPerCycle() {
-    String uuid = UUID.randomUUID().toString();
-    TableOperationsRow row1 = pendingRow(uuid, "db1", "tbl1");
-    TableOperationsRow row2 = pendingRow(uuid, "db1", "tbl1");
-    stubFindPending(List.of(row1, row2));
-    when(operationsRepo.cancel(anyList())).thenReturn(1);
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-
-    packer.prepare(Optional.empty(), Optional.empty());
-
-    ArgumentCaptor<List<String>> cancelled = ArgumentCaptor.forClass(List.class);
-    verify(operationsRepo).cancel(cancelled.capture());
-    assertThat(cancelled.getValue()).hasSize(1);
-  }
-
-  @Test
-  void prepare_skipsOpsWithoutStats_includesOnlyThoseWithStats() {
-    String withStats = UUID.randomUUID().toString();
-    String missing = UUID.randomUUID().toString();
-    TableOperationsRow withStatsRow = pendingRow(withStats, "db1", "tblA");
-    TableOperationsRow missingRow = pendingRow(missing, "db1", "tblB");
-    stubFindPending(List.of(withStatsRow, missingRow));
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(withStats, 50L)));
-
-    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
-
-    assertThat(bins).hasSize(1);
-  }
-
-  @Test
-  void prepare_packerReturnsBinsThatAreOfdBins() {
-    String uuid = UUID.randomUUID().toString();
-    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
-    stubFindPending(List.of(row));
-    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
-
-    List<Bin> bins = packer.prepare(Optional.empty(), Optional.empty());
-
-    assertThat(bins).allMatch(b -> b instanceof OfdBin);
-  }
-
-  @Test
-  void getOperationType_returnsOrphanFilesDeletion() {
-    assertThat(packer.getOperationType()).isEqualTo(OperationTypeDto.ORPHAN_FILES_DELETION);
-  }
-}
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java
deleted file mode 100644
index ac1700f1e..000000000
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/operations/ofd/OfdBinTest.java
+++ /dev/null
@@ -1,172 +0,0 @@
-package com.linkedin.openhouse.optimizer.operations.ofd;
-
-import static org.assertj.core.api.Assertions.assertThat;
-import static org.mockito.ArgumentMatchers.any;
-import static org.mockito.ArgumentMatchers.anyList;
-import static org.mockito.ArgumentMatchers.anyString;
-import static org.mockito.ArgumentMatchers.eq;
-import static org.mockito.Mockito.never;
-import static org.mockito.Mockito.verify;
-import static org.mockito.Mockito.when;
-
-import com.linkedin.openhouse.optimizer.db.OperationStatus;
-import com.linkedin.openhouse.optimizer.db.OperationType;
-import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
-import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import java.time.Instant;
-import java.util.List;
-import java.util.Optional;
-import java.util.UUID;
-import org.junit.jupiter.api.Test;
-import org.junit.jupiter.api.extension.ExtendWith;
-import org.mockito.ArgumentCaptor;
-import org.mockito.Mock;
-import org.mockito.junit.jupiter.MockitoExtension;
-
-@ExtendWith(MockitoExtension.class)
-class OfdBinTest {
-
-  private static final String RESULTS_ENDPOINT = "http://localhost:8080/v1/optimizer/operations";
-
-  @Mock private TableOperationsRepository operationsRepo;
-  @Mock private JobsServiceClient jobsClient;
-
-  private static OfdBinItem item(String fqtn) {
-    return new OfdBinItem(fqtn, UUID.randomUUID().toString(), 100L);
-  }
-
-  private void stubFindClaimed(List<TableOperationsRow> rows) {
-    when(operationsRepo.find(
-            eq(Optional.empty()),
-            eq(Optional.of(OperationStatus.SCHEDULING)),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            eq(Optional.empty()),
-            any(),
-            any(),
-            any()))
-        .thenReturn(rows);
-  }
-
-  private TableOperationsRow schedulingRow(String opId) {
-    return TableOperationsRow.builder()
-        .id(opId)
-        .tableUuid(UUID.randomUUID().toString())
-        .databaseName("db")
-        .tableName("tbl")
-        .operationType(OperationType.ORPHAN_FILES_DELETION)
-        .status(OperationStatus.SCHEDULING)
-        .createdAt(Instant.now())
-        .build();
-  }
-
-  @Test
-  void schedule_singleBin_claimsAndMarksScheduled() {
-    OfdBinItem one = item("db1.tbl1");
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    stubFindClaimed(List.of(schedulingRow(one.getOperationId())));
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
-        .thenReturn(1);
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.of("job-123"));
-
-    new OfdBin(List.of(one), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
-
-    verify(operationsRepo)
-        .updateBatch(
-            eq(List.of(one.getOperationId())),
-            eq(OperationStatus.SCHEDULING),
-            eq(OperationStatus.SCHEDULED),
-            eq(Optional.empty()),
-            eq(Optional.of("job-123")));
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
-
-    ArgumentCaptor<List<String>> tableNames = ArgumentCaptor.forClass(List.class);
-    verify(jobsClient)
-        .launch(
-            anyString(), eq("ORPHAN_FILES_DELETION"), tableNames.capture(), anyList(), anyString());
-    assertThat(tableNames.getValue()).containsExactly("db1.tbl1");
-  }
-
-  @Test
-  void schedule_jobLaunchFails_revertsToPending() {
-    OfdBinItem one = item("db1.tbl1");
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    stubFindClaimed(List.of(schedulingRow(one.getOperationId())));
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.empty());
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any()))
-        .thenReturn(1);
-
-    new OfdBin(List.of(one), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
-
-    verify(operationsRepo)
-        .updateBatch(
-            eq(List.of(one.getOperationId())),
-            eq(OperationStatus.SCHEDULING),
-            eq(OperationStatus.PENDING),
-            eq(Optional.empty()),
-            eq(Optional.empty()));
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
-  }
-
-  @Test
-  void schedule_rowsAlreadyClaimed_skipsSubmit() {
-    OfdBinItem one = item("db1.tbl1");
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(0);
-    stubFindClaimed(List.of());
-
-    new OfdBin(List.of(one), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
-
-    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
-    verify(operationsRepo, never())
-        .updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
-  }
-
-  @Test
-  void schedule_partialClaim_launchesOnlyClaimedSubset() {
-    OfdBinItem a = item("db1.tblA");
-    OfdBinItem b = item("db1.tblB");
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
-        .thenReturn(1);
-    // Only A actually claimed.
-    stubFindClaimed(List.of(schedulingRow(a.getOperationId())));
-    when(operationsRepo.updateBatch(
-            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
-        .thenReturn(1);
-    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
-        .thenReturn(Optional.of("job-partial"));
-
-    new OfdBin(List.of(a, b), operationsRepo, jobsClient, RESULTS_ENDPOINT).schedule();
-
-    ArgumentCaptor<List<String>> launchedTableNames = ArgumentCaptor.forClass(List.class);
-    ArgumentCaptor<List<String>> launchedOpIds = ArgumentCaptor.forClass(List.class);
-    verify(jobsClient)
-        .launch(
-            anyString(),
-            anyString(),
-            launchedTableNames.capture(),
-            launchedOpIds.capture(),
-            anyString());
-    assertThat(launchedTableNames.getValue()).containsExactly("db1.tblA");
-    assertThat(launchedOpIds.getValue()).containsExactly(a.getOperationId());
-  }
-}
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index d42fb976e..35ad08871 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -1,59 +1,343 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
+import static org.assertj.core.api.Assertions.assertThat;
 import static org.assertj.core.api.Assertions.assertThatThrownBy;
 import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.ArgumentMatchers.anyList;
+import static org.mockito.ArgumentMatchers.anyString;
 import static org.mockito.ArgumentMatchers.eq;
-import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.never;
 import static org.mockito.Mockito.verify;
 import static org.mockito.Mockito.when;
 
+import com.linkedin.openhouse.optimizer.db.OperationStatus;
+import com.linkedin.openhouse.optimizer.db.SnapshotMetrics;
+import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
+import com.linkedin.openhouse.optimizer.db.TableStatsRow;
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.Bin;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
+import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
+import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesBinItem;
+import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
+import java.time.Instant;
 import java.util.List;
 import java.util.Optional;
+import java.util.UUID;
+import org.junit.jupiter.api.BeforeEach;
 import org.junit.jupiter.api.Test;
 import org.junit.jupiter.api.extension.ExtendWith;
+import org.mockito.ArgumentCaptor;
 import org.mockito.Mock;
 import org.mockito.junit.jupiter.MockitoExtension;
 
 @ExtendWith(MockitoExtension.class)
 class SchedulerRunnerTest {
 
-  @Mock private BinPacker packer;
-  @Mock private Bin bin1;
-  @Mock private Bin bin2;
+  private static final OperationTypeDto OFD = OperationTypeDto.ORPHAN_FILES_DELETION;
+  private static final com.linkedin.openhouse.optimizer.db.OperationType OFD_DB =
+      com.linkedin.openhouse.optimizer.db.OperationType.ORPHAN_FILES_DELETION;
+  private static final String RESULTS_ENDPOINT = "http://localhost:8080/v1/optimizer/operations";
+
+  @Mock private TableOperationsRepository operationsRepo;
+  @Mock private TableStatsRepository statsRepo;
+  @Mock private JobsServiceClient jobsClient;
+
+  private SchedulerRunner runner;
+
+  @BeforeEach
+  void setUp() {
+    // A real packer + real prototype — the runner exercises the full pipeline against actual
+    // bucketing + projection logic, while the IO is mocked.
+    BinPackerRegistration ofdReg =
+        new BinPackerRegistration(
+            OFD, new FirstFitBinPacker(OFD, 1_000_000L, 50), new TotalFilesBinItem());
+    runner =
+        new SchedulerRunner(
+            operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT, List.of(ofdReg));
+  }
+
+  // ---- Stubbing helpers ----
+
+  private void stubFindPending(List<TableOperationsRow> rows) {
+    when(operationsRepo.find(
+            eq(Optional.of(OFD_DB)),
+            eq(Optional.of(OperationStatus.PENDING)),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            any()))
+        .thenReturn(rows);
+  }
+
+  private void stubFindClaimed(List<TableOperationsRow> rows) {
+    when(operationsRepo.find(
+            eq(Optional.empty()),
+            eq(Optional.of(OperationStatus.SCHEDULING)),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            eq(Optional.empty()),
+            any(),
+            any(),
+            any()))
+        .thenReturn(rows);
+  }
+
+  private TableOperationsRow pendingRow(String uuid, String db, String table) {
+    return TableOperationsRow.builder()
+        .id(UUID.randomUUID().toString())
+        .tableUuid(uuid)
+        .databaseName(db)
+        .tableName(table)
+        .operationType(OFD_DB)
+        .status(OperationStatus.PENDING)
+        .createdAt(Instant.now())
+        .build();
+  }
+
+  private TableOperationsRow schedulingRow(TableOperationsRow source) {
+    return source.toBuilder().status(OperationStatus.SCHEDULING).build();
+  }
+
+  private TableStatsRow statsRow(String uuid, long numCurrentFiles) {
+    return TableStatsRow.builder()
+        .tableUuid(uuid)
+        .snapshot(SnapshotMetrics.builder().numCurrentFiles(numCurrentFiles).build())
+        .build();
+  }
+
+  // ---- Tests ----
 
   @Test
   void schedule_unknownOperationType_throws() {
-    SchedulerRunner runner = new SchedulerRunner(List.of());
+    SchedulerRunner empty =
+        new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT, List.of());
 
-    assertThatThrownBy(() -> runner.schedule(OperationTypeDto.ORPHAN_FILES_DELETION))
+    assertThatThrownBy(() -> empty.schedule(OFD))
         .isInstanceOf(IllegalStateException.class)
         .hasMessageContaining("No BinPacker registered");
   }
 
   @Test
-  void schedule_delegatesToPackerAndSchedulesEachBin() {
-    when(packer.getOperationType()).thenReturn(OperationTypeDto.ORPHAN_FILES_DELETION);
-    when(packer.prepare(any(), any())).thenReturn(List.of(bin1, bin2));
+  void getRegisteredOperationTypes_returnsRegisteredSet() {
+    assertThat(runner.getRegisteredOperationTypes()).containsExactly(OFD);
+  }
+
+  @Test
+  void schedule_noPendingOps_noJobSubmitted() {
+    stubFindPending(List.of());
+
+    runner.schedule(OFD);
+
+    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
+  }
+
+  @Test
+  void schedule_allOpsWithoutStats_noJobSubmitted() {
+    TableOperationsRow row = pendingRow(UUID.randomUUID().toString(), "db1", "tbl1");
+    stubFindPending(List.of(row));
+    when(statsRepo.findAllById(any())).thenReturn(List.of());
 
-    SchedulerRunner runner = new SchedulerRunner(List.of(packer));
-    runner.schedule(OperationTypeDto.ORPHAN_FILES_DELETION);
+    runner.schedule(OFD);
 
-    verify(packer).prepare(eq(Optional.empty()), eq(Optional.empty()));
-    verify(bin1, times(1)).schedule();
-    verify(bin2, times(1)).schedule();
+    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
   }
 
   @Test
-  void schedule_passesScopeArgsThrough() {
-    when(packer.getOperationType()).thenReturn(OperationTypeDto.ORPHAN_FILES_DELETION);
-    when(packer.prepare(any(), any())).thenReturn(List.of());
+  void schedule_singleBin_claimsAndMarksScheduled() {
+    String uuid = UUID.randomUUID().toString();
+    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
+
+    stubFindPending(List.of(row));
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100_000L)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    stubFindClaimed(List.of(schedulingRow(row)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
+        .thenReturn(1);
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.of("job-123"));
+
+    runner.schedule(OFD);
+
+    verify(operationsRepo)
+        .updateBatch(
+            eq(List.of(row.getId())),
+            eq(OperationStatus.SCHEDULING),
+            eq(OperationStatus.SCHEDULED),
+            eq(Optional.empty()),
+            eq(Optional.of("job-123")));
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
+
+    ArgumentCaptor<List<String>> tableNames = ArgumentCaptor.forClass(List.class);
+    verify(jobsClient)
+        .launch(anyString(), eq(OFD.name()), tableNames.capture(), anyList(), anyString());
+    assertThat(tableNames.getValue()).containsExactly("db1.tbl1");
+  }
+
+  @Test
+  void schedule_jobLaunchFails_marksPendingForRetry() {
+    String uuid = UUID.randomUUID().toString();
+    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
+
+    stubFindPending(List.of(row));
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    stubFindClaimed(List.of(schedulingRow(row)));
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.empty());
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any()))
+        .thenReturn(1);
+
+    runner.schedule(OFD);
+
+    verify(operationsRepo)
+        .updateBatch(
+            eq(List.of(row.getId())),
+            eq(OperationStatus.SCHEDULING),
+            eq(OperationStatus.PENDING),
+            eq(Optional.empty()),
+            eq(Optional.empty()));
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
+  }
+
+  @Test
+  void schedule_rowsAlreadyClaimed_skipsSubmit() {
+    String uuid = UUID.randomUUID().toString();
+    TableOperationsRow row = pendingRow(uuid, "db1", "tbl1");
+
+    stubFindPending(List.of(row));
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(0);
+    stubFindClaimed(List.of());
+
+    runner.schedule(OFD);
+
+    verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any());
+    verify(operationsRepo, never())
+        .updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any());
+  }
+
+  @Test
+  void schedule_cancelsDuplicatePendingPerCycle() {
+    String uuid = UUID.randomUUID().toString();
+    TableOperationsRow row1 = pendingRow(uuid, "db1", "tbl1");
+    TableOperationsRow row2 = pendingRow(uuid, "db1", "tbl1");
+
+    stubFindPending(List.of(row1, row2));
+    when(operationsRepo.cancel(anyList())).thenReturn(1);
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(uuid, 100L)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    TableOperationsRow survivor = row1.getCreatedAt().isBefore(row2.getCreatedAt()) ? row1 : row2;
+    if (row1.getCreatedAt().equals(row2.getCreatedAt())) {
+      survivor = row1.getId().compareTo(row2.getId()) <= 0 ? row1 : row2;
+    }
+    stubFindClaimed(List.of(schedulingRow(survivor)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
+        .thenReturn(1);
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.of("job-dedup"));
+
+    runner.schedule(OFD);
+
+    ArgumentCaptor<List<String>> cancelled = ArgumentCaptor.forClass(List.class);
+    verify(operationsRepo).cancel(cancelled.capture());
+    assertThat(cancelled.getValue()).hasSize(1);
+  }
+
+  @Test
+  void schedule_partialClaim_launchesAndMarksOnlyClaimedSubset() {
+    String uuidA = UUID.randomUUID().toString();
+    String uuidB = UUID.randomUUID().toString();
+    TableOperationsRow rowA = pendingRow(uuidA, "db1", "tblA");
+    TableOperationsRow rowB = pendingRow(uuidB, "db1", "tblB");
+
+    stubFindPending(List.of(rowA, rowB));
+    when(statsRepo.findAllById(any()))
+        .thenReturn(List.of(statsRow(uuidA, 100L), statsRow(uuidB, 100L)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    // Only A actually claimed.
+    stubFindClaimed(List.of(schedulingRow(rowA)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
+        .thenReturn(1);
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.of("job-partial"));
+
+    runner.schedule(OFD);
+
+    ArgumentCaptor<List<String>> launchedTableNames = ArgumentCaptor.forClass(List.class);
+    ArgumentCaptor<List<String>> launchedOpIds = ArgumentCaptor.forClass(List.class);
+    verify(jobsClient)
+        .launch(
+            anyString(),
+            anyString(),
+            launchedTableNames.capture(),
+            launchedOpIds.capture(),
+            anyString());
+    assertThat(launchedTableNames.getValue()).containsExactly("db1.tblA");
+    assertThat(launchedOpIds.getValue()).containsExactly(rowA.getId());
+
+    verify(operationsRepo)
+        .updateBatch(
+            eq(List.of(rowA.getId())),
+            eq(OperationStatus.SCHEDULING),
+            eq(OperationStatus.SCHEDULED),
+            eq(Optional.empty()),
+            eq(Optional.of("job-partial")));
+  }
+
+  @Test
+  void schedule_opsWithoutStats_skipped() {
+    String withStats = UUID.randomUUID().toString();
+    String missing = UUID.randomUUID().toString();
+    TableOperationsRow withStatsRow = pendingRow(withStats, "db1", "tblA");
+    TableOperationsRow missingRow = pendingRow(missing, "db1", "tblB");
+
+    stubFindPending(List.of(withStatsRow, missingRow));
+    when(statsRepo.findAllById(any())).thenReturn(List.of(statsRow(withStats, 50L)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.PENDING), eq(OperationStatus.SCHEDULING), any(), any()))
+        .thenReturn(1);
+    stubFindClaimed(List.of(schedulingRow(withStatsRow)));
+    when(operationsRepo.updateBatch(
+            anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.SCHEDULED), any(), any()))
+        .thenReturn(1);
+    when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
+        .thenReturn(Optional.of("job-skip"));
 
-    SchedulerRunner runner = new SchedulerRunner(List.of(packer));
-    runner.schedule(OperationTypeDto.ORPHAN_FILES_DELETION, Optional.of("db1"), Optional.of("t1"));
+    runner.schedule(OFD);
 
-    verify(packer).prepare(eq(Optional.of("db1")), eq(Optional.of("t1")));
+    ArgumentCaptor<List<String>> ids = ArgumentCaptor.forClass(List.class);
+    verify(operationsRepo)
+        .updateBatch(
+            ids.capture(),
+            eq(OperationStatus.PENDING),
+            eq(OperationStatus.SCHEDULING),
+            any(),
+            any());
+    assertThat(ids.getValue()).containsExactly(withStatsRow.getId());
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
new file mode 100644
index 000000000..ad4aa313c
--- /dev/null
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
@@ -0,0 +1,119 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import java.util.List;
+import java.util.stream.Collectors;
+import lombok.AllArgsConstructor;
+import lombok.Getter;
+import org.junit.jupiter.api.Test;
+
+class FirstFitBinPackerTest {
+
+  private static final OperationTypeDto TYPE = OperationTypeDto.ORPHAN_FILES_DELETION;
+
+  @AllArgsConstructor
+  @Getter
+  static class TestItem implements BinItem {
+    private final String id;
+    private final long weight;
+
+    @Override
+    public String getFullyQualifiedTableName() {
+      return "db.tbl_" + id;
+    }
+
+    @Override
+    public String getOperationId() {
+      return "op-" + id;
+    }
+
+    @Override
+    public BinItem withOpAndStats(TableOperationDto op, TableStatsDto stats) {
+      throw new UnsupportedOperationException("test items are not used as prototypes");
+    }
+  }
+
+  private static TestItem item(String id, long weight) {
+    return new TestItem(id, weight);
+  }
+
+  @Test
+  void emptyInput_returnsEmptyBins() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 100L, 10);
+    assertThat(packer.pack(List.of())).isEmpty();
+  }
+
+  @Test
+  void singleItem_oneBin() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 10);
+    List<Bin> bins = packer.pack(List.of(item("a", 100L)));
+    assertThat(bins).hasSize(1);
+    assertThat(bins.get(0).getItems()).hasSize(1);
+    assertThat(bins.get(0).getOperationType()).isEqualTo(TYPE);
+  }
+
+  @Test
+  void underWeightLimit_oneBin() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 10);
+    List<Bin> bins =
+        packer.pack(List.of(item("a", 300_000L), item("b", 300_000L), item("c", 300_000L)));
+    assertThat(bins).hasSize(1);
+    assertThat(bins.get(0).getItems()).hasSize(3);
+  }
+
+  @Test
+  void overWeightLimit_twoBins() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 10);
+    List<Bin> bins =
+        packer.pack(List.of(item("a", 600_000L), item("b", 600_000L), item("c", 400_000L)));
+    assertThat(bins).hasSize(2);
+    // FFD: sort desc → 600, 600, 400. Place 600 → bin0; next 600 doesn't fit bin0, → bin1; 400
+    // fits bin0 (total 1_000_000).
+    long b0 = bins.get(0).getItems().stream().mapToLong(BinItem::getWeight).sum();
+    long b1 = bins.get(1).getItems().stream().mapToLong(BinItem::getWeight).sum();
+    assertThat(b0).isEqualTo(1_000_000L);
+    assertThat(b1).isEqualTo(600_000L);
+  }
+
+  @Test
+  void itemLargerThanCap_getsOwnBin() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000L, 10);
+    List<Bin> bins = packer.pack(List.of(item("big", 5_000L)));
+    assertThat(bins).hasSize(1);
+    assertThat(bins.get(0).getItems()).hasSize(1);
+  }
+
+  @Test
+  void sortedDescending_largestFirst() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 2_000_000L, 10);
+    List<Bin> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
+    assertThat(bins).hasSize(1);
+    List<String> ids =
+        bins.get(0).getItems().stream()
+            .map(TestItem.class::cast)
+            .map(TestItem::getId)
+            .collect(Collectors.toList());
+    assertThat(ids).containsExactly("large", "small");
+  }
+
+  @Test
+  void maxItemsCap_splitsBins() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 2);
+    List<Bin> bins =
+        packer.pack(List.of(item("a", 1L), item("b", 1L), item("c", 1L), item("d", 1L)));
+    assertThat(bins).hasSize(2);
+    assertThat(bins.get(0).getItems()).hasSize(2);
+    assertThat(bins.get(1).getItems()).hasSize(2);
+  }
+
+  @Test
+  void binsCarryConfiguredOperationType() {
+    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 100L, 10);
+    List<Bin> bins = packer.pack(List.of(item("a", 1L)));
+    assertThat(bins.get(0).getOperationType()).isEqualTo(TYPE);
+  }
+}
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
deleted file mode 100644
index e2efa2ce3..000000000
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitDecreasingBinPackerTest.java
+++ /dev/null
@@ -1,113 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler.binpack;
-
-import static org.assertj.core.api.Assertions.assertThat;
-
-import java.util.List;
-import java.util.stream.Collectors;
-import lombok.AllArgsConstructor;
-import lombok.Getter;
-import org.junit.jupiter.api.Test;
-
-class FirstFitDecreasingBinPackerTest {
-
-  @AllArgsConstructor
-  @Getter
-  static class TestItem implements BinItem {
-    private final String id;
-    private final long weight;
-  }
-
-  private static TestItem item(String id, long weight) {
-    return new TestItem(id, weight);
-  }
-
-  @Test
-  void emptyInput_returnsEmptyGroupings() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(100L).maxItemsPerBin(10).build();
-    assertThat(packer.pack(List.of())).isEmpty();
-  }
-
-  @Test
-  void singleItem_oneGrouping() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(1_000_000L)
-            .maxItemsPerBin(10)
-            .build();
-    List<List<BinItem>> groupings = packer.pack(List.of(item("a", 100L)));
-    assertThat(groupings).hasSize(1);
-    assertThat(groupings.get(0)).hasSize(1);
-  }
-
-  @Test
-  void underWeightLimit_oneGrouping() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(1_000_000L)
-            .maxItemsPerBin(10)
-            .build();
-    List<List<BinItem>> groupings =
-        packer.pack(List.of(item("a", 300_000L), item("b", 300_000L), item("c", 300_000L)));
-    assertThat(groupings).hasSize(1);
-    assertThat(groupings.get(0)).hasSize(3);
-    long total = groupings.get(0).stream().mapToLong(BinItem::getWeight).sum();
-    assertThat(total).isEqualTo(900_000L);
-  }
-
-  @Test
-  void overWeightLimit_twoGroupings() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(1_000_000L)
-            .maxItemsPerBin(10)
-            .build();
-    List<List<BinItem>> groupings =
-        packer.pack(List.of(item("a", 600_000L), item("b", 600_000L), item("c", 400_000L)));
-    assertThat(groupings).hasSize(2);
-    // FFD: sort desc → 600, 600, 400. Place 600 → group0; next 600 doesn't fit group0 → group1;
-    // 400 fits group0 (total 1_000_000).
-    long g0Total = groupings.get(0).stream().mapToLong(BinItem::getWeight).sum();
-    long g1Total = groupings.get(1).stream().mapToLong(BinItem::getWeight).sum();
-    assertThat(g0Total).isEqualTo(1_000_000L);
-    assertThat(g1Total).isEqualTo(600_000L);
-  }
-
-  @Test
-  void itemLargerThanCap_getsOwnGrouping() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000L).maxItemsPerBin(10).build();
-    List<List<BinItem>> groupings = packer.pack(List.of(item("big", 5_000L)));
-    assertThat(groupings).hasSize(1);
-    assertThat(groupings.get(0)).hasSize(1);
-  }
-
-  @Test
-  void sortedDescending_largestFirst() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder()
-            .maxWeightPerBin(2_000_000L)
-            .maxItemsPerBin(10)
-            .build();
-    List<List<BinItem>> groupings =
-        packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
-    assertThat(groupings).hasSize(1);
-    List<String> ids =
-        groupings.get(0).stream()
-            .map(TestItem.class::cast)
-            .map(TestItem::getId)
-            .collect(Collectors.toList());
-    assertThat(ids).containsExactly("large", "small");
-  }
-
-  @Test
-  void maxItemsCap_splitsGroupings() {
-    FirstFitDecreasingBinPacker packer =
-        FirstFitDecreasingBinPacker.builder().maxWeightPerBin(1_000_000L).maxItemsPerBin(2).build();
-    List<List<BinItem>> groupings =
-        packer.pack(List.of(item("a", 1L), item("b", 1L), item("c", 1L), item("d", 1L)));
-    assertThat(groupings).hasSize(2);
-    assertThat(groupings.get(0)).hasSize(2);
-    assertThat(groupings.get(1)).hasSize(2);
-  }
-}
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
new file mode 100644
index 000000000..3d1cb802c
--- /dev/null
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
@@ -0,0 +1,70 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import java.util.UUID;
+import org.junit.jupiter.api.Test;
+
+class TotalFilesBinItemTest {
+
+  private static TableOperationDto op() {
+    return TableOperationDto.builder()
+        .id(UUID.randomUUID().toString())
+        .tableUuid(UUID.randomUUID().toString())
+        .databaseName("db1")
+        .tableName("tbl1")
+        .operationType(OperationTypeDto.ORPHAN_FILES_DELETION)
+        .build();
+  }
+
+  private static TableStatsDto statsWithFiles(Long fileCount) {
+    return TableStatsDto.builder()
+        .snapshot(TableStatsDto.SnapshotMetrics.builder().numCurrentFiles(fileCount).build())
+        .build();
+  }
+
+  @Test
+  void withOpAndStats_buildsFullyQualifiedNameAndOperationId() {
+    TableOperationDto op = op();
+    BinItem item = new TotalFilesBinItem().withOpAndStats(op, statsWithFiles(42L));
+
+    assertThat(item.getFullyQualifiedTableName()).isEqualTo("db1.tbl1");
+    assertThat(item.getOperationId()).isEqualTo(op.getId());
+  }
+
+  @Test
+  void withOpAndStats_weightIsCurrentFileCount() {
+    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), statsWithFiles(123_456L));
+    assertThat(item.getWeight()).isEqualTo(123_456L);
+  }
+
+  @Test
+  void withOpAndStats_nullStats_weightIsZero() {
+    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), null);
+    assertThat(item.getWeight()).isEqualTo(0L);
+  }
+
+  @Test
+  void withOpAndStats_nullSnapshot_weightIsZero() {
+    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), TableStatsDto.builder().build());
+    assertThat(item.getWeight()).isEqualTo(0L);
+  }
+
+  @Test
+  void withOpAndStats_nullFileCount_weightIsZero() {
+    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), statsWithFiles(null));
+    assertThat(item.getWeight()).isEqualTo(0L);
+  }
+
+  @Test
+  void seatPrototype_doesNotShareStateWithPopulated() {
+    TotalFilesBinItem seat = new TotalFilesBinItem();
+    BinItem populated = seat.withOpAndStats(op(), statsWithFiles(7L));
+
+    assertThat(seat.getWeight()).isEqualTo(0L);
+    assertThat(populated.getWeight()).isEqualTo(7L);
+  }
+}

From 067d38398c8bd2c3f4741c3a159ba6451bb75755 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 10:36:10 -0700
Subject: [PATCH 07/13] style(scheduler): functional pipeline + Optional lookup
 per code-style.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Sweep for the two recurring style violations the user flagged on the prior
commit:

- FirstFitBinPacker.pack(): drop the `items == null` guard (callers
  contract for non-null) and the trailing PackingBin → Bin for-loop.
  The wrap-into-bins step is now `packingBins.stream().map(pb -> new
  Bin(operationType, pb.items)).collect(toList())`. The `isEmpty()`
  early-return goes too — the stream pipeline handles empty input
  cleanly and the log line is information-neutral.
- SchedulerRunner.schedule(): replace
  `if (reg == null) throw new IllegalStateException(...)` with
  `Optional.ofNullable(registry.get(type)).orElseThrow(() -> ...)`. The
  null comes from `Map.get` (stdlib boundary); wrap and dispatch.

No behavior change; tests unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../optimizer/scheduler/SchedulerRunner.java     | 10 ++++++----
 .../scheduler/binpack/FirstFitBinPacker.java     | 16 ++++++----------
 2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index 2e1f544e7..123709d79 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -84,10 +84,12 @@ public void schedule(OperationTypeDto type) {
 
   public void schedule(
       OperationTypeDto type, Optional<String> databaseName, Optional<String> tableName) {
-    BinPackerRegistration reg = registry.get(type);
-    if (reg == null) {
-      throw new IllegalStateException("No BinPacker registered for operation type " + type);
-    }
+    BinPackerRegistration reg =
+        Optional.ofNullable(registry.get(type))
+            .orElseThrow(
+                () ->
+                    new IllegalStateException(
+                        "No BinPacker registered for operation type " + type));
 
     List<TableOperationDto> pending = loadAndDedupPending(type, databaseName, tableName);
     if (pending.isEmpty()) {
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
index 1466bc321..b782e7527 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
@@ -4,6 +4,7 @@
 import java.util.ArrayList;
 import java.util.Comparator;
 import java.util.List;
+import java.util.stream.Collectors;
 import lombok.AllArgsConstructor;
 import lombok.Getter;
 import lombok.extern.slf4j.Slf4j;
@@ -26,19 +27,14 @@ public class FirstFitBinPacker implements BinPacker {
 
   @Override
   public List<Bin> pack(List<BinItem> items) {
-    if (items == null || items.isEmpty()) {
-      return new ArrayList<>();
-    }
-    List<PackingBin> bins =
+    List<PackingBin> packingBins =
         items.stream()
             .sorted(Comparator.comparingLong(BinItem::getWeight).reversed())
             .collect(ArrayList::new, this::placeItem, List::addAll);
-    log.info("Packed {} items into {} bins", items.size(), bins.size());
-    List<Bin> result = new ArrayList<>(bins.size());
-    for (PackingBin pb : bins) {
-      result.add(new Bin(operationType, pb.items));
-    }
-    return result;
+    log.info("Packed {} items into {} bins", items.size(), packingBins.size());
+    return packingBins.stream()
+        .map(pb -> new Bin(operationType, pb.items))
+        .collect(Collectors.toList());
   }
 
   private void placeItem(List<PackingBin> bins, BinItem item) {

From 2498200c27e600d9cbb4e8e02249fc94240cfada Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 11:07:07 -0700
Subject: [PATCH 08/13] refactor(scheduler): registerOperation(type, packer);
 generic FirstFitBinPacker<T>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Delete BinPackerRegistration; replace with SchedulerRunner.registerOperation(type, packer)
- BinPacker.pack(ops, statsByUuid) → List<List<BinItem>>; op type lives only in Bin
- FirstFitBinPacker<T extends BinItem> abstract; subclass owns T create(op, stats)
- TotalFilesFirstFitBinPacker extends FirstFitBinPacker<TotalFilesBinItem>
- TotalFilesBinItem immutable (drop seat ctor + withOpAndStats); BinItem is pure data
- SchedulerConfig registers via @PostConstruct on autowired runner

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../scheduler/BinPackerRegistration.java      |  24 ---
 .../optimizer/scheduler/SchedulerRunner.java  |  74 ++++-----
 .../optimizer/scheduler/binpack/BinItem.java  |  14 +-
 .../scheduler/binpack/BinPacker.java          |  16 +-
 .../scheduler/binpack/FirstFitBinPacker.java  |  54 +++++--
 .../scheduler/binpack/TotalFilesBinItem.java  |  37 +----
 .../binpack/TotalFilesFirstFitBinPacker.java  |  33 +++++
 .../scheduler/config/SchedulerConfig.java     |  40 ++---
 .../scheduler/SchedulerRunnerTest.java        |  17 +--
 .../binpack/FirstFitBinPackerTest.java        | 140 +++++++++++-------
 .../binpack/TotalFilesBinItemTest.java        |  62 ++++----
 11 files changed, 265 insertions(+), 246 deletions(-)
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java
 create mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java

diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java
deleted file mode 100644
index 752e04b51..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/BinPackerRegistration.java
+++ /dev/null
@@ -1,24 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler;
-
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinItem;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.BinPacker;
-import lombok.AllArgsConstructor;
-import lombok.Getter;
-
-/**
- * Registration tuple for one operation type. Bundles the bucketing strategy with the {@link
- * BinItem} prototype the scheduler uses to project pending operations and their stats into packable
- * items.
- *
- * <p>Spring bean assembled by {@link
- * com.linkedin.openhouse.optimizer.scheduler.config.SchedulerConfig}; {@link SchedulerRunner}
- * injects all registrations and indexes them by {@link #getOperationType()}.
- */
-@AllArgsConstructor
-@Getter
-public class BinPackerRegistration {
-  private final OperationTypeDto operationType;
-  private final BinPacker packer;
-  private final BinItem prototype;
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index 123709d79..32e0d7ce7 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -19,7 +19,7 @@
 import java.util.Map;
 import java.util.Optional;
 import java.util.Set;
-import java.util.function.Function;
+import java.util.concurrent.ConcurrentHashMap;
 import java.util.stream.Collectors;
 import lombok.extern.slf4j.Slf4j;
 import org.springframework.beans.factory.annotation.Autowired;
@@ -29,21 +29,21 @@
 import org.springframework.transaction.annotation.Transactional;
 
 /**
- * Generic scheduler. For each operation type registered via {@link BinPackerRegistration}:
+ * Generic scheduler. Operation types are registered at startup via {@link #registerOperation}; for
+ * each registered type the runner:
  *
  * <ol>
  *   <li>Reads PENDING rows from MySQL.
  *   <li>Deduplicates duplicate PENDING rows for the same {@code tableUuid}.
  *   <li>Loads the stats row for every survivor.
- *   <li>Projects each (operation, stats) pair into a {@link BinItem} via the registration's
- *       prototype.
- *   <li>Hands the items to the {@link BinPacker} to get bins.
- *   <li>Schedules each bin (claim CAS, narrow to claimed, launch, record).
+ *   <li>Hands the (operations, stats) pair to the {@link BinPacker} and receives one grouping per
+ *       batch.
+ *   <li>Wraps each grouping into a {@link Bin} tagged with the operation type and schedules it
+ *       (claim CAS, narrow to claimed, launch, record).
  * </ol>
  *
- * <p>The runner is operation-agnostic. All IO and the claim/launch/mark lifecycle live here. The
- * only per-operation knowledge in the module is the {@link BinPackerRegistration} bean wired in
- * {@link com.linkedin.openhouse.optimizer.scheduler.config.SchedulerConfig}.
+ * <p>The runner is operation-agnostic. All IO and the claim/launch/mark lifecycle live here; the
+ * only per-operation knowledge in the module is the {@link BinPacker} the caller registers.
  */
 @Slf4j
 @Component
@@ -53,29 +53,34 @@ public class SchedulerRunner {
   private final TableStatsRepository statsRepo;
   private final JobsServiceClient jobsClient;
   private final String resultsEndpoint;
-  private final Map<OperationTypeDto, BinPackerRegistration> registry;
+  private final Map<OperationTypeDto, BinPacker> registry = new ConcurrentHashMap<>();
 
   @Autowired
   public SchedulerRunner(
       TableOperationsRepository operationsRepo,
       TableStatsRepository statsRepo,
       JobsServiceClient jobsClient,
-      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint,
-      List<BinPackerRegistration> registrations) {
+      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint) {
     this.operationsRepo = operationsRepo;
     this.statsRepo = statsRepo;
     this.jobsClient = jobsClient;
     this.resultsEndpoint = resultsEndpoint;
-    this.registry =
-        Map.copyOf(
-            registrations.stream()
-                .collect(
-                    Collectors.toMap(
-                        BinPackerRegistration::getOperationType, Function.identity())));
+  }
+
+  /**
+   * Register a {@link BinPacker} for an operation type. Idempotent on identical re-registration;
+   * conflicting registrations replace the prior entry. Called once per operation type at startup.
+   */
+  public void registerOperation(OperationTypeDto operationType, BinPacker packer) {
+    registry.put(operationType, packer);
+    log.info(
+        "Registered BinPacker {} for operation type {}",
+        packer.getClass().getSimpleName(),
+        operationType);
   }
 
   public Set<OperationTypeDto> getRegisteredOperationTypes() {
-    return registry.keySet();
+    return Set.copyOf(registry.keySet());
   }
 
   public void schedule(OperationTypeDto type) {
@@ -84,7 +89,7 @@ public void schedule(OperationTypeDto type) {
 
   public void schedule(
       OperationTypeDto type, Optional<String> databaseName, Optional<String> tableName) {
-    BinPackerRegistration reg =
+    BinPacker packer =
         Optional.ofNullable(registry.get(type))
             .orElseThrow(
                 () ->
@@ -97,13 +102,11 @@ public void schedule(
     }
     Map<String, TableStatsDto> statsByUuid = loadStatsByUuid(pending);
 
-    List<BinItem> items = projectToItems(pending, statsByUuid, reg.getPrototype(), type);
-    if (items.isEmpty()) {
-      return;
-    }
-
-    List<Bin> bins = reg.getPacker().pack(items);
-    log.info("Packed {} PENDING {} operations into {} bins", items.size(), type, bins.size());
+    List<Bin> bins =
+        packer.pack(pending, statsByUuid).stream()
+            .map(grouping -> new Bin(type, grouping))
+            .collect(Collectors.toList());
+    log.info("Packed {} PENDING {} operations into {} bins", pending.size(), type, bins.size());
 
     bins.forEach(this::scheduleBin);
   }
@@ -172,23 +175,6 @@ private Map<String, TableStatsDto> loadStatsByUuid(List<TableOperationDto> ops)
         .collect(Collectors.toMap(TableStatsRow::getTableUuid, TableStatsDto::fromRow));
   }
 
-  private List<BinItem> projectToItems(
-      List<TableOperationDto> pending,
-      Map<String, TableStatsDto> statsByUuid,
-      BinItem prototype,
-      OperationTypeDto type) {
-    List<BinItem> items =
-        pending.stream()
-            .filter(op -> statsByUuid.containsKey(op.getTableUuid()))
-            .map(op -> prototype.withOpAndStats(op, statsByUuid.get(op.getTableUuid())))
-            .collect(Collectors.toList());
-    int skipped = pending.size() - items.size();
-    if (skipped > 0) {
-      log.warn("Skipped {} {} operations with no table_stats row", skipped, type);
-    }
-    return items;
-  }
-
   /**
    * Claim the bin's operations, narrow to the rows actually owned, launch one batched Spark job for
    * the claimed subset, and mark SCHEDULED — or revert to PENDING if launch failed.
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
index e71531d8f..25c9ee68a 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
@@ -1,16 +1,10 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-
 /**
  * One packable unit. Exposes the weight a packer keys on, plus the identity the scheduler reads
- * when it launches a Spark job (fully-qualified table name, operation id).
- *
- * <p>{@link #withOpAndStats(TableOperationDto, TableStatsDto)} returns a new populated instance
- * from a (pending operation, current stats) pair. Implementations have a no-arg constructor that
- * makes a "seat" prototype suitable for calling {@code withOpAndStats(...)} on; getters on a seat
- * are not meaningful.
+ * when it launches a Spark job (fully-qualified table name, operation id). Implementations are
+ * immutable data — projection from {@code (operation, stats)} to a concrete {@link BinItem} subtype
+ * is the bin packer's responsibility.
  */
 public interface BinItem {
   long getWeight();
@@ -18,6 +12,4 @@ public interface BinItem {
   String getFullyQualifiedTableName();
 
   String getOperationId();
-
-  BinItem withOpAndStats(TableOperationDto op, TableStatsDto stats);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
index 87ac0eb1b..aed87f762 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinPacker.java
@@ -1,14 +1,18 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
 import java.util.List;
+import java.util.Map;
 
 /**
- * A stateless bucketing strategy. Given a flat list of {@link BinItem}s, returns one {@link Bin}
- * per batch the scheduler should submit. Implementations do no IO and hold no mutable state.
+ * Per-operation-type strategy. Given a flat list of operations and the corresponding stats, returns
+ * one grouping per batch the scheduler should submit. The scheduler wraps each grouping into a
+ * {@link Bin} with the registered operation type. Implementations do no IO and hold no mutable
+ * state; the projection from {@code (op, stats)} to {@link BinItem} and the bucketing strategy both
+ * live in the implementation.
  */
 public interface BinPacker {
-  OperationTypeDto getOperationType();
-
-  List<Bin> pack(List<BinItem> items);
+  List<List<BinItem>> pack(
+      List<TableOperationDto> operations, Map<String, TableStatsDto> statsByTableUuid);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
index b782e7527..f5d1d0c69 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
@@ -1,40 +1,62 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
 import java.util.ArrayList;
 import java.util.Comparator;
 import java.util.List;
+import java.util.Map;
 import java.util.stream.Collectors;
-import lombok.AllArgsConstructor;
-import lombok.Getter;
 import lombok.extern.slf4j.Slf4j;
 
 /**
- * First-fit-decreasing packing. Sorts items by weight descending, then places each into the first
- * group whose totals stay at or below {@code maxWeightPerBin} and {@code maxItemsPerBin}. An item
- * whose weight exceeds the cap on its own goes into a group by itself.
+ * First-fit-decreasing packing, abstract over the concrete {@link BinItem} subtype {@code T}. The
+ * subclass tells the packer how to construct {@code T} from a {@code (operation, stats)} pair via
+ * {@link #create}; the base class handles the bucketing: sort by weight descending, then place each
+ * item into the first group whose totals stay at or below {@code maxWeightPerBin} and {@code
+ * maxItemsPerBin}. An item whose weight exceeds the cap on its own goes into a group by itself.
+ * Operations whose {@code tableUuid} has no entry in the stats map are dropped.
  *
- * <p>Stateless: the constructor takes only immutable configuration; {@link #pack(List)} is a pure
- * function over its argument.
+ * <p>Stateless: the constructor takes only immutable cap configuration; {@link #pack} is a pure
+ * function over its arguments. The packer is operation-agnostic — the scheduler wraps each grouping
+ * into a {@link Bin} with the registered operation type.
  */
 @Slf4j
-@AllArgsConstructor
-public class FirstFitBinPacker implements BinPacker {
+public abstract class FirstFitBinPacker<T extends BinItem> implements BinPacker {
 
-  @Getter private final OperationTypeDto operationType;
   private final long maxWeightPerBin;
   private final int maxItemsPerBin;
 
+  protected FirstFitBinPacker(long maxWeightPerBin, int maxItemsPerBin) {
+    this.maxWeightPerBin = maxWeightPerBin;
+    this.maxItemsPerBin = maxItemsPerBin;
+  }
+
+  /**
+   * Construct one {@code T} for a single operation. Called by {@link #pack} for every operation
+   * whose stats are available; implementations encode the projection from {@code (op, stats)} to
+   * the concrete {@link BinItem} subtype.
+   */
+  protected abstract T create(TableOperationDto operation, TableStatsDto stats);
+
   @Override
-  public List<Bin> pack(List<BinItem> items) {
+  public final List<List<BinItem>> pack(
+      List<TableOperationDto> operations, Map<String, TableStatsDto> statsByTableUuid) {
+    List<BinItem> items =
+        operations.stream()
+            .filter(op -> statsByTableUuid.containsKey(op.getTableUuid()))
+            .map(op -> (BinItem) create(op, statsByTableUuid.get(op.getTableUuid())))
+            .collect(Collectors.toList());
     List<PackingBin> packingBins =
         items.stream()
             .sorted(Comparator.comparingLong(BinItem::getWeight).reversed())
             .collect(ArrayList::new, this::placeItem, List::addAll);
-    log.info("Packed {} items into {} bins", items.size(), packingBins.size());
-    return packingBins.stream()
-        .map(pb -> new Bin(operationType, pb.items))
-        .collect(Collectors.toList());
+    log.info(
+        "Packed {} operations ({} items after projection) into {} groupings",
+        operations.size(),
+        items.size(),
+        packingBins.size());
+    return packingBins.stream().map(pb -> pb.items).collect(Collectors.toList());
   }
 
   private void placeItem(List<PackingBin> bins, BinItem item) {
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
index 92f56c8db..16c9a15d5 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
@@ -1,45 +1,20 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import java.util.Optional;
+import lombok.AllArgsConstructor;
 import lombok.Getter;
 import lombok.ToString;
 
 /**
- * {@link BinItem} that weights by the table's current file count. Suitable for any operation whose
- * Spark cost scales with file count — orphan files deletion, stats collection, etc. The
- * implementation knows nothing about which operation type is using it.
+ * {@link BinItem} that weights by the table's current file count. Immutable data; constructed by
+ * the {@link TotalFilesFirstFitBinPacker}. The implementation knows nothing about which operation
+ * type the surrounding packer was registered against — it just carries the fields the scheduler
+ * needs to launch the job.
  */
+@AllArgsConstructor
 @Getter
 @ToString
 public class TotalFilesBinItem implements BinItem {
-
   private final String fullyQualifiedTableName;
   private final String operationId;
   private final long weight;
-
-  /** Seat constructor: call {@link #withOpAndStats} to get a populated instance. */
-  public TotalFilesBinItem() {
-    this("", "", 0L);
-  }
-
-  private TotalFilesBinItem(String fullyQualifiedTableName, String operationId, long weight) {
-    this.fullyQualifiedTableName = fullyQualifiedTableName;
-    this.operationId = operationId;
-    this.weight = weight;
-  }
-
-  @Override
-  public BinItem withOpAndStats(TableOperationDto op, TableStatsDto stats) {
-    return new TotalFilesBinItem(
-        op.getDatabaseName() + "." + op.getTableName(), op.getId(), currentFileCount(stats));
-  }
-
-  private static long currentFileCount(TableStatsDto stats) {
-    return Optional.ofNullable(stats)
-        .map(TableStatsDto::getSnapshot)
-        .map(TableStatsDto.SnapshotMetrics::getNumCurrentFiles)
-        .orElse(0L);
-  }
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java
new file mode 100644
index 000000000..89d43c5ac
--- /dev/null
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java
@@ -0,0 +1,33 @@
+package com.linkedin.openhouse.optimizer.scheduler.binpack;
+
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import java.util.Optional;
+
+/**
+ * First-fit-decreasing packing keyed on the table's current file count. Suitable for any operation
+ * whose Spark cost scales with file count — orphan files deletion, stats collection, etc. The
+ * packer knows nothing about which operation type it was registered against; the choice of weight
+ * (current file count) is the only operation-shape assumption it encodes.
+ */
+public class TotalFilesFirstFitBinPacker extends FirstFitBinPacker<TotalFilesBinItem> {
+
+  public TotalFilesFirstFitBinPacker(long maxWeightPerBin, int maxItemsPerBin) {
+    super(maxWeightPerBin, maxItemsPerBin);
+  }
+
+  @Override
+  protected TotalFilesBinItem create(TableOperationDto operation, TableStatsDto stats) {
+    return new TotalFilesBinItem(
+        operation.getDatabaseName() + "." + operation.getTableName(),
+        operation.getId(),
+        currentFileCount(stats));
+  }
+
+  private static long currentFileCount(TableStatsDto stats) {
+    return Optional.ofNullable(stats)
+        .map(TableStatsDto::getSnapshot)
+        .map(TableStatsDto.SnapshotMetrics::getNumCurrentFiles)
+        .orElse(0L);
+  }
+}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index f2699527c..2c75dacf6 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -1,20 +1,21 @@
 package com.linkedin.openhouse.optimizer.scheduler.config;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
-import com.linkedin.openhouse.optimizer.scheduler.BinPackerRegistration;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesBinItem;
+import com.linkedin.openhouse.optimizer.scheduler.SchedulerRunner;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesFirstFitBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
+import javax.annotation.PostConstruct;
+import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.context.annotation.Bean;
 import org.springframework.context.annotation.Configuration;
 import org.springframework.web.reactive.function.client.WebClient;
 
 /**
- * Cross-cutting wiring (jobs-service client) plus the per-operation-type {@link
- * BinPackerRegistration} beans. The registration is the one place each operation's identity (type,
- * packing strategy, item prototype) is composed; the scheduler itself never names an operation type
- * beyond the keys in its registry.
+ * Cross-cutting wiring (jobs-service client) plus the per-operation-type registrations on the
+ * {@link SchedulerRunner}. The {@link #registerOperations()} method is the one place each
+ * operation's identity (type, packing strategy, item prototype) is composed; the scheduler itself
+ * never names an operation type beyond the keys in its registry.
  */
 @Configuration
 public class SchedulerConfig {
@@ -25,6 +26,14 @@ public class SchedulerConfig {
   @Value("${optimizer.scheduler.cluster-id}")
   private String clusterId;
 
+  @Value("${optimizer.scheduler.ofd.max-files-per-bin}")
+  private long ofdMaxFilesPerBin;
+
+  @Value("${optimizer.scheduler.ofd.max-tables-per-bin}")
+  private int ofdMaxTablesPerBin;
+
+  @Autowired private SchedulerRunner schedulerRunner;
+
   @Bean
   public WebClient jobsWebClient() {
     return WebClient.builder().baseUrl(jobsBaseUri).build();
@@ -36,18 +45,13 @@ public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
   }
 
   /**
-   * Orphan files deletion: a {@link FirstFitBinPacker} over {@link TotalFilesBinItem}. Cost scales
-   * with file count — per-file list, manifest joins, and delete calls dominate independent of file
-   * size.
+   * Orphan files deletion: a {@link TotalFilesFirstFitBinPacker}. Cost scales with file count —
+   * per-file list, manifest joins, and delete calls dominate independent of file size.
    */
-  @Bean
-  public BinPackerRegistration ofdRegistration(
-      @Value("${optimizer.scheduler.ofd.max-files-per-bin}") long maxFilesPerBin,
-      @Value("${optimizer.scheduler.ofd.max-tables-per-bin}") int maxTablesPerBin) {
-    return new BinPackerRegistration(
+  @PostConstruct
+  public void registerOperations() {
+    schedulerRunner.registerOperation(
         OperationTypeDto.ORPHAN_FILES_DELETION,
-        new FirstFitBinPacker(
-            OperationTypeDto.ORPHAN_FILES_DELETION, maxFilesPerBin, maxTablesPerBin),
-        new TotalFilesBinItem());
+        new TotalFilesFirstFitBinPacker(ofdMaxFilesPerBin, ofdMaxTablesPerBin));
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index 35ad08871..ffaedfaa4 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -17,8 +17,7 @@
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesBinItem;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesFirstFitBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import java.time.Instant;
 import java.util.List;
@@ -47,14 +46,10 @@ class SchedulerRunnerTest {
 
   @BeforeEach
   void setUp() {
-    // A real packer + real prototype — the runner exercises the full pipeline against actual
-    // bucketing + projection logic, while the IO is mocked.
-    BinPackerRegistration ofdReg =
-        new BinPackerRegistration(
-            OFD, new FirstFitBinPacker(OFD, 1_000_000L, 50), new TotalFilesBinItem());
-    runner =
-        new SchedulerRunner(
-            operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT, List.of(ofdReg));
+    // A real packer — the runner exercises the full pipeline against actual bucketing and the
+    // packer's projection logic, while the IO is mocked.
+    runner = new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT);
+    runner.registerOperation(OFD, new TotalFilesFirstFitBinPacker(1_000_000L, 50));
   }
 
   // ---- Stubbing helpers ----
@@ -113,7 +108,7 @@ private TableStatsRow statsRow(String uuid, long numCurrentFiles) {
   @Test
   void schedule_unknownOperationType_throws() {
     SchedulerRunner empty =
-        new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT, List.of());
+        new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT);
 
     assertThatThrownBy(() -> empty.schedule(OFD))
         .isInstanceOf(IllegalStateException.class)
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
index ad4aa313c..7b28748fd 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
@@ -2,118 +2,144 @@
 
 import static org.assertj.core.api.Assertions.assertThat;
 
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.model.TableOperationDto;
 import com.linkedin.openhouse.optimizer.model.TableStatsDto;
 import java.util.List;
+import java.util.Map;
 import java.util.stream.Collectors;
 import lombok.AllArgsConstructor;
 import lombok.Getter;
 import org.junit.jupiter.api.Test;
 
+/**
+ * Tests the {@link FirstFitBinPacker} bucketing logic in isolation via a test-only subclass that
+ * projects to {@link TestItem}s with caller-controlled weights. Per-subtype projection logic (e.g.
+ * {@link TotalFilesFirstFitBinPacker}) is covered by its own test.
+ */
 class FirstFitBinPackerTest {
 
-  private static final OperationTypeDto TYPE = OperationTypeDto.ORPHAN_FILES_DELETION;
-
   @AllArgsConstructor
   @Getter
   static class TestItem implements BinItem {
-    private final String id;
+    private final String operationId;
     private final long weight;
 
     @Override
     public String getFullyQualifiedTableName() {
-      return "db.tbl_" + id;
+      return "db.tbl_" + operationId;
     }
+  }
 
-    @Override
-    public String getOperationId() {
-      return "op-" + id;
+  /** Reads the weight from a single-entry tableProperties map keyed by {@code "weight"}. */
+  static class TestBinPacker extends FirstFitBinPacker<TestItem> {
+    TestBinPacker(long maxWeightPerBin, int maxItemsPerBin) {
+      super(maxWeightPerBin, maxItemsPerBin);
     }
 
     @Override
-    public BinItem withOpAndStats(TableOperationDto op, TableStatsDto stats) {
-      throw new UnsupportedOperationException("test items are not used as prototypes");
+    protected TestItem create(TableOperationDto operation, TableStatsDto stats) {
+      long weight = Long.parseLong(stats.getTableProperties().get("weight"));
+      return new TestItem(operation.getId(), weight);
     }
   }
 
-  private static TestItem item(String id, long weight) {
-    return new TestItem(id, weight);
+  private static TableOperationDto op(String id) {
+    return TableOperationDto.builder().id(id).tableUuid(id).build();
+  }
+
+  private static TableStatsDto statsWithWeight(String uuid, long weight) {
+    return TableStatsDto.builder()
+        .tableUuid(uuid)
+        .tableProperties(Map.of("weight", Long.toString(weight)))
+        .build();
+  }
+
+  private static List<TableOperationDto> opsList(String... ids) {
+    return java.util.Arrays.stream(ids).map(FirstFitBinPackerTest::op).collect(Collectors.toList());
+  }
+
+  private static Map<String, TableStatsDto> statsMap(Object... uuidWeightPairs) {
+    Map<String, TableStatsDto> map = new java.util.HashMap<>();
+    for (int i = 0; i < uuidWeightPairs.length; i += 2) {
+      String uuid = (String) uuidWeightPairs[i];
+      long weight = (long) uuidWeightPairs[i + 1];
+      map.put(uuid, statsWithWeight(uuid, weight));
+    }
+    return map;
   }
 
   @Test
-  void emptyInput_returnsEmptyBins() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 100L, 10);
-    assertThat(packer.pack(List.of())).isEmpty();
+  void emptyInput_returnsEmptyGroupings() {
+    TestBinPacker packer = new TestBinPacker(100L, 10);
+    assertThat(packer.pack(List.of(), Map.of())).isEmpty();
   }
 
   @Test
-  void singleItem_oneBin() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 10);
-    List<Bin> bins = packer.pack(List.of(item("a", 100L)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).getItems()).hasSize(1);
-    assertThat(bins.get(0).getOperationType()).isEqualTo(TYPE);
+  void singleItem_oneGrouping() {
+    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
+    List<List<BinItem>> groupings = packer.pack(opsList("a"), statsMap("a", 100L));
+    assertThat(groupings).hasSize(1);
+    assertThat(groupings.get(0)).hasSize(1);
   }
 
   @Test
-  void underWeightLimit_oneBin() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 10);
-    List<Bin> bins =
-        packer.pack(List.of(item("a", 300_000L), item("b", 300_000L), item("c", 300_000L)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).getItems()).hasSize(3);
+  void underWeightLimit_oneGrouping() {
+    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
+    List<List<BinItem>> groupings =
+        packer.pack(opsList("a", "b", "c"), statsMap("a", 300_000L, "b", 300_000L, "c", 300_000L));
+    assertThat(groupings).hasSize(1);
+    assertThat(groupings.get(0)).hasSize(3);
   }
 
   @Test
-  void overWeightLimit_twoBins() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 10);
-    List<Bin> bins =
-        packer.pack(List.of(item("a", 600_000L), item("b", 600_000L), item("c", 400_000L)));
-    assertThat(bins).hasSize(2);
+  void overWeightLimit_twoGroupings() {
+    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
+    List<List<BinItem>> groupings =
+        packer.pack(opsList("a", "b", "c"), statsMap("a", 600_000L, "b", 600_000L, "c", 400_000L));
+    assertThat(groupings).hasSize(2);
     // FFD: sort desc → 600, 600, 400. Place 600 → bin0; next 600 doesn't fit bin0, → bin1; 400
     // fits bin0 (total 1_000_000).
-    long b0 = bins.get(0).getItems().stream().mapToLong(BinItem::getWeight).sum();
-    long b1 = bins.get(1).getItems().stream().mapToLong(BinItem::getWeight).sum();
+    long b0 = groupings.get(0).stream().mapToLong(BinItem::getWeight).sum();
+    long b1 = groupings.get(1).stream().mapToLong(BinItem::getWeight).sum();
     assertThat(b0).isEqualTo(1_000_000L);
     assertThat(b1).isEqualTo(600_000L);
   }
 
   @Test
-  void itemLargerThanCap_getsOwnBin() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000L, 10);
-    List<Bin> bins = packer.pack(List.of(item("big", 5_000L)));
-    assertThat(bins).hasSize(1);
-    assertThat(bins.get(0).getItems()).hasSize(1);
+  void itemLargerThanCap_getsOwnGrouping() {
+    TestBinPacker packer = new TestBinPacker(1_000L, 10);
+    List<List<BinItem>> groupings = packer.pack(opsList("big"), statsMap("big", 5_000L));
+    assertThat(groupings).hasSize(1);
+    assertThat(groupings.get(0)).hasSize(1);
   }
 
   @Test
   void sortedDescending_largestFirst() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 2_000_000L, 10);
-    List<Bin> bins = packer.pack(List.of(item("small", 100L), item("large", 900_000L)));
-    assertThat(bins).hasSize(1);
+    TestBinPacker packer = new TestBinPacker(2_000_000L, 10);
+    List<List<BinItem>> groupings =
+        packer.pack(opsList("small", "large"), statsMap("small", 100L, "large", 900_000L));
+    assertThat(groupings).hasSize(1);
     List<String> ids =
-        bins.get(0).getItems().stream()
-            .map(TestItem.class::cast)
-            .map(TestItem::getId)
-            .collect(Collectors.toList());
+        groupings.get(0).stream().map(BinItem::getOperationId).collect(Collectors.toList());
     assertThat(ids).containsExactly("large", "small");
   }
 
   @Test
-  void maxItemsCap_splitsBins() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 1_000_000L, 2);
-    List<Bin> bins =
-        packer.pack(List.of(item("a", 1L), item("b", 1L), item("c", 1L), item("d", 1L)));
-    assertThat(bins).hasSize(2);
-    assertThat(bins.get(0).getItems()).hasSize(2);
-    assertThat(bins.get(1).getItems()).hasSize(2);
+  void maxItemsCap_splitsGroupings() {
+    TestBinPacker packer = new TestBinPacker(1_000_000L, 2);
+    List<List<BinItem>> groupings =
+        packer.pack(opsList("a", "b", "c", "d"), statsMap("a", 1L, "b", 1L, "c", 1L, "d", 1L));
+    assertThat(groupings).hasSize(2);
+    assertThat(groupings.get(0)).hasSize(2);
+    assertThat(groupings.get(1)).hasSize(2);
   }
 
   @Test
-  void binsCarryConfiguredOperationType() {
-    FirstFitBinPacker packer = new FirstFitBinPacker(TYPE, 100L, 10);
-    List<Bin> bins = packer.pack(List.of(item("a", 1L)));
-    assertThat(bins.get(0).getOperationType()).isEqualTo(TYPE);
+  void operationsWithoutStats_dropped() {
+    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
+    List<List<BinItem>> groupings = packer.pack(opsList("a", "missing"), statsMap("a", 100L));
+    assertThat(groupings).hasSize(1);
+    assertThat(groupings.get(0)).hasSize(1);
+    assertThat(groupings.get(0).get(0).getOperationId()).isEqualTo("a");
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
index 3d1cb802c..c7171f0a8 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
@@ -5,9 +5,16 @@
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.model.TableOperationDto;
 import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import java.util.List;
+import java.util.Map;
 import java.util.UUID;
 import org.junit.jupiter.api.Test;
 
+/**
+ * Covers the projection that {@link TotalFilesFirstFitBinPacker} applies when constructing {@link
+ * TotalFilesBinItem}s — fully-qualified name, operation id, and weight derived from the snapshot's
+ * current file count, with the null-safety chain that handles missing snapshot fields.
+ */
 class TotalFilesBinItemTest {
 
   private static TableOperationDto op() {
@@ -20,51 +27,50 @@ private static TableOperationDto op() {
         .build();
   }
 
-  private static TableStatsDto statsWithFiles(Long fileCount) {
+  private static TableStatsDto statsWithFiles(String uuid, Long fileCount) {
     return TableStatsDto.builder()
+        .tableUuid(uuid)
         .snapshot(TableStatsDto.SnapshotMetrics.builder().numCurrentFiles(fileCount).build())
         .build();
   }
 
-  @Test
-  void withOpAndStats_buildsFullyQualifiedNameAndOperationId() {
-    TableOperationDto op = op();
-    BinItem item = new TotalFilesBinItem().withOpAndStats(op, statsWithFiles(42L));
-
-    assertThat(item.getFullyQualifiedTableName()).isEqualTo("db1.tbl1");
-    assertThat(item.getOperationId()).isEqualTo(op.getId());
+  private static List<BinItem> pack(TableOperationDto op, TableStatsDto stats) {
+    TotalFilesFirstFitBinPacker packer =
+        new TotalFilesFirstFitBinPacker(Long.MAX_VALUE, Integer.MAX_VALUE);
+    List<List<BinItem>> groupings = packer.pack(List.of(op), Map.of(op.getTableUuid(), stats));
+    assertThat(groupings).hasSize(1);
+    return groupings.get(0);
   }
 
   @Test
-  void withOpAndStats_weightIsCurrentFileCount() {
-    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), statsWithFiles(123_456L));
-    assertThat(item.getWeight()).isEqualTo(123_456L);
-  }
+  void projectionBuildsFullyQualifiedNameAndOperationId() {
+    TableOperationDto op = op();
+    List<BinItem> items = pack(op, statsWithFiles(op.getTableUuid(), 42L));
 
-  @Test
-  void withOpAndStats_nullStats_weightIsZero() {
-    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), null);
-    assertThat(item.getWeight()).isEqualTo(0L);
+    assertThat(items).hasSize(1);
+    assertThat(items.get(0).getFullyQualifiedTableName()).isEqualTo("db1.tbl1");
+    assertThat(items.get(0).getOperationId()).isEqualTo(op.getId());
   }
 
   @Test
-  void withOpAndStats_nullSnapshot_weightIsZero() {
-    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), TableStatsDto.builder().build());
-    assertThat(item.getWeight()).isEqualTo(0L);
+  void weightIsCurrentFileCount() {
+    TableOperationDto op = op();
+    List<BinItem> items = pack(op, statsWithFiles(op.getTableUuid(), 123_456L));
+    assertThat(items.get(0).getWeight()).isEqualTo(123_456L);
   }
 
   @Test
-  void withOpAndStats_nullFileCount_weightIsZero() {
-    BinItem item = new TotalFilesBinItem().withOpAndStats(op(), statsWithFiles(null));
-    assertThat(item.getWeight()).isEqualTo(0L);
+  void nullSnapshotFields_weightIsZero() {
+    TableOperationDto op = op();
+    TableStatsDto emptySnapshot = TableStatsDto.builder().tableUuid(op.getTableUuid()).build();
+    List<BinItem> items = pack(op, emptySnapshot);
+    assertThat(items.get(0).getWeight()).isEqualTo(0L);
   }
 
   @Test
-  void seatPrototype_doesNotShareStateWithPopulated() {
-    TotalFilesBinItem seat = new TotalFilesBinItem();
-    BinItem populated = seat.withOpAndStats(op(), statsWithFiles(7L));
-
-    assertThat(seat.getWeight()).isEqualTo(0L);
-    assertThat(populated.getWeight()).isEqualTo(7L);
+  void nullFileCount_weightIsZero() {
+    TableOperationDto op = op();
+    List<BinItem> items = pack(op, statsWithFiles(op.getTableUuid(), null));
+    assertThat(items.get(0).getWeight()).isEqualTo(0L);
   }
 }

From 396f70e9f35955b6bba53a99da0ce416b815cacd Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 11:33:05 -0700
Subject: [PATCH 09/13] refactor(scheduler): FirstFitBinPacker<T> concrete +
 Supplier<T> seat factory

Drops the abstract method and named subclass per PR feedback. The packer
takes a Supplier<T> (typically MyItem::new); pack invokes it per operation
to get a seat, then calls fromOpAndStats(op, stats) on the seat to project.

- BinItem regains fromOpAndStats(op, stats); contract documents the seat pattern
- FirstFitBinPacker<T extends BinItem> concrete; Supplier<T> + caps; pack is final
- TotalFilesBinItem keeps no-arg seat + private all-args ctor; fromOpAndStats returns populated copy
- Delete TotalFilesFirstFitBinPacker
- SchedulerConfig registers new FirstFitBinPacker<>(TotalFilesBinItem::new, max, maxItems)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../optimizer/scheduler/binpack/BinItem.java  | 15 +++-
 .../scheduler/binpack/FirstFitBinPacker.java  | 40 +++++------
 .../scheduler/binpack/TotalFilesBinItem.java  | 41 +++++++++--
 .../binpack/TotalFilesFirstFitBinPacker.java  | 33 ---------
 .../scheduler/config/SchedulerConfig.java     | 10 +--
 .../scheduler/SchedulerRunnerTest.java        |  5 +-
 .../binpack/FirstFitBinPackerTest.java        | 68 ++++++++++---------
 .../binpack/TotalFilesBinItemTest.java        | 62 ++++++++---------
 8 files changed, 140 insertions(+), 134 deletions(-)
 delete mode 100644 services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java

diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
index 25c9ee68a..b4016e386 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
@@ -1,10 +1,17 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+
 /**
  * One packable unit. Exposes the weight a packer keys on, plus the identity the scheduler reads
- * when it launches a Spark job (fully-qualified table name, operation id). Implementations are
- * immutable data — projection from {@code (operation, stats)} to a concrete {@link BinItem} subtype
- * is the bin packer's responsibility.
+ * when it launches a Spark job (fully-qualified table name, operation id).
+ *
+ * <p>Implementations have a public no-arg "seat" constructor — instantiated transiently inside
+ * {@link FirstFitBinPacker#pack} via a {@code Supplier<T extends BinItem>} (typically a {@code
+ * MyItem::new} method reference) — on which {@link #fromOpAndStats} is called to return the
+ * populated item. Getters on a seat are not meaningful; the seat exists for the lifetime of a
+ * single projection call.
  */
 public interface BinItem {
   long getWeight();
@@ -12,4 +19,6 @@ public interface BinItem {
   String getFullyQualifiedTableName();
 
   String getOperationId();
+
+  BinItem fromOpAndStats(TableOperationDto op, TableStatsDto stats);
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
index f5d1d0c69..b541f4c00 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
@@ -6,46 +6,46 @@
 import java.util.Comparator;
 import java.util.List;
 import java.util.Map;
+import java.util.function.Supplier;
 import java.util.stream.Collectors;
 import lombok.extern.slf4j.Slf4j;
 
 /**
- * First-fit-decreasing packing, abstract over the concrete {@link BinItem} subtype {@code T}. The
- * subclass tells the packer how to construct {@code T} from a {@code (operation, stats)} pair via
- * {@link #create}; the base class handles the bucketing: sort by weight descending, then place each
- * item into the first group whose totals stay at or below {@code maxWeightPerBin} and {@code
- * maxItemsPerBin}. An item whose weight exceeds the cap on its own goes into a group by itself.
- * Operations whose {@code tableUuid} has no entry in the stats map are dropped.
+ * First-fit-decreasing packing, generic over the concrete {@link BinItem} subtype {@code T}.
+ * Construction takes a {@code Supplier<T>} — typically a {@code MyItem::new} method reference —
+ * which the packer invokes per operation to get a seat, then calls {@link BinItem#fromOpAndStats}
+ * on the seat to project the (operation, stats) pair into a populated item.
  *
- * <p>Stateless: the constructor takes only immutable cap configuration; {@link #pack} is a pure
- * function over its arguments. The packer is operation-agnostic — the scheduler wraps each grouping
- * into a {@link Bin} with the registered operation type.
+ * <p>Sorts items by weight descending, then places each into the first group whose totals stay at
+ * or below {@code maxWeightPerBin} and {@code maxItemsPerBin}. An item whose weight exceeds the cap
+ * on its own goes into a group by itself. Operations whose {@code tableUuid} has no entry in {@code
+ * statsByTableUuid} are dropped.
+ *
+ * <p>Stateless: the constructor takes only the seat factory and the cap configuration; {@link
+ * #pack} is a pure function over its arguments. The packer is operation-agnostic — the scheduler
+ * wraps each grouping into a {@link Bin} with the registered operation type.
  */
 @Slf4j
-public abstract class FirstFitBinPacker<T extends BinItem> implements BinPacker {
+public class FirstFitBinPacker<T extends BinItem> implements BinPacker {
 
+  private final Supplier<T> seatFactory;
   private final long maxWeightPerBin;
   private final int maxItemsPerBin;
 
-  protected FirstFitBinPacker(long maxWeightPerBin, int maxItemsPerBin) {
+  public FirstFitBinPacker(Supplier<T> seatFactory, long maxWeightPerBin, int maxItemsPerBin) {
+    this.seatFactory = seatFactory;
     this.maxWeightPerBin = maxWeightPerBin;
     this.maxItemsPerBin = maxItemsPerBin;
   }
 
-  /**
-   * Construct one {@code T} for a single operation. Called by {@link #pack} for every operation
-   * whose stats are available; implementations encode the projection from {@code (op, stats)} to
-   * the concrete {@link BinItem} subtype.
-   */
-  protected abstract T create(TableOperationDto operation, TableStatsDto stats);
-
   @Override
-  public final List<List<BinItem>> pack(
+  public List<List<BinItem>> pack(
       List<TableOperationDto> operations, Map<String, TableStatsDto> statsByTableUuid) {
     List<BinItem> items =
         operations.stream()
             .filter(op -> statsByTableUuid.containsKey(op.getTableUuid()))
-            .map(op -> (BinItem) create(op, statsByTableUuid.get(op.getTableUuid())))
+            .map(
+                op -> seatFactory.get().fromOpAndStats(op, statsByTableUuid.get(op.getTableUuid())))
             .collect(Collectors.toList());
     List<PackingBin> packingBins =
         items.stream()
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
index 16c9a15d5..06334e21a 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
@@ -1,20 +1,49 @@
 package com.linkedin.openhouse.optimizer.scheduler.binpack;
 
-import lombok.AllArgsConstructor;
+import com.linkedin.openhouse.optimizer.model.TableOperationDto;
+import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import java.util.Optional;
 import lombok.Getter;
 import lombok.ToString;
 
 /**
- * {@link BinItem} that weights by the table's current file count. Immutable data; constructed by
- * the {@link TotalFilesFirstFitBinPacker}. The implementation knows nothing about which operation
- * type the surrounding packer was registered against — it just carries the fields the scheduler
- * needs to launch the job.
+ * {@link BinItem} that weights by the table's current file count. Suitable for any operation whose
+ * Spark cost scales with file count — orphan files deletion, stats collection, etc. The
+ * implementation knows nothing about which operation type it is wired up to.
+ *
+ * <p>Construction: callers pass {@code TotalFilesBinItem::new} as the {@code Supplier<T>} to {@link
+ * FirstFitBinPacker}; the packer calls the supplier per operation to get a seat, then {@link
+ * #fromOpAndStats} on the seat to get a populated copy.
  */
-@AllArgsConstructor
 @Getter
 @ToString
 public class TotalFilesBinItem implements BinItem {
+
   private final String fullyQualifiedTableName;
   private final String operationId;
   private final long weight;
+
+  /** Seat constructor: call {@link #fromOpAndStats} on the result to get a populated instance. */
+  public TotalFilesBinItem() {
+    this("", "", 0L);
+  }
+
+  private TotalFilesBinItem(String fullyQualifiedTableName, String operationId, long weight) {
+    this.fullyQualifiedTableName = fullyQualifiedTableName;
+    this.operationId = operationId;
+    this.weight = weight;
+  }
+
+  @Override
+  public BinItem fromOpAndStats(TableOperationDto op, TableStatsDto stats) {
+    return new TotalFilesBinItem(
+        op.getDatabaseName() + "." + op.getTableName(), op.getId(), currentFileCount(stats));
+  }
+
+  private static long currentFileCount(TableStatsDto stats) {
+    return Optional.ofNullable(stats)
+        .map(TableStatsDto::getSnapshot)
+        .map(TableStatsDto.SnapshotMetrics::getNumCurrentFiles)
+        .orElse(0L);
+  }
 }
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java
deleted file mode 100644
index 89d43c5ac..000000000
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesFirstFitBinPacker.java
+++ /dev/null
@@ -1,33 +0,0 @@
-package com.linkedin.openhouse.optimizer.scheduler.binpack;
-
-import com.linkedin.openhouse.optimizer.model.TableOperationDto;
-import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import java.util.Optional;
-
-/**
- * First-fit-decreasing packing keyed on the table's current file count. Suitable for any operation
- * whose Spark cost scales with file count — orphan files deletion, stats collection, etc. The
- * packer knows nothing about which operation type it was registered against; the choice of weight
- * (current file count) is the only operation-shape assumption it encodes.
- */
-public class TotalFilesFirstFitBinPacker extends FirstFitBinPacker<TotalFilesBinItem> {
-
-  public TotalFilesFirstFitBinPacker(long maxWeightPerBin, int maxItemsPerBin) {
-    super(maxWeightPerBin, maxItemsPerBin);
-  }
-
-  @Override
-  protected TotalFilesBinItem create(TableOperationDto operation, TableStatsDto stats) {
-    return new TotalFilesBinItem(
-        operation.getDatabaseName() + "." + operation.getTableName(),
-        operation.getId(),
-        currentFileCount(stats));
-  }
-
-  private static long currentFileCount(TableStatsDto stats) {
-    return Optional.ofNullable(stats)
-        .map(TableStatsDto::getSnapshot)
-        .map(TableStatsDto.SnapshotMetrics::getNumCurrentFiles)
-        .orElse(0L);
-  }
-}
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index 2c75dacf6..a06dfbe5d 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -2,7 +2,8 @@
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.scheduler.SchedulerRunner;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesFirstFitBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesBinItem;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import javax.annotation.PostConstruct;
 import org.springframework.beans.factory.annotation.Autowired;
@@ -45,13 +46,14 @@ public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
   }
 
   /**
-   * Orphan files deletion: a {@link TotalFilesFirstFitBinPacker}. Cost scales with file count —
-   * per-file list, manifest joins, and delete calls dominate independent of file size.
+   * Orphan files deletion: a {@link FirstFitBinPacker} over {@link TotalFilesBinItem}. Cost scales
+   * with file count — per-file list, manifest joins, and delete calls dominate independent of file
+   * size.
    */
   @PostConstruct
   public void registerOperations() {
     schedulerRunner.registerOperation(
         OperationTypeDto.ORPHAN_FILES_DELETION,
-        new TotalFilesFirstFitBinPacker(ofdMaxFilesPerBin, ofdMaxTablesPerBin));
+        new FirstFitBinPacker<>(TotalFilesBinItem::new, ofdMaxFilesPerBin, ofdMaxTablesPerBin));
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index ffaedfaa4..82a358014 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -17,7 +17,8 @@
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
-import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesFirstFitBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
+import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesBinItem;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import java.time.Instant;
 import java.util.List;
@@ -49,7 +50,7 @@ void setUp() {
     // A real packer — the runner exercises the full pipeline against actual bucketing and the
     // packer's projection logic, while the IO is mocked.
     runner = new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT);
-    runner.registerOperation(OFD, new TotalFilesFirstFitBinPacker(1_000_000L, 50));
+    runner.registerOperation(OFD, new FirstFitBinPacker<>(TotalFilesBinItem::new, 1_000_000L, 50));
   }
 
   // ---- Stubbing helpers ----
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
index 7b28748fd..b6644ece9 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
@@ -4,42 +4,44 @@
 
 import com.linkedin.openhouse.optimizer.model.TableOperationDto;
 import com.linkedin.openhouse.optimizer.model.TableStatsDto;
+import java.util.HashMap;
 import java.util.List;
 import java.util.Map;
 import java.util.stream.Collectors;
-import lombok.AllArgsConstructor;
 import lombok.Getter;
 import org.junit.jupiter.api.Test;
 
 /**
- * Tests the {@link FirstFitBinPacker} bucketing logic in isolation via a test-only subclass that
- * projects to {@link TestItem}s with caller-controlled weights. Per-subtype projection logic (e.g.
- * {@link TotalFilesFirstFitBinPacker}) is covered by its own test.
+ * Tests the {@link FirstFitBinPacker} bucketing logic in isolation via a {@link TestItem} whose
+ * weight comes from a {@code "weight"} entry in {@code tableProperties}. The seat-then-populate
+ * pattern is exercised end-to-end through the public {@code pack} entry point. Projection logic for
+ * production BinItems (e.g. {@link TotalFilesBinItem}) is covered by their own tests.
  */
 class FirstFitBinPackerTest {
 
-  @AllArgsConstructor
   @Getter
   static class TestItem implements BinItem {
     private final String operationId;
     private final long weight;
 
+    public TestItem() {
+      this("", 0L);
+    }
+
+    private TestItem(String operationId, long weight) {
+      this.operationId = operationId;
+      this.weight = weight;
+    }
+
     @Override
     public String getFullyQualifiedTableName() {
       return "db.tbl_" + operationId;
     }
-  }
-
-  /** Reads the weight from a single-entry tableProperties map keyed by {@code "weight"}. */
-  static class TestBinPacker extends FirstFitBinPacker<TestItem> {
-    TestBinPacker(long maxWeightPerBin, int maxItemsPerBin) {
-      super(maxWeightPerBin, maxItemsPerBin);
-    }
 
     @Override
-    protected TestItem create(TableOperationDto operation, TableStatsDto stats) {
-      long weight = Long.parseLong(stats.getTableProperties().get("weight"));
-      return new TestItem(operation.getId(), weight);
+    public BinItem fromOpAndStats(TableOperationDto op, TableStatsDto stats) {
+      long w = Long.parseLong(stats.getTableProperties().get("weight"));
+      return new TestItem(op.getId(), w);
     }
   }
 
@@ -59,7 +61,7 @@ private static List<TableOperationDto> opsList(String... ids) {
   }
 
   private static Map<String, TableStatsDto> statsMap(Object... uuidWeightPairs) {
-    Map<String, TableStatsDto> map = new java.util.HashMap<>();
+    Map<String, TableStatsDto> map = new HashMap<>();
     for (int i = 0; i < uuidWeightPairs.length; i += 2) {
       String uuid = (String) uuidWeightPairs[i];
       long weight = (long) uuidWeightPairs[i + 1];
@@ -68,34 +70,36 @@ private static Map<String, TableStatsDto> statsMap(Object... uuidWeightPairs) {
     return map;
   }
 
+  private static FirstFitBinPacker<TestItem> packer(long maxWeight, int maxItems) {
+    return new FirstFitBinPacker<>(TestItem::new, maxWeight, maxItems);
+  }
+
   @Test
   void emptyInput_returnsEmptyGroupings() {
-    TestBinPacker packer = new TestBinPacker(100L, 10);
-    assertThat(packer.pack(List.of(), Map.of())).isEmpty();
+    assertThat(packer(100L, 10).pack(List.of(), Map.of())).isEmpty();
   }
 
   @Test
   void singleItem_oneGrouping() {
-    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
-    List<List<BinItem>> groupings = packer.pack(opsList("a"), statsMap("a", 100L));
+    List<List<BinItem>> groupings = packer(1_000_000L, 10).pack(opsList("a"), statsMap("a", 100L));
     assertThat(groupings).hasSize(1);
     assertThat(groupings.get(0)).hasSize(1);
   }
 
   @Test
   void underWeightLimit_oneGrouping() {
-    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
     List<List<BinItem>> groupings =
-        packer.pack(opsList("a", "b", "c"), statsMap("a", 300_000L, "b", 300_000L, "c", 300_000L));
+        packer(1_000_000L, 10)
+            .pack(opsList("a", "b", "c"), statsMap("a", 300_000L, "b", 300_000L, "c", 300_000L));
     assertThat(groupings).hasSize(1);
     assertThat(groupings.get(0)).hasSize(3);
   }
 
   @Test
   void overWeightLimit_twoGroupings() {
-    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
     List<List<BinItem>> groupings =
-        packer.pack(opsList("a", "b", "c"), statsMap("a", 600_000L, "b", 600_000L, "c", 400_000L));
+        packer(1_000_000L, 10)
+            .pack(opsList("a", "b", "c"), statsMap("a", 600_000L, "b", 600_000L, "c", 400_000L));
     assertThat(groupings).hasSize(2);
     // FFD: sort desc → 600, 600, 400. Place 600 → bin0; next 600 doesn't fit bin0, → bin1; 400
     // fits bin0 (total 1_000_000).
@@ -107,17 +111,17 @@ void overWeightLimit_twoGroupings() {
 
   @Test
   void itemLargerThanCap_getsOwnGrouping() {
-    TestBinPacker packer = new TestBinPacker(1_000L, 10);
-    List<List<BinItem>> groupings = packer.pack(opsList("big"), statsMap("big", 5_000L));
+    List<List<BinItem>> groupings =
+        packer(1_000L, 10).pack(opsList("big"), statsMap("big", 5_000L));
     assertThat(groupings).hasSize(1);
     assertThat(groupings.get(0)).hasSize(1);
   }
 
   @Test
   void sortedDescending_largestFirst() {
-    TestBinPacker packer = new TestBinPacker(2_000_000L, 10);
     List<List<BinItem>> groupings =
-        packer.pack(opsList("small", "large"), statsMap("small", 100L, "large", 900_000L));
+        packer(2_000_000L, 10)
+            .pack(opsList("small", "large"), statsMap("small", 100L, "large", 900_000L));
     assertThat(groupings).hasSize(1);
     List<String> ids =
         groupings.get(0).stream().map(BinItem::getOperationId).collect(Collectors.toList());
@@ -126,9 +130,9 @@ void sortedDescending_largestFirst() {
 
   @Test
   void maxItemsCap_splitsGroupings() {
-    TestBinPacker packer = new TestBinPacker(1_000_000L, 2);
     List<List<BinItem>> groupings =
-        packer.pack(opsList("a", "b", "c", "d"), statsMap("a", 1L, "b", 1L, "c", 1L, "d", 1L));
+        packer(1_000_000L, 2)
+            .pack(opsList("a", "b", "c", "d"), statsMap("a", 1L, "b", 1L, "c", 1L, "d", 1L));
     assertThat(groupings).hasSize(2);
     assertThat(groupings.get(0)).hasSize(2);
     assertThat(groupings.get(1)).hasSize(2);
@@ -136,8 +140,8 @@ void maxItemsCap_splitsGroupings() {
 
   @Test
   void operationsWithoutStats_dropped() {
-    TestBinPacker packer = new TestBinPacker(1_000_000L, 10);
-    List<List<BinItem>> groupings = packer.pack(opsList("a", "missing"), statsMap("a", 100L));
+    List<List<BinItem>> groupings =
+        packer(1_000_000L, 10).pack(opsList("a", "missing"), statsMap("a", 100L));
     assertThat(groupings).hasSize(1);
     assertThat(groupings.get(0)).hasSize(1);
     assertThat(groupings.get(0).get(0).getOperationId()).isEqualTo("a");
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
index c7171f0a8..81adbde22 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
@@ -5,16 +5,9 @@
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.model.TableOperationDto;
 import com.linkedin.openhouse.optimizer.model.TableStatsDto;
-import java.util.List;
-import java.util.Map;
 import java.util.UUID;
 import org.junit.jupiter.api.Test;
 
-/**
- * Covers the projection that {@link TotalFilesFirstFitBinPacker} applies when constructing {@link
- * TotalFilesBinItem}s — fully-qualified name, operation id, and weight derived from the snapshot's
- * current file count, with the null-safety chain that handles missing snapshot fields.
- */
 class TotalFilesBinItemTest {
 
   private static TableOperationDto op() {
@@ -27,50 +20,51 @@ private static TableOperationDto op() {
         .build();
   }
 
-  private static TableStatsDto statsWithFiles(String uuid, Long fileCount) {
+  private static TableStatsDto statsWithFiles(Long fileCount) {
     return TableStatsDto.builder()
-        .tableUuid(uuid)
         .snapshot(TableStatsDto.SnapshotMetrics.builder().numCurrentFiles(fileCount).build())
         .build();
   }
 
-  private static List<BinItem> pack(TableOperationDto op, TableStatsDto stats) {
-    TotalFilesFirstFitBinPacker packer =
-        new TotalFilesFirstFitBinPacker(Long.MAX_VALUE, Integer.MAX_VALUE);
-    List<List<BinItem>> groupings = packer.pack(List.of(op), Map.of(op.getTableUuid(), stats));
-    assertThat(groupings).hasSize(1);
-    return groupings.get(0);
+  @Test
+  void fromOpAndStats_buildsFullyQualifiedNameAndOperationId() {
+    TableOperationDto op = op();
+    BinItem item = new TotalFilesBinItem().fromOpAndStats(op, statsWithFiles(42L));
+
+    assertThat(item.getFullyQualifiedTableName()).isEqualTo("db1.tbl1");
+    assertThat(item.getOperationId()).isEqualTo(op.getId());
   }
 
   @Test
-  void projectionBuildsFullyQualifiedNameAndOperationId() {
-    TableOperationDto op = op();
-    List<BinItem> items = pack(op, statsWithFiles(op.getTableUuid(), 42L));
+  void fromOpAndStats_weightIsCurrentFileCount() {
+    BinItem item = new TotalFilesBinItem().fromOpAndStats(op(), statsWithFiles(123_456L));
+    assertThat(item.getWeight()).isEqualTo(123_456L);
+  }
 
-    assertThat(items).hasSize(1);
-    assertThat(items.get(0).getFullyQualifiedTableName()).isEqualTo("db1.tbl1");
-    assertThat(items.get(0).getOperationId()).isEqualTo(op.getId());
+  @Test
+  void fromOpAndStats_nullStats_weightIsZero() {
+    BinItem item = new TotalFilesBinItem().fromOpAndStats(op(), null);
+    assertThat(item.getWeight()).isEqualTo(0L);
   }
 
   @Test
-  void weightIsCurrentFileCount() {
-    TableOperationDto op = op();
-    List<BinItem> items = pack(op, statsWithFiles(op.getTableUuid(), 123_456L));
-    assertThat(items.get(0).getWeight()).isEqualTo(123_456L);
+  void fromOpAndStats_nullSnapshot_weightIsZero() {
+    BinItem item = new TotalFilesBinItem().fromOpAndStats(op(), TableStatsDto.builder().build());
+    assertThat(item.getWeight()).isEqualTo(0L);
   }
 
   @Test
-  void nullSnapshotFields_weightIsZero() {
-    TableOperationDto op = op();
-    TableStatsDto emptySnapshot = TableStatsDto.builder().tableUuid(op.getTableUuid()).build();
-    List<BinItem> items = pack(op, emptySnapshot);
-    assertThat(items.get(0).getWeight()).isEqualTo(0L);
+  void fromOpAndStats_nullFileCount_weightIsZero() {
+    BinItem item = new TotalFilesBinItem().fromOpAndStats(op(), statsWithFiles(null));
+    assertThat(item.getWeight()).isEqualTo(0L);
   }
 
   @Test
-  void nullFileCount_weightIsZero() {
-    TableOperationDto op = op();
-    List<BinItem> items = pack(op, statsWithFiles(op.getTableUuid(), null));
-    assertThat(items.get(0).getWeight()).isEqualTo(0L);
+  void seat_doesNotShareStateWithPopulated() {
+    TotalFilesBinItem seat = new TotalFilesBinItem();
+    BinItem populated = seat.fromOpAndStats(op(), statsWithFiles(7L));
+
+    assertThat(seat.getWeight()).isEqualTo(0L);
+    assertThat(populated.getWeight()).isEqualTo(7L);
   }
 }

From a75976b9738a22d258174fdb03e2912d9a688da2 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 13:14:51 -0700
Subject: [PATCH 10/13] style(scheduler): drop made-up 'seat' jargon; rename to
 binItemSupplier

'seat' is not a Java idiom and shows up in zero other codebases. Rename
the field to binItemSupplier and rewrite the surrounding javadoc + test
name to describe the empty no-arg-constructed instance plainly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../optimizer/scheduler/binpack/BinItem.java    |  6 +++---
 .../scheduler/binpack/FirstFitBinPacker.java    | 17 ++++++++++-------
 .../scheduler/binpack/TotalFilesBinItem.java    |  6 +++---
 .../binpack/FirstFitBinPackerTest.java          |  7 ++++---
 .../binpack/TotalFilesBinItemTest.java          |  8 ++++----
 5 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
index b4016e386..4dc9be00e 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/BinItem.java
@@ -7,10 +7,10 @@
  * One packable unit. Exposes the weight a packer keys on, plus the identity the scheduler reads
  * when it launches a Spark job (fully-qualified table name, operation id).
  *
- * <p>Implementations have a public no-arg "seat" constructor — instantiated transiently inside
- * {@link FirstFitBinPacker#pack} via a {@code Supplier<T extends BinItem>} (typically a {@code
+ * <p>Implementations have a public no-arg constructor — instantiated transiently inside {@link
+ * FirstFitBinPacker#pack} via a {@code Supplier<T extends BinItem>} (typically a {@code
  * MyItem::new} method reference) — on which {@link #fromOpAndStats} is called to return the
- * populated item. Getters on a seat are not meaningful; the seat exists for the lifetime of a
+ * populated item. Getters on the empty instance are not meaningful; it exists for the lifetime of a
  * single projection call.
  */
 public interface BinItem {
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
index b541f4c00..be94158f8 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPacker.java
@@ -13,27 +13,27 @@
 /**
  * First-fit-decreasing packing, generic over the concrete {@link BinItem} subtype {@code T}.
  * Construction takes a {@code Supplier<T>} — typically a {@code MyItem::new} method reference —
- * which the packer invokes per operation to get a seat, then calls {@link BinItem#fromOpAndStats}
- * on the seat to project the (operation, stats) pair into a populated item.
+ * which the packer invokes per operation to get an empty instance, then calls {@link
+ * BinItem#fromOpAndStats} on it to project the (operation, stats) pair into a populated item.
  *
  * <p>Sorts items by weight descending, then places each into the first group whose totals stay at
  * or below {@code maxWeightPerBin} and {@code maxItemsPerBin}. An item whose weight exceeds the cap
  * on its own goes into a group by itself. Operations whose {@code tableUuid} has no entry in {@code
  * statsByTableUuid} are dropped.
  *
- * <p>Stateless: the constructor takes only the seat factory and the cap configuration; {@link
+ * <p>Stateless: the constructor takes only the BinItem supplier and the cap configuration; {@link
  * #pack} is a pure function over its arguments. The packer is operation-agnostic — the scheduler
  * wraps each grouping into a {@link Bin} with the registered operation type.
  */
 @Slf4j
 public class FirstFitBinPacker<T extends BinItem> implements BinPacker {
 
-  private final Supplier<T> seatFactory;
+  private final Supplier<T> binItemSupplier;
   private final long maxWeightPerBin;
   private final int maxItemsPerBin;
 
-  public FirstFitBinPacker(Supplier<T> seatFactory, long maxWeightPerBin, int maxItemsPerBin) {
-    this.seatFactory = seatFactory;
+  public FirstFitBinPacker(Supplier<T> binItemSupplier, long maxWeightPerBin, int maxItemsPerBin) {
+    this.binItemSupplier = binItemSupplier;
     this.maxWeightPerBin = maxWeightPerBin;
     this.maxItemsPerBin = maxItemsPerBin;
   }
@@ -45,7 +45,10 @@ public List<List<BinItem>> pack(
         operations.stream()
             .filter(op -> statsByTableUuid.containsKey(op.getTableUuid()))
             .map(
-                op -> seatFactory.get().fromOpAndStats(op, statsByTableUuid.get(op.getTableUuid())))
+                op ->
+                    binItemSupplier
+                        .get()
+                        .fromOpAndStats(op, statsByTableUuid.get(op.getTableUuid())))
             .collect(Collectors.toList());
     List<PackingBin> packingBins =
         items.stream()
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
index 06334e21a..d9bdf135f 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItem.java
@@ -12,8 +12,8 @@
  * implementation knows nothing about which operation type it is wired up to.
  *
  * <p>Construction: callers pass {@code TotalFilesBinItem::new} as the {@code Supplier<T>} to {@link
- * FirstFitBinPacker}; the packer calls the supplier per operation to get a seat, then {@link
- * #fromOpAndStats} on the seat to get a populated copy.
+ * FirstFitBinPacker}; the packer calls the supplier per operation to get an empty instance, then
+ * {@link #fromOpAndStats} on it to get a populated copy.
  */
 @Getter
 @ToString
@@ -23,7 +23,7 @@ public class TotalFilesBinItem implements BinItem {
   private final String operationId;
   private final long weight;
 
-  /** Seat constructor: call {@link #fromOpAndStats} on the result to get a populated instance. */
+  /** Empty constructor: call {@link #fromOpAndStats} on the result to get a populated instance. */
   public TotalFilesBinItem() {
     this("", "", 0L);
   }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
index b6644ece9..fb77d3963 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/FirstFitBinPackerTest.java
@@ -13,9 +13,10 @@
 
 /**
  * Tests the {@link FirstFitBinPacker} bucketing logic in isolation via a {@link TestItem} whose
- * weight comes from a {@code "weight"} entry in {@code tableProperties}. The seat-then-populate
- * pattern is exercised end-to-end through the public {@code pack} entry point. Projection logic for
- * production BinItems (e.g. {@link TotalFilesBinItem}) is covered by their own tests.
+ * weight comes from a {@code "weight"} entry in {@code tableProperties}. The supplier-then-{@code
+ * fromOpAndStats} pattern is exercised end-to-end through the public {@code pack} entry point.
+ * Projection logic for production BinItems (e.g. {@link TotalFilesBinItem}) is covered by their own
+ * tests.
  */
 class FirstFitBinPackerTest {
 
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
index 81adbde22..bdbab91d6 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/binpack/TotalFilesBinItemTest.java
@@ -60,11 +60,11 @@ void fromOpAndStats_nullFileCount_weightIsZero() {
   }
 
   @Test
-  void seat_doesNotShareStateWithPopulated() {
-    TotalFilesBinItem seat = new TotalFilesBinItem();
-    BinItem populated = seat.fromOpAndStats(op(), statsWithFiles(7L));
+  void emptyInstance_doesNotShareStateWithPopulated() {
+    TotalFilesBinItem empty = new TotalFilesBinItem();
+    BinItem populated = empty.fromOpAndStats(op(), statsWithFiles(7L));
 
-    assertThat(seat.getWeight()).isEqualTo(0L);
+    assertThat(empty.getWeight()).isEqualTo(0L);
     assertThat(populated.getWeight()).isEqualTo(7L);
   }
 }

From 602eecb0392b4c154ef4bb66987857ae940d683c Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 13:24:47 -0700
Subject: [PATCH 11/13] refactor(scheduler): immutable registry;
 registerOperation returns a new runner
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Drops ConcurrentHashMap per PR feedback. The registry is a Map.copyOf
final field; registerOperation(type, packer) returns a new SchedulerRunner
with the additional entry — the receiver is unchanged. SchedulerRunner
loses @Component; SchedulerConfig produces it via @Bean and chains the
registration so the bean Spring publishes is the fully-registered runner.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../optimizer/scheduler/SchedulerRunner.java  | 46 +++++++++++--------
 .../scheduler/config/SchedulerConfig.java     | 36 +++++++--------
 .../scheduler/SchedulerRunnerTest.java        |  6 ++-
 3 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
index 32e0d7ce7..e10853f7a 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunner.java
@@ -14,23 +14,23 @@
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
 import java.time.Instant;
 import java.util.Comparator;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
 import java.util.Optional;
 import java.util.Set;
-import java.util.concurrent.ConcurrentHashMap;
 import java.util.stream.Collectors;
 import lombok.extern.slf4j.Slf4j;
-import org.springframework.beans.factory.annotation.Autowired;
-import org.springframework.beans.factory.annotation.Value;
 import org.springframework.data.domain.Pageable;
-import org.springframework.stereotype.Component;
 import org.springframework.transaction.annotation.Transactional;
 
 /**
- * Generic scheduler. Operation types are registered at startup via {@link #registerOperation}; for
- * each registered type the runner:
+ * Generic scheduler. Operation types are registered at construction via {@link #registerOperation},
+ * which returns a new instance with the additional entry — the registry is immutable, so the bean
+ * Spring publishes is the fully-registered runner produced in {@link
+ * com.linkedin.openhouse.optimizer.scheduler.config.SchedulerConfig}. For each registered type the
+ * runner:
  *
  * <ol>
  *   <li>Reads PENDING rows from MySQL.
@@ -46,41 +46,49 @@
  * only per-operation knowledge in the module is the {@link BinPacker} the caller registers.
  */
 @Slf4j
-@Component
 public class SchedulerRunner {
 
   private final TableOperationsRepository operationsRepo;
   private final TableStatsRepository statsRepo;
   private final JobsServiceClient jobsClient;
   private final String resultsEndpoint;
-  private final Map<OperationTypeDto, BinPacker> registry = new ConcurrentHashMap<>();
+  private final Map<OperationTypeDto, BinPacker> registry;
 
-  @Autowired
   public SchedulerRunner(
       TableOperationsRepository operationsRepo,
       TableStatsRepository statsRepo,
       JobsServiceClient jobsClient,
-      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint) {
+      String resultsEndpoint) {
+    this(operationsRepo, statsRepo, jobsClient, resultsEndpoint, Map.of());
+  }
+
+  private SchedulerRunner(
+      TableOperationsRepository operationsRepo,
+      TableStatsRepository statsRepo,
+      JobsServiceClient jobsClient,
+      String resultsEndpoint,
+      Map<OperationTypeDto, BinPacker> registry) {
     this.operationsRepo = operationsRepo;
     this.statsRepo = statsRepo;
     this.jobsClient = jobsClient;
     this.resultsEndpoint = resultsEndpoint;
+    this.registry = registry;
   }
 
   /**
-   * Register a {@link BinPacker} for an operation type. Idempotent on identical re-registration;
-   * conflicting registrations replace the prior entry. Called once per operation type at startup.
+   * Return a new {@link SchedulerRunner} whose registry is this one's plus {@code (type, packer)}.
+   * If {@code type} was already registered, the new entry replaces the prior one. Pure: the
+   * receiver is unchanged.
    */
-  public void registerOperation(OperationTypeDto operationType, BinPacker packer) {
-    registry.put(operationType, packer);
-    log.info(
-        "Registered BinPacker {} for operation type {}",
-        packer.getClass().getSimpleName(),
-        operationType);
+  public SchedulerRunner registerOperation(OperationTypeDto type, BinPacker packer) {
+    HashMap<OperationTypeDto, BinPacker> next = new HashMap<>(registry);
+    next.put(type, packer);
+    return new SchedulerRunner(
+        operationsRepo, statsRepo, jobsClient, resultsEndpoint, Map.copyOf(next));
   }
 
   public Set<OperationTypeDto> getRegisteredOperationTypes() {
-    return Set.copyOf(registry.keySet());
+    return registry.keySet();
   }
 
   public void schedule(OperationTypeDto type) {
diff --git a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
index a06dfbe5d..124860943 100644
--- a/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
+++ b/services/optimizer/scheduler/src/main/java/com/linkedin/openhouse/optimizer/scheduler/config/SchedulerConfig.java
@@ -1,22 +1,21 @@
 package com.linkedin.openhouse.optimizer.scheduler.config;
 
 import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
+import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
+import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
 import com.linkedin.openhouse.optimizer.scheduler.SchedulerRunner;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.TotalFilesBinItem;
 import com.linkedin.openhouse.optimizer.scheduler.client.JobsServiceClient;
-import javax.annotation.PostConstruct;
-import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.beans.factory.annotation.Value;
 import org.springframework.context.annotation.Bean;
 import org.springframework.context.annotation.Configuration;
 import org.springframework.web.reactive.function.client.WebClient;
 
 /**
- * Cross-cutting wiring (jobs-service client) plus the per-operation-type registrations on the
- * {@link SchedulerRunner}. The {@link #registerOperations()} method is the one place each
- * operation's identity (type, packing strategy, item prototype) is composed; the scheduler itself
- * never names an operation type beyond the keys in its registry.
+ * Cross-cutting wiring (jobs-service client) plus the {@link SchedulerRunner} bean. Each operation
+ * type's identity (type, packing strategy, item supplier) is composed in {@link #schedulerRunner};
+ * the runner itself never names an operation type beyond the keys in its registry.
  */
 @Configuration
 public class SchedulerConfig {
@@ -27,14 +26,6 @@ public class SchedulerConfig {
   @Value("${optimizer.scheduler.cluster-id}")
   private String clusterId;
 
-  @Value("${optimizer.scheduler.ofd.max-files-per-bin}")
-  private long ofdMaxFilesPerBin;
-
-  @Value("${optimizer.scheduler.ofd.max-tables-per-bin}")
-  private int ofdMaxTablesPerBin;
-
-  @Autowired private SchedulerRunner schedulerRunner;
-
   @Bean
   public WebClient jobsWebClient() {
     return WebClient.builder().baseUrl(jobsBaseUri).build();
@@ -50,10 +41,17 @@ public JobsServiceClient jobsServiceClient(WebClient jobsWebClient) {
    * with file count — per-file list, manifest joins, and delete calls dominate independent of file
    * size.
    */
-  @PostConstruct
-  public void registerOperations() {
-    schedulerRunner.registerOperation(
-        OperationTypeDto.ORPHAN_FILES_DELETION,
-        new FirstFitBinPacker<>(TotalFilesBinItem::new, ofdMaxFilesPerBin, ofdMaxTablesPerBin));
+  @Bean
+  public SchedulerRunner schedulerRunner(
+      TableOperationsRepository operationsRepo,
+      TableStatsRepository statsRepo,
+      JobsServiceClient jobsClient,
+      @Value("${optimizer.scheduler.results-endpoint}") String resultsEndpoint,
+      @Value("${optimizer.scheduler.ofd.max-files-per-bin}") long ofdMaxFilesPerBin,
+      @Value("${optimizer.scheduler.ofd.max-tables-per-bin}") int ofdMaxTablesPerBin) {
+    return new SchedulerRunner(operationsRepo, statsRepo, jobsClient, resultsEndpoint)
+        .registerOperation(
+            OperationTypeDto.ORPHAN_FILES_DELETION,
+            new FirstFitBinPacker<>(TotalFilesBinItem::new, ofdMaxFilesPerBin, ofdMaxTablesPerBin));
   }
 }
diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index 82a358014..68eef6081 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -49,8 +49,10 @@ class SchedulerRunnerTest {
   void setUp() {
     // A real packer — the runner exercises the full pipeline against actual bucketing and the
     // packer's projection logic, while the IO is mocked.
-    runner = new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT);
-    runner.registerOperation(OFD, new FirstFitBinPacker<>(TotalFilesBinItem::new, 1_000_000L, 50));
+    runner =
+        new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT)
+            .registerOperation(
+                OFD, new FirstFitBinPacker<>(TotalFilesBinItem::new, 1_000_000L, 50));
   }
 
   // ---- Stubbing helpers ----

From 5fcc34e1ce49c91344f602510f194f654aeb3eb2 Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 13:27:26 -0700
Subject: [PATCH 12/13] style(scheduler-test): drop OFD/OFD_DB abbreviation
 aliases

Aliasing enum values to abbreviated constants is the worst of both:
abbreviated *and* an unnecessary indirection. Static-import
OperationTypeDto.ORPHAN_FILES_DELETION and use OperationType.ORPHAN_FILES_DELETION
for the DB enum directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../scheduler/SchedulerRunnerTest.java        | 40 ++++++++++---------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index 68eef6081..cb58fe307 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -1,5 +1,6 @@
 package com.linkedin.openhouse.optimizer.scheduler;
 
+import static com.linkedin.openhouse.optimizer.model.OperationTypeDto.ORPHAN_FILES_DELETION;
 import static org.assertj.core.api.Assertions.assertThat;
 import static org.assertj.core.api.Assertions.assertThatThrownBy;
 import static org.mockito.ArgumentMatchers.any;
@@ -11,10 +12,10 @@
 import static org.mockito.Mockito.when;
 
 import com.linkedin.openhouse.optimizer.db.OperationStatus;
+import com.linkedin.openhouse.optimizer.db.OperationType;
 import com.linkedin.openhouse.optimizer.db.SnapshotMetrics;
 import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
 import com.linkedin.openhouse.optimizer.db.TableStatsRow;
-import com.linkedin.openhouse.optimizer.model.OperationTypeDto;
 import com.linkedin.openhouse.optimizer.repository.TableOperationsRepository;
 import com.linkedin.openhouse.optimizer.repository.TableStatsRepository;
 import com.linkedin.openhouse.optimizer.scheduler.binpack.FirstFitBinPacker;
@@ -34,9 +35,6 @@
 @ExtendWith(MockitoExtension.class)
 class SchedulerRunnerTest {
 
-  private static final OperationTypeDto OFD = OperationTypeDto.ORPHAN_FILES_DELETION;
-  private static final com.linkedin.openhouse.optimizer.db.OperationType OFD_DB =
-      com.linkedin.openhouse.optimizer.db.OperationType.ORPHAN_FILES_DELETION;
   private static final String RESULTS_ENDPOINT = "http://localhost:8080/v1/optimizer/operations";
 
   @Mock private TableOperationsRepository operationsRepo;
@@ -52,14 +50,15 @@ void setUp() {
     runner =
         new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT)
             .registerOperation(
-                OFD, new FirstFitBinPacker<>(TotalFilesBinItem::new, 1_000_000L, 50));
+                ORPHAN_FILES_DELETION,
+                new FirstFitBinPacker<>(TotalFilesBinItem::new, 1_000_000L, 50));
   }
 
   // ---- Stubbing helpers ----
 
   private void stubFindPending(List<TableOperationsRow> rows) {
     when(operationsRepo.find(
-            eq(Optional.of(OFD_DB)),
+            eq(Optional.of(OperationType.ORPHAN_FILES_DELETION)),
             eq(Optional.of(OperationStatus.PENDING)),
             eq(Optional.empty()),
             eq(Optional.empty()),
@@ -89,7 +88,7 @@ private TableOperationsRow pendingRow(String uuid, String db, String table) {
         .tableUuid(uuid)
         .databaseName(db)
         .tableName(table)
-        .operationType(OFD_DB)
+        .operationType(OperationType.ORPHAN_FILES_DELETION)
         .status(OperationStatus.PENDING)
         .createdAt(Instant.now())
         .build();
@@ -113,21 +112,21 @@ void schedule_unknownOperationType_throws() {
     SchedulerRunner empty =
         new SchedulerRunner(operationsRepo, statsRepo, jobsClient, RESULTS_ENDPOINT);
 
-    assertThatThrownBy(() -> empty.schedule(OFD))
+    assertThatThrownBy(() -> empty.schedule(ORPHAN_FILES_DELETION))
         .isInstanceOf(IllegalStateException.class)
         .hasMessageContaining("No BinPacker registered");
   }
 
   @Test
   void getRegisteredOperationTypes_returnsRegisteredSet() {
-    assertThat(runner.getRegisteredOperationTypes()).containsExactly(OFD);
+    assertThat(runner.getRegisteredOperationTypes()).containsExactly(ORPHAN_FILES_DELETION);
   }
 
   @Test
   void schedule_noPendingOps_noJobSubmitted() {
     stubFindPending(List.of());
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
   }
@@ -138,7 +137,7 @@ void schedule_allOpsWithoutStats_noJobSubmitted() {
     stubFindPending(List.of(row));
     when(statsRepo.findAllById(any())).thenReturn(List.of());
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
   }
@@ -160,7 +159,7 @@ void schedule_singleBin_claimsAndMarksScheduled() {
     when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
         .thenReturn(Optional.of("job-123"));
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     verify(operationsRepo)
         .updateBatch(
@@ -175,7 +174,12 @@ void schedule_singleBin_claimsAndMarksScheduled() {
 
     ArgumentCaptor<List<String>> tableNames = ArgumentCaptor.forClass(List.class);
     verify(jobsClient)
-        .launch(anyString(), eq(OFD.name()), tableNames.capture(), anyList(), anyString());
+        .launch(
+            anyString(),
+            eq(ORPHAN_FILES_DELETION.name()),
+            tableNames.capture(),
+            anyList(),
+            anyString());
     assertThat(tableNames.getValue()).containsExactly("db1.tbl1");
   }
 
@@ -196,7 +200,7 @@ void schedule_jobLaunchFails_marksPendingForRetry() {
             anyList(), eq(OperationStatus.SCHEDULING), eq(OperationStatus.PENDING), any(), any()))
         .thenReturn(1);
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     verify(operationsRepo)
         .updateBatch(
@@ -222,7 +226,7 @@ void schedule_rowsAlreadyClaimed_skipsSubmit() {
         .thenReturn(0);
     stubFindClaimed(List.of());
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     verify(jobsClient, never()).launch(anyString(), anyString(), anyList(), anyList(), anyString());
     verify(operationsRepo, never())
@@ -256,7 +260,7 @@ void schedule_cancelsDuplicatePendingPerCycle() {
     when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
         .thenReturn(Optional.of("job-dedup"));
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     ArgumentCaptor<List<String>> cancelled = ArgumentCaptor.forClass(List.class);
     verify(operationsRepo).cancel(cancelled.capture());
@@ -284,7 +288,7 @@ void schedule_partialClaim_launchesAndMarksOnlyClaimedSubset() {
     when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
         .thenReturn(Optional.of("job-partial"));
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     ArgumentCaptor<List<String>> launchedTableNames = ArgumentCaptor.forClass(List.class);
     ArgumentCaptor<List<String>> launchedOpIds = ArgumentCaptor.forClass(List.class);
@@ -326,7 +330,7 @@ void schedule_opsWithoutStats_skipped() {
     when(jobsClient.launch(anyString(), anyString(), anyList(), anyList(), anyString()))
         .thenReturn(Optional.of("job-skip"));
 
-    runner.schedule(OFD);
+    runner.schedule(ORPHAN_FILES_DELETION);
 
     ArgumentCaptor<List<String>> ids = ArgumentCaptor.forClass(List.class);
     verify(operationsRepo)

From 38064293df03760bfe131e356efeb6862e20ce2d Mon Sep 17 00:00:00 2001
From: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Date: Tue, 2 Jun 2026 13:30:20 -0700
Subject: [PATCH 13/13] style(scheduler-test): drive op-type from the model;
 .toDb() at matcher sites

The DB OperationType is an internal mapping; the test should reference
only OperationTypeDto and call .toDb() where the repo matcher needs the
DB enum.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .../openhouse/optimizer/scheduler/SchedulerRunnerTest.java   | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
index cb58fe307..dcd7ec975 100644
--- a/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
+++ b/services/optimizer/scheduler/src/test/java/com/linkedin/openhouse/optimizer/scheduler/SchedulerRunnerTest.java
@@ -12,7 +12,6 @@
 import static org.mockito.Mockito.when;
 
 import com.linkedin.openhouse.optimizer.db.OperationStatus;
-import com.linkedin.openhouse.optimizer.db.OperationType;
 import com.linkedin.openhouse.optimizer.db.SnapshotMetrics;
 import com.linkedin.openhouse.optimizer.db.TableOperationsRow;
 import com.linkedin.openhouse.optimizer.db.TableStatsRow;
@@ -58,7 +57,7 @@ void setUp() {
 
   private void stubFindPending(List<TableOperationsRow> rows) {
     when(operationsRepo.find(
-            eq(Optional.of(OperationType.ORPHAN_FILES_DELETION)),
+            eq(Optional.of(ORPHAN_FILES_DELETION.toDb())),
             eq(Optional.of(OperationStatus.PENDING)),
             eq(Optional.empty()),
             eq(Optional.empty()),
@@ -88,7 +87,7 @@ private TableOperationsRow pendingRow(String uuid, String db, String table) {
         .tableUuid(uuid)
         .databaseName(db)
         .tableName(table)
-        .operationType(OperationType.ORPHAN_FILES_DELETION)
+        .operationType(ORPHAN_FILES_DELETION.toDb())
         .status(OperationStatus.PENDING)
         .createdAt(Instant.now())
         .build();