Skip to content

Add a minimal micro-benchmark suite to validate filtering hot-path optimizations #690

@sentomk

Description

@sentomk

While reading the scan-planning filtering path, I found a small optimization in the metrics evaluators. Using it as a concrete example to raise a broader question about how to validate this kind of change.

Proposed change

The metrics evaluators run per data file. Each predicate currently calls expr->reference() repeatedly, and reference() returns a shared_ptr via shared_from_this() — an atomic refcount bump every time. The StrictMetricsEvaluator macro even discards a dynamic_cast result only to re-fetch the same reference:

- #define RETURN_IF_NOT_REFERENCE(expr)                                         \
-   if (auto ref = dynamic_cast<BoundReference*>(expr.get()); ref == nullptr) { \
-     return kRowsMightNotMatch;                                                \
-   }
+ #define BIND_REFERENCE_OR_RETURN(ref, expr)                            \
+   const auto* ref = dynamic_cast<const BoundReference*>((expr).get()); \
+   if (ref == nullptr) {                                                \
+     return kRowsMightNotMatch;                                         \
+   }

    Result<bool> IsNull(const std::shared_ptr<Bound>& expr) override {
-     RETURN_IF_NOT_REFERENCE(expr);
-     int32_t id = expr->reference()->field().field_id();
+     BIND_REFERENCE_OR_RETURN(ref, expr);
+     int32_t id = ref->field().field_id();
      ...

Reusing the cast result drops the repeated virtual reference() calls (and their atomic ops) across every predicate, with no behavior change.

Expected benefit

The win is on the CPU-bound filtering step, evaluated in isolation. Scan planning as a whole is IO-bound, so on an e2e scan this kind of change is almost certainly unmeasurable — which is exactly why it needs to be measured on the filtering step alone.

Which raises the question: do we need a benchmark suite?

This is exactly the kind of change that's hard to justify without one. The repo has no benchmark infrastructure today, only the gtest suite. A minimal benchmark on the filtering path would let us measure such changes on the CPU-bound step alone, rather than guessing or claiming a win against IO-dominated planning.

So before going further:

  1. How should we add it — Google Benchmark fetched the same way googletest already is, behind an off-by-default CMake option?
  2. Where should it live — a top-level benchmark/, or co-located under src/iceberg/**/?

I'm happy to put up a draft PR for a minimal suite + the filtering benchmark above once there's agreement on direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions