Add a minimal micro-benchmark suite to validate filtering hot-path optimizations

While reading the scan-planning filtering path, I found a small optimization in the metrics evaluators. Using it as a concrete example to raise a broader question about how to validate this kind of change.

## Proposed change

The metrics evaluators run per data file. Each predicate currently calls `expr->reference()` repeatedly, and `reference()` returns a `shared_ptr` via `shared_from_this()` — an atomic refcount bump every time. The `StrictMetricsEvaluator` macro even discards a `dynamic_cast` result only to re-fetch the same reference:

```diff
- #define RETURN_IF_NOT_REFERENCE(expr)                                         \
-   if (auto ref = dynamic_cast<BoundReference*>(expr.get()); ref == nullptr) { \
-     return kRowsMightNotMatch;                                                \
-   }
+ #define BIND_REFERENCE_OR_RETURN(ref, expr)                            \
+   const auto* ref = dynamic_cast<const BoundReference*>((expr).get()); \
+   if (ref == nullptr) {                                                \
+     return kRowsMightNotMatch;                                         \
+   }

    Result<bool> IsNull(const std::shared_ptr<Bound>& expr) override {
-     RETURN_IF_NOT_REFERENCE(expr);
-     int32_t id = expr->reference()->field().field_id();
+     BIND_REFERENCE_OR_RETURN(ref, expr);
+     int32_t id = ref->field().field_id();
      ...
```

Reusing the cast result drops the repeated virtual `reference()` calls (and their atomic ops) across every predicate, with no behavior change.

## Expected benefit

The win is on the CPU-bound filtering step, evaluated in isolation. Scan planning as a whole is IO-bound, so on an e2e scan this kind of change is almost certainly unmeasurable — which is exactly why it needs to be measured on the filtering step alone.

## Which raises the question: do we need a benchmark suite?

This is exactly the kind of change that's hard to justify without one. The repo has no benchmark infrastructure today, only the gtest suite. A minimal benchmark on the filtering path would let us measure such changes on the CPU-bound step alone, rather than guessing or claiming a win against IO-dominated planning.

So before going further:

1. **How** should we add it — Google Benchmark fetched the same way googletest already is, behind an off-by-default CMake option?
2. **Where** should it live — a top-level `benchmark/`, or co-located under `src/iceberg/**/`?

I'm happy to put up a draft PR for a minimal suite + the filtering benchmark above once there's agreement on direction.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a minimal micro-benchmark suite to validate filtering hot-path optimizations #690

Proposed change

Expected benefit

Which raises the question: do we need a benchmark suite?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add a minimal micro-benchmark suite to validate filtering hot-path optimizations #690

Description

Proposed change

Expected benefit

Which raises the question: do we need a benchmark suite?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions