While reading the scan-planning filtering path, I found a small optimization in the metrics evaluators. Using it as a concrete example to raise a broader question about how to validate this kind of change.
Proposed change
The metrics evaluators run per data file. Each predicate currently calls expr->reference() repeatedly, and reference() returns a shared_ptr via shared_from_this() — an atomic refcount bump every time. The StrictMetricsEvaluator macro even discards a dynamic_cast result only to re-fetch the same reference:
- #define RETURN_IF_NOT_REFERENCE(expr) \
- if (auto ref = dynamic_cast<BoundReference*>(expr.get()); ref == nullptr) { \
- return kRowsMightNotMatch; \
- }
+ #define BIND_REFERENCE_OR_RETURN(ref, expr) \
+ const auto* ref = dynamic_cast<const BoundReference*>((expr).get()); \
+ if (ref == nullptr) { \
+ return kRowsMightNotMatch; \
+ }
Result<bool> IsNull(const std::shared_ptr<Bound>& expr) override {
- RETURN_IF_NOT_REFERENCE(expr);
- int32_t id = expr->reference()->field().field_id();
+ BIND_REFERENCE_OR_RETURN(ref, expr);
+ int32_t id = ref->field().field_id();
...
Reusing the cast result drops the repeated virtual reference() calls (and their atomic ops) across every predicate, with no behavior change.
Expected benefit
The win is on the CPU-bound filtering step, evaluated in isolation. Scan planning as a whole is IO-bound, so on an e2e scan this kind of change is almost certainly unmeasurable — which is exactly why it needs to be measured on the filtering step alone.
Which raises the question: do we need a benchmark suite?
This is exactly the kind of change that's hard to justify without one. The repo has no benchmark infrastructure today, only the gtest suite. A minimal benchmark on the filtering path would let us measure such changes on the CPU-bound step alone, rather than guessing or claiming a win against IO-dominated planning.
So before going further:
- How should we add it — Google Benchmark fetched the same way googletest already is, behind an off-by-default CMake option?
- Where should it live — a top-level
benchmark/, or co-located under src/iceberg/**/?
I'm happy to put up a draft PR for a minimal suite + the filtering benchmark above once there's agreement on direction.
While reading the scan-planning filtering path, I found a small optimization in the metrics evaluators. Using it as a concrete example to raise a broader question about how to validate this kind of change.
Proposed change
The metrics evaluators run per data file. Each predicate currently calls
expr->reference()repeatedly, andreference()returns ashared_ptrviashared_from_this()— an atomic refcount bump every time. TheStrictMetricsEvaluatormacro even discards adynamic_castresult only to re-fetch the same reference:Reusing the cast result drops the repeated virtual
reference()calls (and their atomic ops) across every predicate, with no behavior change.Expected benefit
The win is on the CPU-bound filtering step, evaluated in isolation. Scan planning as a whole is IO-bound, so on an e2e scan this kind of change is almost certainly unmeasurable — which is exactly why it needs to be measured on the filtering step alone.
Which raises the question: do we need a benchmark suite?
This is exactly the kind of change that's hard to justify without one. The repo has no benchmark infrastructure today, only the gtest suite. A minimal benchmark on the filtering path would let us measure such changes on the CPU-bound step alone, rather than guessing or claiming a win against IO-dominated planning.
So before going further:
benchmark/, or co-located undersrc/iceberg/**/?I'm happy to put up a draft PR for a minimal suite + the filtering benchmark above once there's agreement on direction.