Skip to content

feat: Add spark-compat mode to integrate datafusion-spark features au…#1416

Merged
milenkovicm merged 2 commits intoapache:mainfrom
mattcuento:spark-compatibility-mode
Feb 11, 2026
Merged

feat: Add spark-compat mode to integrate datafusion-spark features au…#1416
milenkovicm merged 2 commits intoapache:mainfrom
mattcuento:spark-compatibility-mode

Conversation

@mattcuento
Copy link
Copy Markdown
Contributor

@mattcuento mattcuento commented Jan 27, 2026

…tomatically

Which issue does this PR close?

Closes #1397.

Rationale for this change

Exposing datafusion-spark functions as a 'bundled' feature with Ballista. This simplifies augmenting Ballista with spark features for quicker and more simple experimentation/adoption.

Documentation has been added in the user guide to describe how to enable these functions in builds.

What changes are included in this PR?

  • Introduce spark-compat feature to register all spark scalar/agg/window/table functions by default
  • Created helper datafusion_x_functions to register default (and spark) functions if applicable in the SessionState for scheduler/client usage
  • Extended most usages of session state builder to register functions using the helper
  • Register spark functions if applicable in the executor via BallistaFunctionRegistry::default()

I've rendered the docs locally to ensure all looks well.

Are there any user-facing changes?

  • New feature flag spark-compat to automatically register Spark-compatible scalar, aggregate, and window functions from datafusion-spark

@mattcuento
Copy link
Copy Markdown
Contributor Author

I'm not certain that we have a well-defined place to document compile-time features, just docs for runtime configurations. I'd like to follow this up with a markdown file to describe our features

@milenkovicm
Copy link
Copy Markdown
Contributor

I'm not certain that we have a well-defined place to document compile-time features, just docs for runtime configurations. I'd like to follow this up with a markdown file to describe our features

maybe section in the readme file?

@mattcuento mattcuento force-pushed the spark-compatibility-mode branch from b1cef8d to caa8f6e Compare January 28, 2026 17:04
Comment thread ballista/core/src/extension.rs Outdated
@mattcuento mattcuento force-pushed the spark-compatibility-mode branch from caa8f6e to 92793e6 Compare January 28, 2026 17:14
@mattcuento
Copy link
Copy Markdown
Contributor Author

I'm not certain that we have a well-defined place to document compile-time features, just docs for runtime configurations. I'd like to follow this up with a markdown file to describe our features

maybe section in the readme file?

Threw up a separate PR: https://github.com/apache/datafusion-ballista/pull/1418/changes

@mattcuento mattcuento marked this pull request as ready for review January 28, 2026 17:46
Copy link
Copy Markdown
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mattcuento, great addition

i'm not sure if we have to add table functions into registry. also if we there are table functions to be supported, we may need to create encoder for them. so I'd suggest to ignore table functions for now and take them as a follow up if you agree

Comment thread ballista/core/src/registry.rs Outdated
Comment thread ballista/executor/src/executor_server.rs Outdated
@andygrove
Copy link
Copy Markdown
Member

I'm not certain that we have a well-defined place to document compile-time features, just docs for runtime configurations. I'd like to follow this up with a markdown file to describe our features

maybe section in the readme file?

Adding Spark compatibility is a pretty major feature(!), so I think there should be a documentation update as part of this PR. I would be careful to explain it as "using Spark-compatible expressions" where available, rather than full compatibility.

@mattcuento
Copy link
Copy Markdown
Contributor Author

@milenkovicm

also if we there are table functions to be supported, we may need to create encoder for them. so I'd suggest to ignore table functions for now and take them as a follow up if you agree

Yup agreed, happy to take the easy wins for scalar/window/aggregate for now and can see what we can do with table functions as a follow up!

@andygrove

Adding Spark compatibility is a pretty major feature(!), so I think there should be a documentation update as part of this PR. I would be careful to explain it as "using Spark-compatible expressions" where available, rather than full compatibility.

Good point, I can write something up as a part of this PR 👍

@mattcuento mattcuento marked this pull request as draft February 4, 2026 03:43
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Feb 4, 2026
@mattcuento mattcuento force-pushed the spark-compatibility-mode branch 4 times, most recently from 93bb6a5 to 8df2329 Compare February 4, 2026 16:27
@mattcuento mattcuento marked this pull request as ready for review February 4, 2026 16:32
@milenkovicm
Copy link
Copy Markdown
Contributor

sorry for delay @mattcuento I'm behind reviews (and everything else :) ) had a quick look and it looks ok with me. Will try to do proper review tomorrow or over the weekend.

maybe @andygrove has some more ideas what should be addressed

@mattcuento
Copy link
Copy Markdown
Contributor Author

No worries! Sounds good to me. If there's any PRs I can help with reviewing as a new contributor to help alleviate a bit, let me know!

Copy link
Copy Markdown
Contributor

@milenkovicm milenkovicm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @mattcuento
just one comment to address, otherwise its a good in my opinion

Comment thread examples/Cargo.toml Outdated

[features]
default = ["substrait", "standalone"]
spark-compat = ["ballista-core/spark-compat"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to add

[[example]]
name = "remote-spark-functions"
required-features = ["spark-compat"]

at the bottom of the file, so it triggers so it triggers build with spark-compact

please add same for substrait example and remove it from default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good call, added for both, did substrait in a separate commit. Ran both locally. Thanks!

@mattcuento mattcuento force-pushed the spark-compatibility-mode branch from 8df2329 to 86606bd Compare February 11, 2026 02:34
@milenkovicm
Copy link
Copy Markdown
Contributor

Thanks @mattcuento
If we want to improve this further we can open new PRs

@milenkovicm milenkovicm merged commit bca52c3 into apache:main Feb 11, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Spark compatibility mode using datafusion-spark expressions

3 participants