feat: implement cast from whole numbers to binary format and bool to decimal#3083
feat: implement cast from whole numbers to binary format and bool to decimal#3083coderfender wants to merge 9 commits intoapache:mainfrom
Conversation
|
@parthchandra , @andygrove these are the changes to support int to binary cast and bool to decimal cast operation . I have also had to change the base test to make sure that we use DSL (since Int -> Binary casts are only supported in the DSL side of Spark and would raise an encoding exception if we use Spark SQL) . Also, Int -> Binary casts do not support Try or ANSI eval modes in Spark and I have followed the same convention here as well |
|
Fixed format issue with |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3083 +/- ##
============================================
+ Coverage 56.12% 59.98% +3.85%
- Complexity 976 1464 +488
============================================
Files 119 175 +56
Lines 11743 16171 +4428
Branches 2251 2685 +434
============================================
+ Hits 6591 9700 +3109
- Misses 4012 5114 +1102
- Partials 1140 1357 +217 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
andygrove
left a comment
There was a problem hiding this comment.
Thanks for adding int→binary and boolean→decimal cast support!
Issue: Dead code
The function canCastToBinary (lines 354-358 in CometCast.scala) is defined but never called anywhere. This appears to be unused code that should be removed.
private def canCastToBinary(fromType: DataType): SupportLevel = fromType match {
case DataTypes.ByteType | DataTypes.ShortType | DataTypes.IntegerType | DataTypes.LongType =>
Compatible()
case _ => Unsupported(Some(s"Cast from BinaryType to $fromType is not supported"))
}The actual logic for int→binary casts is correctly handled in the canCastFromByte, canCastFromShort, canCastFromInt, and canCastFromLong functions.
Otherwise, the implementation looks good:
- Correctly limits int→binary to LEGACY mode only (matching Spark)
- Boolean→Decimal implementation handles precision/scale correctly
- Tests use column data from parquet (not just literals)
- Compatibility docs updated
- ANSI and Try modes correctly disabled for int→binary tests
This review was generated with AI assistance.
|
There is a test failure: |
|
Thank you @andygrove ,I am working on the test failures (I believe there is a bug on the DSL side vs using Spark SQL) |
910fe4f to
69d8290
Compare
|
There is a mismatch in the error message thrown by spark vs comet for a cast test from StringType to DecimalType(2,2) . I will raise a new PR to fix that error message . Steps to Reproduce : |
|
cc : @andygrove , @parthchandra |
|
I also found that the tests are not running on truly deterministic inputs . For example this test is failing (if I disable / change inputs for a test before that) : |
|
Related PR to fix test in decimal cast : #3248 |
8ccd20a to
fb8814f
Compare
d1f1512 to
05e7e50
Compare
|
@andygrove , @parthchandra I rebased the feature with main and the CometCastSuite finishes successfully on my local machine. Please review and kickoff CI whenever you get a chance |
| if (testTry) { | ||
| data.createOrReplaceTempView("t") | ||
| // try_cast() should always return null for invalid inputs | ||
| // not using spark DSL since it `try_cast` is only available from Spark 4x |
There was a problem hiding this comment.
Not clear to me what this comment means. Do you mean that this should be run only for Spark 4.0? Maybe this should be in a different set of tests that assumes 4.0 is available?
There was a problem hiding this comment.
Sure . The try cast method is only available in spark 4 and my new cast impl doesnt even support that eval mode
There was a problem hiding this comment.
So spark added DSL method to support try_cast starting v 4.0.0 . In order to keep the build working, I kept the try_cast as a SQL statement. Also, these changes to support Int / Long -> Binary dont support ANSI or Try Eval modes so I thought to keep the changes minimal
There was a problem hiding this comment.
So what does this do in Spark 3.5? The query in the test has a try_cast in it.
There was a problem hiding this comment.
try_cast was introduced in Spark 3.2 it seems https://issues.apache.org/jira/browse/SPARK-34881
There was a problem hiding this comment.
True. That is the SQL version of it. DSL version of it got added in v4.0.0 . I initially tried to convert all SQL statements to DSL but had to hold back on updating try_cast DSL from SQL given that it is only available starting v.4 and wold cause compilation issues in other spark 3x versions
There was a problem hiding this comment.
Oh I get it now.
nit: can you align the // ?
parthchandra
left a comment
There was a problem hiding this comment.
lgtm. (minor comment)
| if (testTry) { | ||
| data.createOrReplaceTempView("t") | ||
| // try_cast() should always return null for invalid inputs | ||
| // not using spark DSL since it `try_cast` is only available from Spark 4x |
There was a problem hiding this comment.
Oh I get it now.
nit: can you align the // ?
|
Thank you for the approval @parthchandra |
|
@andygrove , @parthchandra , could you please review or potentially merge the branch whenever you get a chance |
|
Fixed comment indentation issues |
bc83230 to
b49b178
Compare
|
Rebased with main and fixed merged conflicts |

Which issue does this PR close?
try_castis only supported starting Spark 4x)Relavent spark code :
https://github.com/apache/spark/blob/f89fbd49c45045b44259341c6c73e30ee9c46dd0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L682
(Note that only legacy mode routes the code here while we Try and ANSI are not supported)
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?