The DataFrame still has lots of missing functions that need to be implemented. This is the umbrella item for that.
Most of them should be relatively mechanical, but some require some additional work.
| Method |
Support |
agg |
✅ |
alias |
✅ |
approxQuantile |
✅ |
cache |
✅ |
checkpoint |
❌ |
coalesce |
✅ |
colRegex |
✅ (OfDFWithRegex) |
collect |
✅ |
columns |
✅ |
corr |
✅ |
count |
✅ |
cov |
✅ |
createGlobalTempView |
✅ |
createOrReplaceGlobalTempView |
✅ |
createOrReplaceTempView |
✅ |
createTempView |
✅ |
crossJoin |
✅ |
crosstab |
✅ |
cube |
✅ |
describe |
✅ |
distinct |
✅ |
drop |
✅ |
dropDuplicates |
✅ |
dropDuplicatesWithinWatermark |
❌ |
dropna |
✅ |
dtypes |
❌ |
exceptAll |
✅ |
executionInfo |
❌ |
explain |
✅ |
fillna |
✅ |
filter |
✅ |
first |
✅ |
foreach |
➖ (needs native UDFs) |
foreachPartition |
➖ (needs native UDFs) |
freqItems |
✅ |
groupBy |
✅ (some restrictions apply) |
head |
✅ |
hint |
❌ |
inputFiles |
❌ |
intersect |
✅ |
intersectAll |
✅ |
isEmpty |
✅ |
isLocal |
❌ |
isStreaming |
❌ |
is_cached |
❌ |
join |
✅ |
limit |
✅ |
localCheckpoint |
❌ |
mapInArrow |
➖ (needs native UDFs) |
mapInPandas |
➖ (needs native UDFs) |
melt |
✅ |
mergeInto |
❌ |
na |
✅ |
observe |
❌ |
offset |
✅ |
orderBy |
✅ |
pandas_api |
➖ |
persist |
✅ |
printSchema |
❌ |
randomSplit |
✅ |
rdd |
➖ |
registerTempTable |
❌ |
repartition |
✅ |
repartitionByRange |
✅ |
replace |
✅ |
rollup |
✅ |
sameSemantics |
✅ |
sample |
✅ |
sampleBy |
❌ |
schema |
✅ |
select |
✅ |
selectExpr |
✅ |
semanticHash |
✅ |
show |
✅ |
sort |
✅ |
sortWithinPartitions |
✅ |
sparkSession |
❌ |
stat |
✅ |
storageLevel |
✅ |
subtract |
✅ |
summary |
✅ |
tail |
✅ |
take |
✅ |
to |
❌ |
toArrow |
✅ |
toDF |
❌ |
toJSON |
❌ |
toLocalIterator |
❌ |
toPandas |
➖ |
transform |
❌ |
union |
✅ |
unionAll |
✅ |
unionByName |
✅ |
unpersist |
✅ |
unpivot |
✅ |
where |
✅ |
withColumn |
✅ |
withColumnRenamed |
✅ |
withColumns |
✅ |
withColumnsRenamed |
✅ |
withMetadata |
✅ |
withWatermark |
✅ |
write |
✅ |
writeStream |
❌ |
writeTo |
❌ |
The DataFrame still has lots of missing functions that need to be implemented. This is the umbrella item for that.
Most of them should be relatively mechanical, but some require some additional work.
aggaliasapproxQuantilecachecheckpointcoalescecolRegexOfDFWithRegex)collectcolumnscorrcountcovcreateGlobalTempViewcreateOrReplaceGlobalTempViewcreateOrReplaceTempViewcreateTempViewcrossJoincrosstabcubedescribedistinctdropdropDuplicatesdropDuplicatesWithinWatermarkdropnadtypesexceptAllexecutionInfoexplainfillnafilterfirstforeachforeachPartitionfreqItemsgroupByheadhintinputFilesintersectintersectAllisEmptyisLocalisStreamingis_cachedjoinlimitlocalCheckpointmapInArrowmapInPandasmeltmergeIntonaobserveoffsetorderBypandas_apipersistprintSchemarandomSplitrddregisterTempTablerepartitionrepartitionByRangereplacerollupsameSemanticssamplesampleByschemaselectselectExprsemanticHashshowsortsortWithinPartitionssparkSessionstatstorageLevelsubtractsummarytailtaketotoArrowtoDFtoJSONtoLocalIteratortoPandastransformunionunionAllunionByNameunpersistunpivotwherewithColumnwithColumnRenamedwithColumnswithColumnsRenamedwithMetadatawithWatermarkwritewriteStreamwriteTo