-
Notifications
You must be signed in to change notification settings - Fork 3
#698 Fix the backfill logic for late jobs. #701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe PR modifies backfill scheduling logic in Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@pramen/pom.xml`:
- Line 114: The pom currently sets <spark.version> but uses
${maven.scala.version} for Iceberg; add an explicit <iceberg.version> property
(e.g., 1.6.1) and update the Iceberg dependency to reference ${iceberg.version}
instead of ${maven.scala.version}; also ensure the spark-avro dependency/version
matches <spark.version> (3.5.8) exactly so spark-avro uses 3.5.8.
In `@pramen/project/Versions.scala`:
- Line 22: The project pins Spark 3.5.8 for Scala 2.13
(defaultSparkVersionForScala213) but the Kafka clients version used
(kafkaClientsVersion) is 3.9.0, which mismatches Spark 3.5.x's tested dependency
(3.4.1); update the kafkaClientsVersion constant to "3.4.1" to match Spark 3.5
compatibility or, if you must keep "3.9.0", add a clear comment next to
kafkaClientsVersion and run/record validation tests for SASL/SSL and
exactly-once streaming scenarios to prove compatibility before merging.
pramen/project/Versions.scala
Outdated
| val defaultSparkVersionForScala211 = "2.4.8" | ||
| val defaultSparkVersionForScala212 = "3.3.4" | ||
| val defaultSparkVersionForScala213 = "3.4.4" | ||
| val defaultSparkVersionForScala213 = "3.5.8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, examine the Versions.scala file to understand the full context
find . -name "Versions.scala" -type f | head -5Repository: AbsaOSS/pramen
Length of output: 89
🏁 Script executed:
# Search for the file and read its contents
if [ -f "pramen/project/Versions.scala" ]; then
cat -n pramen/project/Versions.scala
fiRepository: AbsaOSS/pramen
Length of output: 7133
🏁 Script executed:
# Look for dependency mappings in build files
fd -e "build.sbt" -e "build.scala" -e "Dependencies.scala" | head -10Repository: AbsaOSS/pramen
Length of output: 40
🌐 Web query:
Spark 3.5.8 release Scala 2.13 support official
💡 Result:
Yes—Apache Spark 3.5.8 officially supports Scala 2.13.
- The official Apache download artifacts for Spark 3.5.8 include a pre-built binary explicitly labeled
spark-3.5.8-bin-hadoop3-scala2.13.tgz, published ondownloads.apache.org. [2] - The official Spark downloads page states that Spark 3.2+ provides an additional pre-built distribution with Scala 2.13 (in addition to the default Scala 2.12 builds). [3]
- Spark’s official build documentation for the 3.5 line states Spark requires Scala 2.12/2.13. [4]
- Spark 3.5.8 is an official Spark 3.5 maintenance release (released Jan 15, 2026). [1]
Sources: Spark release notes/news and Apache download listings. [1][2][3][4]
🌐 Web query:
Delta Lake 3.0.0 Spark 3.5 compatibility requirements
💡 Result:
Delta Lake 3.0.0 (Delta Spark 3.0.0) compatibility requirements for Apache Spark 3.5:
- Spark version: Delta Lake 3.0.x is compatible with Apache Spark 3.5.x (i.e., you must run Spark 3.5 to use Delta 3.0.x). [1][2]
- Artifact / module: Use the renamed Maven artifact
delta-spark(it was previouslydelta-core). [3] - Scala binary version: Pick the artifact that matches your Spark build’s Scala version:
delta-spark_2.12ordelta-spark_2.13. [3] - Practical implication: If your platform/runtime doesn’t offer Spark 3.5 yet, you generally can’t use Delta Lake 3.0.0 there (you’d need an older Delta line matching your Spark version). [1][2]
Sources: [1] [2] [3] [4]
🌐 Web query:
Apache Iceberg 1.6.1 Spark 3.5 compatibility
💡 Result:
Apache Iceberg 1.6.1 is compatible with Apache Spark 3.5.
- Iceberg publishes a dedicated Spark 3.5 runtime artifact for 1.6.1 (Scala 2.12):
org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1. [1] - Spark 3.5 support was added in the Iceberg 1.5.x line (and therefore is included in 1.6.1). [2]
Typical Spark 3.5 usage
spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1Sources: Maven Central artifact listing [1], Iceberg release notes indicating Spark 3.5 support [2].
[1] https://central.sonatype.com/artifact/org.apache.iceberg/iceberg-spark-runtime-3.5_2.12/1.6.1
[2] https://iceberg.apache.org/releases/
🌐 Web query:
Kafka clients 3.9.0 Spark 3.5 compatibility
💡 Result:
- What Spark 3.5 ships/tests with: Spark’s Kafka integration artifacts in the Spark 3.5.x line depend on
org.apache.kafka:kafka-clients:3.4.1(e.g., Spark 3.5.0spark-streaming-kafka-0-10listskafka-clients 3.4.1). [1] - So is
kafka-clients:3.9.0“Spark 3.5 compatible”? It’s not the version Spark 3.5 is built/tested against, so it’s not “officially compatible” in the sense of being Spark’s pinned dependency. Practically, many users do run Spark with a newer Kafka client by overriding dependencies, but you should treat it as “may work, verify in your environment.” [1][2] - Broker-side compatibility note: Kafka 3.9.0 (released Nov 6, 2024) is in the Kafka 3.x line. [3] Kafka’s compatibility docs indicate Kafka 3.x clients are fully compatible (wire/protocol-wise) even with Kafka 4.0 brokers, which is a good sign for broker interoperability—but it doesn’t guarantee Spark’s connector has been tested with that client jar. [4]
If you plan to use kafka-clients:3.9.0 with Spark 3.5: make sure you align all Kafka jars on the classpath (avoid mixing Kafka versions transitively) and run an end-to-end streaming test (SASL/SSL, exactly-once if used, headers, etc.). [2]
Spark 3.5.8 default for Scala 2.13 is confirmed safe; however, kafka-clients version needs alignment.
Spark 3.5.8 officially supports Scala 2.13 (released Jan 2026) and the Delta Lake 3.0.0 and Iceberg 1.6.1 mappings are correct. However, the Kafka clients version at line 116 (kafka-clients:3.9.0) does not align with Spark 3.5's official dependency: Spark 3.5.x is built and tested against kafka-clients:3.4.1, not 3.9.0. While 3.9.0 may work in practice, this creates a version mismatch and should be verified in your Spark streaming scenarios (SASL/SSL, exactly-once semantics). Consider either pinning kafka-clients to 3.4.1 for Spark 3.5 compatibility or explicitly validating 3.9.0 in your environment.
🤖 Prompt for AI Agents
In `@pramen/project/Versions.scala` at line 22, The project pins Spark 3.5.8 for
Scala 2.13 (defaultSparkVersionForScala213) but the Kafka clients version used
(kafkaClientsVersion) is 3.9.0, which mismatches Spark 3.5.x's tested dependency
(3.4.1); update the kafkaClientsVersion constant to "3.4.1" to match Spark 3.5
compatibility or, if you must keep "3.9.0", add a clear comment next to
kafkaClientsVersion and run/record validation tests for SASL/SSL and
exactly-once streaming scenarios to prove compatibility before merging.
Unit Test Coverage
Files
|
37071f9 to
4c0a563
Compare
Closes #698
Summary by CodeRabbit
New Features
Dependencies
✏️ Tip: You can customize this high-level summary in your review settings.