Skip to content

Validate cached batch input seek bounds#15082

Open
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix-byte-array-input-seek-bounds
Open

Validate cached batch input seek bounds#15082
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix-byte-array-input-seek-bounds

Conversation

@fallintoplace

@fallintoplace fallintoplace commented Jun 14, 2026

Copy link
Copy Markdown

No issue filed.

Description

Tighten the cached batch parquet input stream seek validation so it accepts only positions within the backing byte array. This gives a clear error before calling into ByteBuffer.position.

Checklists

Documentation

  • Updated for new or modified user-facing features or behaviors
  • No user-facing change

Testing

  • Added or modified tests to cover new code paths
  • Covered by existing tests
    (Please provide the names of the existing tests in the PR description.)
  • Not required

Performance

  • Tests ran and results are added in the PR description
  • Issue filed with a link in the PR description
  • Not required

@greptile-apps

greptile-apps Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR tightens input validation in the seek method of the anonymous SeekableInputStream inside ByteArrayInputFile. The old guard (newPos > Int.MaxValue || newPos < Int.MinValue) was too loose: negative values between Int.MinValue and -1, and values between buff.length + 1 and Int.MaxValue, could slip through and reach ByteBuffer.position(), which would throw a cryptic IllegalArgumentException.

  • The lower bound is corrected from < Int.MinValue to < 0, properly rejecting any negative seek position.
  • The upper bound is tightened from > Int.MaxValue to > buff.length, ensuring only positions within the actual backing array (inclusive of the EOF position at buff.length) are accepted.

Confidence Score: 5/5

The change is a targeted defensive fix to a single guard expression and introduces no new logic paths or regressions.

The old bounds check accepted any Long value that fit in a signed 32-bit integer, even if it exceeded the buffer's actual capacity. The new check correctly restricts valid positions to [0, buff.length]. The comparison is safe because Scala/JVM will widen the Int buff.length to Long before the comparison. Seeking to exactly buff.length (EOF position) remains permitted, which is required by the Parquet footer-reading protocol. No other code paths are touched.

No files require special attention.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/parquet/ParquetCachedBatchSerializer.scala Single-line guard fix in ByteArrayInputFile.seek: bounds tightened from Int range to [0, buff.length], which is the correct valid range for ByteBuffer.position().

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["seek(newPos: Long)"] --> B{newPos < 0?}
    B -- yes --> E["throw IllegalStateException"]
    B -- no --> C{newPos > buff.length?}
    C -- yes --> E
    C -- no --> D["byteBuffer.position(newPos.toInt)"]

    style E fill:#f66,color:#fff
    style D fill:#6a6,color:#fff
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["seek(newPos: Long)"] --> B{newPos < 0?}
    B -- yes --> E["throw IllegalStateException"]
    B -- no --> C{newPos > buff.length?}
    C -- yes --> E
    C -- no --> D["byteBuffer.position(newPos.toInt)"]

    style E fill:#f66,color:#fff
    style D fill:#6a6,color:#fff
Loading

Reviews (1): Last reviewed commit: "Validate cached batch input seek bounds" | Re-trigger Greptile

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
@fallintoplace fallintoplace force-pushed the fix-byte-array-input-seek-bounds branch from 94c35ec to 0f39ff8 Compare June 14, 2026 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants