Skip to content

fix(parquet): bss encoding and tests on big endian systems#663

Open
daniel-adam-tfs wants to merge 2 commits intoapache:mainfrom
daniel-adam-tfs:bugfix/fix-bss-and-tests-on-big-endian
Open

fix(parquet): bss encoding and tests on big endian systems#663
daniel-adam-tfs wants to merge 2 commits intoapache:mainfrom
daniel-adam-tfs:bugfix/fix-bss-and-tests-on-big-endian

Conversation

@daniel-adam-tfs
Copy link
Contributor

Rationale for this change

To ensure the Arrow and Parquet Go libraries work correctly on big-endian architectures.

What changes are included in this PR?

Added endianness-aware BYTE_STREAM_SPLIT decoding in the parquet/encoding package.
Fixed tests in the parquet package to handle byte order correctly on big-endian systems.

Are these changes tested?

Yes, all affected unit tests now pass on both little-endian and big-endian machines. The changes specifically address some of the previously failing tests on big-endian systems.

Are there any user-facing changes?

No user-facing API changes. The changes are internal and ensure correct behavior on supported architectures.

- Add platform-specific decodeByteStreamSplitBatchWidth{4,8}InByteOrder
  for little-endian and s390x big-endian architectures.
- Update ByteStreamSplitDecoder to use new endianness-aware decoding
  functions for correct behavior on all platforms.
…systems

Fix TestPageIndexRoundTripSuite and TestEncoding tests on big-endian systems
@daniel-adam-tfs daniel-adam-tfs changed the title [Parquet] fix bss and tests on big endian [Parquet] Fix bss encoding and tests on big endian systems Feb 13, 2026
@daniel-adam-tfs daniel-adam-tfs marked this pull request as ready for review February 13, 2026 18:21
Comment on lines +676 to +682
if endian.IsBigEndian {
buf := bytes.NewBuffer(nil)
if err := binary.Write(buf, binary.LittleEndian, val); err != nil {
panic(err)
}
return buf.Bytes()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a ToLE function in internal/utils/ that might be better for this. For example:

val = utils.ToLE(val)
return unsafe.Slice((*byte)(unsafe.Pointer(&val)), unsafe.Sizeof(val))

This would work because on LittleEndian systems, ToLE(val) is just defined as return val.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one nitpick, otherwise this looks good!

@zeroshade zeroshade changed the title [Parquet] Fix bss encoding and tests on big endian systems fix(parquet): bss encoding and tests on big endian systems Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants