Skip to content

[DRAFT] CNDB-17829: Add a property to configure the Digest file checksum type#2420

Draft
cbornet wants to merge 2 commits into
mainfrom
custom-digest
Draft

[DRAFT] CNDB-17829: Add a property to configure the Digest file checksum type#2420
cbornet wants to merge 2 commits into
mainfrom
custom-digest

Conversation

@cbornet
Copy link
Copy Markdown

@cbornet cbornet commented May 13, 2026

What is the issue

https://github.com/riptano/cndb/issues/17829

What does this PR fix and why was it fixed

This PR adds the possibility to configure the type of checksum written together with SSTables.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR and ticket in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the IBM copyright header instead of the Apache License one (no DataStax copyright any longer)

@cbornet cbornet marked this pull request as draft May 14, 2026 08:28
@cbornet cbornet requested a review from jasonstack May 15, 2026 11:38
@cbornet
Copy link
Copy Markdown
Author

cbornet commented May 15, 2026

Benchmark results on my MAC ARM M4:

     [java] Benchmark                             (bufferSize)  Mode  Cnt     Score    Error  Units
     [java] ChecksumBench.benchCrc32                        31  avgt    5    14.301 ±  0.886  ns/op
     [java] ChecksumBench.benchCrc32                       131  avgt    5     7.982 ±  0.157  ns/op
     [java] ChecksumBench.benchCrc32                       517  avgt    5    40.852 ±  0.727  ns/op
     [java] ChecksumBench.benchCrc32                      2041  avgt    5   186.812 ±  0.598  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic             31  avgt    5   105.388 ±  1.265  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic            131  avgt    5   111.941 ±  2.779  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic            517  avgt    5   139.011 ± 27.659  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic           2041  avgt    5   270.267 ±  2.881  ns/op
     [java] ChecksumBench.benchCrc32c                       31  avgt    5     5.255 ±  0.063  ns/op
     [java] ChecksumBench.benchCrc32c                      131  avgt    5     7.219 ±  0.008  ns/op
     [java] ChecksumBench.benchCrc32c                      517  avgt    5    39.127 ±  0.265  ns/op
     [java] ChecksumBench.benchCrc32c                     2041  avgt    5   185.948 ±  1.050  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic            31  avgt    5    10.591 ±  0.060  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic           131  avgt    5    49.973 ±  0.528  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic           517  avgt    5   242.580 ±  4.295  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic          2041  avgt    5  1007.311 ± 25.412  ns/op
     [java] ChecksumBench.benchCrc64nvme                    31  avgt    5    15.581 ±  0.241  ns/op
     [java] ChecksumBench.benchCrc64nvme                   131  avgt    5    67.127 ±  0.197  ns/op
     [java] ChecksumBench.benchCrc64nvme                   517  avgt    5   276.600 ±  1.052  ns/op
     [java] ChecksumBench.benchCrc64nvme                  2041  avgt    5  1131.556 ±  5.560  ns/op
     [java] ChecksumBench.benchHasherCrc32c                 31  avgt    5     9.700 ±  0.448  ns/op
     [java] ChecksumBench.benchHasherCrc32c                131  avgt    5     9.798 ±  0.322  ns/op
     [java] ChecksumBench.benchHasherCrc32c                517  avgt    5    44.551 ±  1.036  ns/op
     [java] ChecksumBench.benchHasherCrc32c               2041  avgt    5   186.158 ±  1.174  ns/op
     [java] ChecksumBench.benchMd5                          31  avgt    5   420.794 ± 15.014  ns/op
     [java] ChecksumBench.benchMd5                         131  avgt    5   499.015 ± 57.456  ns/op
     [java] ChecksumBench.benchMd5                         517  avgt    5  1232.294 ± 16.627  ns/op
     [java] ChecksumBench.benchMd5                        2041  avgt    5  4284.304 ± 87.874  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c               31  avgt    5    19.677 ±  0.341  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c              131  avgt    5    50.339 ±  0.040  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c              517  avgt    5   207.623 ±  2.047  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c             2041  avgt    5   835.910 ±  1.604  ns/op

CRC32C has similar perf as CRC32.
CRC64 much slower than CRC32 (expected since it's pure Java atm) but still 4 times faster than MD5.
Will probably need to run these on writer nodes.

@cbornet cbornet marked this pull request as ready for review May 16, 2026 07:44
@cbornet cbornet changed the title Add a property to configure the Digest file checksum type [DRAFT] Add a property to configure the Digest file checksum type May 16, 2026
@cbornet
Copy link
Copy Markdown
Author

cbornet commented May 16, 2026

Benchmark results on an AWS ARM64 writer node

     [java] Benchmark                             (bufferSize)  Mode  Cnt     Score   Error  Units
     [java] ChecksumBench.benchCrc32                        31  avgt    5    18.052 ? 0.114  ns/op
     [java] ChecksumBench.benchCrc32                       131  avgt    5    21.445 ? 0.057  ns/op
     [java] ChecksumBench.benchCrc32                       517  avgt    5    40.042 ? 0.043  ns/op
     [java] ChecksumBench.benchCrc32                      2041  avgt    5   114.856 ? 0.112  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic             31  avgt    5   109.567 ? 0.276  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic            131  avgt    5   218.522 ? 0.286  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic            517  avgt    5   638.897 ? 0.980  ns/op
     [java] ChecksumBench.benchCrc32NoIntrinsic           2041  avgt    5  2318.262 ? 3.431  ns/op
     [java] ChecksumBench.benchCrc32c                       31  avgt    5    16.389 ? 0.031  ns/op
     [java] ChecksumBench.benchCrc32c                      131  avgt    5    14.950 ? 0.041  ns/op
     [java] ChecksumBench.benchCrc32c                      517  avgt    5    33.171 ? 0.025  ns/op
     [java] ChecksumBench.benchCrc32c                     2041  avgt    5   110.186 ? 0.980  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic            31  avgt    5    51.784 ? 0.058  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic           131  avgt    5   132.762 ? 0.207  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic           517  avgt    5   484.529 ? 0.420  ns/op
     [java] ChecksumBench.benchCrc32cNoIntrinsic          2041  avgt    5  1880.599 ? 2.159  ns/op
     [java] ChecksumBench.benchCrc64nvme                    31  avgt    5    68.933 ? 0.071  ns/op
     [java] ChecksumBench.benchCrc64nvme                   131  avgt    5   190.444 ? 1.119  ns/op
     [java] ChecksumBench.benchCrc64nvme                   517  avgt    5   719.888 ? 8.299  ns/op
     [java] ChecksumBench.benchCrc64nvme                  2041  avgt    5  2552.411 ? 2.641  ns/op
     [java] ChecksumBench.benchHasherCrc32c                 31  avgt    5    17.588 ? 0.015  ns/op
     [java] ChecksumBench.benchHasherCrc32c                131  avgt    5    18.119 ? 0.073  ns/op
     [java] ChecksumBench.benchHasherCrc32c                517  avgt    5    36.739 ? 0.046  ns/op
     [java] ChecksumBench.benchHasherCrc32c               2041  avgt    5   113.842 ? 1.146  ns/op
     [java] ChecksumBench.benchMd5                          31  avgt    5   290.657 ? 0.451  ns/op
     [java] ChecksumBench.benchMd5                         131  avgt    5   533.569 ? 1.364  ns/op
     [java] ChecksumBench.benchMd5                         517  avgt    5  1264.806 ? 5.480  ns/op
     [java] ChecksumBench.benchMd5                        2041  avgt    5  4226.920 ? 4.749  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c               31  avgt    5    65.178 ? 0.070  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c              131  avgt    5   195.936 ? 0.202  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c              517  avgt    5   723.599 ? 0.801  ns/op
     [java] ChecksumBench.benchPureJavaCrc32c             2041  avgt    5  2786.000 ? 4.216  ns/op

@cbornet
Copy link
Copy Markdown
Author

cbornet commented May 16, 2026

Much better results using the AWS CRT CRC64 software.amazon.awssdk.crt.checksums.CRC64NVME (on ARM64 writer node) which uses SIMD instructions:

     [java] ChecksumBench.benchCrc64nvme                    31  avgt    5   125.925 ?   0.337  ns/op
     [java] ChecksumBench.benchCrc64nvme                   131  avgt    5   136.030 ?   0.480  ns/op
     [java] ChecksumBench.benchCrc64nvme                   517  avgt    5   163.652 ?   0.680  ns/op
     [java] ChecksumBench.benchCrc64nvme                  2041  avgt    5   276.807 ?   0.307  ns/op

This needs to include a dependency on software.amazon.awssdk.crt:aws-crt which is a simple JNI wrapper around the AWS CRT. Is it acceptable to introduce in CC ?

@cbornet cbornet marked this pull request as draft May 16, 2026 10:26
@jasonstack
Copy link
Copy Markdown

should we put this as a follow-up with a ticket description? this change may impact upgrade and backward compatibility.

@cbornet cbornet changed the title [DRAFT] Add a property to configure the Digest file checksum type [DRAFT] CNDB-17829: Add a property to configure the Digest file checksum type May 18, 2026
@cbornet
Copy link
Copy Markdown
Author

cbornet commented May 18, 2026

@cbornet
Copy link
Copy Markdown
Author

cbornet commented May 19, 2026

Added code to detect if the AWS CRT is on the classpath (optional dependency).
Benchmark results (MAC M4)

     [java] Benchmark                             (bufferSize)  Mode  Cnt     Score    Error  Units
     [java] ChecksumBench.benchCrc32                        31  avgt    5     6.611 ±  1.655  ns/op
     [java] ChecksumBench.benchCrc32                       131  avgt    5    14.254 ±  3.276  ns/op
     [java] ChecksumBench.benchCrc32                       517  avgt    5    43.672 ±  7.017  ns/op
     [java] ChecksumBench.benchCrc32                      2041  avgt    5   186.366 ±  1.477  ns/op
     [java] ChecksumBench.benchCrc32c                       31  avgt    5     5.297 ±  0.082  ns/op
     [java] ChecksumBench.benchCrc32c                      131  avgt    5     7.263 ±  0.030  ns/op
     [java] ChecksumBench.benchCrc32c                      517  avgt    5    39.300 ±  1.000  ns/op
     [java] ChecksumBench.benchCrc32c                     2041  avgt    5   184.744 ±  1.846  ns/op
     [java] ChecksumBench.benchAwsCrtCrc64nvme              31  avgt    5   123.501 ±  4.157  ns/op
     [java] ChecksumBench.benchAwsCrtCrc64nvme             131  avgt    5   123.885 ±  2.046  ns/op
     [java] ChecksumBench.benchAwsCrtCrc64nvme             517  avgt    5   130.024 ±  1.432  ns/op
     [java] ChecksumBench.benchAwsCrtCrc64nvme            2041  avgt    5   157.424 ±  3.419  ns/op
     [java] ChecksumBench.benchMd5                          31  avgt    5   426.443 ± 50.290  ns/op
     [java] ChecksumBench.benchMd5                         131  avgt    5   462.382 ± 27.261  ns/op
     [java] ChecksumBench.benchMd5                         517  avgt    5  1225.999 ± 10.471  ns/op
     [java] ChecksumBench.benchMd5                        2041  avgt    5  4301.948 ± 31.118  ns/op
     [java] ChecksumBench.benchPureJavaCrc64nvme            31  avgt    5    15.620 ±  0.051  ns/op
     [java] ChecksumBench.benchPureJavaCrc64nvme           131  avgt    5    67.215 ±  0.787  ns/op
     [java] ChecksumBench.benchPureJavaCrc64nvme           517  avgt    5   277.075 ±  2.052  ns/op
     [java] ChecksumBench.benchPureJavaCrc64nvme          2041  avgt    5  1139.187 ± 24.006  ns/op

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants