Fix vulnerability: uninitalized tensor heap memory leak via integer overflow in BalanceTrajectory#1097
Fix vulnerability: uninitalized tensor heap memory leak via integer overflow in BalanceTrajectory#1097mhucka wants to merge 3 commits into
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements integer overflow prevention and input validation across TensorFlow Quantum's noisy simulation operations. It restricts the maximum number of samples to 10,000,000, updates several loop counters and qubit sizes to standard integers, and uses 64-bit integer casting for intermediate calculations. Unit tests are also added to verify these limits and overflow protections. The reviewer feedback highlights an issue in tfq_noisy_expectation.cc where threads in ComputeSmall calculate but do not use n_rand, causing each thread to redundantly reserve the entire undivided amount of random samples.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| auto local_gen = random_gen.ReserveSamples128( | ||
| static_cast<int64_t>(ncircuits.size()) * max_n_shots + 1); |
There was a problem hiding this comment.
In ComputeSmall, each thread running DoWork should only reserve its portion of the random samples (n_rand) from the shared random_gen. Currently, n_rand is calculated but unused, and instead each thread reserves the entire undivided amount (static_cast<int64_t>(ncircuits.size()) * max_n_shots + 1). This results in reserving num_threads times more samples than actually needed, wasting the random number generator's state space and causing unnecessary overhead.
auto local_gen = random_gen.ReserveSamples128(n_rand);There was a problem hiding this comment.
Code Review
This pull request introduces input validation and overflow prevention for the number of samples across several TensorFlow Quantum noisy operators. Specifically, it limits the maximum number of samples to 10,000,000, adds explicit casts to int64_t during arithmetic operations to prevent integer overflow, and updates several loop index types from 64-bit integers to standard int. Corresponding unit tests are also added to verify these constraints. The review feedback identifies an inefficiency in tfq_noisy_expectation.cc where each thread in ComputeSmall reserves the total combined random samples instead of the per-thread required samples (n_rand), which causes unnecessary overhead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| int64_t n_rand = static_cast<int64_t>(ncircuits.size()) * max_n_shots + 1; | ||
| n_rand = (n_rand + num_threads) / num_threads; | ||
| auto local_gen = | ||
| random_gen.ReserveSamples128(ncircuits.size() * max_n_shots + 1); | ||
| auto local_gen = random_gen.ReserveSamples128( | ||
| static_cast<int64_t>(ncircuits.size()) * max_n_shots + 1); |
There was a problem hiding this comment.
In ComputeSmall, each thread in the parallel loop reserves random samples from the global generator. However, instead of reserving the per-thread required samples (n_rand), each thread is reserving the total combined samples for all threads (static_cast<int64_t>(ncircuits.size()) * max_n_shots + 1). This leads to excessive skipping of the Philox random number generator state and unnecessary overhead. Using n_rand directly (as is correctly done in tfq_noisy_sampled_expectation.cc) resolves this inefficiency.
int64_t n_rand = static_cast<int64_t>(ncircuits.size()) * max_n_shots + 1;
n_rand = (n_rand + num_threads) / num_threads;
auto local_gen = random_gen.ReserveSamples128(n_rand);
MichaelBroughton
left a comment
There was a problem hiding this comment.
We can probably shorten this up a fair bit and remove some unrelated changes that didn't get new tests. Fine to merge once this is done.
|
|
||
| uint64_t max_num_qubits = 0; | ||
| for (const uint64_t num : num_qubits) { | ||
| int max_num_qubits = 0; |
There was a problem hiding this comment.
Can we avoid downgrading these ?
| DCHECK_EQ(4, context->num_inputs()); | ||
| const int num_inputs = context->num_inputs(); | ||
| OP_REQUIRES(context, num_inputs == 4, | ||
| tensorflow::errors::InvalidArgument(absl::StrCat( |
There was a problem hiding this comment.
This seems unrelated to fixing the overflow problem, can we move this to a different PR ?
| int num_samples = 0; | ||
| OP_REQUIRES_OK(context, GetIndividualSample(context, &num_samples)); | ||
| OP_REQUIRES(context, num_samples >= 0 && num_samples <= 10000000, | ||
| tensorflow::errors::InvalidArgument( |
There was a problem hiding this comment.
This seems like the main piece of code that actually solves our problems, can we scope this PR down to just putting in these checks and testing them across all ops that require a num_samples and call into BalanceTrajectory.
Internal security scans reported a security vulnerability (CWE-200/CWE-908) in
tensorflow_quantum/core/ops/noise/tfq_noisy_samples.cc. InTfqNoisySamplesOp, a massivenum_samplesvalue nearINT_MAXwill causeBalanceTrajectoryto overflow its signed integer calculation for chunk sizes. This results in negative or zero chunk sizes, causing worker threads to bypass execution. Because TensorFlow'sallocate_outputprovides uninitialized heap blocks, the kernel returns a tensor filled with residual process memory.Changes in this PR:
In
util_balance_trajectory.cc, updated chunk calculations to useint64_tarithmetic, preventing overflows whennum_samples(orrep_limits) is close toINT_MAX.Modified
tfq_noisy_samples.cc,tfq_noisy_expectation.cc, andtfq_noisy_sampled_expectation.ccto compute thread allocations (p_reps) usingint64_tdivision and cast the Philox random sample reservation requests toint64_tvariables.Add checks to enforce a maximum capacity limit of 10,000,000 in input tensor parsing functions
GetNumSamplesandGetIndividualSamplewithinparse_context.cc. This should eliminate memory depletion/DoS attempts with massive sizes.Added tests.