-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Issue
The condition to trigger ABORT when the DUT detects SDCs in consecutive iterations currently requires both:
- SDC in iter K && SDC in iter K+1
- AND num_err[k] == num_err[k+1] (that is, same number of errors logged in both consecutive iterations)
This condition is too strict:
- Even with the same stuck bits, iterations with different inputs will likely result in different number of errors (incorrect elements)
- Even if there are no stuck bits, consecutive iterations with errors mean that the flux is too high, and the cross sections computed are likely wrong
Either way, I believe that the check should ONLY be about consecutive iterations.
Proposed Solution 1
Change the ABORT condition in lines #282 - #290 of log_helper.cpp to remove the check regarding number of errors (lines #283 - #285)
- from
if (kernel_errors == last_iter_errors && last_iter_with_errors + 1 == iteration_number && double_error_kill) { - to
if (last_iter_with_errors + 1 == iteration_number && double_error_kill) {
Proposed Solution 2
Similar to Proposed Solution 1, change the ABORT condition in lines #282 - #290 of log_helper.cpp to remove the check regarding number of errors (lines #283 - #285).
However, also change the condition for consecutive iterations (i, i+1) to a generic configurable range (i, i+j), where j is a parameter of namespace log_helper:
- from
if (kernel_errors == last_iter_errors && last_iter_with_errors + 1 == iteration_number && double_error_kill) { - to
if (last_iter_with_errors + consecutive_iteration_range >= iteration_number && double_error_kill) {
This requires further changes, such as including the variable in the namespace (size_t consecutive_iteration_range), and functions to set/get this value.
The default value should be 1 as to not change the behaviour of scripts that do not set this value.