[crypto] Harden RSA key import CRT checks with random mask #226
[crypto] Harden RSA key import CRT checks with random mask #226mkannwischer wants to merge 2 commits into
Conversation
| bn.wsrr w20, URND | ||
| bn.rshi w20, w31, w20 >> 1 | ||
| bn.addi w20, w20, 1 |
There was a problem hiding this comment.
Is a 256-bit (or rather-255 bit) random mask sufficient here, or do we need a full-width masks?
If we need a full width masks, then we would need different masks mod p, mod q, and mod p-1.
There was a problem hiding this comment.
When Jade discussed the core issue in #170 with me, it sounded like much of the FI potential came from all but the lowest limb of the check value being zero in the "check pass" case, since faulting a word to zero can be easier than faulting to an arbitrary desired value. As such, I think it could make sense to expand the mask to additional limbs.
Would it maybe work to have one fixed, half-modulus sized mask for each of p, q, and p-1? The mod routine as invoked should handle reducing the intermediate products as long as the limb counts remain the same, and even though the resulting check values won't be quite uniformly distributed, this should be okay for just mitigating FI attacks (cc @jadephilipoom to make sure this isn't entirely off base).
If I'm missing something and that doesn't work, we could also consider e.g. a mask 1/4th of the modulus size, just to ensure the check value has several non-zero limbs while remaining reduced modulo {p, q, p-1}?
There was a problem hiding this comment.
I think it's basically a performance/security tradeoff. A larger mask would be better for FI protection but need more time and memory to multiply. Having a multi-bit value is better than a single-bit value, and a multi-limb value is better than a single-limb value. I think the delta between single-bit and multi-bit is basically that with single-bit, an attacker could chain a fault that zeroes the target registers and a single-bit fault to flip the 1 bit in order to get the checks to pass, regardless of what the value actually was. With a multi-bit value that's already a lot harder.
One way to improve the FI defense without extending the mask would be to check on ACC itself that the high limbs are 0 after the modular reduction; this would shorten the attack window and minimize the attack surface for an attacker to hide nonzero high limbs, compared to checking that they are zero on Ibex after reading them back.
Multiply a nonzero random mask into the d_p, d_q, and i_q validity checks. Instead of computing e * d_p mod (p-1) and comparing to 1, we now computes r * e * d_p mod (p-1) and the C side compares against r. This avoids the multi-limb value 1 as an intermediate or comparison target, hardening the check against fault injection. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
The existing tests only checked rejection of an invalid d_p. Add test cases for invalid d_q and i_q as well, each constructed by flipping an arbitrary single bit in the valid test vector. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
a4bf20a to
c4175a4
Compare
pqcfox
left a comment
There was a problem hiding this comment.
Looks excellent! Added a note re: the masking, let me know if my thoughts there seem reasonable--thanks!
| bn.wsrr w20, URND | ||
| bn.rshi w20, w31, w20 >> 1 | ||
| bn.addi w20, w20, 1 |
There was a problem hiding this comment.
When Jade discussed the core issue in #170 with me, it sounded like much of the FI potential came from all but the lowest limb of the check value being zero in the "check pass" case, since faulting a word to zero can be easier than faulting to an arbitrary desired value. As such, I think it could make sense to expand the mask to additional limbs.
Would it maybe work to have one fixed, half-modulus sized mask for each of p, q, and p-1? The mod routine as invoked should handle reducing the intermediate products as long as the limb counts remain the same, and even though the resulting check values won't be quite uniformly distributed, this should be okay for just mitigating FI attacks (cc @jadephilipoom to make sure this isn't entirely off base).
If I'm missing something and that doesn't work, we could also consider e.g. a mask 1/4th of the modulus size, just to ensure the check value has several non-zero limbs while remaining reduced modulo {p, q, p-1}?
As proposed in #170, this PR
multiplies a nonzero random mask into the d_p, d_q, and i_q
validity checks. Instead of computing e * d_p mod (p-1) and
comparing to 1, we now computes r * e * d_p mod (p-1) and
the C side compares against r.
This avoids the multi-limb value 1 as an intermediate or
comparison target, hardening the check against fault injection.
I have also added additional negative tests that test these failures cases for invalid d_q and i_q.