Skip to content

Add security improvements#6

Open
rcmstark wants to merge 8 commits intomasterfrom
refactor/security
Open

Add security improvements#6
rcmstark wants to merge 8 commits intomasterfrom
refactor/security

Conversation

@rcmstark
Copy link
Copy Markdown
Member

@rcmstark rcmstark commented Apr 10, 2026

Summary

  • Port all security fixes from Python reference implementation
  • Security: hedged RFC 6979 nonces (§3.6 extra entropy: same key+message produces different signatures while preserving protection against RNG failure), Low-S normalization, public key on-curve validation, hash truncation, branch-balanced Montgomery ladder, curve-specific doubling shortcuts (A=0 for secp256k1, A=-3 for prime256v1), Tonelli-Shanks modular square root, extended Euclidean modular inverse, fromJacobian infinity guard
  • Performance: mixed affine+Jacobian addition fast path, bit-by-bit NAF generator multiplication backed by a precomputed affine [G, 2G, 4G, ..., 2^n*G] table (zero doublings during signing), Shamir's trick with Joint Sparse Form, GLV endomorphism for secp256k1 (splits each 256-bit scalar into two ~128-bit halves for a 4-scalar simultaneous multi-exponentiation during verification)
  • 74 tests across 10 separate test files matching Python structure
  • Benchmark script added
  • README updated with security section, benchmark numbers, and performance-prose

Test plan

  • All 74 tests passing (mix test)
  • Benchmark: sign 0.2ms, verify 0.6ms
  • Security audit: all 9 checks pass

@rcmstark rcmstark force-pushed the refactor/security branch from 6a87007 to 6aadd8d Compare April 19, 2026 15:20
rcmstark added 8 commits May 5, 2026 04:22
- Replace Fermat's little theorem with extended Euclidean modular inverse,
  2-3x faster for 256-bit operands
- Fixed-base windowed scalar multiplication (2^4-ary method) with
  precomputed generator table, cuts sign time substantially
- Skip A*pz^4 term in jacobian_double for secp256k1 (A=0); use
  3*(px-pz^2)*(px+pz^2) shortcut for prime256v1 (A=-3)
- Cache curve.nBitLength to avoid recomputing per call

Benchmark (100 rounds):
  sign:   1.2ms -> 0.6ms
  verify: 0.9ms -> 0.9ms
When qz == 1 (affine input), skip computing qz*qz and
simplify U1, S1, and nz, saving four field multiplications
per add. This is the hot-path optimization used by
multiplyGenerator, which feeds only affine operands.
Swap the 4-bit window table for an affine [G, 2G, 4G, ..., 2^nBitLength*G]
table plus a bit-by-bit width-2 NAF loop. Every non-zero NAF digit
triggers one mixed add and zero doublings, cutting the ~256 doublings
of the windowed method down to ~86 adds for 256-bit scalars. Table
still cached in :persistent_term keyed by curve name.
Replace raw-binary Shamir for n1*p1 + n2*p2 with JSF. JSF picks
signed digits in {-1, 0, 1} so at most ~l/2 digit pairs are
non-zero, versus ~3l/4 for raw binary, cutting the expected
number of adds in the simultaneous double-and-add loop by
roughly a third. Used only with public scalars (verification).
Split each 256-bit scalar k into two ~128-bit scalars (k1, k2) with
k = k1 + k2*lambda (mod N) via Babai rounding against the reduced
basis, then run a 4-scalar simultaneous multi-exponentiation over
(p1, phi(p1), p2, phi(p2)) with a 16-entry table of subset sums.
Halves the loop length versus the plain Shamir path. GLV constants
live on the curve struct under :glvParams; curves without
endomorphism (prime256v1) transparently fall back to Shamir+JSF.
@rcmstark rcmstark force-pushed the refactor/security branch from 6aadd8d to 69d3ff2 Compare May 5, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant