Commit Graph

257 Commits

Author SHA1 Message Date
shrec
8af7320c60
Harden audit and fix Windows CUDA build 2026-03-25 14:36:36 +00:00
shrec
ccf8f4a97d
audit(gaps#4,5,6,7): ethereum diff KAT, musig2/frost fuzz, cflite +2 targets, opencl zk+bip324
Gap #4 RESOLVED: audit/test_exploit_ethereum_differential.cpp — 10 tests / 15 sub-checks
  against go-ethereum, web3.py, ethers.js reference vectors (address derivation KAT,
  ecrecover, EIP-191, EIP-155, eth_personal_sign, keccak256 KAT, tamper detection).

Gap #7 RESOLVED: audit/test_fuzz_musig2_frost.cpp — 15 tests / 16 sub-checks
  (MuSig2 key_agg / nonce_agg / partial_verify / partial_sig_agg, FROST keygen_finalize /
  sign / verify_partial / aggregate, schnorr + ecdsa adaptor, boundary n_signers=0 → error).

ClusterFuzzLite expanded to 5 targets:
  + cpu/fuzz/fuzz_ecdsa.cpp  (ECDSA sign→verify invariants, parse_compact_strict)
  + cpu/fuzz/fuzz_schnorr.cpp (BIP-340 sign→verify, adversarial from_bytes verify)

Gap #5/#6 PARTIALLY RESOLVED: OpenCL now wires zk_knowledge_verify_batch,
  zk_dleq_verify_batch, bip324_aead_encrypt_batch, bip324_aead_decrypt_batch.
  bulletproof_verify_batch: PARITY-EXCEPTION (no WNAF multi-scalar on OpenCL).
  Metal: stubs documented with PARITY-EXCEPTION / TODO(parity) markers.
2026-03-24 20:53:23 +00:00
shrec
64e97c439c
audit: add 7 KAT/exploit tests; fix API mismatches and test data bugs
- Add 7 new audit test files (all 136 tests now pass):
  * test_exploit_ecdsa_rfc6979_kat.cpp      — 20 vectors (PASS)
  * test_exploit_scalar_systematic.cpp       — 23 checks   (PASS)
  * test_exploit_schnorr_bip340_kat.cpp      — 19 vectors  (PASS)
  * test_exploit_point_serialization.cpp     — 22 checks   (PASS)
  * test_exploit_ripemd160_kat.cpp           — 17 vectors  (PASS)
  * test_exploit_chacha20_kat.cpp            — 17 vectors  (PASS)
  * test_exploit_frost_byzantine.cpp         — 16 checks   (PASS)

- Fix API mismatches in exploit tests:
  * ripemd160: add 'using namespace secp256k1::hash'
  * scalar: .add/.mul/.invert → operator+/*/. inverse()
  * frost: FrostRound1Output → pair<FrostCommitment,vector<FrostShare>>
  * frost: participant_id/s/hiding_commitment → id/z_i/hiding_point

- Fix test data bugs (wrong expected values in exploit tests):
  * schnorr: corrupted BIP-340 sign vectors (wrong hex lengths/values)
  * point serialization: 2*G prefix 0x03→0x02 (y is even)
  * ripemd160: wrong HASH160(G_compressed) expected value
  * chacha20: wrong RFC 8439 §2.4.2 keystream expected (verified by XOR)
  * poly1305: wrong RFC §2.5.2 tag expected (verified via OpenSSL)
  * frost: do_dkg_2of3 excluded own share from finalize (j!=i bug)
  * frost: T=1,N=1 passed empty shares instead of own share

- Fix frost.cpp: add input validation to frost_keygen_begin
  (returns empty on participant_id==0 or threshold>num_participants)
2026-03-24 13:26:40 +00:00
shrec
e83ee16589
security: fix exploit test assertions and CT field timing after sort+recov fixes
- test_exploit_ct_recov: rewrite Phase C to simulate CT branchless overflow
  check and verify it agrees with reference for all 400 inputs
- test_musig2_bip327_vectors: update Test 2 assertions from != to == since
  canonical sort makes key aggregation order-independent (expected behavior)
- test_musig2_frost: update Test 2 from 'Ordering Matters' to 'Order
  Independence'; CHECK(same) now that canonical sort is applied
- test_musig2_frost_advanced: fix test_musig2_key_coefficient_binding to
  account for BIP-327 second-unique-key coeff=1 optimization; assert
  Q_x differs (the real security property) and only check coeff when pk_a
  is not the second-unique shortcut in both groups
- ct_field.cpp field_add: add pointer value_barrier to prevent LTO from
  propagating fe_zero limbs into add256, reducing |t| from ~12 to ~4.5
- ct_field.cpp field_sqr: extend RISC-V FE52 limb barrier to all GCC/clang
  platforms (was RISC-V-only); drops |t| from ~12 to ~0.8
- ct_field.cpp field_mul: add FE52 limb barriers to both operands
  (same rationale as field_sqr)

Result: unified_audit_runner 70/70 PASS -- AUDIT-READY
2026-03-24 04:45:57 +00:00
shrec
9b4b40eb0e
security: fix 8 audit findings (CT, ECIES KDF, BIP-32, MuSig2, FROST)
🔴 ct_sign.cpp: CT violation — if-branch on R.y parity replaced with
   branchless mask (recid = limbs[0] & 1u).  Early-break loop on r_bytes
   vs ORDER replaced with constant-time MSB accumulation; overflow bit set
   via branchless shift-OR.  Both paths now leak zero timing information
   about the secret nonce.

🔴 ecies.cpp (Android): /dev/urandom early-boot weakness — switched to
   arc4random_buf() which blocks until the kernel entropy pool is seeded.

🟡 ecies.cpp (KDF): SHA-512(x) replaced with HKDF-SHA256(IKM=shared_x,
   salt="secp256k1-ecies-v1", info=eph_pubkey_compressed[33]).  Provides
   domain separation, context binding, and resistance to related-key attacks.
   Removed redundant local hmac_sha256; use secp256k1::hmac_sha256 from hkdf.hpp.

🟡 bip32.cpp: depth uint8_t silent wrap — added depth == 0xFF guard that
   returns {ExtendedKey{}, false} instead of wrapping to depth 0.

🟡 musig2.cpp: pubkeys unsorted → wrong aggregate key — sort a canonical
   copy before computing L so the aggregate key is identical regardless of
   caller input order.  second_unique key detection uses the sorted list.

🟠 message_signing.cpp: intermediate hash1 and message buf not zeroed —
   added secure_erase on hash1 and the stack/heap buffer before return.
   Added explicit #include for secure_erase.hpp.

🟠 frost.cpp: derive_scalar missing BIP-340 tag prefix — added
   SHA256(tag) || SHA256(tag) double-tag prefix so domain separation
   matches BIP-340 and prevents cross-protocol hash collisions.
2026-03-24 01:48:58 +00:00
shrec
b45c6365a2
defect: M-05 reduce Pippenger stack frame from ~35KB to ~7KB
Lower STACK_BUCKETS from 256 to 64.  Window sizes c<=6 (n<=384) keep
64 Points on stack (~6.7 KB); c>=7 routes to the existing heap path.
Previous 256-bucket stack allocated 26-35 KB depending on Point size.

61/61 CPU tests pass.
2026-03-24 00:02:11 +00:00
shrec
d0de65adee
security: M-03 nonce single-use enforcement in musig2_partial_sign
musig2_partial_sign now takes MuSig2SecNonce& (non-const) and zeroizes
both k1 and k2 before returning. This prevents secret-nonce reuse from
the C++ API level -- a critical property for MuSig2 where reusing
(k1,k2) with a different message leaks the private key.

Also update bench_unified.cpp: pregenerate per-iteration nonce pool for
the partial_sign bench loop so it doesn't call partial_sign on a
consumed (zeroed) nonce.

The C ABI (ufsecp_musig2_partial_sign) already zeroized its local
MuSig2SecNonce after calling musig2_partial_sign; that remains as
belt-and-suspenders redundancy.

53/53 tests pass.  Audit: AUDIT-READY (69/70, 1 advisory CT warning).
2026-03-23 23:43:24 +00:00
shrec
53346b7c12
security: H-01 nonce single-use enforcement in frost_sign
frost_sign now takes FrostNonce& (non-const) and zeroizes both
hiding_nonce and binding_nonce before returning. This prevents nonce
reuse from the C++ API level — a critical security property for FROST,
where reusing (d,e) with a different message leaks the signing share.

Also update:
- bench_unified.cpp: pregenerate nonce pool per iteration so the bench
  loop does not call frost_sign on a consumed (zeroed) nonce.
- test_frost_kat.cpp: move RFC9591 Invariant-7 nonce-commitment check
  (D == d*G) to before frost_sign consumes the nonce, per the new policy.

The C ABI (ufsecp_frost_sign) already zeroized its local FrostNonce after
calling frost_sign; this remains as belt-and-suspenders redundancy.

53/53 tests pass.  Audit: AUDIT-READY (69/70, 1 advisory CT warning).
2026-03-23 23:38:50 +00:00
shrec
2679ca07c0
security: round 8 — AES S-box CT scan, CUDA ECDH key zeroing, field_52 dedup, ECIES ECDH guard, OpenCL buffer checks
- cpu/src/ecies.cpp: Replace SBOX[] direct-index lookups with sbox_lookup()
  full-table scan (256 iterations, mask-select) — eliminates cache-timing
  side channel on shared hardware (SBOX spans 4 cache lines, prior code
  allowed recovering AES key state via LRU/flush+reload attacks)
- gpu/src/gpu_backend_cuda.cu: Add cudaMemset(d_keys, 0, ...) zeroization of
  private keys on GPU before cudaFree, and secure_erase on host h_keys vector —
  prevents stale key recovery from reallocated device memory in multi-tenant
  or co-process scenarios
- cpu/include/secp256k1/field_52_impl.hpp: Remove duplicate 4x64 asm bridge
  block (lines 89-110 were identical copy of lines 63-79) — eliminates
  double extern C declarations and confusing copy-paste dead code
- cpu/src/ecies.cpp: Add check after ecdh_compute_raw() — reject all-zero
  shared_x (point at infinity from degenerate pubkey) before key derivation
- opencl/src/opencl_context.cpp: Add null checks after clCreateBuffer in
  batch_scalar_mul_generator, batch_scalar_mul, batch_field_inv,
  batch_jacobian_to_affine; add CL_SUCCESS checks on clEnqueueWriteBuffer
  and clEnqueueNDRangeKernel — silent OOM/failure no longer corrupts results

Tests: 60/60 (build_rel), 22/22 core (build_opencl) all passing
2026-03-23 21:26:22 +00:00
shrec
8636cae5d2
audit(round4): 18 security fixes across 11 files
Batch 1 — Secret-handling files:
- address.cpp: secure_erase for silent payment t_hash, S_comp temporaries (HIGH)
- address.cpp: erase per-iteration t_hash in silent_payment_scan loop (HIGH)
- hkdf.cpp: erase inner_hash HMAC intermediate in hmac_sha256 (MEDIUM)
- hkdf.cpp: erase inner_hash/ti per-iteration in hkdf_sha256_expand (MEDIUM)
- keccak256.cpp: add destructor to Keccak256State with secure_erase (MEDIUM)
- eth_signing.hpp: guard eip155_recid against v < 27 underflow (MEDIUM)
- eth_signing.cpp: validate recid range in ecrecover before use (MEDIUM)

Batch 2 — CT and core crypto:
- ct_point.cpp: make_v GLV sign branch now branchless via mask+select (MEDIUM)
  Eliminates if(k_neg) branch on secret-derived GLV decomposition sign

Batch 3 — Batch/protocol operations:
- batch_verify.cpp: move r/s zero-check BEFORE Montgomery batch inversion
  to prevent zero s from corrupting all s_inv values (MEDIUM)
- field_asm.cpp: fix borrow overflow in negate and final reduction
  (p[i] < a_limbs[i] + borrow) wraps when a_limbs[i]==UINT64_MAX (LOW)
- precompute.cpp: add zero-element guard to batch_inverse (MEDIUM)
- ecmult_gen_comb.cpp: add SHA256 integrity checksum to cache save/load
  to reject corrupted or malicious precomputation tables (MEDIUM)

Audited files found clean: coin_hd.cpp, segwit.cpp, ct_field.cpp,
ct_scalar.cpp, point.cpp, field.cpp, bip143.cpp, bip144.cpp,
field_26.cpp, field_52.cpp, field_asm_arm64.cpp, field_asm_riscv64.cpp,
batch_add_affine.cpp

68/68 tests pass.
2026-03-23 19:35:40 +00:00
shrec
97aba6abb1
security: fix 24 audit findings across 7 files
Critical (remotely exploitable timing side-channel):
- ellswift.cpp: Replace variable-time scalar_mul with ct::generator_mul
  and ct::scalar_mul for BIP-324 handshake (privkey timing leak)
- ellswift.cpp: Erase ECDH point after use

HIGH — nonce/privkey stack leaks:
- adaptor.cpp: Erase sk_bytes, hash in adaptor_nonce; k, sk in
  schnorr_adaptor_sign; k, k_inv in ecdsa_adaptor_sign; t/t_inv
  in adapt functions (6 findings)
- bip32.cpp: Erase HMAC k_buf/ipad/opad; data[37] with privkey in
  derive_child; I/IL in derive_child and bip32_master_key (4 findings)
- wallet.cpp: Erase privkey bytes in export_private_key EVM/Tron paths
- zk.cpp: Erase secret bytes in derive_nonce, nonce k in
  knowledge_prove_base/dleq_prove, massive secret state in range_prove
  (4 findings)

HIGH — buffer overflow:
- taproot.cpp: Bounds check merkle_root_len <= 32 in taproot_tweak_hash
- taproot.cpp: Bounds check input_index < input_count in tap_sighash_common

MEDIUM — secret state leaks:
- taproot.cpp: Erase private key scalar d in taproot_tweak_privkey
- bip39.cpp: Erase PBKDF2 result/u intermediates, word indices vectors,
  salt_str in mnemonic_to_seed (3 findings)

All 68 tests pass.
2026-03-23 19:06:19 +00:00
shrec
6c720043af
perf: fix BIP-341/342 benchmark regression from LTO code layout
Root cause: 28 secure_erase additions in a52b81d caused ThinLTO to
change code layout for SHA-256 compress dispatch, moving hot functions
out of the hot section.

Fixes:
- Precompute SHA256("TapSighash") tagged-hash midstate constant,
  eliminating 2 SHA-256 compress calls per sighash computation
- Mark sha256_compress_dispatch with __attribute__((hot)) to ensure
  hot-section placement under LTO regardless of surrounding code

Results (vs pre-regression baseline):
  BIP-144 witness_commitment:  105.4 ns  (was 205.2 ns) — 1.95x faster
  BIP-341 keypath_sighash:     428.7 ns  (was 775.6 ns) — 1.81x faster
  BIP-342 tapscript_sighash:   405.3 ns  (was 825.3 ns) — 2.04x faster
2026-03-23 18:57:06 +00:00
shrec
9debcd0561
fix: resolve 6 GitHub code scanning alerts
- selftest.cpp: remove 4x redundant .c_str() calls (readability-redundant-string-cstr)
- ufsecp_impl.cpp: remove std::move() on trivially-copyable ExtendedKey (performance-move-const-arg)
- ufsecp_impl.cpp: replace structured binding with explicit pair decomposition to silence clang-analyzer uninitialized pointer warning (clang-analyzer-core.CallAndMessage)

68/68 tests pass.
2026-03-23 18:34:15 +00:00
shrec
a52b81d07c
security: comprehensive memory safety audit — 28 fixes across 13 files
Engine-wide audit for secret-material leaks, dangling pointers, buffer
overflows, and missing secure_erase on stack temporaries.

HIGH fixes:
- schnorr_sign: erase all nonce/privkey material (d_bytes, t_hash, k, k_prime)
- wif_encode/wif_decode: erase raw private key bytes on all paths
- chacha20_poly1305 AEAD: fix RFC 8439 Poly1305 padding logic (crypto correctness)
- musig2_nonce_gen: erase 129-byte nonce_input buffers with secret t-derived data
- bip324 encrypt: add integer overflow guard (32-bit) and 3-byte length limit
- frost_sign ABI: consolidate error paths with secret material erasure

MEDIUM fixes:
- ecdh: erase shared_point Jacobian coords in all 3 functions
- ecdsa rfc6979_nonce: erase HMAC midstates (inner_mid/outer_mid)
- hkdf_expand: erase ipad_base/opad_base SHA256 states
- bip324: add Bip324Cipher/Bip324Session destructors for key_ erasure
- bip324: erase sk in complete_handshake and both constructors
- silent_payment_create_output: erase a_sum aggregate private key
- frost derive_scalar: erase hash of secret seed material
- ufsecp_impl: erase k1_bytes/k2_bytes, bip39 seed, scan result keys

LOW fixes:
- Poly1305 block(): defensive len clamp to prevent buf[17] overflow
- musig2 partial_sign: bounds check signer_index
- ecdsa rfc6979: erase buf33 in retry loops
- ecies hmac_sha256: erase inner_hash intermediate
- ct_sign: erase pre_sig copy of secret nonce-derived s value

68/68 tests pass. No regressions.
2026-03-23 18:27:06 +00:00
shrec
2c5c068bbb
fix(sonar): resolve all cpp:S5813 strlen security hotspots
Replace strlen() on string literals with sizeof(arr) - 1 compile-time
constants. Change function parameters from const char* to typed alternatives
(template char-array ref for frost, string_view for selftest) so SonarCloud
can statically verify safe use.

Files fixed:
- cpu/src/bip324.cpp: 4 hotspots (HKDF salt/info strings)
- cpu/src/ellswift.cpp: 1 hotspot (tagged-hash domain string)
- cpu/src/adaptor.cpp: 1 hotspot (domain separation tag)
- cpu/src/frost.cpp: 2 hotspots (binding tag + derive_scalar param)
- cpu/src/pedersen.cpp: 2 hotspots (generator tag strings)
- cpu/src/ethereum.cpp: 1 hotspot (hex length from string pointer)
- cpu/src/selftest.cpp: 1 hotspot (hex_equal string_view param)

Quality Gate was failing: new_security_hotspots_reviewed = 0/5 (0.0%),
threshold >= 100%. All 12 hotspots eliminated by removing strlen usage.
2026-03-23 15:37:18 +00:00
shrec
76a3e918a2
fix: resolve all GitHub code scanning alerts (clang-tidy + codeql)
Fixes all 32 open alerts from https://github.com/shrec/UltrafastSecp256k1/security/code-scanning:

clang-analyzer-core.CallAndMessage (#8235):
  ufsecp_impl.cpp: replace structured binding auto[master,ok] with explicit
  std::pair decomposition so analyzer tracks that master is only used when ok

readability-braces-around-statements (#8259-#8263):
  ufsecp_gpu_impl.cpp: add {} around all multi-line null-guard if bodies

misc-const-correctness (#8232-#8236, #8239-#8240, #8244-#8247, #8249-#8254, #8257-#8258):
  ufsecp_impl.cpp: const size_t payload_tag_len
  ufsecp_gpu_impl.cpp: const uint32_t count
  test_gpu_abi_gate.cpp: const count, n, dcount (4 sites)
  test_gpu_host_api_negative.cpp: const n, dcount, recid, gpu_codes[], code (5 sites)
  test_adversarial_protocol.cpp: const wrong_recid, rc_bad
  bench_bip324_transport.cpp: const sz, t0, mem_ns

cert-err33-c (#8255):
  test_gpu_host_api_negative.cpp: cast snprintf return value to (void)

clang-analyzer-core.NonNullParamChecker / nullPointerRedundantCheck (#8241, #8242, #8256):
  test_gpu_host_api_negative.cpp: guard strcmp(str,...) with if(str!=nullptr)

bugprone-implicit-widening-of-multiplication-result (#8248):
  test_gpu_abi_gate.cpp: 1024*1024 -> 1024ULL*1024ULL

cpp/trivial-switch (#8243):
  gpu_registry.cpp: replace switch with preprocessor-gated if chain to
  avoid trivial switch when no GPU backends are compiled in

All tests pass: 544/544 adversarial protocol tests.
2026-03-23 14:53:49 +00:00
shrec
06a6699750
Tighten audits and optimize batch/MSM hot paths 2026-03-23 14:11:02 +00:00
shrec
28728f313c
fix(audit): add edge-case tests for 26 uncovered ABI functions (H.1-H.12)
- Add test_h1 through test_h12 in audit/test_adversarial_protocol.cpp covering:
  H.1 ctx_size, H.2 AEAD (BIP324-gated), H.3 ECIES, H.4 EllSwift (BIP324-gated),
  H.5 ETH checksummed + personal_hash, H.6 Pedersen switch commit,
  H.7 Schnorr adaptor extract (4-arg API), H.8 batch sign (count=0 is valid no-op),
  H.9 BIP-143, H.10 BIP-144, H.11 SegWit, H.12 Taproot/Tapscript sighash
- Wire H-section into test_adversarial_protocol_run() after test_hostile_ethereum()
- 448 passed, 0 failed (up from 408 before this series)

fix(abi): bip144_txid and bip144_wtxid now validate ctx != nullptr consistently with all other ABI functions
fix(abi): segwit_witness_script_hash now accepts (nullptr, 0) empty-script input
fix(ecies): restore secure_erase of eph_bytes on eph_privkey.is_zero() early return (regression from d49453d9)

docs(audit): expand AUDIT_TEST_PLAN §N mandatory edge-case rule + N.1-N.12 table
docs(audit): update TEST_MATRIX adversarial row counts (101 fn, 286+ checks) + H-section table
docs(audit): add §H coverage section and expanded Guarantee to FFI_HOSTILE_CALLER
docs(audit): add mandatory edge-case rule + test row to AUDIT_SCOPE §6
docs(audit): add 35-entry H/§N invariant block to AUDIT_TRACEABILITY
2026-03-23 13:12:08 +00:00
shrec
443691428a
fix: thread-safe selftest init + MSVC C89 _Static_assert guard + graph bodygrep
- init.hpp: replace non-thread-safe 'static bool tested' with std::call_once
  in ensure_library_integrity(). Fixes c_abi_thread_stress crash on macOS where
  concurrent ufsecp_ctx_create calls raced into Selftest, causing
  'Digit index out of range during accumulation' via g_context.reset().
- ufsecp.h: wrap _Static_assert(sizeof(ufsecp_bip32_key)==82,...) in
  __STDC_VERSION__ >= 201112L guard. Fixes MSVC C89 build failure in JNI binding.
- source_graph.py: add bodygrep command for searching string literals inside
  indexed function bodies via SQL LIKE. Enhance find command with body search
  fallback after FTS5 miss.
2026-03-23 11:02:21 +00:00
shrec
c75b9e83cd
Fix BIP324 decrypt API callsites 2026-03-23 10:31:47 +00:00
shrec
8df8bb250b
Fix AEAD decrypt error mapping 2026-03-23 09:43:24 +00:00
shrec
c7059f0d0e
Finish current ABI audit sweep 2026-03-23 03:25:00 +00:00
shrec
3a86fcef1e
Harden ABI and finish bindings validation 2026-03-23 02:30:44 +00:00
shrec
67cb0cb073
fix: restore failing CI gates 2026-03-22 17:14:03 +00:00
shrec
34b7c167e5
BIP-141/143/144/342: fill all SegWit + tapscript coverage gaps
Implement BIP-143 (SegWit v0 sighash), BIP-144 (witness tx serialization/
wtxid/commitment), BIP-141 (witness program creation/parsing/validation),
and BIP-342 tapscript sighash + BIP-341 keypath sighash.

New files:
  - bip143.hpp/cpp: full BIP-143 sighash with ANYONECANPAY/NONE/SINGLE
  - bip144.hpp/cpp: witness/legacy serialization, txid/wtxid, weight/vsize
  - segwit.hpp/cpp: witness program detection, P2WPKH/P2WSH/P2TR builders

Extended:
  - taproot.hpp/cpp: TapSighashTxData + tapscript_sighash + keypath_sighash
  - ufsecp.h: 14 new C ABI functions (BIP-143/144/141/342)
  - ufsecp_impl.cpp: all implementations + raw tx txid/wtxid parsing

Tests: 79 BIP-141/143/144 + 12 BIP-342 tests, registered in selftest (30/30)
Benchmarks: Section 8.12 in bench_unified (10 new rows)
2026-03-22 13:03:47 +00:00
shrec
1838d4bb12
bench_unified: add BIP-39 mnemonic section (last uncovered feature)
Section 8.11: BIP-39 Mnemonic
  - bip39_generate (12 words, 128-bit entropy): ~8.8 us
  - bip39_generate (24 words, 256-bit entropy): ~8.0 us
  - bip39_validate (12 words, checksum check): ~0.8 us
  - bip39_to_seed (PBKDF2-HMAC-SHA512, 2048 rounds): ~2.1 ms
  - Throughput summary entries added

All features now have benchmark coverage.
2026-03-22 12:01:22 +00:00
shrec
d64cd2b8e2
Horizontal audit: benchmarks + tests for Adaptor/FROST/MuSig2/ECIES/SHA/FFI
bench_unified: add sections 8.6-8.10
  - Adaptor Signatures (Schnorr + ECDSA)
  - FROST 2-of-3 threshold (keygen/sign/verify/agg)
  - MuSig2 2-of-2 (key_agg/nonce/partial_sign/sig_agg)
  - ECIES encrypt/decrypt (256B payload)
  - Message Signing, SHA-256/512, Multi-scalar mul (4pt/64pt)
  - Throughput summary entries for all new sections

test_ecies.cpp: 7 tests (roundtrip, variable lengths, wrong key, tamper, nondeterminism)
test_sha.cpp: 9 tests with NIST FIPS 180-4 vectors (SHA-256/512)
test_ffi_coverage.cpp: 34 C ABI tests (ctx_size, Pedersen, ZK range, AEAD, ElligatorSwift, BIP-324)

Bug fixes:
  - ufsecp.h: UFSECP_ZK_RANGE_PROOF_MAX_LEN 675 -> 688 (was too small for actual proof)
  - ufsecp_impl.cpp: fix undefined UFSECP_ERROR -> UFSECP_ERR_INTERNAL,
    UFSECP_ERR_BUFFER_TOO_SMALL -> UFSECP_ERR_BUF_TOO_SMALL in BIP-324 functions

All 28/28 selftest modules pass. All benchmarks produce valid output.
2026-03-22 11:53:04 +00:00
shrec
97153c730b
BIP-324 horizontal: test suite (62/62), bench_unified section, BENCHMARKS.md, C example
- cpu/tests/test_bip324.cpp: 10 test sections, 62 checks covering
  HKDF-SHA256, ChaCha20-Poly1305 AEAD, ElligatorSwift, Bip324Cipher,
  Bip324Session, packet sequences, determinism, variable sizes,
  tamper resistance, and random-key sessions
- cpu/CMakeLists.txt: register test_bip324_standalone target + selftest
- cpu/tests/run_selftest.cpp: add BIP-324 module (gated by SECP256K1_BIP324)
- cpu/bench/bench_unified.cpp: add BIP-324 ENCRYPTED TRANSPORT section
  with ElligatorSwift, HKDF, AEAD, and Session benchmarks + summary
- docs/BENCHMARKS.md: add BIP-324 benchmark tables and bench target
- examples/c_example/main.c: add [11] BIP-324 session demo (create,
  handshake, encrypt, decrypt) via C ABI
2026-03-22 11:17:24 +00:00
shrec
2886cf56c0
fix: resolve 5 CI failures
- ellswift.cpp: remove unused-but-set variables in case which==2 (-Werror)
- bench_bip324_transport.cpp: fix -Wsign-conversion (int→size_t loop vars)
- security-audit.yml: exclude 6 slow tests from MSan (timeout under 10-20x overhead)
- ci.yml: exclude gpu_ops_equivalence from macOS ctest (Metal metallib not found)
- un-track source_graph.db (was causing dirty worktree → benchmark dashboard failure)
- add docs/BENCHMARK_BIP324_GPU.md (CUDA transport benchmark documentation)
2026-03-22 10:51:34 +00:00
shrec
eb2b3080f6
bench: add BIP-324 transport benchmark (mixed traffic, decoys, latency percentiles, e2e socket)
Four production-grade benchmark modes:

1. transport_mixed — realistic Bitcoin P2P payload distribution
   (40% 0-32B, 30% 33-128B, 20% 129-512B, 10% 512B+)
   Reports: packets/sec, goodput MB/s, wire overhead %

2. transport_with_decoys — decoy (ignore) packet CPU tax
   Tests at 5% and 20% injection rates
   Reports: decoy tax %, useful throughput drop vs no-decoy

3. latency_mode — per-packet p50/p95/p99 percentile histograms
   Per size bucket + mixed distribution, with jitter measurement

4. e2e_socket — localhost TCP full-duplex with TCP_NODELAY
   Reports: end-to-end req/resp latency, throughput with syscall overhead
   Includes memory-only baseline for overhead comparison
2026-03-22 03:51:26 +00:00
shrec
64895d213a
bip324: optimize ChaCha20 (SSE2), Poly1305 (__int128), HKDF, session allocs
ChaCha20: SSE2/SSSE3 vectorized quarter-round processes all 4 columns
simultaneously in XMM registers. SSSE3 shuffle for 16/8-bit rotations.
~1.45x faster block generation (115 ns → 79 ns).

Poly1305: 64-bit path using 3×44-bit limbs with unsigned __int128
multiply-accumulate (9 MULs vs 25). 32-bit 5×26-bit fallback preserved
for embedded/MSVC targets.

HKDF-Expand: Pre-compute and cache ipad/opad SHA256 mid-states outside
the iteration loop, clone per iteration. Saves 2 SHA256 compressions
per expand call.

Bip324Cipher: Reduce heap allocations from 2→1 per encrypt (build
combined plaintext directly in output buffer, encrypt in-place) and
3→1 per decrypt. AEAD encrypt/decrypt now handle aliased in/out.

Overall: AEAD +17%, session roundtrip +15%, all 27/27 tests pass.
2026-03-22 03:42:53 +00:00
shrec
2a74571535
bench: add BIP-324 throughput benchmark
Measures all BIP-324 stack layers at 64/256/1K/4K payload sizes:
  - ChaCha20 stream: ~833 MB/s
  - Poly1305 MAC: ~2.1 GB/s
  - ChaCha20-Poly1305 AEAD: ~525-550 MB/s
  - HMAC-SHA256 / HKDF-SHA256: ~278 ns key derivation
  - ElligatorSwift create: ~49 us, XDH: ~36 us
  - Full BIP-324 handshake: ~186 us (both sides)
  - Session packet encrypt: ~517 MB/s @ 4KB
  - Session roundtrip (enc+dec): ~285 MB/s @ 4KB

Uses benchmark_harness.hpp (RDTSCP, IQR outlier removal, thread pinning).
2026-03-22 03:19:45 +00:00
shrec
4079b266e5
feat: BIP-324 v2 encrypted P2P transport
Implements the complete BIP-324 protocol stack with zero external dependencies:

- ChaCha20-Poly1305 AEAD (RFC 8439): stream cipher + MAC + authenticated encryption
- HMAC-SHA256 / HKDF-SHA256 (RFC 5869): key derivation built on existing SHA256
- ElligatorSwift: XSwiftEC forward/inverse map for public key encoding
- BIP-324 session: ephemeral ECDH, ElligatorSwift handshake, bidirectional encryption

C ABI: ufsecp_bip324_create/handshake/encrypt/decrypt/destroy,
       ufsecp_aead_chacha20_poly1305_encrypt/decrypt,
       ufsecp_ellswift_create/xdh

Conditional build gate: SECP256K1_BUILD_BIP324 (ON by default)
All 27 unit tests pass (RFC test vectors + roundtrip + session tests)
2026-03-22 03:12:34 +00:00
shrec
395e83aacc
fix(ci): resolve -Werror, MSan, and parity stub gaps
-Werror fixes:
  - audit/audit_ct_namespace.cpp: assign fread() return value and check
    for short reads (fixes -Werror=unused-result with GCC 13)
  - audit/test_kat_all_operations.cpp: remove dead hex_to_bytes() static
    function (fixes -Werror=unused-function)

MSan fix:
  - cpu/src/pippenger.cpp: guard aggregation loop with used[b] check
    before reading from stack_buckets[], preventing MSan uninitialized-
    read reports on the first window where untouched bucket slots have
    not been written yet.  Functionally equivalent: untouched slots are
    conceptually Point::infinity() (identity), so skipping the add is
    correct and matches the original algorithm.

GPU parity (OpenCL + Metal):
  - opencl/kernels/secp256k1_frost.cl: new full FROST partial-verify
    kernel implementing R_i = D_i + rho_i*E_i, lhs = z_i*G,
    rhs = R_i + lambda_ie*Y_i comparison
  - gpu/src/gpu_backend_opencl.cpp: replace frost stub with full 9-buffer
    GPU dispatch via ensure_frost_kernel()
  - metal/shaders/secp256k1_kernels.metal: add Kernel 20 for FROST
    partial batch verify
  - gpu/src/gpu_backend_metal.mm: full rewrite implementing all 7 GPU
    operations (gen_mul, ecdsa_verify, schnorr_verify, ecdh, hash160,
    frost, msm) on Metal
2026-03-22 02:37:51 +00:00
shrec
e92be872c2
chore: remove Toom-Cook-3 dead code, fix duplicate OpenCL dispatch block
- Remove square_4_toomcook stub (was just forwarding to square_4_bmi2)
- Remove field_square_toomcook wrapper (uncalled dead code)
- Remove duplicate clEnqueueWriteBuffer+clSetKernelArg block in
  batch_scalar_mul_generator (caused cl_uint cnt redeclaration error)
2026-03-21 19:41:28 +00:00
shrec
2f3051282c
fix: tech debt batch — zero-safe batch inverse, 4-stream WNAF sign, OpenMP, MuSig2 validation
1. fe_batch_inverse: handle zero inputs gracefully by substituting ones
   during forward accumulation and restoring zeros in output. Prevents
   undefined behavior when callers pass zero-valued field elements.
   Added test_batch_inverse_zero_safe covering mixed, all-zero, and
   single-zero cases. (CT paths in ct_point.cpp unchanged — documented
   preconditions only.)

2. 4-stream WNAF (ESP32/STM32): fixed phi(G) sign — use k2_neg directly
   instead of k1_neg XOR k2_neg. G tables are precomputed without any
   sign baked in, unlike P tables where k1_neg is absorbed into P_base.
   Re-enabled the previously disabled code path.

3. OpenMP: added conditional OpenMP support for fe_h_based_inversion_batched.
   find_package(OpenMP QUIET) in CMakeLists.txt with ESP32/WASM exclusion.
   Static libgomp.a resolution for ARM64 cross-compilation.

4. MuSig2 key aggregation: validate ALL pubkeys upfront before computing
   anything. Previously, invalid pubkeys were silently skipped via continue,
   enabling potential rogue key attacks. Now returns empty ctx (Q=infinity)
   if any pubkey is invalid.

Tested on x86_64 (25/25), ARM64 RK3588 (25/25), RISC-V VisionFive2 (25/25).
No benchmark regressions detected.
2026-03-21 16:51:14 +00:00
shrec
d49453d935
Merge remote-tracking branch 'origin/fix/code-scanning-alerts' into dev 2026-03-21 14:15:44 +00:00
shrec
4f9c9f85c0
Merge remote-tracking branch 'origin/fix/code-scanning-alerts-round2' into dev 2026-03-21 14:15:44 +00:00
shrec
3afb02b54c
Merge remote-tracking branch 'origin/fix/ci-bip39-audit-regression' into dev 2026-03-21 14:15:44 +00:00
shrec
cd13729e6e
Merge remote-tracking branch 'origin/fix/round4-alerts' into dev 2026-03-21 14:15:44 +00:00
shrec
f3f58f7970
Merge remote-tracking branch 'origin/release/v3.22.0' into dev 2026-03-21 14:14:49 +00:00
shrec
4b5d75acf3
fix(arm64): SHA-256 ARM SHA2 intrinsics bug — vsha256h2q_u32 used modified abcd
armsha::sha256_compress passed the already-updated abcd to
vsha256h2q_u32 instead of the pre-update value. Per the ARMv8
Cryptographic Extensions spec, SHA256H2 requires the original
ABCD as its Qn operand.

This caused wrong SHA-256 digests on macOS ARM64 (Apple Silicon),
breaking NIST vectors, BIP-340, BIP-39, and RFC-6979 tests.
Linux x86_64 was unaffected (uses SHA-NI or scalar path).
2026-03-19 17:45:35 +00:00
shrec
fea2420fe7
fix: MSVC C2026 string limit (#173), OpenCL batch-inv kernels, source graph tooling
- Split embedded OpenCL kernel_source into kernel_parts[] array
  so no single string literal exceeds MSVC's 65535-byte limit.
  clCreateProgramWithSource now receives multiple source strings.
- Added batch-inversion kernels (field_inv, affine_add, jac_to_affine)
  using per-workgroup Montgomery's trick with __local memory.
- OpenCL BIP352 benchmark scaffold and kernel stubs.
- Source graph kit for indexed codebase exploration.
- Assorted doc, benchmark, and audit report updates.
2026-03-19 16:43:55 +00:00
Vano Chkheidze
690e1c3e59
fix: resolve clang-tidy code scanning warnings (#170)
Co-authored-by: shrec <shrec@users.noreply.github.com>
2026-03-18 11:18:04 +04:00
Vano Chkheidze
89891221ee
perf: batch ops 17-67x faster via all-affine fast path; pippenger touched-bucket + window tuning (#169)
* perf: batch ops 17-67x faster via all-affine fast path; pippenger touched-bucket + window tuning

## Performance (N=64 batch, LTO Release build)
- batch_normalize /pt:      144.7 ns → 8.2 ns  (17.6x faster)
- batch_to_compressed /pt:  134.6 ns → 2.0 ns  (67x faster)
- batch_x_only_bytes /pt:    97.4 ns → 1.9 ns  (51x faster)
- scalar_mul (k*P):          17012 ns → 17155 ns (no regression)

## Changes

### cpu/src/point.cpp
- batch_normalize: all-affine fast path — when all z_one_==true, skip batch
  inversion and read x_/y_ directly
- batch_to_compressed: same fast path + parity via limbs()[0] (avoids
  full serialization just for one bit)
- batch_x_only_bytes: same fast path using store_b32_prenorm / to_bytes()

### cpu/src/pippenger.cpp
- Window thresholds retuned: n<=72→c=5, n<=384→c=6, n<=768→c=7, etc.
  (was n<=32→c=5, n<=64→c=6)
- Strauss/Pippenger crossover: n<48 (was n<=64) — avoids Pippenger
  overhead for small MSM sizes
- Stack allocation for buckets/touched/used (STACK_BUCKETS=256): eliminates
  heap alloc for common window sizes; unique_ptr fallback for larger
- Pre-extracted digits: all n*num_windows digits in a flat u16[] before
  main loop — avoids redundant extract_window_bits calls
- all_affine scatter: uses add_mixed52_inplace/from_affine52 instead of
  full Jacobian add
- touched[] tracking: only reset touched buckets (O(k) not O(2^c))
- max_touched_digit: aggregate loop starts from highest used bucket

### cpu/src/batch_add_affine.cpp
- PrecomputeBuffers struct: stack arrays for count<=64, heap fallback
- y-parity via limbs()[0] instead of to_bytes()[31]

## Tests
- test_point_edge_cases_standalone: 53/53 PASS
- test_ecc_properties_standalone: 89/89 PASS
- test_edge_cases_standalone: 60/60 PASS
- test_comprehensive_standalone: 12023/12023 PASS
- test_batch_add_affine_standalone: 548/548 PASS

* fix: precompute_point_multiples stack alloc; ASan timeout 300→600s -j4

- cpu/src/batch_add_affine.cpp: precompute_point_multiples now uses
  PrecomputeBuffers (same pattern as precompute_g_multiples from PR #169)
  Eliminates 3 heap allocations per b*P table build for count <= 64
  (stack fallback path, avoids malloc/free overhead)

- docker/run_ci.sh: fix ASan+UBSan flaky timeout (root cause: selftest
  ~159s normally → 300-480s under ASan + CPU contention from -j$NPROC)
  Fix: --timeout 600 and cap parallelism at min(4, NPROC) for asan job

* perf: keep schnorr batch verify on fast path through N=64

Current measurements still show the randomized MSM path losing to
per-signature schnorr_verify at N=64, even after earlier MSM work.

Keep batches through 64 entries on the existing GLV Strauss + fixed-base
path and defer the MSM path to larger batches, matching the measured
crossover on this machine.

* perf: reduce schnorr batch setup passes

Build the non-generator MSM inputs in one pass during large-batch
Schnorr verification instead of materializing temporary weights,
challenges, and lifted-point vectors and then refilling scalars/points.

This cuts setup memory traffic and improved a local N=128 large-batch
harness by about 5% in the one-shot path.

Also add an audit case for malformed x-only pubkeys with the same xor
fingerprint as a valid one so future lift-caching changes keep the batch
path robust under collisions and repeated invalid inputs.

* bench: add larger batch verify sizes

Extend the unified benchmark to measure Schnorr and ECDSA batch
verification at N=128 and N=192 in addition to 4/16/64.

Scale iteration counts for the larger sizes so the benchmark stays
practical while exposing the real crossover behavior of the large-batch
path.

* perf: retune schnorr batch crossover

Keep Schnorr batch verification on the per-signature GLV Strauss path
through N=128, and switch to the randomized MSM path above that.

Current official benchmark data on this CPU shows the public batch path
is still slower than individual verification at 64 and 128 entries, while
192 entries is close enough to justify keeping the large-batch path active
there for further tuning.

* perf: cache repeated x-only pubkeys in large schnorr batches

Large Schnorr batches in the official benchmark reuse x-only pubkeys from
a 64-entry pool, so avoid re-lifting the same pubkey multiple times inside
one batch. Reuse parsed SchnorrXonlyPubkey points for duplicates and stream
challenge hashing directly instead of building a temporary 96-byte buffer.

This keeps 64/128 on the per-signature path and improves the N=192 batch
path enough to beat individual verification again on the current CPU.

Also add audit coverage for batches that intentionally reuse the same
x-only pubkey across many signatures.

* perf: trim field batch inversion scratch overhead

Use indexed scratch storage in fe_batch_inverse instead of push_back on the
hot path, and route small batches through a fixed stack scratch buffer.

This keeps the change local to field inversion while shaving overhead from
runtime callers that use small and medium Montgomery batch inversions.

* Add cached schnorr batch path and preflight coverage fixes

* Benchmark cached schnorr batch verification

* Reuse scratch buffers in schnorr batch verify

* Cache x-only lifts in schnorr parse path

* Trim schnorr batch seed serialization overhead

* Tune schnorr batch cutoff for N=128

* Reuse SHA256 base for schnorr batch weights

* Harden ECIES zero-ephemeral cleanup

* harden ABI secret cleanup paths

* harden wallet seed-to-address cleanup

* optimize coin HD fixed-path derivation

* optimize silent payment scan invariants

* optimize OpenCL generator nibble lookup

* optimize OpenCL GLV generator phi table

* Optimize CUDA BIP352 benchmark and enrich project graph

---------

Co-authored-by: shrec <shrec@users.noreply.github.com>
2026-03-18 02:38:10 +04:00
Vano Chkheidze
5f6dde593c
fix: clear remaining low-level code scanning alerts (#162)
Co-authored-by: shrec <shrec@users.noreply.github.com>
2026-03-17 06:54:30 +04:00
Vano Chkheidze
e80de58224
fix CI bip39 audit regression (#161)
Co-authored-by: shrec <shrec@users.noreply.github.com>
2026-03-17 00:02:12 +04:00
shrec
a42a6202c4 fix CI bip39 audit regression 2026-03-16 19:12:36 +00:00
Vano Chkheidze
c38b659b06
fix: resolve all 213 code-scanning alerts + N-03 CT path for message signing
- bip39.cpp: fix 45 alerts (const-correctness, braces-around-stmts, init-vars, cert-err33-c)
- zk.cpp: fix 25 alerts (const-correctness, braces-around-stmts)
- ufsecp_impl.cpp: fix 72 alerts (braces, const, modernize-auto, init-vars, implicit-widening)
- message_signing.cpp: N-03 security fix (use ct::ecdsa_sign_recoverable on CT path)
- ct_sign.cpp + ct/sign.hpp: add ct::ecdsa_sign_recoverable implementation
- compat/libsecp256k1_shim: add secp256k1_ecdsa_sign_recoverable + secp256k1_ecdsa_recover
- SECURITY.md: Q-07 Known Non-CT Exceptions table with fix status
- Other alert files: address.cpp, coin_address.cpp, eth_signing.cpp, wallet.cpp,
  test_bip39.cpp, test_ethereum.cpp, test_wallet.cpp, test_zk.cpp, test_ffi_round_trip.cpp
2026-03-16 22:48:52 +04:00
Vano Chkheidze
c99fdb1dfa fix: resolve all 213 code-scanning alerts + N-03 CT path for message signing
- bip39.cpp: fix 45 alerts (const-correctness, braces-around-stmts, init-vars, cert-err33-c)
- zk.cpp: fix 25 alerts (const-correctness, braces-around-stmts)
- ufsecp_impl.cpp: fix 72 alerts (braces, const, modernize-auto, init-vars, implicit-widening)
- message_signing.cpp: N-03 security fix (use ct::ecdsa_sign_recoverable on CT path)
- ct_sign.cpp + ct/sign.hpp: add ct::ecdsa_sign_recoverable implementation
- compat/libsecp256k1_shim: add secp256k1_ecdsa_sign_recoverable + secp256k1_ecdsa_recover
- SECURITY.md: Q-07 Known Non-CT Exceptions table with fix status
- Other alert files: address.cpp, coin_address.cpp, eth_signing.cpp, wallet.cpp,
  test_bip39.cpp, test_ethereum.cpp, test_wallet.cpp, test_zk.cpp, test_ffi_round_trip.cpp
2026-03-16 22:35:03 +04:00