opencl+metal: wire bulletproof_verify_batch — close last parity gap

- opencl/kernels/secp256k1_zk.cl: remove #if 0 guard (881 lines of bulletproof
  code re-enabled); fix range_verify_full_impl address-space qualifiers:
  bp_G / bp_H now __global const AffinePoint*; per-iteration private copy
  (AffinePoint g_pt = bp_G[i]) before passing to scalar_mul_impl.

- gpu/src/gpu_backend_opencl.cpp: replace Unsupported stub with real dispatch
  via range_proof_poly_batch kernel; add bp_poly_batch_ member + cleanup;
  update ensure_zk_kernels() to register the new kernel; parse 324-byte
  wire format (4x65-byte uncompressed + 2x32-byte scalars) into
  RangeProofPolyOCL GPU layout.

- gpu/src/gpu_backend_metal.mm: replace Unsupported stub with real dispatch
  via range_proof_poly_batch kernel; build RangeProofPolyMetal (320B) from
  324-byte wire format using be32_to_metal_fe / be32_to_metal_scalar helpers.

- docs/BACKEND_ASSURANCE_MATRIX.md: bulletproof row stub->Y for OpenCL+Metal;
  parity tracking now shows zero remaining stubs.

- CHANGELOG.md: document bulletproof parity closure.

All three backends (CUDA, OpenCL, Metal) now implement bulletproof_verify_batch.
Zero Unsupported stubs remain in the GPU backend surface.

2026-03-24 21:59:56 +00:00

90 KiB

Raw Permalink Blame History

Changelog

All notable changes to UltrafastSecp256k1 are documented here.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Security / Audit

Ethereum differential KAT (audit/test_exploit_ethereum_differential.cpp) — 10 tests, 15 sub-checks against go-ethereum, web3.py, and ethers.js reference vectors: address derivation (go-ethereum testKey KAT), privkey=1 canonical address, ecrecover with go-ethereum test message hash, EIP-191 hash vs web3.py, sign+ecrecover roundtrip, EIP-155 v encoding, eth_personal_sign roundtrip, tamper detection, keccak256("abc") KAT, anti-collision. Closes assurance gap #4.
MuSig2/FROST/adaptor parser robustness fuzz (audit/test_fuzz_musig2_frost.cpp) — 15 tests, 16 sub-checks: musig2 key_agg/nonce_agg/partial_verify/partial_sig_agg with random inputs (5000/3000/2000 rounds each), FROST keygen_finalize/sign/verify_partial/aggregate random inputs, schnorr+ecdsa adaptor random inputs, boundary test (n_signers=0 → must error). Closes assurance gap #7.
ClusterFuzzLite expanded to 5 targets: added cpu/fuzz/fuzz_ecdsa.cpp (ECDSA sign→verify invariant, wrong-msg false-positive check, parse_compact_strict robustness) and cpu/fuzz/fuzz_schnorr.cpp (BIP-340 sign→verify, adversarial from_bytes verify, wrong-msg check).

GPU Backend

Bulletproof parity (OpenCL + Metal): resolved the last remaining PARITY-EXCEPTION. OpenCL: removed #if 0 guard in secp256k1_zk.cl; fixed range_verify_full_impl address-space qualifiers (__global const AffinePoint* for bp_G/bp_H, with per-iteration private copy before scalar_mul_impl); wired bulletproof_verify_batch host dispatch via range_proof_poly_batch kernel (matches CUDA poly-check behavior). Metal: wired bulletproof_verify_batch host dispatch via range_proof_poly_batch kernel; host converts 324-byte proof wire format to RangeProofPolyMetal GPU structs. Full CUDA ↔ OpenCL ↔ Metal parity — zero Unsupported stubs remaining.
OpenCL parity: wired zk_knowledge_verify_batch, zk_dleq_verify_batch, bip324_aead_encrypt_batch, bip324_aead_decrypt_batch in gpu_backend_opencl.cpp — 4 new kernels matching CUDA surface.
Metal parity: wired zk_knowledge_verify_batch, zk_dleq_verify_batch, bip324_aead_encrypt_batch, bip324_aead_decrypt_batch in gpu_backend_metal.mm — all four Metal kernels were already present in secp256k1_kernels.metal; dispatch code now connected. Also fixed zk_knowledge_verify_batch Metal kernel: was incorrectly treating pubkey buffer as a scalar (scalar×G); corrected to lift_x to recover the full point from x-coordinate.
CUDA 13 compatibility: replaced deprecated cudaDeviceProp::clockRate / ::memoryClockRate fields (removed in CUDA 13) with cudaDeviceGetAttribute(cudaDevAttrClockRate/MemoryClockRate) under #if CUDART_VERSION >= 13000 guard. Backward-compatible with CUDA 12. Reported by @craigraw compiling with CUDA 13 on RTX 5080.

Full ABI audit coverage: 155 ufsecp_* + 23 ufsecp_gpu_* functions, 70-module unified runner (AUDIT-READY), GPU C ABI null-guard path integration.

Added

ZK adversarial exploit test (test_exploit_zk_adversarial.cpp) — 14 tests covering malformed/forged ZK proof inputs: garbage bytes, all-zero proof, scalar overflow (s ≥ n), truncated data, identity pubkey, identity generator, degenerate G==H DLEQ, wrong commitment for range proof, DLEQ overflow e, exhaustive 64-byte-flip sensitivity. Closes audit coverage gap #2.
Pedersen adversarial exploit test (test_exploit_pedersen_adversarial.cpp) — 12 tests covering switch commitment security and adversarial balance attacks: switch roundtrip, zero-blind equivalence, switch binding, zero-commit→identity, negation cancellation, imbalanced verify_sum (theft detection), blind_sum subtraction, switch-as-normal rejection, double-spend detection, generator J independence. Closes audit coverage gap #3.
Cross-library differential test (test_cross_libsecp256k1) wired into audit label set (audit;exploit;differential;libsecp) so it participates in the unified audit runner when SECP256K1_BUILD_CROSS_TESTS=ON.

Fixed

Stable binding validation closure -- aligned the shared validate_bindings.sh flow and wrapper smoke suites across C#, Java, Swift, Python, Go, Rust, Node.js, PHP, Ruby, Dart, and the default React Native contract lane. Fixed wrapper/API drift, zero-length FFI buffer edge cases, Dart NativeFinalizer usage, and local Dart smoke-runner execution issues uncovered during the pass.
CUDA jacobian_add_mixed_unchecked infinity flag — missing r->infinity = false assignment in the normal (non-infinity-input) code path caused generator table entries table[3..15] built by build_generator_table to carry uninitialized infinity flags. Scalars with many consecutive high nibbles (e.g. n-1, all-0xF pattern) heavily hit table[15] and produced wrong public keys. All 52/52 CUDA signing tests now pass.

Changed

Binding docs and packaging notes -- synchronized canonical binding docs, examples, package names, and validation framing to the verified state: Dart is documented as ultrafast_secp256k1, React Native as react-native-ultrafast-secp256k1, and the central bindings matrix now reflects smoke-validated coverage instead of stale compile-only/optional wording.
CUDA signing paths — scalar_mul_generator_const → scalar_mul_generator_w8 across all signing kernels (ecdsa.cuh, schnorr.cuh, bip32.cuh, pedersen.cuh, zk.cuh). w=8 uses 32 windows of 8-bit lookups instead of 64 windows of 4-bit lookups (w=4):
- ECDSA Sign: 220.9 → 198.3 ns/op (−10.2%, beats OpenCL 211.3 ns)
- Schnorr Sign: equivalent speedup via the same generator multiplication hotspot
- scalar_mul_generator_const (w=4) retained for audit/benchmark comparisons.

Added

BIP-39 Mnemonic Seed Phrases (bip39.hpp/bip39.cpp) -- entropy-to-mnemonic conversion (12/15/18/21/24 words), mnemonic validation (word count, membership, checksum), seed derivation via PBKDF2-HMAC-SHA512 (2048 rounds), mnemonic-to-entropy roundtrip. OS CSPRNG entropy source. 2048-word English wordlist from official BIP-39 specification. 57 tests across 8 test functions. Registered in run_selftest (module 10/25) and unified_audit_runner (protocol_security).
Zero-Knowledge Proof Layer (zk.hpp/zk.cpp) -- non-interactive Schnorr knowledge proofs, DLEQ (discrete log equality) proofs, and Bulletproof range proofs (64-bit). All proving operations use CT layer (constant-time); verification uses FAST layer (public data). Fiat-Shamir via tagged SHA-256. Nothing-up-my-sleeve generators (no trusted setup).
ZK Batch Operations -- batch_range_verify() for efficient multi-proof verification, batch_commit() for Pedersen commitment generation.
MSM-optimized Bulletproof verifier -- multi-scalar multiplication (Pippenger) merges 144 points into a single MSM; Montgomery batch inversion for s_coeff (1 inv + 126 muls vs 64 inversions). 1.93x speedup (5,079 -> 2,634 us).
GPU ZK kernels -- Pedersen commitment (pedersen.cuh) and ZK proof primitives (zk.cuh) for CUDA backend.
GPU ZK + BIP-324 C ABI -- 5 new ufsecp_gpu_* batch operations wired through the full GpuBackend → C ABI stack: zk_knowledge_verify_batch, zk_dleq_verify_batch, bulletproof_verify_batch, bip324_aead_encrypt_batch, bip324_aead_decrypt_batch. CUDA fully implemented; OpenCL/Metal stubs with TODO(parity). Shared BIP-324 device code extracted to cuda/include/bip324.cuh (ChaCha20-Poly1305 AEAD). GPU C ABI total: 8 → 13 backend-neutral batch operations.
GPU CT ZK proving (ct_zk.cuh) -- constant-time knowledge proof and DLEQ proof on CUDA, using the full CT scalar multiplication layer. Deterministic nonce derivation with SHA-256 tagged hash and XOR hedging. Batch kernels for both operations.
OpenCL ZK proving/verification (secp256k1_zk.cl) -- knowledge proof, DLEQ proof, batch prove/verify kernels. Uses fast-path wNAF-5 scalar multiplication.
Metal ZK proving/verification (secp256k1_zk.h, kernels 19-22) -- knowledge proof, DLEQ proof with batch kernels. Uses branchless affine_select scalar multiplication.
GPU ZK test coverage -- test_ct_smoke.cu expanded to 9 tests: CT knowledge prove+verify and CT DLEQ prove+verify round-trips verified on GPU (RTX 5060 Ti, Blackwell SM 12.0).
ZK Benchmarks in bench_unified Section 8.5: Pedersen commit, Knowledge prove/verify, DLEQ prove/verify, Bulletproof range prove/verify with throughput numbers.
24 ZK tests in test_zk.cpp: knowledge proof, DLEQ, Bulletproof range proof correctness, soundness, serialization, batch verification, edge cases.
Unified Wallet API (wallet.hpp/wallet.cpp) -- chain-agnostic key management, address generation, message signing, and public key recovery. Single wallet:: namespace works identically across Bitcoin, Ethereum, Tron, and all 28 supported coins.
Bitcoin message signing (message_signing.hpp/message_signing.cpp) -- BIP-137/Electrum compatible: bitcoin_message_hash(), bitcoin_sign_message(), bitcoin_verify_message(), bitcoin_recover_message(), bitcoin_sig_to_base64(), bitcoin_sig_from_base64().
P2SH-P2WPKH (nested/wrapped SegWit) address generation -- address_p2sh_p2wpkh(), coin_address_p2sh_p2wpkh(), wallet::get_address_p2sh_p2wpkh(). Produces 3... addresses for backward-compatible SegWit (BIP-49).
P2SH (pay-to-script-hash) address primitive -- address_p2sh(), coin_address_p2sh() from raw 20-byte script hash.
P2WSH (witness script hash) address primitive -- address_p2wsh() from 32-byte witness script hash (SegWit v0, bc1q... 32-byte program).
CashAddr encoding (Bitcoin Cash BIP-0185) -- cashaddr_encode(), address_cashaddr(), coin_address_cashaddr(), wallet::get_address_cashaddr(). Produces bitcoincash:q... addresses.
Tron (TRX) coin descriptor -- coin_type=195, TRON_BASE58 encoding (Keccak-256 hash + 0x41 prefix + Base58Check). 28th coin in registry.
5 wallet address helpers -- get_address_p2pkh(), get_address_p2wpkh(), get_address_p2sh_p2wpkh(), get_address_p2tr(), get_address_cashaddr().
chain_id field in CoinParams for EIP-155 signing (Ethereum=1, BSC=56, Polygon=137, etc.).
19 new tests -- 12 in test_coins.cpp (P2SH-P2WPKH, CashAddr, P2SH, Tron), 7 in test_wallet.cpp (key management, signing, address formats, recovery).

Fixed

CUDA pedersen.cuh -- fixed .d[] -> .limbs[] member access on ScalarData/FieldElementData (7+ occurrences), removed calls to non-existent field_normalize(), removed duplicate field_sqrt() definition (already in secp256k1.cuh). These bugs existed since file creation but were never triggered because no test included pedersen.cuh before.
CUDA zk.cuh -- fixed .d[] -> .limbs[] on Scalar and FieldElement in knowledge_verify_device, dleq_verify_device, and range_verify_inner. Fixed undefined SCALAR_ONE constant with inline initializer.

Changed

coin_address() CASHADDR dispatch now correctly routes to coin_address_cashaddr() -- Bitcoin Cash addresses generate via CashAddr instead of falling through to Base58Check.
All 28 coins now generate addresses correctly (was 27; BCH fixed, Tron added).
ARM64 Android hash dispatch -- hash_accel now routes sha256_33, sha256_32, hash160_33, and sha256_compress_dispatch through ARMv8 SHA-256 instructions when building for AArch64 targets with SHA2 support. On RK3588 / Android NDK r27.2 this reduced ecdsa_sign from 25.89 us to 22.22 us, schnorr_sign (precomputed) from 17.73 us to 16.67 us, and ct::ecdsa_sign from 70.50 us to 67.11 us, with verify paths remaining effectively flat.
x86 Schnorr batch verify allocation path -- batch_verify.cpp now reserves the full batch size for the uncached x-only pubkey cache instead of capping capacity at 64. Local i5-14400F reruns reduced uncached schnorr_batch_verify from 20.27 us/sig to about 19.94-20.06 us/sig at N=128 and from 18.56 us/sig to about 18.01-18.45 us/sig at N=192, with comprehensive remaining green.

[3.22.0] - 2026-03-10

Minor release: v3.21.1 -> v3.22.0 | Modular Ethereum layer | ABI extension (backward compatible) New feature: full Ethereum signing/recovery support with conditional build

Added

Modular Ethereum layer -- EIP-191, EIP-155, ecrecover, personal_sign with conditional CMake option SECP256K1_BUILD_ETHEREUM (default ON). Bitcoin-only builds exclude all Ethereum code via -DSECP256K1_BUILD_ETHEREUM=OFF. (#144)
eth_signing.hpp/.cpp -- eip191_hash(), eth_personal_sign(), eth_sign_hash(), ecrecover(), eth_personal_verify() with EIP-155 chain ID encoding (v = 35+2*chainId+recid).
C ABI Ethereum functions -- 6 new ufsecp_* functions: ufsecp_keccak256, ufsecp_eth_address, ufsecp_eth_address_checksummed, ufsecp_eth_personal_hash, ufsecp_eth_sign, ufsecp_eth_ecrecover. All guarded by #ifdef SECP256K1_BUILD_ETHEREUM.
Ethereum test suite -- 32 tests across 7 groups (EIP-155 encoding, EIP-191 hash, eth_sign_hash, ecrecover, personal_sign+verify, multi-chain, Keccak-256 vectors). Registered as standalone target + run_selftest module.
Ethereum benchmarks -- 8 benchmarks in bench_unified Section 6.5: keccak256, ethereum_address, eip191_hash, eth_sign_hash, ecdsa_sign_recoverable, ecrecover, eth_personal_sign, eip55_checksum.
libsecp256k1 recovery comparison -- enabled ENABLE_MODULE_RECOVERY=1 in libsecp_provider; added sign_recoverable + recover benchmarks with 3-column apple-to-apple comparison rows.
Ethereum audit module -- registered in unified_audit_runner under protocol_security section with conditional compilation guard.

Changed

coin_address.cpp -- EIP-55 dispatch wrapped with #ifdef SECP256K1_BUILD_ETHEREUM guard; returns empty string when Ethereum module not built.
test_coins.cpp -- All Ethereum-specific tests (Keccak-256, EIP-55, BIP-44 Ethereum paths) wrapped with #ifdef SECP256K1_BUILD_ETHEREUM for clean Bitcoin-only builds.

[3.21.1] - 2026-03-09

Patch release: v3.21.0 -> v3.21.1 | Bug fixes, CI hardening, Metal audit | ABI compatible No breaking changes -- drop-in upgrade from v3.21.0

Added

Metal Unified Audit Runner -- 27 audit modules across 8 sections for Apple GPU (Metal) backend, matching OpenCL audit runner coverage. (#134)

Fixed

CPUID PIC safety -- replaced raw cpuid inline asm with __get_cpuid_count() from <cpuid.h> to avoid EBX register conflicts with clang PIC mode and coverage instrumentation. Eliminates SIGILL on clang-17 Release builds. (#135)
CI MARCH level -- downgraded from x86-64-v3 to x86-64-v2 in CI/SonarCloud workflows to avoid AVX2/FMA instructions on runners without those extensions. (#135)
Metal metallib path -- audit runner now searches for secp256k1_kernels.metallib (the name CMake copies) in addition to secp256k1.metallib. (#135)
Benchmark CI output path -- moved benchmark JSON output to /tmp/ to prevent git checkout conflicts with gh-pages branch. (#135)
OpenCL audit 6-bug fix -- fixed 6 bugs in OpenCL backend, achieving 27/27 audit PASS status; dead code removal and Scorecard .sigstore cleanup. (#133)
macOS amd64 build -- fixed build failure on macOS x86-64. (#132)
Code scanning alerts -- resolved 6 clang-tidy findings in test_edge_cases.cpp: unused using declaration, const-correctness on zero_bytes / zero_sig, unchecked std::remove() returns, and redundant one - one expression. (#136)
Delta audit findings -- addressed audit delta findings with GPU audit runners and documentation updates. (#126)

Dependencies

benchmark-action/github-action-benchmark bumped (#131)
actions/dependency-review-action 4.8.3 -> 4.9.0 (#130)
github/codeql-action 4.32.4 -> 4.32.6 (#129)
actions/setup-go 6.2.0 -> 6.3.0 (#128)
ruby/setup-ruby 1.288.0 -> 1.290.0 (#125)
actions/setup-dotnet 5.1.0 -> 5.2.0 (#124)
step-security/harden-runner 2.15.0 -> 2.15.1 (#123)
actions/setup-node 6.2.0 -> 6.3.0 (#122)
actions/cache 4.2.3 -> 5.0.3 (#121)

[3.21.0] - 2026-03-08

Cumulative release: v3.14.0 -> v3.21.0 | 120+ commits | ABI compatible No breaking changes -- drop-in upgrade from v3.14.x

Added

ABI layout guards -- compile-time static_assert checks on struct sizes and constant lengths in ufsecp.h to catch ABI breaks. (#118)
docs/CRYPTO_INVARIANTS.md -- comprehensive crypto invariant reference for integrators and auditors. (#118)

Fixed

Precompute cache validation -- 65 bytes/point -> 1 byte/point minimum bound check; path separator fix on Windows. (#118)
Fuzzer edge case -- fuzz_point k=n-1 infinity assertion. (#118)
CI stabilization -- SECP256K1_MARCH option to avoid ccache/-march=native mismatch SIGILL; benchmark regression filter MIN_REGRESSION_NS=50. (#118)
Packaging workflow -- handle immutable GitHub releases by recreating with preserved metadata when gh release upload --clobber fails (HTTP 422). (#119)
LTO/ASM build fix -- restrict -flto=thin (Clang) and -flto (GCC) to C++ sources only via $<COMPILE_LANGUAGE:CXX> generator expression, preventing the system assembler from seeing unsupported LTO flags on .S files. (#120)

[3.20.0] - 2026-03-07

Cumulative release: v3.14.0 -> v3.20.0 | 120+ commits | ABI compatible No breaking changes -- drop-in upgrade from v3.14.x

This release consolidates all work from v3.15.0 through v3.19.0, plus 19 additional commits (PRs #90--#111) into a single stable release.

1. Security & Constant-Time Hardening

RISC-V CT timing leak fixes -- dudect testing on SiFive U74 detected 5 persistent timing leaks (field_sqr, scalar_is_zero, scalar_sub, scalar_window, ct_compare). Fixed with register-only value_barrier() and corrected rdcycle timer (no fence). (#v3.19.0)
CT SafeGCD scalar inverse -- replaced Fermat chain (294 scalar ops) with constant-time Bernstein-Yang divsteps-59 port from bitcoin-core/secp256k1. ct::scalar_inverse: 10,650 ns -> 1,671 ns (6.4x faster). CT ECDSA Sign: 26,942 ns -> 15,360 ns (43% faster). Fermat chain preserved for non-__int128 platforms (ESP32). (#v3.18.0)
Secret zeroization -- ecdsa_sign(), rfc6979_nonce(), musig2_nonce_gen() now guarantee secure_erase() of all intermediate secrets (k, k_inv, z, V, K, HMAC state) on every code path. (#v3.17.0)
Sign-then-verify countermeasure -- both ecdsa_sign() and ct::schnorr_sign() verify the signature before returning; failure zeroes the result. (#v3.17.0)
BIP-340 strict parsing -- Scalar::parse_bytes_strict, FieldElement::parse_bytes_strict, SchnorrSignature::parse_strict reject all malformed inputs. C ABI uses strict parsing internally. UFSECP_BITCOIN_STRICT CMake option for compile-time enforcement. (#v3.16.0)
CT buffer erasure -- volatile function-pointer trick in ct::schnorr_sign and ct::ecdsa_sign (same technique as libsecp256k1). (#v3.16.0)
Hedged ECDSA -- ecdsa_sign_hedged() + rfc6979_nonce_hedged() implementing RFC 6979 Section 3.6 with 32-byte aux_rand mixed into HMAC-DRBG. (#v3.17.0)
PrivateKey strong type -- private_key.hpp: wraps fast::Scalar, no implicit conversion, secure_erase in destructor, [[nodiscard]] accessors. CT overloads for ECDSA/Schnorr operations. (#v3.17.0)
Formal CT verification -- Valgrind ctgrind (SECP256K1_CLASSIFY/DECLASSIFY markers), Fiat-Crypto direct linkage (6085 cross-checks), ct-verif LLVM pass. (#v3.17.0, #v3.16.0)
Schnorr parity fix + branchless scalar_window -- corrected BIP-340 parity bit, branchless CT implementation on RISC-V, platform-specific path on x86/ARM. (#v3.15.0)
Point on-curve validation -- audited 18 deserialization paths, fixed 4 CRITICAL + 1 HIGH + 3 LOW missing validations. (#v3.17.0)

2. Performance

L1 I-cache optimization -- __attribute__((noinline)) on point add/double functions reduced verify hot path below L1 I-cache threshold. ECDSA verify ratio vs libsecp: 0.82x -> 0.92x (+12%). (#v3.19.0)
BIP-352 affine add fast path -- 1.20x speedup for silent payment scanning. (#95)
FE52 fast path for scalar_mul_with_plan + GLV MSM -- optimized multi-scalar multiplication with FE52-native operations. (#92)
ECDSA recovery 1.9x speedup -- replaced 3 separate scalar muls with single dual_scalar_mul_gen_point(u1, u2, R) using 4-stream GLV Strauss. Recovery: ~69 us -> ~36 us. (#v3.15.0)
Precompute cache atomic write -- write-then-rename pattern prevents CTest parallel test flakes. File size validation on load. (#v3.17.1)

3. Testing & Audit

Google Wycheproof ECDSA -- 89 test cases, 10 categories. (#v3.17.0)
Google Wycheproof ECDH -- 36 test cases, 7 categories. (#v3.17.0)
FROST RFC 9591 invariants -- 7 ciphersuite-independent invariants + exhaustive 3-of-5 signing across all C(5,3) = 10 subsets. (#v3.16.0)
MuSig2 BIP-327 vectors -- 35 reference tests. (#v3.16.0)
FFI round-trip tests -- 103 boundary tests for Schnorr, ECDSA, pubkey, ECDH, tweaking, and error paths. (#v3.16.0)
Fiat-Crypto cross-checks -- 752 field arithmetic checks against Coq-extracted reference. (#v3.16.0)
Cross-platform audit campaign -- 7 configurations (Windows/Linux/CI x86-64, ESP32-S3, RISC-V 64), all AUDIT-READY (40--49 modules each). (#v3.16.1)
Cross-platform benchmark campaign -- 4 platforms (x86-64, ARM64, RISC-V, ESP32-S3) with identical apple-to-apple suite vs libsecp256k1 v0.7.2. (#v3.16.1)
ASan buffer overread fix -- suite_15_ffi_ecdh_edge() 32-byte buffer corrected to 33-byte for compressed pubkey. (#v3.17.1)
Batch serialization + bench_unified coverage -- extended benchmark suite. (#94)
Test count: grew from ~29 to 31 core tests + 49 audit modules.

4. CI/CD & Code Quality

OpenSSF Scorecard hardening -- all GitHub Actions pinned to SHA, harden-runner on every job, persist-credentials: false, pip hash pinning, Dependabot. (#v3.15.0, #v3.16.0)
CT verification CI -- ct-arm64.yml (native Apple Silicon dudect), ct-verif.yml (compile-time LLVM pass), valgrind-ct.yml (taint analysis). (#v3.16.0)
ClusterFuzzLite -- integrated with UBSan vptr compatibility, LTO disabled in fuzz builds to prevent link failures. (#v3.15.0, #108)
Docker local CI -- docker-compose.ci.yml, pre-push hook, ~5 min full validation. (#v3.16.0)
Performance regression gate -- per-commit benchmark with 150% threshold. (#v3.16.0)
SARIF output -- unified_audit_runner --sarif for GitHub Code Scanning. (#v3.16.0)
SonarCloud Quality Gate -- coverage 61.8% -> 85.8%, duplication below threshold, CPD exclusion for CT variants. (#v3.15.0)
5,150+ code scanning alerts resolved -- mass clang-tidy, cppcheck, CodeQL remediation across v3.15.0--v3.15.3 + PRs #102, #105, #109, #111.
Code deduplication -- -817 lines net across 8 files: point.cpp (-765), glv.cpp (-64), benchmark_harness.hpp, ufsecp_impl.cpp, selftest.cpp, field.cpp, scalar.cpp + new shared detail/arith64.hpp. (#110)
CI dependency bumps -- actions/attest-build-provenance v4.1.0, sigstore/cosign-installer v4.0.0, step-security/harden-runner v2.15.0, actions/upload-artifact v7.0.0. (#80--#84)
ClusterFuzzLite + MSan failures fixed. (#91)
6 CI workflow failures resolved. (#90)

5. Platform Support

ESP32-S3 -- bench_hornet benchmark data, 40-module audit. (#107, #v3.16.1)
WASM / Emscripten -- SECP256K1_NO_INT128 auto-defined, FAST_52BIT disabled, precompute generator bypass, GLV+Shamir fallback. (#v3.15.0)
ARM64 Android -- bench_hornet port with clock_gettime, libsecp_bench.c for cross-compilation. (#v3.16.1)
RISC-V real hardware -- Milk-V Mars benchmarks, 4/6 ops faster (2.02x--3.08x), value_barrier register-only fix. (#v3.16.1, #v3.19.0)
Preprocessor branch repair -- fixed broken conditional compilation in point ops. (#104)

6. Build & Packaging

Benchmark diagnostics -- Schnorr verify sub-operation diagnostics (SHA256, FE52_inv, parse_strict) added to bench_unified. (#v3.19.0)
Build hardening -- clean -Werror -Wall -Wextra -Wpedantic build, fixed -Wsign-conversion in SafeGCD, -Wstringop-overflow in base58. (#v3.19.0, #v3.15.1)
MSVC / GCC / Clang compatibility -- resolved __int128 pedantic warnings, using declaration restoration, duplicate const qualifiers. (#v3.15.0, #v3.15.1)
Audit UX -- centralized CHECK macro with ASCII progress bar, Windows stdout fix. (#v3.16.0)
z_one_ member fix -- removed from constructor initializer lists, restored as member with normalize() methods. (#96, #97)
SonarCloud fixes -- FieldElement52::to_bytes_into() deduplication, null checks, crypto impl exclusion. (#93, #98, #99)

7. Documentation

BENCHMARKING.md -- complete guide for all 4 platforms.
AUDIT_GUIDE.md -- 40/48-module audit how-to.
FROST_COMPLIANCE.md -- RFC 9591/BIP-FROST checkpoint matrix.
COMPATIBILITY.md -- BIP-340 strict encoding notes.
BINDINGS_ERROR_MODEL.md -- strict semantics for binding authors.
ADOPTERS.md -- production/development/hobby adopter categories.
GitHub Discussion templates -- Q&A, Show-and-Tell, Ideas, Integration Help.

Cross-Platform Benchmark Results (vs libsecp256k1 v0.7.2)

x86-64 (i5-14400F @ 2.50 GHz, GCC 14.2.0, Ubuntu 24.04)

Operation	Ultra FAST	Ultra CT	libsecp256k1	Ratio (fast)	Ratio (CT)
ECDSA sign	7.45 us	13.48 us	17.86 us	2.40x	1.33x
ECDSA verify	20.39 us	--	21.93 us	1.08x	--
Schnorr sign	5.86 us	10.85 us	12.58 us	2.15x	1.16x
Schnorr verify	21.49 us	--	22.57 us	1.05x	--
k*G	5.39 us	9.67 us	12.78 us	2.37x	1.32x

x86-64 (i7-11700 @ 2.50 GHz, Clang 21.1.0)

Operation	Ultra FAST	Ultra CT	libsecp256k1	Ratio (fast)	Ratio (CT)
ECDSA sign	8.06 us	15.74 us	21.67 us	2.69x	1.38x
ECDSA verify	29.06 us	--	26.62 us	0.92x	--
Schnorr sign	6.42 us	13.59 us	17.07 us	2.66x	1.26x
Schnorr verify	28.67 us	--	27.72 us	0.97x	--
k*G	4.29 us	11.86 us	17.59 us	4.10x	1.48x

ARM64 (Cortex-A55 @ YF_022A, Clang 18.0.1 NDK r27)

Operation	Ultra FAST	Ultra CT	libsecp256k1	Ratio (fast)	Ratio (CT)
ECDSA sign	27.98 us	71.91 us	76.35 us	2.73x	1.06x
ECDSA verify	146.95 us	--	148.42 us	1.01x	--
Schnorr sign	20.11 us	64.00 us	64.95 us	3.23x	1.02x
Schnorr verify	147.59 us	--	149.06 us	1.01x	--
k*G	17.46 us	--	63.20 us	3.62x	--

RISC-V 64 (SiFive U74-MC @ 1.5 GHz, GCC 13.3.0, Milk-V Mars)

Operation	Ultra FAST	Ultra CT	libsecp256k1	Ratio (fast)	Ratio (CT)
ECDSA sign	81.25 us	159.25 us	164.12 us	2.02x	1.03x
ECDSA verify	235.50 us	--	221.37 us	0.94x	--
Schnorr sign	56.37 us	133.45 us	133.01 us	2.36x	1.00x
Schnorr verify	239.44 us	--	225.07 us	0.94x	--
k*G	40.60 us	--	125.05 us	3.08x	--

ESP32-S3 (Xtensa LX7 @ 240 MHz, GCC 14.2.0, ESP-IDF 5.5.1)

Operation	Ultra FAST	Ultra CT	libsecp256k1	Ratio (fast)	Ratio (CT)
ECDSA sign	7,600 us	7,951 us	9,538 us	1.25x	1.20x
ECDSA verify	18,446 us	--	29,329 us	1.59x	--
Schnorr sign	6,640 us	7,051 us	9,451 us	1.42x	1.34x
Schnorr verify	19,023 us	--	27,203 us	1.43x	--
k*G	6,273 us	--	7,214 us	1.15x	--

BIP-352 Silent Payments Pipeline (bench_bip352)

External standalone benchmark isolating the full BIP-352 scanning pipeline: k*P -> serialize -> tagged_SHA256 -> k*G -> point_add -> serialize -> prefix match. Equalized compiler flags (-O3 -march=native), 10K ops, 11 passes, median.

x86-64 (i5-14400F @ 2.50 GHz, GCC 14.2.0, Ubuntu 24.04)

Operation	libsecp256k1	UltrafastSecp256k1	Ratio
k*P (scalar mul)	21,083 ns	16,689 ns	1.26x
k*G (generator mul)	11,082 ns	5,149 ns	2.15x
k*G (precomputed tables)	--	4,396 ns	2.52x
Point addition	1,794 ns	1,336 ns	1.34x
Tagged SHA-256	457 ns	43 ns	10.6x
Serialize compressed	22 ns	9 ns	2.4x
Full pipeline	33,642 ns	25,079 ns	1.34x

ARM64 (Cortex-A55 @ YF_022A, Clang 18.0.3 NDK r27)

Operation	libsecp256k1	UltrafastSecp256k1	Ratio
k*P (scalar mul)	131,694 ns	130,596 ns	1.01x
k*G (generator mul)	59,199 ns	16,056 ns	3.69x
k*G (precomputed tables)	--	12,626 ns	4.69x
Point addition	8,161 ns	3,232 ns	2.52x
Tagged SHA-256	971 ns	431 ns	2.25x
Serialize compressed	44 ns	12 ns	3.7x
Full pipeline	200,289 ns	153,385 ns	1.31x

RISC-V 64 (SiFive U74-MC @ 1.5 GHz, GCC 13.3.0, Milk-V Mars)

Operation	libsecp256k1	UltrafastSecp256k1	Ratio
k*P (scalar mul)	196,364 ns	201,162 ns	0.98x
k*G (generator mul)	133,519 ns	45,016 ns	2.97x
k*G (precomputed tables)	--	38,995 ns	3.42x
Point addition	14,699 ns	5,449 ns	2.70x
Tagged SHA-256	5,227 ns	1,688 ns	3.10x
Serialize compressed	274 ns	131 ns	2.1x
Full pipeline	354,234 ns	257,996 ns	1.37x

Validation prefix: 0xb63b4601066a6971 (all platforms, both libraries match).

Detailed per-version changelogs follow below for historical reference.

[3.19.0] - 2026-03-04

No breaking changes -- drop-in upgrade from v3.18.x | ABI compatible Focus: RISC-V constant-time hardening, L1 I-cache optimization, -Werror clean build

1. RISC-V Constant-Time Timing Leak Fixes

Root cause: dudect testing on SiFive U74 (in-order core) detected 5 persistent timing leaks (|t| > 10): field_sqr, scalar_is_zero, scalar_sub, scalar_window, ct_compare. The compiler (Clang 21) reordered instructions across barriers, and store-buffer retirement latency created data-dependent timing differences.
Fix (v1): Added "memory" clobber to RISC-V value_barrier() and explicit barriers at critical points in ct_field.cpp and ct_scalar.cpp.
Fix (v2): Reverted to register-only barrier asm volatile("" : "+r"(v)) on RISC-V. The "memory" clobber forced store-to-load-forwarding sequences with data-dependent retirement latency on U74's store buffer (zero-coalescing), creating false-positive dudect leaks in field_add, field_is_zero, and scalar_add. Register-only is sufficient on in-order cores because the pipeline cannot reorder past asm volatile.
rdtsc fix: Removed hardware fence from RISC-V rdcycle timer -- the fence drained the store buffer synchronously, capturing its data-dependent retirement latency and producing false-positive timing leaks. Matches x86 rdtscp and ARM64 cntvct_el0 behavior (neither drains the store buffer).
Files changed: cpu/include/secp256k1/ct/ops.hpp, cpu/src/ct_field.cpp, cpu/src/ct_scalar.cpp, audit/test_ct_sidechannel.cpp

2. L1 I-Cache Optimization (ECDSA Verify)

Root cause: ECDSA verify performance was 0.82x vs libsecp256k1 due to L1 instruction cache thrashing. Aggressive inlining of point arithmetic functions (jac52_add, jac52_double, etc.) caused the hot verify loop to exceed L1 I-cache capacity (32 KB).
Fix: Added __attribute__((noinline)) to point add/double functions, reducing code size in the verify hot path below L1 I-cache threshold.
Performance: ECDSA verify ratio vs libsecp: 0.82x -> 0.92x (+12% improvement)
Files changed: cpu/src/point.cpp

3. Benchmark Diagnostics

Added Schnorr verify sub-operation diagnostics (SHA256, FE52_inv, parse_strict) to bench_unified.cpp for identifying verify bottlenecks.
Files changed: cpu/bench/bench_unified.cpp

4. Build Hardening

Fixed -Wsign-conversion warnings in ct_scalar.cpp SafeGCD divsteps_59() function. Added explicit static_cast for int64_t <-> uint64_t conversions that were previously implicit. Clean -Werror -Wall -Wextra -Wpedantic build.
Files changed: cpu/src/ct_scalar.cpp

x86-64 Benchmark Results (i7-11700 @ 2.50 GHz, Clang 21.1.0)

Operation	Ultra FAST	Ultra CT	libsecp256k1	Ratio (fast)	Ratio (CT)
ECDSA sign	8.06 us	15.74 us	21.67 us	2.69x	1.38x
ECDSA verify	29.06 us	--	26.62 us	0.92x	--
Schnorr sign	6.42 us	13.59 us	17.07 us	2.66x	1.26x
Schnorr verify	28.67 us	--	27.72 us	0.97x	--
k*G	4.29 us	11.86 us	17.59 us	4.10x	1.48x

[3.18.0] - 2026-03-04

No breaking changes -- drop-in upgrade from v3.17.x | ABI compatible Focus: CT scalar inverse -- SafeGCD replaces Fermat chain, 6.4x faster CT signing

CT Scalar Inverse: SafeGCD (Bernstein-Yang constant-time divsteps)

Root cause: ct::scalar_inverse() used Fermat's Little Theorem chain (a^{n-2} mod n) with 254 squarings + 40 multiplications = 294 scalar ops, costing ~10,650 ns. This single function accounted for ~40% of CT ECDSA sign latency.
Fix: Replaced with constant-time SafeGCD (Bernstein-Yang divsteps-59), port of secp256k1_modinv64 from bitcoin-core/secp256k1. Fixed 10 rounds x 59 branchless divsteps = 590 total. All loops fixed-count, all conditionals use bitmasks. No ctz, no early termination, no secret-dependent branches.
Performance: ct::scalar_inverse: 10,650 ns -> 1,671 ns (6.4x faster)
Impact on CT ECDSA Sign: 26,942 ns -> 15,360 ns (43% faster)
CT vs libsecp ECDSA Sign ratio: 0.80x (lose) -> 1.44x (win)
Fallback: Fermat chain preserved for platforms without __int128 (ESP32, etc.)
Files changed: cpu/src/ct_scalar.cpp, cpu/include/secp256k1/ct/scalar.hpp, cpu/bench/bench_unified.cpp
All 32 tests pass (excluding ct_sidechannel long-run)

[3.17.1] - 2026-03-05

No breaking changes -- drop-in upgrade from v3.17.0 | ABI compatible Focus: CI reliability fixes -- precompute cache race condition + ASan buffer overread

1. Precompute Cache Atomic Write (bip340_vectors CI flake fix)

Root cause: save_precompute_cache_locked() wrote directly to cache_w{N}.bin. When CTest runs tests in parallel (-j$(nproc)), a reader process could see a partially-written cache file from another writer, loading corrupt precompute tables. This caused intermittent bip340_vectors failures where scalar_mul_generator() produced wrong results for large scalars (higher window tables not yet written).
Fix: Atomic write-then-rename pattern -- cache is written to cache_w{N}.bin.tmp.{pid}, then atomically renamed to cache_w{N}.bin. Readers always see either the old complete file or the new complete file.
Additional hardening: load_precompute_cache_locked() now validates expected file size (computed from header window_count * digit_count * 65 + sizeof(CacheHeader)) before reading point data. Truncated or partially-written files are rejected immediately.
Files changed: cpu/src/precompute.cpp

2. ASan Buffer Overread Fix (fuzz_address_bip32_ffi suite 15a)

Root cause: suite_15_ffi_ecdh_edge() passed uint8_t xpub[32] (32 bytes) to ufsecp_ecdh_xonly() which expects const uint8_t pubkey33[33] (33-byte compressed pubkey). ASan detected a 1-byte stack-buffer-overflow in FieldElement::parse_bytes_strict().
Fix: Changed buffer to uint8_t xpub[33] with fill_random(xpub, 33) to correctly match the API's 33-byte compressed pubkey parameter.
Files changed: audit/test_fuzz_address_bip32_ffi.cpp

[3.17.0] - 2026-03-04

No breaking changes -- drop-in upgrade from v3.16.x | ABI compatible Focus: Track I Crypto Auditor Gaps -- industrial-grade hardening campaign (16/16 items DONE)

1. Secret Zeroization (I1)

[I1-1] ECDSA fast-path scalar zeroization -- ecdsa_sign() rewritten with single-cleanup-path structure; k, k_inv, z guaranteed secure_erase() on all code paths
[I1-2] RFC 6979 HMAC state zeroization -- rfc6979_nonce(): V, K, x_bytes, buf97 all zeroed before every return via secure_erase()
[I1-3] MuSig2 secret zeroization -- sk_bytes, aux_hash, t zeroed after XOR block in musig2_nonce_gen()

2. Fault Attack Countermeasures (I2)

[I2-1] ECDSA sign-then-verify -- ecdsa_sign() verifies signature before returning; failure zeroes result. Both fast and CT paths hardened
[I2-2] Schnorr sign-then-verify -- ct::schnorr_sign() verifies via schnorr_verify() before returning; failure returns empty signature

3. Test Vector Coverage (I3)

[I3-1] Google Wycheproof ECDSA -- audit/test_wycheproof_ecdsa.cpp: 89 test cases covering 10 categories (valid, invalid r/s, modified, boundary, wrong key/msg, infinity, high-S, known vectors, Schnorr invalid, degenerate). 89/89 passed
[I3-2] Google Wycheproof ECDH -- audit/test_wycheproof_ecdh.cpp: 36 test cases covering 7 categories (valid ECDH, infinity, off-curve, zero key, commutativity, point validation, variant consistency). 36/36 passed

4. API Misuse Resistance (I4)

[I4-1] MuSig2 nonce CT migration -- fast::Point::generator().scalar_mul() replaced with ct::generator_mul() in MuSig2 nonce generation
[I4-2] Point on-curve validation -- audited 18 deserialization paths; fixed 4 CRITICAL + 1 HIGH + 3 LOW missing validations
[I4-3] PrivateKey strong type -- new cpu/include/secp256k1/private_key.hpp: PrivateKey wrapping fast::Scalar, no implicit conversion, secure_erase in destructor, [[nodiscard]] accessors. CT overloads for ecdsa_sign, schnorr_pubkey, schnorr_keypair_create
[I4-4] aux_rand entropy contract -- comprehensive BIP-340 aux_rand documentation on schnorr_sign() (both overloads) and CT variant: CSPRNG requirement, XOR nonce hedging, all-zeros safety, reuse warnings

5. Formal Verification (I5)

[I5-1] Formal CT verification (Valgrind ctgrind) -- audit/test_ct_verif_formal.cpp: marks secrets as undefined via SECP256K1_CLASSIFY(), runs CT operations (ECDSA sign, Schnorr sign, field/scalar/point ops), then declassifies. Any secret-dependent branch triggers Valgrind error. Same technique as libsecp256k1 valgrind_ctime_test.c and BoringSSL constant_time_test
[I5-2] Fiat-Crypto direct linkage -- audit/test_fiat_crypto_linkage.cpp: Fiat-Crypto secp256k1_64 reference implementation (MIT License, Coq-extracted) embedded directly; 6085 cross-checks at function level (mul, sqr, add, sub, neg) with 100% output parity

6. Protocol-Level Hardening (I6)

[I6-1] Hedged ECDSA -- ecdsa_sign_hedged() + rfc6979_nonce_hedged() implementing RFC 6979 Section 3.6 with 32-byte aux_rand mixed into HMAC-DRBG. Both fast and CT variants. Sign-then-verify + secure_erase included
[I6-2] FROST BIP-387 compliance -- docs/FROST_COMPLIANCE.md: RFC 9591/BIP-FROST checkpoint matrix covering DKG, nonce generation, signing, verification, aggregation. 5 deviation notes, 4 recommendations
[I6-3] Batch verify randomness audit -- audit/test_batch_randomness.cpp: 1022 checks confirming hash-derived (SHA256) deterministic weights -- not PRNG-dependent. Bellare-Garay immune

7. Test Suite Growth

Test count: 29 -> 31 (added ct_verif_formal, fiat_crypto_linkage)
All 31 tests pass (31/31)

[3.16.1] - 2026-03-02

No breaking changes -- drop-in upgrade from v3.16.0 | ABI compatible Focus: cross-platform benchmark campaign and audit on real hardware

1. Cross-Platform Benchmark Campaign (bench_hornet)

Benchmarked 4 platforms with an identical 6-operation apple-to-apple suite against bitcoin-core/libsecp256k1 v0.7.2:
- x86-64: Intel i7-11700 @ 2.50 GHz, Clang 21.1.0 (Windows)
- ARM64: Cortex-A55 (YF_022A), Clang 18.0.1 (Android NDK 27)
- RISC-V 64: SiFive U74-MC @ 1.5 GHz, GCC 13.3.0 (Milk-V Mars, real hardware)
- ESP32-S3: Xtensa LX7 @ 240 MHz, GCC 14.2.0 (ESP-IDF 5.5.1, real hardware)
Added a CT-vs-CT fair comparison since libsecp256k1 is always constant-time; the separate CT-vs-CT section shows true relative performance for signing operations
Generated 13 report files (JSON + TXT per platform, plus a cross-platform comparison)

2. Cross-Platform Audit Campaign (unified_audit_runner)

7 platform configurations, all AUDIT-READY (48/49 or 40/40 modules depending on platform):
- Windows x86-64 (Clang 21.1.0): 48/49 PASS
- Linux Docker x86-64 (GCC 13.3.0): 48/49 PASS
- Linux CI x86-64 (Clang 17.0.6): 46/46 PASS
- Linux CI x86-64 (GCC 13.3.0): 46/46 PASS
- Windows CI x86-64 (MSVC 1944): 45/45 PASS
- ESP32-S3 real hardware (GCC 14.2.0): 40/40 PASS (8 modules skipped as platform-incompatible)
- RISC-V 64 real hardware (GCC 13.3.0): 48/49 PASS (0 modules skipped, 1 advisory)
Updated PLATFORM_AUDIT.md with all 7 configurations

3. ARM64 Android Benchmark Port

bench_hornet_android.cpp -- complete bench_hornet port for ARM64 Android using clock_gettime, median-of-5 timing, and a 32-key pool
libsecp_bench.c -- libsecp256k1 apple-to-apple benchmark for Android NDK cross-compilation
android/CMakeLists.txt -- added C language support, bench_hornet target, and LIBSECP_SRC_DIR

4. RISC-V Benchmark on Real Hardware

Cross-compiled bench_hornet for rv64gc_zba_zbb and deployed to Milk-V Mars via SCP
Results: UltrafastSecp256k1 wins 4 of 6 operations (2.02x--3.08x faster), loses both Verify variants (0.94x)
CT-vs-CT comparison: signing operations are essentially tied (1.00x--1.03x); Verify remains at 0.94x

5. Build & CI Fixes

audit-report.yml -- scoped security-events permission to job level instead of workflow level
Dockerfile.ci -- pinned ubuntu:24.04 to a SHA-256 digest for reproducibility
sanitizer_scale.hpp -- added iteration scaling for sanitizer builds
ESP32 audit -- expanded sdkconfig, CMakeLists, and audit_main to support 40-module auditing
.gitignore -- expanded to exclude local scratch files, build logs, and temporary scripts

6. Documentation

BENCHMARKING.md -- complete guide covering how to run bench_hornet on all 4 platforms
AUDIT_GUIDE.md -- complete guide covering how to run the 40/48-module audit on any platform
Updated examples README with stability markers ([STABLE], [EXPERIMENTAL])

[3.16.0] - 2026-03-01

No breaking changes -- drop-in upgrade from v3.15.x | ABI compatible

1. Security Hardening

BIP-340 strict parsing -- added Scalar::parse_bytes_strict, FieldElement::parse_bytes_strict, and SchnorrSignature::parse_strict, which reject all malformed inputs (#73)
CT buffer erasure -- ct::schnorr_sign and ct::ecdsa_sign now erase intermediate nonces via a volatile function-pointer trick (same technique used by libsecp256k1)
lift_x deduplication -- consolidated duplicated code in Schnorr verify/sign into a single static lift_x() helper
Y-parity fix -- switched to limbs()[0] & 1 instead of a byte-level parity check
Pragma balance fix -- removed a misbalanced #pragma GCC diagnostic push/pop pair in ct_field.cpp

2. Audit Infrastructure

Advisory flag -- marked ct_sidechannel_smoke as advisory in unified_audit_runner so that timing flakes on shared CI runners do not fail the overall audit
carry_propagation cross-validation -- the test now verifies the generator-optimized path against the generic GLV path and prints hex diagnostics on ARM64 mismatch
BIP-340 strict test suite -- added 31 tests covering reject-zero, reject-overflow, reject-p-plus, and accept-valid scenarios for all strict parsing APIs

3. Local CI (Docker)

docker-compose.ci.yml -- single-command orchestration for all 14 CI jobs
pre-push target -- docker compose run --rm pre-push validates warnings, tests, ASan, and the audit in approximately 5 minutes
audit job -- docker/run_ci.sh audit mirrors audit-report.yml (GCC-13 + Clang-17)
ccache integration -- Docker volume persistence for fast rebuilds
pre-push hook -- scripts/hooks/pre-push blocks pushes on CI failure
PowerShell wrapper -- scripts/pre-push-ci.ps1 for Windows users

4. Documentation

COMPATIBILITY.md -- BIP-340 strict encoding compatibility notes
BINDINGS_ERROR_MODEL.md -- BIP-340 strict semantics for binding authors
SECURITY.md -- updated Memory Handling section (library-side erasure), Planned items checklist, and API Stability references
UFSECP_BITCOIN_STRICT -- new CMake option to enforce strict-only parsing at compile time

5. Build & CI

packaging.yml -- fixed a release workflow race condition (gh release upload with retry)
C ABI -- ufsecp_schnorr_verify, ufsecp_schnorr_sign, and ufsecp_xonly_pubkey_parse now use strict parsing internally

6. CT Verification CI

ct-arm64.yml -- native ARM64 / Apple Silicon dudect on macos-14 M1: smoke tests per-PR, full suite nightly
ct-verif.yml -- compile-time constant-time verification via the ct-verif LLVM pass (deterministic, not statistical)
valgrind-ct.yml -- Valgrind MAKE_MEM_UNDEFINED taint analysis that detects secret-dependent branches at the binary level
MuSig2/FROST dudect -- protocol-level timing tests for musig2_partial_sign, frost_sign, and frost_lagrange_coefficient

7. Audit Infrastructure (SARIF & Regression)

SARIF output -- unified_audit_runner --sarif generates SARIF v2.1.0 reports for GitHub Code Scanning
bench-regression.yml -- per-commit performance regression gate with a 120% threshold (fail-on-alert)
audit-report.yml -- now uploads SARIF results to GitHub Code Scanning (linux-gcc job)

8. OpenSSF Scorecard Hardening

Pinned actions -- all GitHub Actions pinned to full SHA (codeql-action v4.32.4, upload-artifact v6.0.0)
harden-runner -- added to discord-commits and packaging RPM jobs
persist-credentials: false -- applied to all checkout steps with write permissions (benchmark, docs, packaging, release, bench-regression)
Standardized versions -- audited and hardened 13 workflow files

9. FROST RFC 9591 Protocol Invariant Tests

test_rfc9591_invariants -- verifies 7 ciphersuite-independent invariants: verification_share = signing_share * G, Lagrange interpolation of Y_i, Feldman VSS commitment, partial signature linearity, partial signature verification, wrong-share rejection, and nonce commitment consistency
test_rfc9591_3of5 -- exhaustive 3-of-5 FROST signing across all C(5,3) = 10 participant subsets with BIP-340 verification
valgrind_ct_check.sh -- fixed binary path (audit/ instead of cpu/) for test_ct_sidechannel_standalone

10. Audit UX

audit_check.hpp -- centralized CHECK macro with a 20-character ASCII progress bar ([####................] N OK), reporting every 4096 iterations
22 audit .cpp files -- migrated from per-file CHECK macros to the shared audit_check.hpp
Windows stdout fix -- applied setvbuf(stdout, nullptr, _IONBF, 0) for unbuffered output on Windows (avoids _IOLBF crash)

11. New Audit Modules

test_musig2_bip327_vectors.cpp -- 35 BIP-327 MuSig2 reference tests covering key aggregation, nonce aggregation, signing, and verification
test_ffi_round_trip.cpp -- 103 FFI round-trip boundary tests covering Schnorr, ECDSA, pubkey, ECDH, tweaking, and error paths
test_fiat_crypto_vectors.cpp -- expanded to 752 cross-checks of field arithmetic against the Fiat-Crypto reference implementation

12. Community

ADOPTERS.md -- added production, development, and hobby adopter categories
GitHub Discussion templates -- added Q&A, Show-and-Tell, Ideas, and Integration Help categories

[3.15.3] - 2026-03-01

Fixed -- Code Quality (136 code scanning alerts resolved)

bench_hornet.cpp -- 73 fixes covering const-correctness, braces, cert-err33-c, modernize-use-auto, implicit-widening, reserved-identifier, and init-variables
glv.cpp -- 33 fixes for const-correctness in the GLV_MULADD macro and k_arr array
audit_integration.cpp -- 10 fixes for const-correctness, cert-err33-c, and sizes[] arrays
point.cpp -- 5 fixes for const-correctness on Jacobian addition intermediates
precompute.cpp -- 2 fixes: modernize-use-auto and simplify-boolean-expr
Dismissed 12 containerOutOfBounds alerts (Cppcheck false positives)

[3.15.1] - 2026-03-01

Fixed -- Build Compatibility (MSVC / WASM / armv7 / GCC -Wpedantic)

schnorr.cpp -- FieldElement::from_bytes() was called with const uint8_t* instead of const std::array<uint8_t,32>&; added a copy into std::array before the call. This had broken MSVC, WASM, and armv7 builds.
glv.cpp -- suppressed the GCC -Wpedantic warning for the __int128 extension type using #pragma GCC diagnostic push/pop.
glv.cpp -- removed the unused mul_shift_384 runtime function in the __int128 path (only the template mul_shift_384_const is used).

[3.15.0] - 2026-03-01

104 commits since v3.14.0 | 368 files changed | +45,388 / -7,639 lines No breaking changes -- drop-in upgrade from v3.14.0 | ABI compatible -- SOVERSION unchanged

1. Security & Constant-Time Hardening

Schnorr parity fix -- corrected the parity bit computation in BIP-340 signatures (#48)
Z=0 guard deduplication -- added point edge-case tests (#49)
CT branchless scalar_window -- branchless implementation on RISC-V, branched on x86/ARM (#42--#44)
value_barrier -- added after mask derivation in ct_compare, plus a WASM KAT target
is_zero_mask -- RISC-V branchless assembly with triple barrier and rdcycle fence
reverse-scan ct_compare -- uses an interleaved test data pattern

2. WASM / Emscripten Support

SECP256K1_NO_INT128 -- automatically defined on Emscripten
SECP256K1_FAST_52BIT -- disabled for Emscripten targets
Precompute generator bypass -- avoids timeouts on WASM
GLV+Shamir fallback -- replaced wNAF w=5 with an optimal double-and-add implementation
KAT test -- resolved SINGLE_FILE=1 and ESM conflicts

3. CI/CD Infrastructure

OpenSSF Scorecard -- pinned all actions to SHA and added harden-runner (#52)
pip deps pinned by hash -- improved supply chain security (#52)
ClusterFuzzLite -- integrated with UBSan vptr sanitizer compatibility
Cppcheck + Mutation testing + SARIF -- added new CI workflows
Fuzz + Protocol tests -- enabled in all CI jobs

4. Code Quality (~5,150 alerts fixed)

~4,600 code scanning alerts -- mass cleanup (#53)
~550 code scanning alerts -- batch 2 (#56)
Duplicate const qualifiers -- fixed GCC-13 build failure (#54, #55)
using declarations -- restored declarations removed by clang-tidy, required for MSVC/ESP32/WASM (#57)
audit_field.cpp -- fixed unused variable -Werror failure (#58)

5. SonarCloud Quality Gate

SHA-256 SonarCloud blocker -- suppressed S3519 buf_ overflow false positive (#50, #60, #61)
Coverage -- raised from 61.8% to 85.8% (exclusions: audit/, include/ufsecp/) (#59)
Duplication -- reduced from 3.3% to below threshold via CT variant CPD exclusion (#61)
cpp:S876 -- suppressed CT masking unsigned negation warning (#59)
Codecov exclusions -- corrected configuration (#50)

6. Audit Framework

A--M audit framework -- complete audit scripts with a cross-platform test plan
audit_ct -- raised timing sanity threshold for CI from 1.5x to 2.0x
AUDIT_COVERAGE.md -- full CI infrastructure documentation
Unified runner + CI workflow -- added evidence collection scripts

7. Testing

MuSig2 + FROST -- advanced protocol tests (Phase II)
Parser fuzz -- DER, Schnorr, and Pubkey fuzzing
Cross-library differential test -- verified against bitcoin-core/libsecp256k1
Address/BIP32/FFI fuzz tests -- added
FROST KAT tests -- added
Point edge-case tests -- added
FE52 Jacobian is_on_curve -- added for FAST_52BIT platforms
FieldElement::operator== normalize -- handles non-canonical limb values correctly

8. Build & Platform

MSVC -- added SECP256K1_NOINLINE macro and fixed s_gen4 race condition
Reproducible builds -- signed releases with SBOM
Fuzz point -- avoided precomputed-table timeouts under sanitizers

9. Performance -- ECDSA Recovery (1.9x speedup)

ecdsa_recover() rewritten -- replaced 3 separate scalar multiplications (s*R, z*G, r^-1 * result) with a single call to dual_scalar_mul_gen_point(u1, u2, R) using 4-stream GLV Strauss with interleaved wNAF. Recovery now matches libsecp256k1 performance (~36 us vs. the previous ~69 us).
lift_x() parity optimization -- replaced to_bytes() serialization (32-byte encode) with a direct limbs()[0] & 1 parity check for y-coordinate odd/even detection.
Dudect cache artifact false positives -- fixed 11 smoke-test false positives in constant-time side-channel tests by tightening thresholds and isolating cache effects.

10. Platform Assembly

ARM64 -- added CSEL branchless conditionals and EXTR optimization for field squaring.
RISC-V -- applied preload optimization for field multiply assembly and reduced register pressure in field_asm52_riscv64.S.
Field operations -- refactored field.cpp with improved Montgomery path selection.

11. Apple-to-Apple Benchmark

bench_apple_to_apple -- definitive head-to-head benchmark against libsecp256k1 v0.6.0 covering 13 operations with the same compiler, flags, and assembly. Uses IQR outlier removal and median-of-11 passes. Result: 7 FASTER, 5 EQUAL, 0 SLOWER (geometric mean 0.68x = UltrafastSecp256k1 is 1.47x faster on the 13-op suite). Note: this geometric mean is not weighted by real-world operation frequency. Workloads dominated by k*G (signing, key generation) will see the full benefit; workloads dominated by k*P (ECDH, BIP-352 scanning, key tweaking) may not -- see bench_unified ratio table for per-operation breakdown (#87).

12. Documentation & Bindings (continuation of v3.14.0)

Release artifacts -- signed SHA256SUMS manifest with verification instructions
ABI versioning policy -- documented
7 Phase III documents -- covering audit, invariants, bug bounty, thread safety, and more
User guide + FAQ -- added

[3.14.0] - 2026-02-25

Added -- Language Bindings (12 languages, 41-function C API parity)

Java -- 22 new JNI functions and 3 helper classes (RecoverableSignature, WifDecoded, TaprootOutputKeyResult): full coverage of ECDSA sign/verify, DER encoding, recovery, ECDH, Schnorr, BIP-32, BIP-39, taproot, WIF, address encoding, and tagged hash
Swift -- 20 new functions: DER encode/decode, recovery sign/recover, ECDH, tagged hash, BIP-32/39, taproot, WIF, and address encoding
React Native -- 15 new functions: DER, recovery, ECDH, Schnorr, BIP-32/39, taproot, WIF, address, and tagged hash
Python -- 3 new functions: ctx_clone(), last_error(), last_error_msg()
Rust -- 2 new functions: last_error(), last_error_msg()
Dart -- 1 new function: ctx_clone()
Go, Node.js, C#, Ruby, PHP -- already complete (verified; no changes needed)
9 new binding READMEs -- for c_api, dart, go, java, php, python, ruby, rust, and swift
Selftest report API -- added SelftestReport and SelftestCase structs in selftest.hpp; tally() refactored for programmatic reporting

Fixed -- Documentation & Packaging

Package naming corrected across all documentation -- renamed libsecp256k1-fast* to libufsecp* (apt, rpm, arch); CMake target secp256k1-fast-cpu to secp256k1::fast; linker flag -lsecp256k1-fast-cpu to -lfastsecp256k1; pkg-config Libs -lsecp256k1-fast-cpu to -lfastsecp256k1
RPM spec renamed -- from libsecp256k1-fast.spec to libufsecp.spec
Debian control -- source libufsecp, binary packages libufsecp3/libufsecp-dev
Arch PKGBUILD -- pkgname=libufsecp, provides=('libufsecp')
3 existing binding READMEs fixed -- Node.js, C#, and React Native: removed inaccurate CT-layer claims (the C API uses the fast:: path only)
README dead link -- fixed INDUSTRIAL_ROADMAP_WORKING.md to point to ROADMAP.md

Fixed -- CI / Build

-Werror=unused-function -- added [[maybe_unused]] to get_platform_string() in selftest.cpp
Scorecard CI -- pinned ubuntu:24.04 by SHA digest in Dockerfile.local-ci

[3.13.1] - 2026-02-24

Fixed

Critical: GLV decomposition overflow in ct::scalar_mul() -- ct_mul_256x_lo128_mod used a single-phase reduction (256x128-bit), which overflowed when GLV's c1/c2 rounded to exactly 2^128. Additionally, the lambda*k2 computation only read 2 lower limbs of k2_abs, silently dropping limb[2]=1. This caused incorrect results for approximately 5 out of 64 random scalar inputs. Replaced with a full ct_scalar_mul_mod_n(): 4x4 schoolbook multiply producing an 8-limb product, followed by 3-phase reduce_512 (512 -> 385 -> 258 -> 256 bits), matching libsecp256k1's algorithm. Both the 5x52 (__int128) and 4x64 (portable U128/mul64) paths were fixed.
GLV constant minus_b2 -- changed from a 128-bit b2_pos to a full 256-bit Scalar(n - b2), and updated the decomposition formula from scalar_sub(p1, p2) to scalar_add(p1, p2) since both constants are already negated
-Werror=unused-function -- added [[maybe_unused]] to diagnostic helpers print_scalar() and print_point_xy() in diag_scalar_mul.cpp

Removed

Dead code: ct_mul_lo128_mod() and ct_mul_256x_lo128_mod() (replaced by ct_scalar_mul_mod_n)

Performance

CT scalar_mul overhead vs. the fast path: 1.05x (25.3 us vs. 24.0 us) -- no regression

[3.13.0] - 2026-02-24

Added

BIP-32 official test vectors TV1--TV5 -- 90 comprehensive checks covering master key derivation, hardened/normal child paths, and public-only derivation chains (test_bip32_vectors.cpp)
Nightly CI workflow -- daily extended verification with a differential correctness check using a 100x multiplier (~1.3M checks) and dudect full-mode statistical analysis (30 min, t=4.5 threshold)
Differential test CLI/env multiplier -- differential_test accepts --multiplier=N or the UFSECP_DIFF_MULTIPLIER environment variable; default 1 preserves existing CI behavior

Fixed

BIP-32 public key decompression -- public_key() now correctly decompresses from the compressed prefix + x-coordinate via the y^2 = x^3 + 7 square root with a parity check; previously it treated the x-coordinate as a scalar, producing incorrect public keys for public-only derivation
pub_prefix field in ExtendedKey -- now stores the y-parity byte (0x02/0x03) across to_public(), derive_child(), and serialize() for correct compressed public key round-trip
SonarCloud ct_sidechannel exclusion -- changed -E ct_sidechannel to the exact-match -E "^ct_sidechannel$" to prevent accidental exclusion of other tests

[3.12.3] - 2026-02-24

Fixed

Valgrind "still reachable" false positives -- added valgrind.supp suppression file for precomputed wNAF/comb table allocations that are intentionally kept for program lifetime
CTest memcheck integration -- switched from enable_testing() to include(CTest) for proper Valgrind memcheck support
Security audit CI -- added --suppressions flag and exact-match ct_sidechannel exclusion in Valgrind step
ASan heap-buffer-overflow in dudect smoke mode -- fixed buffer overread in timing analysis
aarch64 cross-compilation -- added missing toolchain file for ARM64 CI builds

[3.12.2] - 2026-02-24

Security

Branchless ct_compare -- rewritten with bitwise arithmetic and asm volatile value barriers; dudect |t| dropped from 22.29 -> 2.17, eliminating a timing side-channel leak

Fixed

SonarCloud coverage collection -- use run_selftest as primary llvm-cov binary (links full library); coverage report now reflects actual test execution
Dead code elimination in precompute.cpp -- RDTSC() gated behind SECP256K1_PROFILE_DECOMP; multiply_u64/mul64x64/mul_256 unified to call _umul128() instead of duplicating __int128 inline
GCC #pragma clang diagnostic warnings -- wrapped in #ifdef __clang__ guards in 3 test files
GCC -Wstringop-overflow -- bounds check in base58check_encode (address.cpp)
All -Werror warnings resolved -- 41 files across library, tests, and benchmarks
Clang-tidy CI -- filter .S assembly from analysis, add --quiet and parallel xargs
Unused variable -- removed compressed in bip32.cpp to_public()

Changed

const on hot-path intermediates -- ~60 FieldElement52 write-once variables in point.cpp marked const
Benchmark exclusion -- sonar-project.properties excludes benchmark files from coverage calculation
CPD minimum tokens -- set to 100 in sonar-project.properties

Added

GOVERNANCE.md -- BDFL governance model with continuity plan (bus factor)
ROADMAP.md -- 12-month project roadmap (Mar 2026 - Feb 2027)
CONTRIBUTING.md -- Developer Certificate of Origin (DCO) requirement
OpenSSF Best Practices badge -- added to README
Code scanning fixes -- resolved alerts #281, #282

[3.12.1] - 2026-02-23

Security

bump wheel 0.45.1 -> 0.46.2 -- fixes CVE-2026-24049 (path traversal in wheel unpack)
bump setuptools 75.8.0 -> 78.1.1 -- fixes CVE-2025-47273 (path traversal via vendored wheel)

Changed

VERSION.txt updated to 3.12.1

[3.12.0] - 2026-02-23

Security -- CI/CD Hardening & Supply-Chain Protection

SHA-pinned all GitHub Actions -- every action uses immutable commit SHA instead of mutable tags
Harden Runner -- step-security/harden-runner v2.14.2 on every CI job (egress audit)
CodeQL -- upgraded to v4.32.4, job-level security-events: write, custom query filters
OpenSSF Scorecard -- daily scorecard workflow with SARIF upload
SonarCloud -- CI-based code quality analysis with build-wrapper
pip hash pinning -- --require-hashes on all pip install steps in release/CI workflows
Dependabot -- configured for GitHub Actions, pip, npm, NuGet, Cargo ecosystems
Branch protection -- required reviews, dismiss stale, strict status checks on main

Fixed

66+ code scanning alerts resolved -- unused variables, permissions, hardcoded credentials, scorecard findings
StepSecurity remediation -- merged PR #25 with fixes for GHA best practices

Changed

Dependabot PRs #26-#32 merged -- codeql-action v4.32.4, setup-dotnet v5.1.0, upload-artifact v6.0.0, download-artifact v7.0.0, scorecard-action v2.4.3, attest-build-provenance v3.2.0, sonarqube-scan-action v7.0.0
Rust workspace Cargo.toml -- added for Dependabot Cargo ecosystem support

Added

docs/CODING_STANDARDS.md -- comprehensive coding standards for OpenSSF CII badge
CONTRIBUTING.md requirements section -- explicit contribution requirements with links
Full AGPL-3.0 LICENSE text -- replaced summary with standard text for GitHub license detection

[3.11.0] - 2026-02-23

Performance -- Effective-Affine & RISC-V Optimization

Effective-affine GLV table -- batch-normalize P-multiples to affine in scalar_mul_glv52, eliminating Z-coordinate arithmetic from the main loop. Point Add 821->159 ns on x86-64.
RISC-V auto-detect CPU -- CMake reads /proc/cpuinfo uarch field to set -mcpu=sifive-u74 automatically. 28-34% speedup on Milk-V Mars (Scalar Mul 235->154 us).
RISC-V ThinLTO propagation -- ARCH_FLAGS propagated via INTERFACE compile+link options so ThinLTO codegen uses correct CPU scheduling at link time.
RISC-V Zba/Zbb fix -- explicit -march=rv64gc_zba_zbb alongside -mcpu since Clang's sifive-u74 model omits these extensions.
ARM64 10x26 field representation -- verified as optimal for Cortex-A76 (74 ns mul vs 100 ns with 5x52).

Performance -- Embedded

SafeGCD30 field inverse -- GCD-based modular inverse for non-__int128 platforms: ESP32 118 us (was 3 ms).
SafeGCD30 scalar inverse -- same technique for scalar field; optimized SHA-256/HMAC/RFC-6979 for embedded.
ESP32 4-stream GLV Strauss -- parallel endomorphism streams + Z^2-verify optimization.
CT layer optimizations -- comprehensive CT optimization pass for embedded targets.

Changed

Unified benchmark harness -- all 4 bench binaries share common framework with IQR outlier removal and RDTSCP/chrono auto-selection.
CMake 4.x compatibility -- standalone build support with cmake_minimum_required(3.18) + project-level CTest.
Disable RISC-V FE52 asm -- C++ __int128 inline is 26-33% faster than hand-written FE52 assembly on RISC-V.
Benchmark data refresh -- all platforms re-measured: x86-64 (Clang 21), ARM64 (RK3588), RISC-V (Milk-V Mars).
Remove competitor comparison tables -- benchmarks show only UltrafastSecp256k1 results.

Added

Lightning donation -- shrec@stacker.news badge in README.
ARM64 5x52 MUL/UMULH kernel -- interleaved multiply for exploration (10x26 remains default).
ESP32 comprehensive benchmark -- full benchmark matching x86 format.

Fixed

CI Unicode cleanup -- replaced all Unicode characters with ASCII across codebase.
CI benchmark parse fix -- reset baseline for Unicode-free benchmark output.
Orphaned submodule -- removed stale cpu/secp256k1 submodule entry.

Acknowledgments

Stacker News, Delving Bitcoin, and @0xbitcoiner for community support.

[3.10.0] - 2026-02-21

Performance -- CT Hot-Path Optimization (Phases 5-15)

5x52 field representation -- switched point internals from 4x64 to FieldElement52, enabling __int128 lazy reduction across all CT operations
Direct asm bypass -- CT field_mul/field_sqr now call hand-tuned 5x52 multiply/square directly: 70 ns -> 33 ns
GLV endomorphism -- CT scalar_mul via lambda-decomposition + interleaved double-and-add: 304 us -> 20 us
CT generator_mul precomputed table -- 16-entry precomputed-G table with batch inversion: 310 us -> 9.8 us (31x speedup)
Batch inversion + Brier-Joye unified add -- Montgomery's trick for multi-point normalization
Hamburg signed-digit + batch doubling -- compact signed-digit recoding with merged double passes
128-bit split + w=15 for G-stream verify -- Shamir-style dual-stream with wider window: ~14% verify speedup
AVX2 CT table lookup -- _mm256_cmpeq_epi64 + _mm256_and_si256 constant-time table scan
Effective-affine P table -- batch-normalize P-multiples to skip Z-coordinate arithmetic in main loop
Schnorr keypair/pubkey caching + FE52 sqrt -- avoid redundant serialization in sign/verify
FE52-native inverse + isomorphic table build + GCD inv_var -- SafeGCD field inverse stays in 52-bit form
Format conversion elimination -- removed to_fe()/from_fe() round-trips on every CT hot path
Redundant normalize elimination -- ct_field_mul_impl/square_impl produce already-reduced results
Schnorr X-check + Y-parity combined -- single Z-inverse for both x-coordinate check and y-parity in FE52

Performance -- I-Cache Optimization

noinline on jac52_add_mixed_inplace -- prevents inlining of 800+ byte function body into tight loops: 59% I-cache miss reduction

Fixed

scalar_mul_glv52 infinity guard -- early return on base.is_infinity() || scalar.is_zero() prevents zero-inverse crash in Montgomery batch trick (CI #128-131 regression)
CT complete_add fallback -- uses affine x()/y() instead of raw Jacobian X()/Y()
MSVC fallback -- field_neg arity, is_equal_mask, GLV decompose, y_bytes redefinition
Cross-platform FE52 guard -- SECP256K1_FAST_52BIT gating prevents compilation on 32-bit targets

Changed

Dead code removal -- removed functions superseded by Z-ratio normalization path
Barrett -> specialized GLV multiplies -- replaced generic Barrett reduction with curve-specific multiply

CI / Infrastructure

npm/nuget publishing fix -- corrected CI workflow for package publishing
Comprehensive audit suite -- 8 suites, 641K checks, cryptographic correctness validation
CT operations benchmark -- bench_ct_vs_libsecp with per-operation ns/op and throughput
dudect timing test -- side-channel timing leakage detection for CT operations
Doxyfile version auto-injection -- VERSION.txt -> Doxyfile at configure time

[3.6.0] - 2026-02-20

Added -- GPU Signature Operations (CUDA)

ECDSA Sign on GPU -- ecdsa_sign_batch_kernel with RFC 6979 deterministic nonces, low-S normalization. 204.8 ns / 4.88 M/s per signature.
ECDSA Verify on GPU -- ecdsa_verify_batch_kernel with Shamir's trick + GLV endomorphism. 410.1 ns / 2.44 M/s per verification.
ECDSA Sign Recoverable on GPU -- ecdsa_sign_recoverable_batch_kernel with recovery ID computation. 311.5 ns / 3.21 M/s.
ECDSA Recover on GPU -- ecdsa_recover_batch_kernel for public key recovery from signature + recid.
Schnorr Sign (BIP-340) on GPU -- schnorr_sign_batch_kernel with tagged hash midstates. 273.4 ns / 3.66 M/s.
Schnorr Verify (BIP-340) on GPU -- schnorr_verify_batch_kernel with x-only pubkey verification. 354.6 ns / 2.82 M/s.
6 new batch kernel wrappers in secp256k1.cu -- all with __launch_bounds__(128, 2) matching scalar_mul kernels.
5 GPU signature benchmarks in bench_cuda.cu -- ECDSA sign, verify, sign+recid, Schnorr sign, Schnorr verify.
prepare_ecdsa_test_data() helper -- generates valid signatures on GPU for verify benchmark correctness.

No other open-source GPU library provides secp256k1 ECDSA + Schnorr sign/verify. This is the only production-ready multi-backend (CUDA + OpenCL + Metal) GPU secp256k1 library.

Changed

CUDA benchmark numbers updated -- Scalar Mul improved to 225.8 ns (was 266.5 ns), Field Inv to 10.2 ns (was 12.1 ns) from __launch_bounds__ thread count fix (128 vs 256 mismatch).
README -- Added blockchain coin badges (Bitcoin, Ethereum, +25), GPU signature benchmark tables, 27-coin supported coins section, SEO metadata footer, updated performance headline.
BENCHMARKS.md -- Split CUDA section into Core ECC + GPU Signature Operations; updated all comparison tables.

Fixed

CUDA benchmark thread mismatch -- Benchmarks used 256 threads/block but kernels declared __launch_bounds__(128, 2), causing 0.0 ns results. Fixed to use 128 threads.

[3.4.0] - 2026-02-19

Added -- Stable C ABI (`ufsecp`)

Complete C ABI library -- ufsecp.dll / libufsecp.so / libufsecp.dylib with 45 exported symbols, opaque ufsecp_ctx handle, and structured error model (11 error codes)
Headers: ufsecp.h (main API, 37 functions), ufsecp_version.h (ABI versioning), ufsecp_error.h (error codes)
Implementation: ufsecp_impl.cpp wrapping C++ core into C-linkage with zero heap allocations on hot paths
Build system: include/ufsecp/CMakeLists.txt -- shared + static build, standalone or sub-project mode, pkg-config template (ufsecp.pc.in)
API coverage: key generation, ECDSA sign/verify/recover, Schnorr BIP-340 sign/verify, SHA-256, ECDH (compressed/xonly/raw), BIP-32 HD derivation, Bitcoin addresses (P2PKH/P2WPKH/P2TR), WIF encode/decode, DER serialization, public key tweak (add/mul), selftest
SUPPORTED_GUARANTEES.md -- Tier 1/2/3 stability guarantees documentation
examples/hello_world.c -- Minimal usage example

Added -- Dual-Layer Constant-Time Architecture

Always-on dual layers -- secp256k1::fast::* (public operations) and secp256k1::ct::* (secret-key operations) are always active simultaneously; no flag-based selection
CT layer -- Complete addition formula (12M+2S), fixed-trace scalar multiplication, constant-time table lookup
Valgrind/MSAN markers -- SECP256K1_CLASSIFY() / SECP256K1_DECLASSIFY() for verifiable constant-time guarantees

Added -- SHA-256 Hardware Acceleration

SHA-NI hardware dispatch -- Runtime CPUID detection for Intel SHA Extensions; transparent fallback to software implementation
Zero-overhead dispatch -- Function pointer set once at init, no branching in hot path

Added -- C# P/Invoke Bindings & Benchmarks

bindings/csharp/UfsepcBenchmark/ -- .NET 8.0 project with complete P/Invoke declarations for all 45 ufsecp functions
68 correctness tests -- 12 categories covering key ops, ECDSA, Schnorr, SHA-256, ECDH, BIP-32, addresses, DER round-trip, recovery, WIF, tweaks, selftest
19 benchmarks -- SHA-256: 137ns, ECDSA Sign: 11.89us, Verify: 47.95us, Schnorr Sign: 10.68us, KeyGen: 1.22us
P/Invoke overhead measured -- ~10-40ns per call (negligible)

Changed

ufsecp_ctx_create() takes no flags parameter -- dual-layer CT architecture is always active

[3.3.0] - 2026-02-16

Added -- Comprehensive Benchmarks

Metal GPU benchmark (bench_metal.mm): 9 operations -- Field Mul/Add/Sub/Sqr/Inv, Point Add/Double, Scalar Mul (Pxk), Generator Mul (Gxk). Matches CUDA benchmark format with warmup, kernel-only timing, and throughput tables.
3 new Metal GPU kernels: field_add_bench, field_sub_bench, field_inv_bench in secp256k1_kernels.metal
WASM benchmark (bench_wasm.mjs): Node.js benchmark for all WASM-exported operations -- Pubkey Create (Gxk), Point Mul, Point Add, ECDSA Sign/Verify, Schnorr Sign/Verify, SHA-256 (32B/1KB)
WASM benchmark runs automatically in CI (Node.js 20 setup + execution)

Added -- Security & Maturity

SECURITY.md v3.2 with vulnerability reporting guidelines
THREAT_MODEL.md with detailed threat analysis
API stability guarantees documented
Fuzz testing documentation and libFuzzer harnesses
Selftest modes: smoke (fast), ci (full), stress (extended)
Repro bundle support for deterministic test reproduction
Sanitizer CI integration (ASan/UBSan/TSan)

Added -- Testing

Boundary KAT vectors for field limb boundaries
Batch inverse sweep tests
Unified test runner (12 test files consolidated into single runner)

Added -- Documentation

Batch inverse & mixed addition API reference with examples (full point, X-only, CUDA, division, scratch reuse, Montgomery trick)
CHANGELOG.md (this file), CODE_OF_CONDUCT.md
Benchmark dashboard link in README

Changed

Benchmark alert threshold 120% -> 150% (reduces false positive alerts on shared CI runners)
README: added Apple Silicon/Metal badges, CI status badge, version badge, benchmark dashboard link
Feature coverage table updated to v3.3.0
Badge layout reorganized: CI/Bench/Release first, then GPU backends, then platforms

Fixed

Metal shader compilation errors (MSL address space mismatches, jacobian_to_affine ordering)
Metal: skip generator_mul test on non-Apple7+ paravirtual devices (CI fix)
Keccak rotl64 undefined behavior (shift by 0)
macOS build flags for Clang compatibility
Metal metal2.4 shader standard for newer Xcode toolchains
WASM runtime crash: removed --closure 1, added -fno-exceptions, increased initial memory to 4MB
Bitcoin CoinFeatures header fix

Removed

Unused .cuh files and sorted_ecc_db
Database/lookup/bloom references from public documentation
AI-generated text removed from README

[3.2.0] - 2026-02-16

Added -- Coins Layer

Multi-coin infrastructure -- coins/coin_params.hpp with constexpr CoinParams definitions for 27 secp256k1-based cryptocurrencies: Bitcoin, Litecoin, Dogecoin, Dash, Ethereum, Bitcoin Cash, Bitcoin SV, Zcash, DigiByte, Namecoin, Peercoin, Vertcoin, Viacoin, Groestlcoin, Syscoin, BNB Smart Chain, Polygon, Avalanche, Fantom, Arbitrum, Optimism, Ravencoin, Flux, Qtum, Horizen, Bitcoin Gold, Komodo
Unified address generation -- coin_address(), coin_address_p2pkh(), coin_address_p2wpkh(), coin_address_p2tr() with automatic encoding dispatch per coin (Base58Check / Bech32 / EIP-55)
Per-coin WIF encoding -- coin_wif_encode() with coin-specific prefix bytes
Full key derivation pipeline -- coin_derive() takes private key + CoinParams -> public key + address + WIF in one call
Coin registry -- find_by_ticker("BTC"), find_by_coin_type(60), ALL_COINS[] array for iteration

Added -- Ethereum & EVM Support

Keccak-256 hash -- Standard Keccak-256 (NOT SHA3-256; Ethereum-compatible 0x01 padding), incremental API (Keccak256State::update/finalize), one-shot keccak256() (coins/keccak256.hpp, src/keccak256.cpp)
Ethereum addresses (EIP-55) -- ethereum_address() with mixed-case checksummed output, ethereum_address_raw(), ethereum_address_bytes(), eip55_checksum(), eip55_verify() (coins/ethereum.hpp, src/ethereum.cpp)
EVM chain compatibility -- Same address derivation works for BSC, Polygon, Avalanche, Fantom, Arbitrum, Optimism

Added -- BIP-44 HD Derivation

Coin-type derivation -- coin_derive_key() with automatic purpose selection: BIP-86 (Taproot) for Bitcoin, BIP-84 (SegWit) for Litecoin, BIP-44 (legacy) for Dogecoin/Ethereum
Path construction -- coin_derive_path() builds m/purpose'/coin_type'/account'/change/index
Seed-to-address pipeline -- coin_address_from_seed() full pipeline: seed -> BIP-32 master -> BIP-44 derivation -> coin address

Added -- Custom Generator Point & Curve Context

CurveContext -- context.hpp with custom generator point support, curve order (raw bytes), cofactor, and name (CurveContext::secp256k1_default(), CurveContext::with_generator(), CurveContext::custom())
Context-aware operations -- derive_public_key(privkey, &ctx), scalar_mul_G(scalar, &ctx), effective_generator(&ctx) -- nullptr = standard secp256k1, custom context = custom G
Zero-overhead default -- Standard secp256k1 usage with nullptr context has no extra cost

Added -- Tests

test_coins -- 32 tests covering CurveContext, CoinParams registry, Keccak-256 vectors, EIP-55 checksum, Bitcoin/Litecoin/Dogecoin/Dash/Ethereum addresses, WIF encoding, BIP-44 path/derivation, custom generator derivation, full multi-coin pipeline

[3.1.0] - 2026-02-15

Added -- Cryptographic Protocols

Pedersen Commitments -- pedersen_commit(value, blinding), pedersen_verify(), pedersen_verify_sum() (homomorphic balance proofs), pedersen_blind_sum(), pedersen_switch_commit() (Mimblewimble switch commitments); nothing-up-my-sleeve generators H and J via SHA-256 try-and-increment (cpu/include/pedersen.hpp, cpu/src/pedersen.cpp)
FROST Threshold Signatures -- frost_keygen_begin() / frost_keygen_finalize() (Feldman VSS distributed key generation), frost_sign_nonce_gen() / frost_sign() (partial signature rounds), frost_verify_partial(), frost_aggregate() -> standard BIP-340 SchnorrSignature; frost_lagrange_coefficient() helper (cpu/include/frost.hpp, cpu/src/frost.cpp)
Adaptor Signatures -- Schnorr adaptor: schnorr_adaptor_sign(), schnorr_adaptor_verify(), schnorr_adaptor_adapt(), schnorr_adaptor_extract(); ECDSA adaptor: ecdsa_adaptor_sign(), ecdsa_adaptor_verify(), ecdsa_adaptor_adapt(), ecdsa_adaptor_extract() -- for atomic swaps and DLCs (cpu/include/adaptor.hpp, cpu/src/adaptor.cpp)
MuSig2 multi-signatures (BIP-327) -- Key aggregation (KeyAgg), deterministic nonce generation, 2-round signing protocol, partial sig verify, Schnorr-compatible aggregate signatures (cpu/include/musig2.hpp, cpu/src/musig2.cpp)
ECDH key exchange -- ecdh_compute (SHA-256 of compressed point), ecdh_compute_xonly (SHA-256 of x-coordinate), ecdh_compute_raw (raw x-coordinate) (cpu/include/ecdh.hpp, cpu/src/ecdh.cpp)
ECDSA public key recovery -- ecdsa_sign_recoverable (deterministic recid), ecdsa_recover (reconstruct pubkey from signature + recid), compact 65-byte serialization (cpu/include/recovery.hpp, cpu/src/recovery.cpp)
Taproot (BIP-341/342) -- Tweak hash, output key computation, private key tweaking, commitment verification, TapLeaf/TapBranch hashing, Merkle root/proof construction (cpu/include/taproot.hpp, cpu/src/taproot.cpp)
BIP-32 HD key derivation -- Master key from seed, hardened/normal child derivation, path parsing (m/0'/1/2h), Base58Check serialization (xprv/xpub), RIPEMD-160 fingerprinting (cpu/include/bip32.hpp, cpu/src/bip32.cpp)
BIP-352 Silent Payments -- silent_payment_address(), SilentPaymentAddress::encode(), silent_payment_create_output(), silent_payment_scan() with ECDH-based stealth addressing and multi-output support (cpu/include/address.hpp, cpu/src/address.cpp)

Added -- Address & Encoding

Bitcoin Address Generation -- hash160() (RIPEMD-160 + SHA-256), base58check_encode() / base58check_decode(), bech32_encode() / bech32_decode() (BIP-173/BIP-350, Bech32/Bech32m), address_p2pkh(), address_p2wpkh(), address_p2tr(), wif_encode() / wif_decode() (cpu/include/address.hpp, cpu/src/address.cpp)

Added -- Core Algorithms

Multi-scalar multiplication -- Shamir's trick (2-point) + Strauss interleaved wNAF (n-point) (cpu/include/multiscalar.hpp, cpu/src/multiscalar.cpp)
Batch signature verification -- Schnorr and ECDSA batch verify with random linear combination; identify_invalid() to pinpoint bad signatures (cpu/include/batch_verify.hpp, cpu/src/batch_verify.cpp)
SHA-512 -- Header-only implementation for HMAC-SHA512 / BIP-32 (cpu/include/sha512.hpp)
Constant-time byte utilities -- ct_equal, ct_is_zero, ct_compare, ct_memzero (volatile + asm barrier), ct_memcpy_if, ct_memswap_if, ct_select_byte (cpu/include/ct_utils.hpp)

Added -- Performance

AVX2/AVX-512 SIMD batch field ops -- Runtime CPUID detection, auto-dispatching batch_field_add/sub/mul/sqr, Montgomery batch inverse (1 inversion + 3(n-1) multiplications) (cpu/include/field_simd.hpp, cpu/src/field_simd.cpp)

Added -- GPU Optimization

Occupancy auto-tune utility -- gpu_occupancy.cuh with optimal_launch_1d() (uses cudaOccupancyMaxPotentialBlockSize), query_occupancy(), and startup device diagnostics
Warp-level reduction primitives -- warp_reduce_sum(), warp_reduce_sum64(), warp_reduce_or(), warp_broadcast(), warp_aggregated_atomic_add() in reusable header
__launch_bounds__ on library kernels -- field_mul/add/sub/inv_kernel (256,4), scalar_mul_batch/generator_mul_batch_kernel (128,2), point_add/dbl_kernel (256,4), hash160_pubkey_kernel (256,4)

Added -- Build & Packaging

PGO build scripts -- build_pgo.sh (Linux, Clang/GCC auto-detect) and build_pgo.ps1 (Windows, MSVC/ClangCL)
MSVC PGO support -- CMakeLists.txt now handles /GL + /GENPROFILE / /USEPROFILE for MSVC in addition to Clang/GCC
vcpkg manifest -- vcpkg.json with optional features (asm, cuda, lto)
Conan 2.x recipe -- conanfile.py with CMakeToolchain integration and shared/fPIC/asm/lto options
Benchmark dashboard CI -- GitHub Actions workflow (benchmark.yml) running benchmarks on Linux + Windows, parse_benchmark.py for JSON output, github-action-benchmark integration with 120% alert threshold

Added -- Tests (237 new)

test_v4_features -- 90 tests: Pedersen (basic/homomorphic/balance/switch/serialization/zero-value), FROST (Lagrange/keygen/2-of-3 signing), Adaptor (Schnorr basic/ECDSA basic/identity), Address (Base58Check/Bech32/Bech32m/hash160/P2PKH/P2WPKH/P2TR/WIF/consistency), Silent Payments (address/flow/multi-output)
test_ecdh_recovery_taproot -- 76 tests: ECDH, Recovery, Taproot, CT Utils, Wycheproof vectors
test_multiscalar_batch -- 16 tests: Shamir edge cases, multi-scalar sums, Schnorr & ECDSA batch verify
test_bip32 -- 28 tests: HMAC-SHA512 vectors, BIP-32 TV1 master/child keys, path derivation, serialization
test_musig2 -- 19 tests: key aggregation, nonce generation, 2-of-2 & 3-of-3 signing
test_simd_batch -- 8 tests: SIMD detection, batch add/sub/mul/sqr, batch inverse

Fixed

SHA-512 K[23] constant -- Single-bit typo (0x76f988da831153b6 -> 0x76f988da831153b5) that caused all SHA-512 hashes to be incorrect
MuSig2 per-signer Y parity -- musig2_partial_sign() now negates the secret key when the signer's public key has odd Y (required for x-only pubkey compatibility)

[3.0.0] - 2026-02-11

Added -- Cryptographic Primitives

ECDSA (RFC 6979) -- Deterministic signing & verification (cpu/include/ecdsa.hpp)
Schnorr BIP-340 -- x-only signing & verification (cpu/include/schnorr.hpp)
SHA-256 -- Standalone hash, zero-dependency (cpu/include/sha256.hpp)
Constant-time benchmarks -- CT layer micro-benchmarks via CTest

Added -- Platform Support

iOS -- CMake toolchain, XCFramework build script, SPM (Package.swift), CocoaPods (UltrafastSecp256k1.podspec), C++ umbrella header
WebAssembly (Emscripten) -- C API (11 functions), JS wrapper (secp256k1.mjs), TypeScript declarations, npm package @ultrafastsecp256k1/wasm
ROCm / HIP -- CUDA <-> HIP portability layer (gpu_compat.h), all 24 PTX asm blocks guarded with #if SECP256K1_USE_PTX + portable __int128 alternatives, dual CUDA/HIP CMake build
Android NDK -- arm64-v8a CI build with NDK r27c

Added -- Infrastructure

CI/CD (GitHub Actions) -- Linux (gcc-13/clang-17 x Release/Debug), Windows (MSVC), macOS (AppleClang), iOS (OS + Simulator + XCFramework), WASM (Emscripten), Android (NDK), ROCm (Docker)
Doxygen -> GitHub Pages -- Auto-generated API docs on push to main
Fuzzing harness -- tests/fuzz_field.cpp for libFuzzer field arithmetic testing
Version header -- cmake/version.hpp.in auto-generates SECP256K1_VERSION_* macros
.clang-format + .editorconfig -- Consistent code formatting
Desktop example app -- examples/desktop_example.cpp with CTest integration
CMake install -- install(TARGETS) + install(DIRECTORY) for system-wide deployment

Changed

Search kernels relocated -- cuda/include/ -> cuda/app/ (cleaner library vs. app separation)
README -- 7 CI badges, comprehensive build instructions for all platforms

[!] Testers Wanted

We need community testers for platforms we cannot fully validate in CI:

iOS -- Real device testing (iPhone/iPad with Xcode)

AMD GPU (ROCm/HIP) -- AMD Radeon RX / Instinct hardware

If you have access to these platforms, please run the build and report results! Open an issue at https://github.com/shrec/Secp256K1fast/issues

[2.0.0] - 2026-02-11

Added

Shared POD types (include/secp256k1/types.hpp): Canonical data layouts (FieldElementData, ScalarData, AffinePointData, JacobianPointData, MidFieldElementData) with static_assert layout guarantees across all backends
CUDA edge case tests (10 new): zero scalar, order scalar, point cancellation, infinity operand, add/dbl consistency, commutativity, associativity, field inv edges, scalar mul cross-check, distributive -- now 40/40 total
OpenCL edge case tests (8 new): matching coverage -- now 40/40 total
Shared test vectors (tests/test_vectors.hpp): canonical K*G vectors, edge scalars, large scalar pairs, hex utilities
CTest integration for CUDA (cuda/CMakeLists.txt)
CPU data()/from_data() accessors on FieldElement and Scalar for zero-cost cross-backend interop

Changed

CUDA: FieldElement, Scalar, AffinePoint are now using aliases to shared POD types (zero overhead, no API change)
OpenCL: Added static_assert layout compatibility checks + to_data()/ from_data() conversion utilities
OpenCL point ops optimized: 3-temp point doubling (was 12-temp), alias-safe mixed addition
CUDA point ops optimized: Local-variable rewrite eliminates pointer aliasing -- Point Double 2.29x faster (1.6->0.7 ns), Point Add 1.91x faster (2.1->1.1 ns), kG 2.25x faster (485->216 ns). CUDA now beats OpenCL on all point ops.
PTX inline assembly for NVIDIA OpenCL: Field ops now at parity with CUDA
Benchmarks updated: Full CUDA + OpenCL numbers on RTX 5060 Ti

Performance (RTX 5060 Ti, kernel-only)

CUDA kG: 216.1 ns (4.63 M/s) -- CUDA 1.37x faster than OpenCL
OpenCL kG: 295.1 ns (3.39 M/s)
Point Double: CUDA 0.7 ns (1,352 M/s), OpenCL 0.9 ns -- CUDA 1.29x
Point Add: CUDA 1.1 ns (916 M/s), OpenCL 1.6 ns -- CUDA 1.45x
Field Mul: 0.2 ns on both (4,139 M/s)

[1.0.0] - 2026-02-11

Added

Complete secp256k1 field arithmetic
Point addition, doubling, and multiplication
Scalar arithmetic
GLV endomorphism optimization
Assembly optimizations:
- x86-64 BMI2/ADX (3-5x speedup)
- RISC-V RV64GC (2-3x speedup)
- RISC-V Vector Extension (RVV) support
CUDA batch operations
Memory-mapped database support
Comprehensive documentation

Performance

x86-64 field multiplication: ~8ns (assembly)
RISC-V field multiplication: ~75ns (assembly)
CUDA batch throughput: 8M ops/s (RTX 4090)

Legend:

Added - New features
Changed - Changes in existing functionality
Deprecated - Soon-to-be removed features
Removed - Removed features
Fixed - Bug fixes
Security - Security fixes

90 KiB Raw Permalink Blame History Unescape Escape

Changelog

[Unreleased]

Security / Audit

GPU Backend

Added

Fixed

Changed

Added

Fixed

Changed

[3.22.0] - 2026-03-10

Added

Changed

[3.21.1] - 2026-03-09

Added

Fixed

Dependencies

[3.21.0] - 2026-03-08

Added

Fixed

[3.20.0] - 2026-03-07

1. Security & Constant-Time Hardening

2. Performance

3. Testing & Audit

4. CI/CD & Code Quality

5. Platform Support

6. Build & Packaging

7. Documentation

Cross-Platform Benchmark Results (vs libsecp256k1 v0.7.2)

x86-64 (i5-14400F @ 2.50 GHz, GCC 14.2.0, Ubuntu 24.04)

x86-64 (i7-11700 @ 2.50 GHz, Clang 21.1.0)

ARM64 (Cortex-A55 @ YF_022A, Clang 18.0.1 NDK r27)

RISC-V 64 (SiFive U74-MC @ 1.5 GHz, GCC 13.3.0, Milk-V Mars)

ESP32-S3 (Xtensa LX7 @ 240 MHz, GCC 14.2.0, ESP-IDF 5.5.1)

BIP-352 Silent Payments Pipeline (bench_bip352)

x86-64 (i5-14400F @ 2.50 GHz, GCC 14.2.0, Ubuntu 24.04)

ARM64 (Cortex-A55 @ YF_022A, Clang 18.0.3 NDK r27)

RISC-V 64 (SiFive U74-MC @ 1.5 GHz, GCC 13.3.0, Milk-V Mars)

[3.19.0] - 2026-03-04

1. RISC-V Constant-Time Timing Leak Fixes

2. L1 I-Cache Optimization (ECDSA Verify)

3. Benchmark Diagnostics

4. Build Hardening

x86-64 Benchmark Results (i7-11700 @ 2.50 GHz, Clang 21.1.0)

[3.18.0] - 2026-03-04

CT Scalar Inverse: SafeGCD (Bernstein-Yang constant-time divsteps)

[3.17.1] - 2026-03-05

1. Precompute Cache Atomic Write (bip340_vectors CI flake fix)

2. ASan Buffer Overread Fix (fuzz_address_bip32_ffi suite 15a)

[3.17.0] - 2026-03-04

1. Secret Zeroization (I1)

2. Fault Attack Countermeasures (I2)

3. Test Vector Coverage (I3)

4. API Misuse Resistance (I4)

5. Formal Verification (I5)

6. Protocol-Level Hardening (I6)

7. Test Suite Growth

[3.16.1] - 2026-03-02

1. Cross-Platform Benchmark Campaign (bench_hornet)

2. Cross-Platform Audit Campaign (unified_audit_runner)

3. ARM64 Android Benchmark Port

4. RISC-V Benchmark on Real Hardware

5. Build & CI Fixes

6. Documentation

[3.16.0] - 2026-03-01

1. Security Hardening

2. Audit Infrastructure

3. Local CI (Docker)

4. Documentation

5. Build & CI

6. CT Verification CI

7. Audit Infrastructure (SARIF & Regression)

8. OpenSSF Scorecard Hardening

9. FROST RFC 9591 Protocol Invariant Tests

10. Audit UX

11. New Audit Modules

12. Community

[3.15.3] - 2026-03-01

Fixed -- Code Quality (136 code scanning alerts resolved)

90 KiB

Raw Permalink Blame History

Added -- Stable C ABI (`ufsecp`)