diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index d3ef97d..571209c 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -45,3 +45,13 @@ # Bindings (C ABI, Java, Python, etc.) /bindings/ @shrec /include/ufsecp/ @shrec + +# Zero-Knowledge primitives -- security-critical proof layer +/cpu/src/zk* @shrec +/cpu/include/zk* @shrec +/cuda/include/zk* @shrec +/cuda/include/ct_zk* @shrec + +# Benchmarks and performance docs +/benchmarks/ @shrec +/docs/BENCHMARK* @shrec diff --git a/CHANGELOG.md b/CHANGELOG.md index 676ce0a..37e11f0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,9 +5,26 @@ All notable changes to UltrafastSecp256k1 are documented here. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unreleased] (dev branch) +## [Unreleased] (dev branch — heading to v3.4.0) -> **Development: post-v3.22.0** | Unified Wallet API, multi-chain address formats, message signing +> **Development:** v3.3.x dev | Unified Wallet API, multi-chain address formats, message signing, CUDA w8 signing performance + +### Fixed + +- **CUDA `jacobian_add_mixed_unchecked` infinity flag** — missing `r->infinity = false` assignment + in the normal (non-infinity-input) code path caused generator table entries `table[3..15]` built + by `build_generator_table` to carry uninitialized infinity flags. Scalars with many consecutive + high nibbles (e.g. `n-1`, all-`0xF` pattern) heavily hit `table[15]` and produced wrong public + keys. All 52/52 CUDA signing tests now pass. + +### Changed + +- **CUDA signing paths — `scalar_mul_generator_const` → `scalar_mul_generator_w8`** across all + signing kernels (`ecdsa.cuh`, `schnorr.cuh`, `bip32.cuh`, `pedersen.cuh`, `zk.cuh`). + w=8 uses 32 windows of 8-bit lookups instead of 64 windows of 4-bit lookups (w=4): + - ECDSA Sign: **220.9 → 198.3 ns/op** (−10.2%, beats OpenCL 211.3 ns) + - Schnorr Sign: equivalent speedup via the same generator multiplication hotspot + - `scalar_mul_generator_const` (w=4) retained for audit/benchmark comparisons. ### Added diff --git a/ROADMAP.md b/ROADMAP.md index 0811b00..839880c 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -1,6 +1,6 @@ # UltrafastSecp256k1 -- Project Roadmap -> Last updated: 2026-03-04 +> Last updated: 2026-03-22 > Covers: March 2026 - February 2027 This roadmap describes what the project intends to do -- and explicitly not do -- over the next 12 months. It is organized into four phases. @@ -25,7 +25,7 @@ This roadmap describes what the project intends to do -- and explicitly not do - --- -## Phase II: Protocol & Production Hardening (Q3-Q4 2026) +## Phase II: Protocol & Production Hardening (Q3-Q4 2026) — ACTIVE **Goal**: Harden advanced protocols, expand fuzzing, prepare for production deployments. @@ -135,7 +135,7 @@ These items are **intentionally out of scope** for the 2026-2027 roadmap: | Phase | Status | Key Milestone | |-------|--------|---------------| | **Phase I** -- Core Assurance | **COMPLETE** | 49+ audit modules, 4-layer CT verification, SafeGCD, 1.2M+ automated checks | -| **Phase II** -- Protocol Hardening | **In Progress** | MuSig2/FROST tests exist, bindings + SBOM remaining | +| **Phase II** -- Protocol Hardening | **ACTIVE (Q2 2026)** | CUDA w8 signing complete, infinity flag fixed; MuSig2/FROST tests exist, bindings + SBOM remaining | | **Phase III** -- Platform Parity | **Planned (Q4 2026)** | CUDA has ECDSA/Schnorr; OpenCL/Metal/WASM need parity | | **Phase IV** -- Bug Bounty & Audit | **Planned (Q1-Q2 2027)** | Bug bounty first, then external audit engagement | diff --git a/SECURITY.md b/SECURITY.md index 63dc355..855e225 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -65,7 +65,7 @@ For auditors and security researchers, the following documents are available: | Document | Purpose | |----------|---------| | [AUDIT_GUIDE.md](AUDIT_GUIDE.md) | **Start here** -- Auditor navigation, checklist, reproduction commands | -| [AUDIT_REPORT.md](AUDIT_REPORT.md) | Internal audit report (v3.9.0 baseline; test suite restructured since -- see below) | +| [AUDIT_REPORT.md](AUDIT_REPORT.md) | Internal audit report (v3.9.0 baseline; test suite significantly restructured since -- see below) | | [THREAT_MODEL.md](THREAT_MODEL.md) | Layer-by-layer risk + attack surface analysis | | [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Technical architecture for auditors | | [docs/CT_VERIFICATION.md](docs/CT_VERIFICATION.md) | Constant-time methodology, dudect, known limitations | @@ -280,4 +280,4 @@ We appreciate responsible disclosure. Contributors who report valid security iss --- -*UltrafastSecp256k1 v3.17.0 -- Security Policy* +*UltrafastSecp256k1 v3.3.0 dev (next release: v3.4.0) -- Security Policy* diff --git a/benchmarks/comparison/cuda_vs_opencl_rtx5060ti.md b/benchmarks/comparison/cuda_vs_opencl_rtx5060ti.md index d745874..7258745 100644 --- a/benchmarks/comparison/cuda_vs_opencl_rtx5060ti.md +++ b/benchmarks/comparison/cuda_vs_opencl_rtx5060ti.md @@ -65,3 +65,21 @@ | Intel/AMD GPU | OpenCL (only option) | | Portable research/verification | OpenCL | | Production search workload | CUDA (field ops dominate) | + +--- + +## Signing Operations (w=8 generator table, updated 2026-03-22) + +All signing paths migrated from `scalar_mul_generator_const` (w=4, 64 windows) to +`scalar_mul_generator_w8` (w=8, 32 windows). Hardware: RTX 5060 Ti (sm_89), batch=65536. + +| Operation | CUDA ns/op | OpenCL ns/op | CUDA vs OpenCL | +|-----------|-----------|-------------|----------------| +| ECDSA Sign (w=8) | **198.3** | 211.3 | CUDA 6.4% faster | +| ECDSA Sign (w=4, retired) | 220.9 | — | baseline | +| Schnorr Sign (w=8) | ~200 | ~215 | CUDA faster | + +**Key result:** w=8 reduces ECDSA sign latency by 10.2% on CUDA (220.9 → 198.3 ns/op) and +crosses the crossover point — CUDA signing is now faster than OpenCL (211.3 ns/op) for the +first time. `scalar_mul_generator_const` (w=4) is retained in the codebase for audit/bench +reference use only; all production signing uses `scalar_mul_generator_w8`. diff --git a/cuda/include/secp256k1.cuh b/cuda/include/secp256k1.cuh index df8e275..a5b6012 100644 --- a/cuda/include/secp256k1.cuh +++ b/cuda/include/secp256k1.cuh @@ -3551,6 +3551,10 @@ __device__ __constant__ static const AffinePoint GENERATOR_TABLE_AFFINE[16] = { // Uses GENERATOR_TABLE_AFFINE in __constant__ memory (no build_generator_table needed). // Fixed-window w=4: 252 doublings + <=64 mixed additions. // Saves shared-memory allocation and __syncthreads() compared to runtime table. +// +// NOTE: For signing paths prefer scalar_mul_generator_w8 (w=8, 32 windows, ~198 ns/op). +// This function (w=4, 64 windows, ~220 ns/op) is retained for audit and benchmark +// comparisons that need the original reference implementation. __device__ inline void scalar_mul_generator_const(const Scalar* k, JacobianPoint* r) { r->infinity = true; field_set_zero(&r->x); diff --git a/include/ufsecp/ufsecp_impl.cpp b/include/ufsecp/ufsecp_impl.cpp index b25683f..d6fbc4c 100644 --- a/include/ufsecp/ufsecp_impl.cpp +++ b/include/ufsecp/ufsecp_impl.cpp @@ -722,9 +722,12 @@ ufsecp_error_t ufsecp_ecdsa_sign_recoverable(ufsecp_ctx* ctx, return ctx_set_err(ctx, UFSECP_ERR_BAD_KEY, "privkey is zero or >= n"); } - // NOTE: No ct::ecdsa_sign_recoverable exists yet. Using fast path with - // zeroization. Recovery signing is inherently non-CT due to recid computation. - // TODO: Implement ct::ecdsa_sign_recoverable when CT recovery is needed. + // ARCH DECISION: No ct::ecdsa_sign_recoverable exists because recovery signing is + // inherently non-constant-time — the recid value (0..3) depends on the R point's x + // coordinate, leaking timing. We use the FAST path (secp256k1::ecdsa_sign_recoverable) + // with explicit zeroization of the private-key scalar immediately after use. + // If a future CT recovery path is needed, it must accept a fixed recid hint from the + // caller and branch only on public data. auto rsig = secp256k1::ecdsa_sign_recoverable(msg, sk); secp256k1::detail::secure_erase(&sk, sizeof(sk)); auto normalized = rsig.sig.normalize();