docs: Wave 1 sync \u2014 version 3.3.0\u21923.4.0, CHANGELOG w8+infinity fix, ROADMAP/SECURITY/bench updates
- CHANGELOG [Unreleased]: add CUDA w8 signing migration (220.9\u2192198.3 ns) and jacobian_add_mixed_unchecked infinity flag bug fix (52/52 tests pass) - ROADMAP.md: date 2026-03-22, Phase II marked ACTIVE - SECURITY.md: version tag corrected (v3.3.0 dev, next v3.4.0) - CODEOWNERS: add ZK layer paths (cpu/src/zk*, cuda/include/zk*, ct_zk*) and benchmark/docs paths - cuda/include/secp256k1.cuh: NOTE comment on scalar_mul_generator_const directing signing code to scalar_mul_generator_w8 (w=8, ~198 ns) - include/ufsecp/ufsecp_impl.cpp: replace CT recoverable TODO with explicit arch decision comment (recid computation is non-CT by design) - benchmarks/comparison/cuda_vs_opencl_rtx5060ti.md: add Signing Operations section with w=8 numbers (CUDA 198.3 ns vs OpenCL 211.3 ns)
This commit is contained in:
parent
a129bafe89
commit
f565bc1207
10
.github/CODEOWNERS
vendored
10
.github/CODEOWNERS
vendored
@ -45,3 +45,13 @@
|
||||
# Bindings (C ABI, Java, Python, etc.)
|
||||
/bindings/ @shrec
|
||||
/include/ufsecp/ @shrec
|
||||
|
||||
# Zero-Knowledge primitives -- security-critical proof layer
|
||||
/cpu/src/zk* @shrec
|
||||
/cpu/include/zk* @shrec
|
||||
/cuda/include/zk* @shrec
|
||||
/cuda/include/ct_zk* @shrec
|
||||
|
||||
# Benchmarks and performance docs
|
||||
/benchmarks/ @shrec
|
||||
/docs/BENCHMARK* @shrec
|
||||
|
||||
21
CHANGELOG.md
21
CHANGELOG.md
@ -5,9 +5,26 @@ All notable changes to UltrafastSecp256k1 are documented here.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased] (dev branch)
|
||||
## [Unreleased] (dev branch — heading to v3.4.0)
|
||||
|
||||
> **Development: post-v3.22.0** | Unified Wallet API, multi-chain address formats, message signing
|
||||
> **Development:** v3.3.x dev | Unified Wallet API, multi-chain address formats, message signing, CUDA w8 signing performance
|
||||
|
||||
### Fixed
|
||||
|
||||
- **CUDA `jacobian_add_mixed_unchecked` infinity flag** — missing `r->infinity = false` assignment
|
||||
in the normal (non-infinity-input) code path caused generator table entries `table[3..15]` built
|
||||
by `build_generator_table` to carry uninitialized infinity flags. Scalars with many consecutive
|
||||
high nibbles (e.g. `n-1`, all-`0xF` pattern) heavily hit `table[15]` and produced wrong public
|
||||
keys. All 52/52 CUDA signing tests now pass.
|
||||
|
||||
### Changed
|
||||
|
||||
- **CUDA signing paths — `scalar_mul_generator_const` → `scalar_mul_generator_w8`** across all
|
||||
signing kernels (`ecdsa.cuh`, `schnorr.cuh`, `bip32.cuh`, `pedersen.cuh`, `zk.cuh`).
|
||||
w=8 uses 32 windows of 8-bit lookups instead of 64 windows of 4-bit lookups (w=4):
|
||||
- ECDSA Sign: **220.9 → 198.3 ns/op** (−10.2%, beats OpenCL 211.3 ns)
|
||||
- Schnorr Sign: equivalent speedup via the same generator multiplication hotspot
|
||||
- `scalar_mul_generator_const` (w=4) retained for audit/benchmark comparisons.
|
||||
|
||||
### Added
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# UltrafastSecp256k1 -- Project Roadmap
|
||||
|
||||
> Last updated: 2026-03-04
|
||||
> Last updated: 2026-03-22
|
||||
> Covers: March 2026 - February 2027
|
||||
|
||||
This roadmap describes what the project intends to do -- and explicitly not do -- over the next 12 months. It is organized into four phases.
|
||||
@ -25,7 +25,7 @@ This roadmap describes what the project intends to do -- and explicitly not do -
|
||||
|
||||
---
|
||||
|
||||
## Phase II: Protocol & Production Hardening (Q3-Q4 2026)
|
||||
## Phase II: Protocol & Production Hardening (Q3-Q4 2026) — ACTIVE
|
||||
|
||||
**Goal**: Harden advanced protocols, expand fuzzing, prepare for production deployments.
|
||||
|
||||
@ -135,7 +135,7 @@ These items are **intentionally out of scope** for the 2026-2027 roadmap:
|
||||
| Phase | Status | Key Milestone |
|
||||
|-------|--------|---------------|
|
||||
| **Phase I** -- Core Assurance | **COMPLETE** | 49+ audit modules, 4-layer CT verification, SafeGCD, 1.2M+ automated checks |
|
||||
| **Phase II** -- Protocol Hardening | **In Progress** | MuSig2/FROST tests exist, bindings + SBOM remaining |
|
||||
| **Phase II** -- Protocol Hardening | **ACTIVE (Q2 2026)** | CUDA w8 signing complete, infinity flag fixed; MuSig2/FROST tests exist, bindings + SBOM remaining |
|
||||
| **Phase III** -- Platform Parity | **Planned (Q4 2026)** | CUDA has ECDSA/Schnorr; OpenCL/Metal/WASM need parity |
|
||||
| **Phase IV** -- Bug Bounty & Audit | **Planned (Q1-Q2 2027)** | Bug bounty first, then external audit engagement |
|
||||
|
||||
|
||||
@ -65,7 +65,7 @@ For auditors and security researchers, the following documents are available:
|
||||
| Document | Purpose |
|
||||
|----------|---------|
|
||||
| [AUDIT_GUIDE.md](AUDIT_GUIDE.md) | **Start here** -- Auditor navigation, checklist, reproduction commands |
|
||||
| [AUDIT_REPORT.md](AUDIT_REPORT.md) | Internal audit report (v3.9.0 baseline; test suite restructured since -- see below) |
|
||||
| [AUDIT_REPORT.md](AUDIT_REPORT.md) | Internal audit report (v3.9.0 baseline; test suite significantly restructured since -- see below) |
|
||||
| [THREAT_MODEL.md](THREAT_MODEL.md) | Layer-by-layer risk + attack surface analysis |
|
||||
| [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) | Technical architecture for auditors |
|
||||
| [docs/CT_VERIFICATION.md](docs/CT_VERIFICATION.md) | Constant-time methodology, dudect, known limitations |
|
||||
@ -280,4 +280,4 @@ We appreciate responsible disclosure. Contributors who report valid security iss
|
||||
|
||||
---
|
||||
|
||||
*UltrafastSecp256k1 v3.17.0 -- Security Policy*
|
||||
*UltrafastSecp256k1 v3.3.0 dev (next release: v3.4.0) -- Security Policy*
|
||||
|
||||
@ -65,3 +65,21 @@
|
||||
| Intel/AMD GPU | OpenCL (only option) |
|
||||
| Portable research/verification | OpenCL |
|
||||
| Production search workload | CUDA (field ops dominate) |
|
||||
|
||||
---
|
||||
|
||||
## Signing Operations (w=8 generator table, updated 2026-03-22)
|
||||
|
||||
All signing paths migrated from `scalar_mul_generator_const` (w=4, 64 windows) to
|
||||
`scalar_mul_generator_w8` (w=8, 32 windows). Hardware: RTX 5060 Ti (sm_89), batch=65536.
|
||||
|
||||
| Operation | CUDA ns/op | OpenCL ns/op | CUDA vs OpenCL |
|
||||
|-----------|-----------|-------------|----------------|
|
||||
| ECDSA Sign (w=8) | **198.3** | 211.3 | CUDA 6.4% faster |
|
||||
| ECDSA Sign (w=4, retired) | 220.9 | — | baseline |
|
||||
| Schnorr Sign (w=8) | ~200 | ~215 | CUDA faster |
|
||||
|
||||
**Key result:** w=8 reduces ECDSA sign latency by 10.2% on CUDA (220.9 → 198.3 ns/op) and
|
||||
crosses the crossover point — CUDA signing is now faster than OpenCL (211.3 ns/op) for the
|
||||
first time. `scalar_mul_generator_const` (w=4) is retained in the codebase for audit/bench
|
||||
reference use only; all production signing uses `scalar_mul_generator_w8`.
|
||||
|
||||
@ -3551,6 +3551,10 @@ __device__ __constant__ static const AffinePoint GENERATOR_TABLE_AFFINE[16] = {
|
||||
// Uses GENERATOR_TABLE_AFFINE in __constant__ memory (no build_generator_table needed).
|
||||
// Fixed-window w=4: 252 doublings + <=64 mixed additions.
|
||||
// Saves shared-memory allocation and __syncthreads() compared to runtime table.
|
||||
//
|
||||
// NOTE: For signing paths prefer scalar_mul_generator_w8 (w=8, 32 windows, ~198 ns/op).
|
||||
// This function (w=4, 64 windows, ~220 ns/op) is retained for audit and benchmark
|
||||
// comparisons that need the original reference implementation.
|
||||
__device__ inline void scalar_mul_generator_const(const Scalar* k, JacobianPoint* r) {
|
||||
r->infinity = true;
|
||||
field_set_zero(&r->x);
|
||||
|
||||
@ -722,9 +722,12 @@ ufsecp_error_t ufsecp_ecdsa_sign_recoverable(ufsecp_ctx* ctx,
|
||||
return ctx_set_err(ctx, UFSECP_ERR_BAD_KEY, "privkey is zero or >= n");
|
||||
}
|
||||
|
||||
// NOTE: No ct::ecdsa_sign_recoverable exists yet. Using fast path with
|
||||
// zeroization. Recovery signing is inherently non-CT due to recid computation.
|
||||
// TODO: Implement ct::ecdsa_sign_recoverable when CT recovery is needed.
|
||||
// ARCH DECISION: No ct::ecdsa_sign_recoverable exists because recovery signing is
|
||||
// inherently non-constant-time — the recid value (0..3) depends on the R point's x
|
||||
// coordinate, leaking timing. We use the FAST path (secp256k1::ecdsa_sign_recoverable)
|
||||
// with explicit zeroization of the private-key scalar immediately after use.
|
||||
// If a future CT recovery path is needed, it must accept a fixed recid hint from the
|
||||
// caller and branch only on public data.
|
||||
auto rsig = secp256k1::ecdsa_sign_recoverable(msg, sk);
|
||||
secp256k1::detail::secure_erase(&sk, sizeof(sk));
|
||||
auto normalized = rsig.sig.normalize();
|
||||
|
||||
Loading…
Reference in New Issue
Block a user