[BREAKGLASS] Append-only mirror of github.com/sparrowwallet/UltrafastSecp256k1

Go to file

vano bcffd1ccef fix: suppress 62+ code scanning alerts, harden PR #25 fixes - Add .github/codeql/codeql-config.yml: exclude cpp/unused-static-function (52), cpp/constant-comparison (4), cpp/stack-address-escape (1), cpp/path-injection (3) - Reference config-file in codeql.yml CodeQL init step - Fix dependency-review.yml: checkout v4->v6, ubuntu-latest->ubuntu-24.04 - Clean .pre-commit-config.yaml: remove irrelevant PHP/Java/Ruby/Go/eslint/pylint hooks, keep gitleaks/shellcheck/cpplint/pre-commit-hooks, bump versions - Pin pip versions: wheel==0.45.1, setuptools==75.8.0, build==1.2.2 (release.yml), pyflakes==3.2.0, mypy==1.14.1 (bindings.yml) for Scorecard PinnedDependenciesID - Suppress unused-local-variable: (void)a_inf in ct_point.cpp, (void)parity in test_ecdh_recovery_taproot.cpp Eliminates: 52 unused-static-function, 4 constant-comparison, 3 path-injection, 2 unused-local-variable, 1 stack-address-escape, 2 PinnedDependenciesID = 64 alerts resolved. Remaining 8: 4 TokenPermissions (legitimate), 4 repo-level (not code-fixable).		2026-02-23 18:23:36 +04:00
.github	fix: suppress 62+ code scanning alerts, harden PR #25 fixes	2026-02-23 18:23:36 +04:00
android	style: replace all Unicode with ASCII across entire codebase	2026-02-23 02:16:57 +04:00
benchmarks	docs: OpenCL is implemented, remove '(future)' labels	2026-02-14 17:48:17 +00:00
bindings	chore: pin actions by SHA, bump deps, fix CodeQL alerts	2026-02-23 17:16:46 +04:00
cmake	feat: iOS support — SPM, CocoaPods, XCFramework, CI	2026-02-15 05:11:53 +04:00
compat/libsecp256k1_shim	style: replace all Unicode with ASCII across entire codebase	2026-02-23 02:16:57 +04:00
cpu	fix: suppress 62+ code scanning alerts, harden PR #25 fixes	2026-02-23 18:23:36 +04:00
cuda	style: replace all Unicode with ASCII across entire codebase	2026-02-23 02:16:57 +04:00
docs	docs: comprehensive wiki documentation update	2026-02-23 14:53:16 +04:00
examples	style: replace all Unicode with ASCII across entire codebase	2026-02-23 02:16:57 +04:00
include	chore: pin actions by SHA, bump deps, fix CodeQL alerts	2026-02-23 17:16:46 +04:00
metal	fix: update all repository URLs to shrec/UltrafastSecp256k1	2026-02-23 06:11:06 +04:00
nuget	Add PackageReadmeFile to Ufsecp NuGet package	2026-02-23 05:58:03 +04:00
opencl	style: replace all Unicode with ASCII across entire codebase	2026-02-23 02:16:57 +04:00
scripts	feat: v3.4.0 — ufsecp stable C ABI + 12 language bindings + release CI	2026-02-19 19:09:09 +04:00
tests	style: replace all Unicode with ASCII across entire codebase	2026-02-23 02:16:57 +04:00
tools	feat: v3.4.0 — ufsecp stable C ABI + 12 language bindings + release CI	2026-02-19 19:09:09 +04:00
wasm	Add package READMEs for npm/NuGet, fix repo URLs	2026-02-23 05:43:49 +04:00
.clang-format	feat: v2.2.0 — ECDSA (RFC 6979), Schnorr (BIP-340), SHA-256, CI/CD, fuzzing, CT bench	2026-02-15 05:03:33 +04:00
.editorconfig	feat: v2.2.0 — ECDSA (RFC 6979), Schnorr (BIP-340), SHA-256, CI/CD, fuzzing, CT bench	2026-02-15 05:03:33 +04:00
.gitignore	fix(ci): remove orphaned cpu/secp256k1 submodule entry	2026-02-23 02:24:56 +04:00
.pre-commit-config.yaml	fix: suppress 62+ code scanning alerts, harden PR #25 fixes	2026-02-23 18:23:36 +04:00
AUDIT_REPORT.md	audit: comprehensive cryptographic audit test suite (8 suites, 641K checks)	2026-02-20 10:09:46 +00:00
build_pgo.ps1	feat: v3.1.0 — multi-scalar mul, batch verify, BIP-32, MuSig2, PGO, GPU occupancy	2026-02-15 13:08:44 +04:00
build_pgo.sh	feat: v3.1.0 — multi-scalar mul, batch verify, BIP-32, MuSig2, PGO, GPU occupancy	2026-02-15 13:08:44 +04:00
CHANGELOG.md	release: v3.11.0 — effective-affine, RISC-V auto-tune, benchmark refresh	2026-02-23 05:14:32 +04:00
CMakeLists.txt	fix: update all repository URLs to shrec/UltrafastSecp256k1	2026-02-23 06:11:06 +04:00
CMakePresets.json	feat: selftest modes (smoke/ci/stress), repro bundle, sanitizer CI	2026-02-16 06:33:02 +04:00
CODE_OF_CONDUCT.md	Add Contributor Covenant Code of Conduct	2026-02-12 11:12:50 +04:00
conanfile.py	fix: update all repository URLs to shrec/UltrafastSecp256k1	2026-02-23 06:11:06 +04:00
CONTRIBUTING.md	fix: update all repository URLs to shrec/UltrafastSecp256k1	2026-02-23 06:11:06 +04:00
Doxyfile	docs: auto-inject version from VERSION.txt into Doxyfile	2026-02-20 04:53:24 +04:00
GPU_TESTING_GUIDE.md	feat(gpu): CUDA/Metal/OpenCL extended crypto ops + GPU testing guide	2026-02-18 13:54:53 +04:00
LICENSE	Initial release: UltrafastSecp256k1 v1.0.0	2026-02-02 18:36:16 +04:00
Package.swift	fix: update all repository URLs to shrec/UltrafastSecp256k1	2026-02-23 06:11:06 +04:00
PORTING.md	docs: SEO optimization — README rewrite, PORTING.md, version updates	2026-02-19 19:19:07 +00:00
README.md	chore: pin actions by SHA, bump deps, fix CodeQL alerts	2026-02-23 17:16:46 +04:00
RELEASE_NOTES_v3.6.0.md	perf: optimize CT generator_mul with precomputed table — 3x speedup	2026-02-20 11:14:54 +00:00
RELEASE_NOTES_v3.7.0.md	fix(ci): fix all packaging jobs — NuGet/gem/Python/npm/Java	2026-02-20 01:11:55 +04:00
RISCV_OPTIMIZATIONS.md	docs: rewrite RISCV_OPTIMIZATIONS.md with full optimization history	2026-02-15 03:10:20 +04:00
secp256k1-fast.pc.in	Initial release: UltrafastSecp256k1 v1.0.0	2026-02-02 18:36:16 +04:00
SECURITY.md	security: add CodeQL, OpenSSF Scorecard, coverage, attestation, checksums	2026-02-23 15:54:38 +04:00
THREAT_MODEL.md	docs: SEO optimization — README rewrite, PORTING.md, version updates	2026-02-19 19:19:07 +00:00
UltrafastSecp256k1.podspec	fix: update all repository URLs to shrec/UltrafastSecp256k1	2026-02-23 06:11:06 +04:00
vcpkg.json	fix: update all repository URLs to shrec/UltrafastSecp256k1	2026-02-23 06:11:06 +04:00
VERSION.txt	release: v3.11.0 — effective-affine, RISC-V auto-tune, benchmark refresh	2026-02-23 05:14:32 +04:00

README.md

UltrafastSecp256k1 — Fastest Open-Source secp256k1 Library

Zero-dependency, multi-backend secp256k1 elliptic curve cryptography library — GPU-accelerated ECDSA & Schnorr signatures, constant-time side-channel protection, 12+ platform targets inc. CUDA, Metal, OpenCL, ROCm, WebAssembly, RISC-V, ESP32, and STM32.

4.88 M ECDSA signs/s · 2.44 M ECDSA verifies/s · 3.66 M Schnorr signs/s · 2.82 M Schnorr verifies/s — single GPU (RTX 5060 Ti)

Why UltrafastSecp256k1?

Fastest open-source GPU signatures — no other library provides secp256k1 ECDSA + Schnorr sign/verify on CUDA, OpenCL, and Metal
Zero dependencies — pure C++20, no Boost, no OpenSSL, compiles anywhere with a conforming compiler
Dual-layer security — variable-time FAST path for throughput, constant-time CT path for secret-key operations
12+ platforms — x86-64, ARM64, RISC-V, WASM, iOS, Android, ESP32, STM32, CUDA, Metal, OpenCL, ROCm

Quick links: Discord · Benchmarks · Build Guide · API Reference · Security Policy · Threat Model · Porting Guide

Supported Blockchains (secp256k1-based):

GPU & Platform Support:

⚠️ Security Notice

Research & Development Project — Not Audited

This library has not undergone independent security audits. It is provided for research, educational, and experimental purposes.

❌ Not recommended for production without independent cryptographic audit
✅ All self-tests pass (76/76 including all backends)
✅ Dual-layer constant-time architecture (FAST + CT always active)
✅ Stable C ABI (ufsecp) with 45 exported functions
✅ Fuzz-tested core arithmetic (libFuzzer + ASan)

Report vulnerabilities via GitHub Security Advisories or email payysoon@gmail.com. For production cryptographic systems, prefer audited libraries like libsecp256k1.

secp256k1 Feature Overview

Category	Component	Status
Core	Field, Scalar, Point, GLV, Precompute	✅
Assembly	x64 MASM/GAS, BMI2/ADX, ARM64 MUL/UMULH, RISC-V RV64GC	✅
SIMD	AVX2/AVX-512 batch ops, Montgomery batch inverse	✅
Constant-Time	CT field/scalar/point — no secret-dependent branches	✅
ECDSA	Sign/Verify, RFC 6979, DER/Compact, low-S, Recovery	✅
Schnorr	BIP-340 sign/verify, tagged hashing, x-only pubkeys	✅
ECDH	Key exchange (raw, xonly, SHA-256)	✅
Multi-scalar	Strauss/Shamir dual-scalar multiplication	✅
Batch verify	ECDSA + Schnorr batch verification	✅
BIP-32/44	HD derivation, path parsing, xprv/xpub, coin-type	✅
MuSig2	BIP-327, key aggregation, 2-round signing	✅
Taproot	BIP-341/342, tweak, Merkle tree	✅
Pedersen	Commitments, homomorphic, switch commitments	✅
FROST	Threshold signatures, t-of-n	✅
Adaptor	Schnorr + ECDSA adaptor signatures	✅
Address	P2PKH, P2WPKH, P2TR, Base58, Bech32/m, EIP-55	✅
Coins	27 blockchains, auto-dispatch	✅
Hashing	SHA-256 (SHA-NI), SHA-512, HMAC, Keccak-256	✅
C ABI	`ufsecp` stable FFI (45 exports, C/C#/Python/Go/Rust/…)	✅
GPU	CUDA, Metal, OpenCL, ROCm kernels	✅
Platforms	x64, ARM64, RISC-V, ESP32, STM32, WASM, iOS, Android	✅

secp256k1 GPU Acceleration (CUDA / OpenCL / Metal / ROCm)

UltrafastSecp256k1 is the only open-source library that provides full secp256k1 ECDSA + Schnorr sign/verify on GPU across four backends:

Backend	Hardware	kG/s	ECDSA Sign	ECDSA Verify	Schnorr Sign	Schnorr Verify
CUDA	RTX 5060 Ti	4.59 M/s	4.88 M/s	2.44 M/s	3.66 M/s	2.82 M/s
OpenCL	RTX 5060 Ti	3.39 M/s	—	—	—	—
Metal	Apple M3 Pro	0.33 M/s	—	—	—	—
ROCm (HIP)	AMD GPUs	Portable	—	—	—	—

CUDA 12.0, sm_86;sm_89, batch=16K signatures. Metal 2.4, 8×32-bit Comba limbs, 18 GPU cores.

CUDA Core ECC Operations (Kernel-Only Throughput)

Operation	Time/Op	Throughput
Field Mul	0.2 ns	4,142 M/s
Field Add	0.2 ns	4,130 M/s
Field Inv	10.2 ns	98.35 M/s
Point Add	1.6 ns	619 M/s
Point Double	0.8 ns	1,282 M/s
Scalar Mul (P×k)	225.8 ns	4.43 M/s
Generator Mul (G×k)	217.7 ns	4.59 M/s
Batch Inv (Montgomery)	2.9 ns	340 M/s
Jac→Affine (per-pt)	14.9 ns	66.9 M/s

GPU Signature Operations (ECDSA + Schnorr)

Operation	Time/Op	Throughput	Protocol
ECDSA Sign	204.8 ns	4.88 M/s	RFC 6979 + low-S
ECDSA Verify	410.1 ns	2.44 M/s	Shamir + GLV
ECDSA Sign+Recid	311.5 ns	3.21 M/s	Recoverable (EIP-155)
Schnorr Sign	273.4 ns	3.66 M/s	BIP-340
Schnorr Verify	354.6 ns	2.82 M/s	BIP-340 + GLV

CUDA vs OpenCL Comparison (RTX 5060 Ti)

Operation	CUDA	OpenCL	Winner
Field Mul	0.2 ns	0.2 ns	Tie
Field Inv	10.2 ns	14.3 ns	CUDA 1.40×
Point Double	0.8 ns	0.9 ns	CUDA 1.13×
Point Add	1.6 ns	1.6 ns	Tie
kG (Generator Mul)	217.7 ns	295.1 ns	CUDA 1.36×

Benchmarks: 2026-02-14, Linux x86_64, NVIDIA Driver 580.126.09. Both kernel-only (no buffer allocation/copy overhead).

Apple Metal (M3 Pro) — Kernel-Only

Operation	Time/Op	Throughput
Field Mul	1.9 ns	527 M/s
Field Inv	106.4 ns	9.40 M/s
Point Add	10.1 ns	98.6 M/s
Point Double	5.1 ns	196 M/s
Scalar Mul (P×k)	2.94 μs	0.34 M/s
Generator Mul (G×k)	3.00 μs	0.33 M/s

Metal 2.4, 8×32-bit Comba limbs, Apple M3 Pro (18 GPU cores, Unified Memory 18 GB)

secp256k1 ECDSA & Schnorr Signatures (BIP-340, RFC 6979)

Full signature support across CPU and GPU:

ECDSA: RFC 6979 deterministic nonces, low-S normalization, DER/Compact encoding, public key recovery (recid)
Schnorr: BIP-340 compliant — tagged hashing, x-only public keys
Batch verification: ECDSA and Schnorr batch verify
Multi-scalar: Shamir's trick (k₁×G + k₂×Q) for fast verification

CPU Signature Benchmarks (x86-64, Clang 19, AVX2, Release)

Operation	Time	Throughput
ECDSA Sign (RFC 6979)	8.5 μs	118,000 op/s
ECDSA Verify	23.6 μs	42,400 op/s
Schnorr Sign (BIP-340)	6.8 μs	146,000 op/s
Schnorr Verify (BIP-340)	24.0 μs	41,600 op/s
Key Generation (CT)	9.5 μs	105,500 op/s
Key Generation (fast)	5.5 μs	182,000 op/s
ECDH	23.9 μs	41,800 op/s

Schnorr sign is ~25% faster than ECDSA sign due to simpler nonce derivation (no modular inverse). Measured single-core, pinned, 2026-02-21.

Constant-Time secp256k1 (Side-Channel Resistance)

The ct:: namespace provides constant-time operations for secret-key material — no secret-dependent branches or memory access patterns:

Operation	Fast	CT	Overhead
Field Mul	17 ns	23 ns	1.08×
Field Inverse	0.8 μs	1.7 μs	2.05×
Complete Addition	—	276 ns	—
Scalar Mul (k×P)	23.6 μs	26.6 μs	1.13×
Generator Mul (k×G)	5.3 μs	9.9 μs	1.86×

CT layer provides: ct::field_mul, ct::field_inv, ct::scalar_mul, ct::point_add_complete, ct::point_dbl

Use the CT layer for: private key operations, signing, nonce generation, ECDH. Use the FAST layer for: verification, public key derivation, batch processing, benchmarks.

See THREAT_MODEL.md for a full layer-by-layer risk assessment.

secp256k1 Benchmarks — Cross-Platform Comparison

CPU: x86-64 vs ARM64 vs RISC-V

Operation	x86-64 (Clang 21, AVX2)	ARM64 (Cortex-A76)	RISC-V (Milk-V Mars)
Field Mul	17 ns	74 ns	95 ns
Field Square	14 ns	50 ns	70 ns
Field Add	1 ns	8 ns	11 ns
Field Inverse	1 μs	2 μs	4 μs
Point Add	159 ns	992 ns	1 μs
Generator Mul (k×G)	5 μs	14 μs	33 μs
Scalar Mul (k×P)	25 μs	131 μs	154 μs

GPU: CUDA vs OpenCL vs Metal

Operation	CUDA (RTX 5060 Ti)	OpenCL (RTX 5060 Ti)	Metal (M3 Pro)
Field Mul	0.2 ns	0.2 ns	1.9 ns
Field Inv	10.2 ns	14.3 ns	106.4 ns
Point Add	1.6 ns	1.6 ns	10.1 ns
Generator Mul (G×k)	217.7 ns	295.1 ns	3.00 μs

Embedded: ESP32-S3 vs ESP32 vs STM32

Operation	ESP32-S3 LX7 (240 MHz)	ESP32 LX6 (240 MHz)	STM32F103 (72 MHz)
Field Mul	6,105 ns	6,993 ns	15,331 ns
Field Square	5,020 ns	6,247 ns	12,083 ns
Field Add	850 ns	985 ns	4,139 ns
Field Inv	2,524 μs	609 μs	1,645 μs
Fast Scalar × G	5,226 μs	6,203 μs	37,982 μs
CT Scalar × G	15,527 μs	—	—
CT Generator × k	4,951 μs	—	—

Field Representation: 5×52 vs 4×64

Operation	4×64	5×52	Speedup
Multiplication	42 ns	15 ns	2.76×
Squaring	31 ns	13 ns	2.44×
Addition	4.3 ns	1.6 ns	2.69×
Add chain (32 ops)	286 ns	57 ns	5.01×

5×52 uses __int128 lazy reduction — ideal for 64-bit platforms.

For full benchmark results, see docs/BENCHMARKS.md.

secp256k1 on Embedded (ESP32 / STM32 / ARM Cortex-M)

UltrafastSecp256k1 runs on resource-constrained microcontrollers with portable C++ (no __int128, no assembly required):

ESP32-S3 (Xtensa LX7 @ 240 MHz): Fast scalar × G in 5.2 ms, CT generator × k in 4.9 ms
ESP32-PICO-D4 (Xtensa LX6 @ 240 MHz): Scalar × G in 6.2 ms, CT layer available (44.8 ms CT)
STM32F103 (ARM Cortex-M3 @ 72 MHz): Scalar × G in 38 ms with ARM inline assembly (UMULL/ADDS/ADCS)
Android ARM64 (RK3588, Cortex-A76 @ 2.256 GHz): Scalar × G in 14 μs, Scalar × P in 131 μs, ECDSA Sign 30 μs

All 37 library tests pass on every embedded target. See examples/esp32_test/ and examples/stm32_test/.

Porting to New Platforms

See PORTING.md for a step-by-step checklist to add new CPU architectures, embedded targets, or GPU backends.

WASM secp256k1 (Browser & Node.js)

WebAssembly build via Emscripten — runs secp256k1 in any modern browser or Node.js:

./scripts/build_wasm.sh        # → build-wasm/dist/

Output: secp256k1_wasm.wasm + secp256k1.mjs (ES6 module with TypeScript declarations). See wasm/README.md for JavaScript/TypeScript integration.

secp256k1 Batch Modular Inverse (Montgomery Trick)

All backends include batch modular inversion — a critical building block for Jacobian→Affine conversion:

Backend	Function	Notes
CPU	`fe_batch_inverse(FieldElement*, size_t)`	Montgomery trick with scratch buffer
CUDA	`batch_inverse_montgomery` / `batch_inverse_kernel`	GPU Montgomery trick kernel
Metal	`batch_inverse`	Chunked parallel threadgroups
OpenCL	Inline PTX inverse	Batch via host orchestration

Algorithm: Montgomery batch inverse computes N field inversions using only 1 modular inversion + 3(N−1) multiplications, amortizing the expensive inversion across the entire batch.

For N=1024: ~500× cheaper than individual inversions. A single field inversion costs ~3.5 μs (Fermat), while batch amortizes to ~7 ns per element.

Mixed Addition (Jacobian + Affine)

Branchless mixed addition (add_mixed_inplace) uses the madd-2007-bl formula: 7M + 4S (vs 11M + 5S for full Jacobian add).

#include <secp256k1/point.hpp>
using namespace secp256k1::fast;

Point P = Point::generator();
FieldElement gx = P.x(), gy = P.y();

// Compute 2G using mixed add (7M + 4S)
Point Q = Point::generator();
Q.add_mixed_inplace(gx, gy);  // Q = G + G = 2G

// Batch walk: P, P+G, P+2G, ...
Point walker = P;
for (int i = 0; i < 1000; ++i) {
    walker.add_mixed_inplace(gx, gy);  // walker += G each step
}

GPU Pattern: H-Product Serial Inversion

Production GPU apps use a memory-efficient variant: instead of storing full Z coordinates, jacobian_add_mixed_h returns H = U2 − X1 separately. Since Z_k = Z_0 · H_0 · H_1 · … · H_{k-1}, the entire Z chain is invertible from H values + initial Z_0.

Cost: 1 Fermat inversion + 2N multiplications per thread (vs N Fermat inversions naively).

See apps/secp256k1_search_gpu_only/gpu_only.cu (step kernel) + unified_split.cuh (batch inversion kernel)

secp256k1 Stable C ABI (`ufsecp`) — FFI Bindings

Starting with v3.4.0, UltrafastSecp256k1 ships a stable C ABI — ufsecp — designed for FFI bindings (C#, Python, Rust, Go, Java, Node.js, etc.):

┌──────────────────────────────────────────────────┐
│                  Your Application                │
│          (C, C#, Python, Go, Rust, …)            │
└──────────────────┬───────────────────────────────┘
                   │  ufsecp C ABI (45 functions)
┌──────────────────▼───────────────────────────────┐
│           ufsecp.dll / libufsecp.so              │
│  Opaque ctx  │  Error model  │  ABI versioning   │
├──────────────┴───────────────┴───────────────────┤
│   FAST layer (variable-time public ops)          │
├──────────────────────────────────────────────────┤
│   CT layer (constant-time secret-key ops)        │
└──────────────────────────────────────────────────┘

Both layers are always active — public operations use FAST; secret-key operations (sign, derive, ECDH) use CT internally.

Quick Start (C)

#include "ufsecp.h"

ufsecp_ctx* ctx = NULL;
ufsecp_ctx_create(&ctx);

// Generate keypair
unsigned char seckey[32], pubkey[33];
ufsecp_keygen(ctx, seckey, pubkey);

// ECDSA sign
unsigned char msg[32] = { /* SHA-256 hash */ };
unsigned char sig[64];
ufsecp_ecdsa_sign(ctx, seckey, msg, sig);

// Verify
int valid = 0;
ufsecp_ecdsa_verify(ctx, pubkey, 33, msg, sig, &valid);

ufsecp_ctx_destroy(ctx);

API Coverage

Category	Functions
Context	`ctx_create`, `ctx_destroy`, `selftest`, `last_error`
Keys	`keygen`, `seckey_verify`, `pubkey_create`, `pubkey_parse`, `pubkey_serialize`
ECDSA	`ecdsa_sign`, `ecdsa_verify`, `ecdsa_sign_der`, `ecdsa_verify_der`, `ecdsa_recover`
Schnorr	`schnorr_sign`, `schnorr_verify`
SHA-256	`sha256` (SHA-NI accelerated)
ECDH	`ecdh_compressed`, `ecdh_xonly`, `ecdh_raw`
BIP-32	`bip32_from_seed`, `bip32_derive_child`, `bip32_serialize`
Address	`address_p2pkh`, `address_p2wpkh`, `address_p2tr`
WIF	`wif_encode`, `wif_decode`
Tweak	`pubkey_tweak_add`, `pubkey_tweak_mul`
Version	`version`, `abi_version`, `version_string`

See SUPPORTED_GUARANTEES.md for Tier 1/2/3 stability guarantees.

secp256k1 Use Cases

Transaction Signing & Verification — Bitcoin, Ethereum, and 25+ blockchain transaction signing at CPU or GPU scale
Batch Signature Verification — verify thousands of ECDSA/Schnorr signatures per second for block validation
HD Wallet Key Derivation — BIP-32/44 hierarchical deterministic derivation with 27-coin address generation
Embedded IoT Signing — ESP32 and STM32 on-device key generation and transaction signing
High-Throughput Indexing — GPU-accelerated public key derivation for address indexing services
Zero-Knowledge Proof Systems — Pedersen commitments, adaptor signatures for ZK protocols
Multi-Party Computation — MuSig2 (BIP-327) and FROST threshold signing
Cross-Platform Cryptographic Services — single codebase across server (CUDA), desktop (OpenCL/Metal), mobile (ARM64), browser (WASM), and embedded (ESP32/STM32)
Cryptographic Research & Benchmarking — field/group operation microbenchmarks, algorithm variant comparison

Testers Wanted

We need community testers for platforms we cannot fully validate in CI:

iOS — Build & run on real iPhone/iPad hardware with Xcode

AMD GPU (ROCm/HIP) — Test on AMD Radeon RX / Instinct GPUs

Open an issue with your results!

Building secp256k1 from Source (CMake)

Prerequisites

CMake 3.18+
C++20 compiler (GCC 11+, Clang/LLVM 15+, MSVC 2022+ with -DSECP256K1_ALLOW_MSVC=ON)
CUDA Toolkit 12.0+ (optional, for GPU)
Ninja (recommended)

CPU-Only Build

cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

With CUDA GPU Support

cmake -S . -B build -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DSECP256K1_BUILD_CUDA=ON
cmake --build build -j

WebAssembly (Emscripten)

./scripts/build_wasm.sh        # → build-wasm/dist/

iOS (XCFramework)

./scripts/build_xcframework.sh  # → build-xcframework/output/

Universal XCFramework (arm64 device + arm64 simulator). Also available via Swift Package Manager and CocoaPods.

Build Options

Option	Default	Description
`SECP256K1_USE_ASM`	ON	Assembly optimizations (x64/ARM64/RISC-V)
`SECP256K1_BUILD_CUDA`	OFF	CUDA GPU support
`SECP256K1_BUILD_OPENCL`	OFF	OpenCL GPU support
`SECP256K1_BUILD_ROCM`	OFF	ROCm/HIP GPU support (AMD)
`SECP256K1_BUILD_TESTS`	ON	Test suite
`SECP256K1_BUILD_BENCH`	ON	Benchmarks
`SECP256K1_RISCV_FAST_REDUCTION`	ON	Fast modular reduction (RISC-V)
`SECP256K1_RISCV_USE_VECTOR`	ON	RVV vector extension (RISC-V)

For detailed build instructions, see docs/BUILDING.md.

secp256k1 Quick Start (C++ Examples)

Basic Point Operations

#include <secp256k1/field.hpp>
#include <secp256k1/point.hpp>
#include <secp256k1/scalar.hpp>
#include <iostream>

using namespace secp256k1::fast;

int main() {
    // Public key derivation: private_key × G = public_key
    auto generator = Point::generator();
    auto private_key = Scalar::from_hex(
        "E9873D79C6D87DC0FB6A5778633389F4453213303DA61F20BD67FC233AA33262"
    );
    auto public_key = generator * private_key;

    std::cout << "Public Key X: " << public_key.x().to_hex() << "\n";
    std::cout << "Public Key Y: " << public_key.y().to_hex() << "\n";
    return 0;
}

g++ -std=c++20 example.cpp -lsecp256k1-fast-cpu -o example && ./example

GPU Batch Multiplication

#include <secp256k1_cuda/batch_operations.hpp>
#include <secp256k1/point.hpp>
#include <vector>

using namespace secp256k1::fast;

int main() {
    std::vector<Point> base_points(1'000'000, Point::generator());
    std::vector<Scalar> scalars(1'000'000);
    for (auto& s : scalars) s = Scalar::random();

    cuda::BatchConfig config{.device_id = 0, .threads_per_block = 256, .streams = 4};
    auto results = cuda::batch_multiply(base_points, scalars, config);

    std::cout << "Processed " << results.size() << " point multiplications\n";
    return 0;
}

secp256k1 Security Model (FAST vs CT)

Two security profiles are always active — no flag-based selection:

FAST Profile (Default)

Maximum throughput, variable-time algorithms
Use for: verification, batch processing, public key derivation, benchmarking
⚠️ Not safe for secret key operations — timing side-channels possible

CT / Hardened Profile (`ct::` namespace)

Constant-time arithmetic — no secret-dependent branches or memory access
~5–7× performance penalty vs FAST
Use for: signing, private key handling, nonce generation, ECDH

Choose the appropriate profile for your use case. Using FAST with secret data is a security vulnerability. See THREAT_MODEL.md for full details.

secp256k1 Supported Coins (27 Blockchains)

#	Coin	Ticker	Address Types	BIP-44
1	Bitcoin	BTC	P2PKH, P2WPKH (Bech32), P2TR (Bech32m)	m/86'/0'
2	Ethereum	ETH	EIP-55 Checksum	m/44'/60'
3	Litecoin	LTC	P2PKH, P2WPKH	m/84'/2'
4	Dogecoin	DOGE	P2PKH	m/44'/3'
5	Bitcoin Cash	BCH	P2PKH	m/44'/145'
6	Bitcoin SV	BSV	P2PKH	m/44'/236'
7	Zcash	ZEC	P2PKH (transparent)	m/44'/133'
8	Dash	DASH	P2PKH	m/44'/5'
9	DigiByte	DGB	P2PKH, P2WPKH	m/44'/20'
10	Namecoin	NMC	P2PKH	m/44'/7'
11	Peercoin	PPC	P2PKH	m/44'/6'
12	Vertcoin	VTC	P2PKH, P2WPKH	m/44'/28'
13	Viacoin	VIA	P2PKH	m/44'/14'
14	Groestlcoin	GRS	P2PKH, P2WPKH	m/44'/17'
15	Syscoin	SYS	P2PKH	m/44'/57'
16	BNB Smart Chain	BNB	EIP-55	m/44'/60'
17	Polygon	MATIC	EIP-55	m/44'/60'
18	Avalanche	AVAX	EIP-55 (C-Chain)	m/44'/60'
19	Fantom	FTM	EIP-55	m/44'/60'
20	Arbitrum	ARB	EIP-55	m/44'/60'
21	Optimism	OP	EIP-55	m/44'/60'
22	Ravencoin	RVN	P2PKH	m/44'/175'
23	Flux	FLUX	P2PKH	m/44'/19167'
24	Qtum	QTUM	P2PKH	m/44'/2301'
25	Horizen	ZEN	P2PKH	m/44'/121'
26	Bitcoin Gold	BTG	P2PKH	m/44'/156'
27	Komodo	KMD	P2PKH	m/44'/141'

All EVM chains (ETH, BNB, MATIC, AVAX, FTM, ARB, OP) share the same address format (EIP-55 checksummed hex).

secp256k1 Architecture

UltrafastSecp256k1/
├── cpu/                 # CPU-optimized implementation
│   ├── include/         # Public headers (field.hpp, scalar.hpp, point.hpp, ecdsa.hpp, schnorr.hpp)
│   ├── src/             # Implementation (field_asm_x64.asm, field_asm_riscv64.S, ...)
│   ├── fuzz/            # libFuzzer harnesses
│   └── tests/           # Unit tests
├── cuda/                # CUDA GPU acceleration
├── opencl/              # OpenCL GPU acceleration
├── metal/               # Apple Metal GPU acceleration
├── wasm/                # WebAssembly (Emscripten)
├── android/             # Android NDK (ARM64)
├── include/ufsecp/      # Stable C ABI
├── examples/
│   ├── esp32_test/      # ESP32-S3 Xtensa LX7 port
│   └── stm32_test/      # STM32F103 ARM Cortex-M3 port
└── docs/                # Documentation

secp256k1 Testing & Verification

Built-in Selftest

Every executable runs a deterministic Known Answer Test (KAT) on startup, covering all arithmetic operations:

Mode	Time	When	What
smoke	~1-2s	App startup, embedded	Core KAT (10 scalar mul, field/scalar identities, boundary vectors)
ci	~30-90s	Every push (CI)	Smoke + cross-checks, bilinearity, NAF/wNAF, batch sweeps, algebraic stress
stress	~10-60min	Nightly / manual	CI + 1000 random scalar muls, 500 field triples, batch inverse up to 8192

#include "secp256k1/selftest.hpp"
using namespace secp256k1::fast;

Selftest(true, SelftestMode::smoke);              // Fast startup check
Selftest(true, SelftestMode::ci);                  // Full CI suite
Selftest(true, SelftestMode::stress, 0xDEADBEEF); // Nightly with custom seed

Sanitizer Builds

cmake --preset cpu-asan && cmake --build build/cpu-asan -j    # ASan + UBSan
cmake --preset cpu-tsan && cmake --build build/cpu-tsan -j    # TSan (data races)
ctest --test-dir build/cpu-asan --output-on-failure

Fuzz Testing

libFuzzer harnesses cover core arithmetic (cpu/fuzz/):

Target	What it tests
`fuzz_field`	add/sub round-trip, mul identity, square, inverse
`fuzz_scalar`	add/sub, mul identity, distributive law
`fuzz_point`	on-curve check, negate, compress round-trip, dbl vs add

Platform CI Coverage

Platform	Backend	Compiler	Status
Linux x64	CPU	GCC 13 / Clang 17	✅ CI
Linux x64	CPU	Clang 17 (ASan+UBSan)	✅ CI
Linux x64	CPU	Clang 17 (TSan)	✅ CI
Windows x64	CPU	MSVC 2022	✅ CI
macOS ARM64	CPU + Metal	AppleClang	✅ CI
iOS ARM64	CPU	Xcode	✅ CI
Android ARM64	CPU	NDK r27c	✅ CI
WebAssembly	CPU	Emscripten	✅ CI
ROCm/HIP	CPU + GPU	ROCm 6.3	✅ CI

secp256k1 Benchmark Targets

Target	Description
`bench_comprehensive`	Full field/point/batch/signature suite
`bench_scalar_mul`	k×G and k×P with wNAF analysis
`bench_ct`	Fast-vs-CT overhead comparison
`bench_atomic_operations`	Individual ECC building block latencies
`bench_field_52`	4×64 vs 5×52 field representation
`bench_ecdsa_multiscalar`	k₁×G + k₂×Q (Shamir vs separate)
`bench_jsf_vs_shamir`	JSF vs Windowed Shamir comparison
`bench_adaptive_glv`	GLV window size sweep (8–20)
`bench_comprehensive_riscv`	RISC-V optimized benchmark suite

Research Statement

This library explores the performance ceiling of secp256k1 across CPU architectures (x64, ARM64, RISC-V, Cortex-M, Xtensa) and GPUs (CUDA, OpenCL, Metal, ROCm). Zero external dependencies. Pure C++20.

API Stability

C++ API: Not yet stable. Breaking changes may occur before v4.0. Core layers (field, scalar, point, ECDSA, Schnorr) are mature. Experimental layers (MuSig2, FROST, Adaptor, Pedersen, Taproot, HD, Coins) may change.

C ABI (ufsecp): Stable from v3.4.0. ABI version tracked separately. See SUPPORTED_GUARANTEES.md.

Documentation

Document	Description
API Reference	Full C++ and C ABI reference
Build Guide	Detailed build instructions for all platforms
Benchmarks	Complete benchmark results and methodology
Threat Model	Layer-by-layer security risk assessment
Security Policy	Vulnerability reporting and audit status
Porting Guide	Add new platforms, architectures, GPU backends
RISC-V Optimizations	RISC-V assembly details
ESP32 Setup	ESP32 embedded development guide
Contributing	Development guidelines
Changelog	Version history

Contributing

Contributions are welcome! Please read CONTRIBUTING.md.

git clone https://github.com/shrec/UltrafastSecp256k1.git
cd UltrafastSecp256k1
cmake -S . -B build-dev -G Ninja -DCMAKE_BUILD_TYPE=Debug
cmake --build build-dev -j
ctest --test-dir build-dev --output-on-failure

License

GNU Affero General Public License v3.0 (AGPL-3.0)

✅ Use, modify, and distribute under AGPL-3.0
✅ Must disclose source code
✅ Must provide network access to source if run as a service

Commercial License: For proprietary use without AGPL obligations, contact payysoon@gmail.com.

See LICENSE for full details.

Contact & Community

Channel	Link
Issues	GitHub Issues
Discussions	GitHub Discussions
Wiki	Documentation Wiki
Benchmarks	Live Dashboard
Security	Report Vulnerability
Commercial	payysoon@gmail.com

Acknowledgements

UltrafastSecp256k1 is an independent implementation — written from scratch with our own architecture, GPU pipeline, embedded ports, and optimization techniques. At the same time, no project exists in a vacuum. The published research, specifications, and open discussions from the wider cryptographic community helped us refine our own ideas and validate our results.

We want to acknowledge the teams whose public work informed parts of our journey:

bitcoin-core/secp256k1 — The reference C library whose published research on constant-time field arithmetic and endomorphism-based scalar multiplication (GLV, Strauss, Pippenger) helped us benchmark and verify our own independent implementations on GPU and embedded targets.
Bitcoin Core contributors — For open specifications (BIP-340 Schnorr, BIP-341 Taproot, RFC 6979) and a correctness-first engineering culture that benefits everyone building in this space.
Pieter Wuille, Jonas Nick, Tim Ruffing and the libsecp256k1 maintainers — For publicly sharing their research on side-channel resistance, exhaustive testing, and field representation trade-offs. Their published findings helped us make better decisions when designing our own architecture.

We share our optimizations, GPU kernels, embedded ports, and cross-platform techniques freely — because open-source cryptography grows stronger when knowledge flows in every direction.

Special thanks to the Stacker News and Delving Bitcoin communities for their early support and technical feedback.

Extra gratitude to @0xbitcoiner for the initial outreach and for helping bridge the project with the wider Bitcoin developer ecosystem.

⚡ Support the Project

If you find UltrafastSecp256k1 useful, consider supporting its development!

Lightning Address: shrec@stacker.news — send sats via any Lightning wallet or stacker.news/shrec

UltrafastSecp256k1 — The fastest open-source secp256k1 library. GPU-accelerated ECDSA & Schnorr signatures for Bitcoin, Ethereum, and 25+ blockchains. Zero dependencies. Constant-time layer. 12+ platforms.

README.md Unescape Escape