sparrowwallet/UltrafastSecp256k1

Fork 0

shrec 8af7320c60

Harden audit and fix Windows CUDA build

2026-03-25 14:36:36 +00:00

13 KiB

Raw Permalink Blame History

GPU Validation Matrix

Unified view of GPU backend validation coverage in UltrafastSecp256k1.

The machine-readable companion for backend status, publishability, and artifact requirements is docs/GPU_BACKEND_EVIDENCE.json. When this document and the JSON diverge, treat the JSON as the enforcement source.

This document answers four practical questions for each backend:

Do we have correctness tests?
Do we have a unified self-audit runner?
Do we have benchmark coverage?
Do we have host-side integration tests?

It is intended as an engineering checklist, not a marketing page.

C ABI Ops -- Per-Backend Status

The C ABI layer (ufsecp_gpu.h) currently exposes 13 backend-neutral GPU batch operations. CUDA, OpenCL, and Metal all implement that stable public surface. UFSECP_ERR_GPU_UNSUPPORTED (104) remains part of the ABI for unsupported backend selection, missing device/runtime capability, or invalid execution context, not as a standing parity gap for the stable 13-op surface.

Operation	CUDA	OpenCL	Metal	Data Class
`generator_mul_batch`	implemented	implemented	implemented	PUBLIC
`ecdsa_verify_batch`	implemented	implemented	implemented	PUBLIC
`schnorr_verify_batch`	implemented	implemented	implemented	PUBLIC
`ecdh_batch`	implemented	implemented	implemented	SECRET
`hash160_pubkey_batch`	implemented	implemented	implemented	PUBLIC
`msm`	implemented	implemented	implemented	PUBLIC
`frost_verify_partial_batch`	implemented	implemented	implemented	PUBLIC
`ecrecover_batch`	implemented	implemented	implemented	PUBLIC
`zk_knowledge_verify_batch`	implemented	implemented	implemented	PUBLIC
`zk_dleq_verify_batch`	implemented	implemented	implemented	PUBLIC
`bulletproof_verify_batch`	implemented	implemented	implemented	PUBLIC
`bip324_aead_encrypt_batch`	implemented	implemented	implemented	PUBLIC
`bip324_aead_decrypt_batch`	implemented	implemented	implemented	SECRET
Total (unified GPU C ABI)	13/13	13/13	13/13

Expansion Roadmap

Unified 13/13 GPU C ABI parity is closed across CUDA, OpenCL, and Metal.
The five ZK/BIP-324 batch ops are implemented on all three backends and exposed through the stable C ABI.
Remaining GPU governance work is no longer backend parity; it is hardware-backed publishability, artifact retention, and cross-device reproducibility.

C ABI Test Coverage

Test	Scope	Guard
`gpu_abi_gate`	ABI surface, error codes, discovery, lifecycle, NULL handling	GPU host + ufsecp
`gpu_ops_equivalence`	GPU vs CPU reference for the stable public op surface; unsupported paths remain negative-test coverage	GPU host + ufsecp
`gpu_host_api_negative`	NULL ptrs, count=0, invalid backend/device, error strings	GPU host + ufsecp
`gpu_backend_matrix`	Backend enumeration, device info, per-backend op probing	GPU host + ufsecp

CI and Local Verification

Environment	CUDA	OpenCL	Metal	Tests
Local (dev machine)	[OK] RTX 5060 Ti	[OK] RTX 5060 Ti	N/A (Linux)	All 49 tests pass including gpu_abi_gate, gpu_ops_equivalence, gpu_host_api_negative, gpu_backend_matrix
Local Docker parity	[OK]* host GPU passthrough	[OK]* host OpenCL/runtime passthrough	N/A (Linux)	Same Linux CI toolchain via `docker-compose.ci.yml` / `Dockerfile.local-ci`; GPU slices remain host-hardware dependent
GitHub Actions CI	N/A (no GPU runners)	N/A (no GPU runners)	[OK] macOS (lifecycle)	Metal discovery + lifecycle via macOS job

Note

: GitHub Actions standard runners do not have NVIDIA GPUs or OpenCL devices, but that is not the only reproducible Linux path. The repository ships a containerized local CI stack in Dockerfile.local-ci, docker-compose.ci.yml, and docs/LOCAL_CI.md, so contributors can reproduce the same Linux toolchain in Docker on their own machines. GPU validation still requires the host GPU driver/runtime stack and appropriate device passthrough into the container or local process.

Summary

Backend	Correctness Tests	Unified Audit	Unified Bench	Host / Integration	Notes
CUDA	[OK]	[OK]	[OK]	[OK]	Strongest GPU validation path today
ROCm/HIP	[!] Planned / Source-Shared	[!] Planned / Source-Shared	[!] Planned / Source-Shared	[!] Planned / Source-Shared	Optional future backend expansion; current absence of AMD hardware does not invalidate existing audited backends
OpenCL	[OK]	[OK]	[OK]	[OK]	Good coverage, entry points are more distributed
Metal	[OK]	[OK]	[OK]	[OK]	Good coverage on Apple platforms

ROCm/HIP reuses the CUDA/HIP source tree and runners, but real AMD GPU validation is still pending.

That pending state is intentionally non-blocking for current audit validity. CPU, CUDA, OpenCL, and Metal assurance claims remain grounded in their own existing evidence paths; ROCm/HIP only becomes relevant once the project chooses to publish real AMD-backed claims.

This status is intentionally fail-closed in docs/GPU_BACKEND_EVIDENCE.json: ROCm/HIP remains non-publishable until AMD hardware-backed benchmark and audit artifacts exist.

CUDA / ROCm

Main Entry Points

Benchmark: gpu_bench_unified.cu
Audit runner: gpu_audit_runner.cu
Full test suite: test_suite.cu
CT smoke: test_ct_smoke.cu
Specialized benches:

Coverage

Area	Status	Notes
Field arithmetic	[OK]	Included in selftest + audit runner + unified bench
Scalar arithmetic	[OK]	Included in unified bench and audit runner
Point operations	[OK]	Add/double/kG/kP covered
ECDSA	[OK]	Sign/verify in bench + audit
Schnorr	[OK]	Sign/verify in bench + audit
ECDH	[OK]	Present in audit runner
Recovery	[OK]	Present in audit runner
Batch verify	[OK]	Included in audit runner
BIP32	[OK]	Present in audit runner
CT GPU path	[OK]	Bench + CT smoke present
Real workload benches	[OK]	BIP-352 and ZK present

Current Strength

CUDA is the most unified backend today. If someone asks, "Which GPU backend has the cleanest validation story?" the answer is CUDA.

For Linux reproducibility, that story is not limited to the original developer workstation: the same local CI environment can be recreated through the repo's Docker-based local parity infrastructure, then paired with host GPU access for CUDA/OpenCL runs.

Remaining Engineering Gaps

ROCm/HIP should not be treated as validated until tested on real AMD hardware.
Keep cross-device reproducibility artifacts organized by GPU model and driver version.
Keep backend-specific regression logs together with benchmark JSON/TXT artifacts.

OpenCL

Main Entry Points

Audit runner: opencl_audit_runner.cpp
Selftest: opencl_selftest.cpp
Extended test + bench: opencl_extended_test.cpp
Basic test harness: test_opencl.cpp
Benchmark app: bench_opencl.cpp

Coverage

Area	Status	Notes
Field arithmetic	[OK]	Covered by selftest + extended test
Point operations	[OK]	Covered by selftest + extended test
Scalar / hash / ECDSA / Schnorr / ECDH / recovery / MSM	[OK]	Covered via extended kernel set + host test
Audit report generation	[OK]	`opencl_audit_runner` exists
Benchmark coverage	[OK]	`opencl_benchmark` + extended test bench mode
Host integration	[OK]	Dedicated host-side extended test

Current Strength

OpenCL has broad native validation coverage already and now matches CUDA and Metal on the stable public GPU C ABI surface.

The repo's local Docker parity environment also helps standardize the Linux-side toolchain for OpenCL validation; the remaining variable is the host OpenCL ICD/device stack, not the surrounding build/test container.

Remaining Engineering Gaps

Entry points are more fragmented than CUDA.
A single "OpenCL unified benchmark" story should stay easy to discover in docs.
Cross-vendor reports should be organized clearly: NVIDIA OpenCL, AMD OpenCL, Intel OpenCL.

Metal

Main Entry Points

Audit runner: metal_audit_runner.mm
Extended test + bench: metal_extended_test.mm
Host test: test_metal_host.cpp
App bench/test: metal_test.mm
Metal bench app: bench_metal.mm

Coverage

Area	Status	Notes
Field arithmetic	[OK]	Covered in tests and app bench
Point operations	[OK]	Covered in tests and app bench
Extended crypto ops	[OK]	Covered by extended test
Audit report generation	[OK]	`metal_audit_runner` exists
Benchmark coverage	[OK]	Bench mode and app bench exist
Host integration	[OK]	Dedicated host test present

Current Strength

Metal has a reasonably complete validation stack and is already beyond "demo backend" level.

Remaining Engineering Gaps

Keep Apple GPU model coverage explicit in benchmark docs.
Keep shader/library build steps easy to reproduce from CI and local machines.

Recommended Backend Checklist

Use this checklist before calling a GPU backend "fully validated" for a release candidate:

Backend selftest passes
Backend audit runner passes
Unified benchmark runs and emits report
Host-side integration test passes
One real-device benchmark artifact is saved
One real-device audit artifact is saved
Driver/toolkit version is recorded
JSON + TXT reports are archived

ROCm/HIP Promotion Checklist

ROCm/HIP is source-shared with the CUDA/HIP portability layer, but it is not promoted by source compatibility alone. Promotion from planned to validated and publishable: true requires all of the following on real AMD hardware:

docs/GPU_BACKEND_EVIDENCE.json is updated so rocm-hip becomes validated, hardware_backed: true, and publishable: true in the same change.
Benchmark artifacts exist in both JSON and text form for the recorded AMD device.
Audit runner output exists for the same hardware and is reproducible from the recorded command path.
Driver/runtime metadata is archived, including ROCm driver version and AMD device model.
Any published numbers identify the exact device class and do not reuse CUDA labels or NVIDIA-only evidence.
python3 scripts/preflight.py --gpu-evidence and python3 scripts/validate_assurance.py pass after the evidence update.

Until those conditions are met, ROCm/HIP remains deliberately fail-closed for public benchmark and validation claims. This is a containment rule for future AMD-specific claims, not a defect in the validity of the current audit and validation surfaces.

This checklist is enforceable through:

python3 scripts/preflight.py --gpu-evidence
python3 scripts/validate_assurance.py

Practical Reading

If the goal is day-to-day engineering confidence:

Start with CUDA as the reference GPU backend.
Treat OpenCL and Metal as validated but separately operationalized backends.
Treat ROCm/HIP as source-compatible with CUDA, but not promotion-eligible until the AMD hardware-backed checklist above is complete.

13 KiB Raw Permalink Blame History

GPU Validation Matrix

C ABI Ops -- Per-Backend Status

Expansion Roadmap

C ABI Test Coverage

CI and Local Verification

Summary

CUDA / ROCm

Main Entry Points

Coverage

Current Strength

Remaining Engineering Gaps

OpenCL

Main Entry Points

Coverage

Current Strength

Remaining Engineering Gaps

Metal

Main Entry Points

Coverage

Current Strength

Remaining Engineering Gaps

Recommended Backend Checklist

ROCm/HIP Promotion Checklist

Practical Reading

13 KiB

Raw Permalink Blame History