fix: Metal device validation + GPU audit presets + docs + examples (#146)
* fix: CUDA RIPEMD160 r2 table + ECDH y-parity + GPU-side conversion
- hash160.cuh: fix transposed r2[46..47] (was 13,4 -> correct 4,13)
- ecdh.cuh: compute y-parity for SHA-256(02/03||x) to match CPU ecdh_compute
- gpu_backend_cuda.cu: GPU-side Jacobian<->compressed conversion
via batch_jac_to_compressed_kernel and batch_compressed_to_jac_kernel;
fix bytes_to_scalar/bytes_to_field byte order;
add msm_reduce_and_compress_kernel for GPU-side MSM accumulation;
remove dead host-side FieldElement code
- gpu/CMakeLists.txt: remove gpu_backend_cuda_host.cpp
- bindings/rust: add hex dev-dep, fix abi_version() call in smoke test
- bindings/nodejs: add koffi-based smoke test (Node 22 compatible)
GPU test results: 154 passed, 0 failed
gpu_abi_gate: 44/44
gpu_ops_equivalence: 55/55
gpu_host_api_negative: 55/55
Binding tests:
Rust (cargo test): 13/13
Node.js (koffi): 12/12
* examples: add 6-language example suite (C, Python, Rust, Node.js, Go, Java)
Comprehensive CPU + GPU examples for all supported binding languages:
- C: Direct C ABI calls, 16 demo sections (CPU + GPU)
- Python: ctypes + Ufsecp wrapper, 14 sections (CPU + GPU)
- Rust: Safe ufsecp crate wrapper, 9 sections (CPU only)
- Node.js: koffi FFI, 12 sections (CPU + GPU)
- Go: Pure cgo, 14 sections (CPU + GPU)
- Java: JNA FFI, 14 sections (CPU + GPU)
Each example covers: key generation, ECDSA (sign/verify/recover/DER),
Schnorr (BIP-340), ECDH, hashing (SHA-256, Hash160), Bitcoin addresses
(P2PKH, P2WPKH, P2TR), WIF encoding, BIP-32 HD derivation, and Taproot.
GPU examples demonstrate: backend discovery, batch key generation,
batch ECDSA verify, batch Hash160, and multi-scalar multiplication.
All 6 examples tested and verified against RTX 5060 Ti (CUDA + OpenCL).
* examples: add GPU+Pedersen to all 6 languages, expand README
- Rust: add GPU sections [10]-[15] + Pedersen, add GPU+Pedersen FFI to ufsecp-sys
- Node.js: add BIP-32, Taproot, Pedersen sections [8]-[10], renumber GPU [11]-[15]
- Python: add Pedersen section [10] via direct ctypes
- Go: add Pedersen section [10] via cgo
- Java: add Pedersen JNA declarations + section [10]
- Rust safe wrapper: add Context::as_ptr() for GPU FFI access
- examples/README.md: comprehensive rewrite with all 6 languages, build/run
instructions, feature coverage matrix, embedded platforms, troubleshooting
- README.md: add Highlights, Performance, Architecture stack diagram,
Hardware Compatibility table (16 platforms), Embedded Targets, Examples
index, Use Cases section; expand Architecture with bindings + source tree
All 6 examples tested: C 16/16, Rust 15/15, Python 15/15, Node.js 15/15,
Go 15/15, Java 15/15 -- all sections pass including GPU operations.
* docs: fix GPU backend maturity labels, add docs links, clean repo hygiene
- .gitignore: add rules for example build artifacts (binaries, node_modules,
target/, Cargo.lock, package-lock.json, .class files)
- README.md: fix GPU overclaims -- OpenCL is partial (4/6 ops), Metal is
experimental (discovery only); add GPU API, Validation Matrix, Feature
Maturity, Supported Guarantees, Examples to Documentation table
- FEATURE_MATURITY.md: fix contradictions -- ECDSA/Schnorr verify GPU column
corrected from 'all 3' to 'CUDA' (OpenCL UNSUPPORTED per ufsecp_gpu.h);
BIP-32 HD GPU corrected from 'all 3' to '-' (no GPU C ABI path)
- GPU_VALIDATION_MATRIX.md: add CI and Local Verification table documenting
that CUDA/OpenCL tests pass locally (RTX 5060 Ti), GH Actions lacks GPU
runners; all 4 GPU C ABI tests (gpu_abi_gate, gpu_ops_equivalence,
gpu_host_api_negative, gpu_backend_matrix) confirmed in ctest matrix (49
total) and pass
* fix: Metal device index validation + canonical GPU audit presets
- Metal backend: reject out-of-range device_index in init() before
creating MetalRuntime (fixes gpu_host_api_negative on macOS CI)
- CMakePresets.json: add cuda-audit, cuda-audit-5060ti configure/build
presets and testPresets for reproducible 49-test GPU verification
- docs/BUILDING.md: document canonical GPU audit build path
- docs/LOCAL_CI.md: add GPU proof-path quick-reference
- docs/README.md: update docs index entry
---------
Co-authored-by: shrec <shrec@users.noreply.github.com>