Commit Graph

10 Commits

Author SHA1 Message Date
shrec
8af7320c60
Harden audit and fix Windows CUDA build 2026-03-25 14:36:36 +00:00
shrec
461ecca3c1
docs: add 5 ZK/BIP-324 GPU batch ops to all docs
Update 8 documentation files to reflect the new GPU C ABI surface:
- CHANGELOG: 18→23 gpu functions, added entry for GPU ZK+BIP-324 C ABI
- README: 8 ops → 13 ops (8 core + 5 CUDA-first)
- BACKEND_ASSURANCE_MATRIX: new feature rows + parity tracking stubs
- BACKEND_PARITY: new GPU batch ops section, 3.4.0 note
- GPU_VALIDATION_MATRIX: 8→13 ops, CUDA 13/13, OCL/Metal 8/13
- API_REFERENCE: 5 new C ABI function signatures
- FEATURE_ASSURANCE_LEDGER: 18→23 gpu functions, 5 new batch op rows
- FEATURE_MATURITY: 18→23 gpu functions count
2026-03-24 16:43:52 +00:00
shrec
3a86fcef1e
Harden ABI and finish bindings validation 2026-03-23 02:30:44 +00:00
shrec
852255bf72
docs: sync GPU ABI and batch signing docs 2026-03-22 16:57:51 +00:00
shrec
c7914410db
Merge remote-tracking branch 'origin/fix/metal-validation-gpu-presets' into dev 2026-03-21 14:15:44 +00:00
shrec
fe64eb2744
infra: self-hosted GPU CI runner + workflow + release evidence
- Add gpu-selfhosted.yml: CUDA CI on parking-gpu (RTX 5060 Ti)
  - Triggers: push/PR dev+main, nightly 04:00 UTC, manual dispatch
  - GPU C ABI tests, unified audit, benchmarks, backend matrix
  - Environment manifest + per-commit proof summary artifacts
  - 90-day artifact retention
- Update release.yml: GPU evidence bundle in releases
- Add INFRA_AMPLIFICATION_TODO.md: completed P0-P4 tracking
- Fix audit/CMakeLists.txt: prefer ufsecp_shared for GPU test linking
- Update GPU_VALIDATION_MATRIX.md, README, docs, ufsecp_gpu.h
2026-03-15 23:16:17 +00:00
Vano Chkheidze
d16f254f8f
fix: Metal device validation + GPU audit presets + docs + examples (#146)
* fix: CUDA RIPEMD160 r2 table + ECDH y-parity + GPU-side conversion

- hash160.cuh: fix transposed r2[46..47] (was 13,4 -> correct 4,13)
- ecdh.cuh: compute y-parity for SHA-256(02/03||x) to match CPU ecdh_compute
- gpu_backend_cuda.cu: GPU-side Jacobian<->compressed conversion
  via batch_jac_to_compressed_kernel and batch_compressed_to_jac_kernel;
  fix bytes_to_scalar/bytes_to_field byte order;
  add msm_reduce_and_compress_kernel for GPU-side MSM accumulation;
  remove dead host-side FieldElement code
- gpu/CMakeLists.txt: remove gpu_backend_cuda_host.cpp
- bindings/rust: add hex dev-dep, fix abi_version() call in smoke test
- bindings/nodejs: add koffi-based smoke test (Node 22 compatible)

GPU test results: 154 passed, 0 failed
  gpu_abi_gate:          44/44
  gpu_ops_equivalence:   55/55
  gpu_host_api_negative: 55/55

Binding tests:
  Rust (cargo test):     13/13
  Node.js (koffi):       12/12

* examples: add 6-language example suite (C, Python, Rust, Node.js, Go, Java)

Comprehensive CPU + GPU examples for all supported binding languages:

- C:       Direct C ABI calls, 16 demo sections (CPU + GPU)
- Python:  ctypes + Ufsecp wrapper, 14 sections (CPU + GPU)
- Rust:    Safe ufsecp crate wrapper, 9 sections (CPU only)
- Node.js: koffi FFI, 12 sections (CPU + GPU)
- Go:      Pure cgo, 14 sections (CPU + GPU)
- Java:    JNA FFI, 14 sections (CPU + GPU)

Each example covers: key generation, ECDSA (sign/verify/recover/DER),
Schnorr (BIP-340), ECDH, hashing (SHA-256, Hash160), Bitcoin addresses
(P2PKH, P2WPKH, P2TR), WIF encoding, BIP-32 HD derivation, and Taproot.

GPU examples demonstrate: backend discovery, batch key generation,
batch ECDSA verify, batch Hash160, and multi-scalar multiplication.

All 6 examples tested and verified against RTX 5060 Ti (CUDA + OpenCL).

* examples: add GPU+Pedersen to all 6 languages, expand README

- Rust: add GPU sections [10]-[15] + Pedersen, add GPU+Pedersen FFI to ufsecp-sys
- Node.js: add BIP-32, Taproot, Pedersen sections [8]-[10], renumber GPU [11]-[15]
- Python: add Pedersen section [10] via direct ctypes
- Go: add Pedersen section [10] via cgo
- Java: add Pedersen JNA declarations + section [10]
- Rust safe wrapper: add Context::as_ptr() for GPU FFI access
- examples/README.md: comprehensive rewrite with all 6 languages, build/run
  instructions, feature coverage matrix, embedded platforms, troubleshooting
- README.md: add Highlights, Performance, Architecture stack diagram,
  Hardware Compatibility table (16 platforms), Embedded Targets, Examples
  index, Use Cases section; expand Architecture with bindings + source tree

All 6 examples tested: C 16/16, Rust 15/15, Python 15/15, Node.js 15/15,
Go 15/15, Java 15/15 -- all sections pass including GPU operations.

* docs: fix GPU backend maturity labels, add docs links, clean repo hygiene

- .gitignore: add rules for example build artifacts (binaries, node_modules,
  target/, Cargo.lock, package-lock.json, .class files)
- README.md: fix GPU overclaims -- OpenCL is partial (4/6 ops), Metal is
  experimental (discovery only); add GPU API, Validation Matrix, Feature
  Maturity, Supported Guarantees, Examples to Documentation table
- FEATURE_MATURITY.md: fix contradictions -- ECDSA/Schnorr verify GPU column
  corrected from 'all 3' to 'CUDA' (OpenCL UNSUPPORTED per ufsecp_gpu.h);
  BIP-32 HD GPU corrected from 'all 3' to '-' (no GPU C ABI path)
- GPU_VALIDATION_MATRIX.md: add CI and Local Verification table documenting
  that CUDA/OpenCL tests pass locally (RTX 5060 Ti), GH Actions lacks GPU
  runners; all 4 GPU C ABI tests (gpu_abi_gate, gpu_ops_equivalence,
  gpu_host_api_negative, gpu_backend_matrix) confirmed in ctest matrix (49
  total) and pass

* fix: Metal device index validation + canonical GPU audit presets

- Metal backend: reject out-of-range device_index in init() before
  creating MetalRuntime (fixes gpu_host_api_negative on macOS CI)
- CMakePresets.json: add cuda-audit, cuda-audit-5060ti configure/build
  presets and testPresets for reproducible 49-test GPU verification
- docs/BUILDING.md: document canonical GPU audit build path
- docs/LOCAL_CI.md: add GPU proof-path quick-reference
- docs/README.md: update docs index entry

---------

Co-authored-by: shrec <shrec@users.noreply.github.com>
2026-03-16 02:42:54 +04:00
shrec
6d92a882f3
docs: fix GPU backend maturity labels, add docs links, clean repo hygiene
- .gitignore: add rules for example build artifacts (binaries, node_modules,
  target/, Cargo.lock, package-lock.json, .class files)
- README.md: fix GPU overclaims -- OpenCL is partial (4/6 ops), Metal is
  experimental (discovery only); add GPU API, Validation Matrix, Feature
  Maturity, Supported Guarantees, Examples to Documentation table
- FEATURE_MATURITY.md: fix contradictions -- ECDSA/Schnorr verify GPU column
  corrected from 'all 3' to 'CUDA' (OpenCL UNSUPPORTED per ufsecp_gpu.h);
  BIP-32 HD GPU corrected from 'all 3' to '-' (no GPU C ABI path)
- GPU_VALIDATION_MATRIX.md: add CI and Local Verification table documenting
  that CUDA/OpenCL tests pass locally (RTX 5060 Ti), GH Actions lacks GPU
  runners; all 4 GPU C ABI tests (gpu_abi_gate, gpu_ops_equivalence,
  gpu_host_api_negative, gpu_backend_matrix) confirmed in ctest matrix (49
  total) and pass
2026-03-15 21:15:55 +00:00
shrec
0afb34402d
gpu: complete P0-P2 GPU API TODO -- CUDA/OpenCL/Metal
CUDA completeness (P0):
- Implement ecdh_batch kernel (thin wrapper over cuda::ecdh_compute)
- Implement MSM: scatter k*P kernels + host-side accumulation
- Split compilation: extract FieldElement-dependent code to
  gpu_backend_cuda_host.cpp (host compiler, not nvcc) with POD-only
  header gpu_cuda_host_helpers.h
- Fix namespace: extern kernel decls in secp256k1::cuda:: (not gpu::)
- Add POSITION_INDEPENDENT_CODE for CUDA static lib (shared lib linking)

Test infrastructure (P0):
- test_gpu_ops_equivalence.cpp: 6 equivalence tests (CPU vs GPU)
- test_gpu_host_api_negative.cpp: 8 groups, 55 negative checks
- test_gpu_backend_matrix.cpp: backend enum + per-backend op probing
- Wire 3 new test targets in audit/CMakeLists.txt

OpenCL expansion (P1):
- Implement ecdh_batch, hash160_pubkey_batch, msm (4/6 ops now)
- ECDSA/Schnorr verify remain UNSUPPORTED stubs

Error model hardening (P1):
- Reorder all backends: count=0 check before NULL check (no-op is valid)
- Unsupported stubs check is_ready() before returning
- MSM n=0 returns Ok, infinity returns Arith

Metal scope clarity (P2):
- Mark Metal backend EXPERIMENTAL in header + ufsecp_gpu.h
- All Metal stubs check is_ready()

LTO/CUDA compatibility:
- Guard INTERFACE -flto propagation when CUDA is enabled
  (nvcc LTO v12 vs GCC LTO v14 version mismatch)
- Add IPO opt-out for GPU-linked test targets

Documentation:
- GPU_VALIDATION_MATRIX.md: feature maturity table
- TEST_MATRIX.md: 3 new GPU test targets
- AUDIT_TRACEABILITY.md: GPU coverage entries
- ufsecp_gpu.h: feature maturity section (CUDA 6/6, OpenCL 4/6, Metal 0/6)
- C example: include/ufsecp/examples/gpu_example.c
2026-03-15 18:24:12 +00:00
shrec
331beb01ac
security: comprehensive hardening + adversarial testing + debug invariants
- Parser strictness: FE::parse_bytes_strict at ABI boundary (batch verify,
  batch identify, adaptor verify)
- MuSig2 nonce reuse fix (CRITICAL): secnonce zeroed after partial_sign
- BIP-32 overflow fix: uint64_t bounds check on derivation index
- BIP-39 validate: null ctx guard added
- ECIES: HMAC covers ephemeral pubkey, OS CSPRNG, strict prefix validation
- Zeroization audit: secure_erase across 34 functions (ufsecp_impl, ecdh, ecies)

Testing:
- test_adversarial_protocol.cpp: 37 tests, 7 categories (MuSig2, FROST,
  Silent Payments, adaptor sigs, BIP-32, FFI hostile-caller), 150/150 pass
- CPU-GPU cross-verification: gen_mul, field_mul, ECDSA (GPU audit 45/45)
- Debug invariants deployed: 15 SECP_ASSERT_* insertions across point.cpp,
  ecdsa.cpp, schnorr.cpp, scalar.cpp (zero overhead in Release)

Documentation:
- docs/FEATURE_ASSURANCE_LEDGER.md: 96 API functions, 25 categories,
  8 coverage dimensions
- docs/GPU_VALIDATION_MATRIX.md: CUDA/OpenCL/Metal validation checklist
- docs/ENGINEERING_WORKDOC.md

All tests: 42/42 CTest PASS
2026-03-13 23:52:25 +00:00