UltrafastSecp256k1

History

shrec 6d0703d65c bench(cuda): BENCH_MULTI=20 in full benchmark loops — matches autotuner throughput Full-benchmark GLV+LUT loops previously fired 1 kernel per CudaTimer interval (single-dispatch overhead ~4.5 ns). Autotuner already used sample_repeats=20. Now both measure the same way: Before: GPU+LUT 95.2 ns / 10.50 M/s (1 kernel/timer) After: GPU+LUT 90.6 ns / 11.04 M/s (20 kernels/timer, divided by 20) Update OpenCL CUDA reference: 95.2 → 90.6 ns.		2026-03-21 23:06:23 +00:00
..
bench_bip352_opencl.cpp	bench(cuda): BENCH_MULTI=20 in full benchmark loops — matches autotuner throughput	2026-03-21 23:06:23 +00:00
bench_opencl.cpp	Merge remote-tracking branch 'origin/release/v3.22.0' into dev	2026-03-21 14:14:49 +00:00