opencl+metal: wire bulletproof_verify_batch — close last parity gap
- opencl/kernels/secp256k1_zk.cl: remove #if 0 guard (881 lines of bulletproof code re-enabled); fix range_verify_full_impl address-space qualifiers: bp_G / bp_H now __global const AffinePoint*; per-iteration private copy (AffinePoint g_pt = bp_G[i]) before passing to scalar_mul_impl. - gpu/src/gpu_backend_opencl.cpp: replace Unsupported stub with real dispatch via range_proof_poly_batch kernel; add bp_poly_batch_ member + cleanup; update ensure_zk_kernels() to register the new kernel; parse 324-byte wire format (4x65-byte uncompressed + 2x32-byte scalars) into RangeProofPolyOCL GPU layout. - gpu/src/gpu_backend_metal.mm: replace Unsupported stub with real dispatch via range_proof_poly_batch kernel; build RangeProofPolyMetal (320B) from 324-byte wire format using be32_to_metal_fe / be32_to_metal_scalar helpers. - docs/BACKEND_ASSURANCE_MATRIX.md: bulletproof row stub->Y for OpenCL+Metal; parity tracking now shows zero remaining stubs. - CHANGELOG.md: document bulletproof parity closure. All three backends (CUDA, OpenCL, Metal) now implement bulletproof_verify_batch. Zero Unsupported stubs remain in the GPU backend surface.
This commit is contained in:
parent
d329026a01
commit
849c9df309
@ -13,8 +13,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
||||
- **ClusterFuzzLite expanded to 5 targets**: added `cpu/fuzz/fuzz_ecdsa.cpp` (ECDSA sign→verify invariant, wrong-msg false-positive check, parse_compact_strict robustness) and `cpu/fuzz/fuzz_schnorr.cpp` (BIP-340 sign→verify, adversarial from_bytes verify, wrong-msg check).
|
||||
|
||||
### GPU Backend
|
||||
- **OpenCL parity**: wired `zk_knowledge_verify_batch`, `zk_dleq_verify_batch`, `bip324_aead_encrypt_batch`, `bip324_aead_decrypt_batch` in `gpu_backend_opencl.cpp` — 4 new kernels matching CUDA surface. Closes assurance gap **#5/#6** (partial; bulletproof batch remains PARITY-EXCEPTION, documented in `BACKEND_ASSURANCE_MATRIX.md`).
|
||||
- **Metal parity**: wired `zk_knowledge_verify_batch`, `zk_dleq_verify_batch`, `bip324_aead_encrypt_batch`, `bip324_aead_decrypt_batch` in `gpu_backend_metal.mm` — all four Metal kernels were already present in `secp256k1_kernels.metal`; dispatch code now connected. Also fixed `zk_knowledge_verify_batch` Metal kernel: was incorrectly treating pubkey buffer as a scalar (scalar×G); corrected to `lift_x` to recover the full point from x-coordinate. Full CUDA ↔ OpenCL ↔ Metal parity on all ZK and BIP-324 batch ops. `bulletproof_verify_batch` remains PARITY-EXCEPTION on all GPU backends.
|
||||
- **Bulletproof parity (OpenCL + Metal)**: resolved the last remaining PARITY-EXCEPTION. OpenCL: removed `#if 0` guard in `secp256k1_zk.cl`; fixed `range_verify_full_impl` address-space qualifiers (`__global const AffinePoint*` for `bp_G`/`bp_H`, with per-iteration private copy before `scalar_mul_impl`); wired `bulletproof_verify_batch` host dispatch via `range_proof_poly_batch` kernel (matches CUDA poly-check behavior). Metal: wired `bulletproof_verify_batch` host dispatch via `range_proof_poly_batch` kernel; host converts 324-byte proof wire format to `RangeProofPolyMetal` GPU structs. Full CUDA ↔ OpenCL ↔ Metal parity — zero `Unsupported` stubs remaining.
|
||||
- **OpenCL parity**: wired `zk_knowledge_verify_batch`, `zk_dleq_verify_batch`, `bip324_aead_encrypt_batch`, `bip324_aead_decrypt_batch` in `gpu_backend_opencl.cpp` — 4 new kernels matching CUDA surface.
|
||||
- **Metal parity**: wired `zk_knowledge_verify_batch`, `zk_dleq_verify_batch`, `bip324_aead_encrypt_batch`, `bip324_aead_decrypt_batch` in `gpu_backend_metal.mm` — all four Metal kernels were already present in `secp256k1_kernels.metal`; dispatch code now connected. Also fixed `zk_knowledge_verify_batch` Metal kernel: was incorrectly treating pubkey buffer as a scalar (scalar×G); corrected to `lift_x` to recover the full point from x-coordinate.
|
||||
- **CUDA 13 compatibility**: replaced deprecated `cudaDeviceProp::clockRate` / `::memoryClockRate` fields (removed in CUDA 13) with `cudaDeviceGetAttribute(cudaDevAttrClockRate/MemoryClockRate)` under `#if CUDART_VERSION >= 13000` guard. Backward-compatible with CUDA 12. Reported by @craigraw compiling with CUDA 13 on RTX 5080.
|
||||
|
||||
|
||||
|
||||
@ -21,7 +21,7 @@
|
||||
| ZK proofs | Y | Y | Y | Y | Y |
|
||||
| ZK knowledge verify batch | - | - | Y | Y | Y |
|
||||
| ZK DLEQ verify batch | - | - | Y | Y | Y |
|
||||
| Bulletproof verify batch | - | - | Y | stub | stub |
|
||||
| Bulletproof verify batch | - | - | Y | Y | Y |
|
||||
| BIP-324 AEAD encrypt batch | - | - | Y | Y | Y |
|
||||
| BIP-324 AEAD decrypt batch | - | - | Y | Y | Y |
|
||||
| Multi-scalar mul | Y | - | Y | Y | Y |
|
||||
@ -44,11 +44,13 @@
|
||||
|
||||
| Operation | Backend | Tracking note |
|
||||
|-----------|---------|---------------|
|
||||
| `bulletproof_verify_batch` | OpenCL, Metal | CUDA implemented; OpenCL/Metal blocked by address-space qualifier issues in kernel — tracked as PARITY-EXCEPTION |
|
||||
| *(none — all parity gaps resolved)* | — | — |
|
||||
|
||||
> All other ZK and BIP-324 batch operations (`zk_knowledge_verify_batch`,
|
||||
> `zk_dleq_verify_batch`, `bip324_aead_encrypt_batch`, `bip324_aead_decrypt_batch`)
|
||||
> are now **fully implemented on all three GPU backends** (CUDA, OpenCL, Metal).
|
||||
> All ZK and BIP-324 batch operations, including `bulletproof_verify_batch`, are now
|
||||
> **fully implemented on all three GPU backends** (CUDA, OpenCL, Metal).
|
||||
> OpenCL kernel `#if 0` guard removed; address-space qualifier fix applied to
|
||||
> `range_verify_full_impl` (added `__global` to `bp_G`/`bp_H`, local copy in loop).
|
||||
> Metal host dispatch wired via `range_proof_poly_batch` kernel.
|
||||
> Resolved 2026-03-24.
|
||||
|
||||
### Current permanent exceptions
|
||||
|
||||
@ -840,12 +840,82 @@ public:
|
||||
return GpuError::Ok;
|
||||
}
|
||||
|
||||
// PARITY-EXCEPTION(Metal): bulletproof kernel blocked upstream; OpenCL also #if 0.
|
||||
// See docs/BACKEND_ASSURANCE_MATRIX.md row "bulletproof_verify_batch".
|
||||
GpuError bulletproof_verify_batch(
|
||||
const uint8_t*, const uint8_t*, const uint8_t*,
|
||||
size_t, uint8_t*) override
|
||||
{ return set_error(GpuError::Unsupported, "bulletproof: kernel pending address-space fixes"); }
|
||||
const uint8_t* proofs324, const uint8_t* commitments65,
|
||||
const uint8_t* H_generator65, size_t count,
|
||||
uint8_t* out_results) override
|
||||
{
|
||||
if (!is_ready()) return set_error(GpuError::Device, "context not initialised");
|
||||
if (count == 0) { clear_error(); return GpuError::Ok; }
|
||||
if (!proofs324 || !commitments65 || !H_generator65 || !out_results)
|
||||
return set_error(GpuError::NullArg, "NULL buffer");
|
||||
|
||||
auto err = ensure_library();
|
||||
if (err != GpuError::Ok) return err;
|
||||
|
||||
/* Convert big-endian 32 bytes → MetalFieldElem (same format as scalar). */
|
||||
auto be32_to_metal_fe = [](const uint8_t be[32]) -> MetalFieldElem {
|
||||
MetalFieldElem fe;
|
||||
for (int i = 0; i < 8; i++) {
|
||||
int base = (7 - i) * 4;
|
||||
fe.limbs[i] = ((uint32_t)be[base] << 24) |
|
||||
((uint32_t)be[base+1] << 16) |
|
||||
((uint32_t)be[base+2] << 8) |
|
||||
((uint32_t)be[base+3]);
|
||||
}
|
||||
return fe;
|
||||
};
|
||||
|
||||
/* Parse uncompressed point (65 bytes: 04 || x[32] || y[32]) → MetalAffinePoint */
|
||||
auto parse_pt65 = [&be32_to_metal_fe](const uint8_t pt65[65]) -> MetalAffinePoint {
|
||||
return { be32_to_metal_fe(pt65 + 1), be32_to_metal_fe(pt65 + 33) };
|
||||
};
|
||||
|
||||
/* Build GPU-layout RangeProofPolyGPU structs (320 bytes each):
|
||||
* 4 x MetalAffinePoint (A, S, T1, T2) + 2 x MetalScalar256 (tau_x, t_hat)
|
||||
* Wire format per proof (324 bytes): 4 x 65-byte uncompressed + 2 x 32-byte scalars */
|
||||
struct RangeProofPolyMetal {
|
||||
MetalAffinePoint A, S, T1, T2;
|
||||
MetalScalar256 tau_x, t_hat;
|
||||
};
|
||||
static_assert(sizeof(RangeProofPolyMetal) == 320, "struct layout mismatch");
|
||||
|
||||
auto buf_proofs = runtime_->alloc_buffer_shared(count * sizeof(RangeProofPolyMetal));
|
||||
auto* proofs_out = static_cast<RangeProofPolyMetal*>(buf_proofs.contents());
|
||||
for (size_t i = 0; i < count; ++i) {
|
||||
const uint8_t* p = proofs324 + i * 324;
|
||||
proofs_out[i].A = parse_pt65(p);
|
||||
proofs_out[i].S = parse_pt65(p + 65);
|
||||
proofs_out[i].T1 = parse_pt65(p + 130);
|
||||
proofs_out[i].T2 = parse_pt65(p + 195);
|
||||
proofs_out[i].tau_x = be32_to_metal_scalar(p + 260);
|
||||
proofs_out[i].t_hat = be32_to_metal_scalar(p + 292);
|
||||
}
|
||||
|
||||
auto buf_commits = runtime_->alloc_buffer_shared(count * sizeof(MetalAffinePoint));
|
||||
auto* commits_out = static_cast<MetalAffinePoint*>(buf_commits.contents());
|
||||
for (size_t i = 0; i < count; ++i)
|
||||
commits_out[i] = parse_pt65(commitments65 + i * 65);
|
||||
|
||||
auto buf_hgen = runtime_->alloc_buffer_shared(sizeof(MetalAffinePoint));
|
||||
*static_cast<MetalAffinePoint*>(buf_hgen.contents()) = parse_pt65(H_generator65);
|
||||
|
||||
auto buf_res = runtime_->alloc_buffer_shared(count * sizeof(uint32_t));
|
||||
uint32_t n32 = (uint32_t)count;
|
||||
auto buf_n = runtime_->alloc_buffer_shared(sizeof(uint32_t));
|
||||
std::memcpy(buf_n.contents(), &n32, sizeof(n32));
|
||||
|
||||
auto pipe = runtime_->make_pipeline("range_proof_poly_batch");
|
||||
runtime_->dispatch_sync(pipe, (uint32_t)count, 64u,
|
||||
{&buf_proofs, &buf_commits, &buf_hgen, &buf_res, &buf_n});
|
||||
|
||||
const auto* res = static_cast<const uint32_t*>(buf_res.contents());
|
||||
for (size_t i = 0; i < count; ++i)
|
||||
out_results[i] = res[i] ? 1 : 0;
|
||||
|
||||
clear_error();
|
||||
return GpuError::Ok;
|
||||
}
|
||||
|
||||
GpuError bip324_aead_encrypt_batch(
|
||||
const uint8_t* keys32, const uint8_t* nonces12,
|
||||
|
||||
@ -144,6 +144,7 @@ public:
|
||||
frost_init_attempted_ = false;
|
||||
if (zk_knowledge_verify_) { clReleaseKernel(zk_knowledge_verify_); zk_knowledge_verify_ = nullptr; }
|
||||
if (zk_dleq_verify_) { clReleaseKernel(zk_dleq_verify_); zk_dleq_verify_ = nullptr; }
|
||||
if (bp_poly_batch_) { clReleaseKernel(bp_poly_batch_); bp_poly_batch_ = nullptr; }
|
||||
if (zk_program_) { clReleaseProgram(zk_program_); zk_program_ = nullptr; }
|
||||
zk_init_attempted_ = false;
|
||||
if (bip324_aead_encrypt_) { clReleaseKernel(bip324_aead_encrypt_); bip324_aead_encrypt_ = nullptr; }
|
||||
@ -1063,13 +1064,106 @@ public:
|
||||
return GpuError::Ok;
|
||||
}
|
||||
|
||||
// PARITY-EXCEPTION(OpenCL): bulletproof kernel is #if 0 blocked in secp256k1_zk.cl
|
||||
// pending address-space qualifier fixes (__global/__private params).
|
||||
// See docs/BACKEND_ASSURANCE_MATRIX.md row "bulletproof_verify_batch".
|
||||
GpuError bulletproof_verify_batch(
|
||||
const uint8_t*, const uint8_t*, const uint8_t*,
|
||||
size_t, uint8_t*) override
|
||||
{ return set_error(GpuError::Unsupported, "bulletproof: OpenCL kernel pending address-space fixes"); }
|
||||
const uint8_t* proofs324, const uint8_t* commitments65,
|
||||
const uint8_t* H_generator65, size_t count,
|
||||
uint8_t* out_results) override
|
||||
{
|
||||
if (!is_ready()) return set_error(GpuError::Device, "context not initialised");
|
||||
if (count == 0) { clear_error(); return GpuError::Ok; }
|
||||
if (!proofs324 || !commitments65 || !H_generator65 || !out_results)
|
||||
return set_error(GpuError::NullArg, "NULL buffer");
|
||||
|
||||
auto err = ensure_zk_kernels();
|
||||
if (err != GpuError::Ok) return err;
|
||||
|
||||
auto* cl_ctx = static_cast<cl_context>(ctx_->native_context());
|
||||
auto* queue = static_cast<cl_command_queue>(ctx_->native_queue());
|
||||
cl_int clerr;
|
||||
|
||||
/* Parse 324-byte proofs into host-side RangeProofPolyGPU struct.
|
||||
* Wire layout per proof: 4 × 65-byte uncompressed points (A, S, T1, T2)
|
||||
* + 2 × 32-byte BE scalars (tau_x, t_hat) = 324 bytes.
|
||||
* GPU struct layout: 4 × AffinePoint(64B) + 2 × Scalar(32B) = 320 bytes. */
|
||||
struct RangeProofPolyOCL {
|
||||
secp256k1::opencl::AffinePoint A, S, T1, T2;
|
||||
secp256k1::opencl::Scalar tau_x, t_hat;
|
||||
};
|
||||
std::vector<RangeProofPolyOCL> h_proofs(count);
|
||||
for (size_t i = 0; i < count; ++i) {
|
||||
const uint8_t* p = proofs324 + i * 324;
|
||||
if (!pubkey65_to_affine(p, &h_proofs[i].A)) return set_error(GpuError::BadKey, "invalid proof A");
|
||||
if (!pubkey65_to_affine(p + 65, &h_proofs[i].S)) return set_error(GpuError::BadKey, "invalid proof S");
|
||||
if (!pubkey65_to_affine(p + 130, &h_proofs[i].T1)) return set_error(GpuError::BadKey, "invalid proof T1");
|
||||
if (!pubkey65_to_affine(p + 195, &h_proofs[i].T2)) return set_error(GpuError::BadKey, "invalid proof T2");
|
||||
bytes_to_scalar(p + 260, &h_proofs[i].tau_x);
|
||||
bytes_to_scalar(p + 292, &h_proofs[i].t_hat);
|
||||
}
|
||||
|
||||
std::vector<secp256k1::opencl::AffinePoint> h_commits(count);
|
||||
for (size_t i = 0; i < count; ++i) {
|
||||
if (!pubkey65_to_affine(commitments65 + i * 65, &h_commits[i]))
|
||||
return set_error(GpuError::BadKey, "invalid commitment");
|
||||
}
|
||||
|
||||
secp256k1::opencl::AffinePoint h_gen;
|
||||
if (!pubkey65_to_affine(H_generator65, &h_gen))
|
||||
return set_error(GpuError::BadKey, "invalid H generator");
|
||||
|
||||
cl_mem d_proofs = clCreateBuffer(cl_ctx, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
|
||||
sizeof(RangeProofPolyOCL) * count, h_proofs.data(), &clerr);
|
||||
if (clerr != CL_SUCCESS) return set_error(GpuError::Memory, "bp proof buffer");
|
||||
|
||||
cl_mem d_commits = clCreateBuffer(cl_ctx, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
|
||||
sizeof(secp256k1::opencl::AffinePoint) * count,
|
||||
h_commits.data(), &clerr);
|
||||
if (clerr != CL_SUCCESS) {
|
||||
clReleaseMemObject(d_proofs);
|
||||
return set_error(GpuError::Memory, "bp commit buffer");
|
||||
}
|
||||
|
||||
cl_mem d_hgen = clCreateBuffer(cl_ctx, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
|
||||
sizeof(secp256k1::opencl::AffinePoint), &h_gen, &clerr);
|
||||
if (clerr != CL_SUCCESS) {
|
||||
clReleaseMemObject(d_commits); clReleaseMemObject(d_proofs);
|
||||
return set_error(GpuError::Memory, "bp h-gen buffer");
|
||||
}
|
||||
|
||||
cl_mem d_res = clCreateBuffer(cl_ctx, CL_MEM_WRITE_ONLY,
|
||||
sizeof(int) * count, nullptr, &clerr);
|
||||
if (clerr != CL_SUCCESS) {
|
||||
clReleaseMemObject(d_hgen); clReleaseMemObject(d_commits); clReleaseMemObject(d_proofs);
|
||||
return set_error(GpuError::Memory, "bp result buffer");
|
||||
}
|
||||
|
||||
cl_uint cl_count = static_cast<cl_uint>(count);
|
||||
clSetKernelArg(bp_poly_batch_, 0, sizeof(cl_mem), &d_proofs);
|
||||
clSetKernelArg(bp_poly_batch_, 1, sizeof(cl_mem), &d_commits);
|
||||
clSetKernelArg(bp_poly_batch_, 2, sizeof(cl_mem), &d_hgen);
|
||||
clSetKernelArg(bp_poly_batch_, 3, sizeof(cl_mem), &d_res);
|
||||
clSetKernelArg(bp_poly_batch_, 4, sizeof(cl_uint), &cl_count);
|
||||
|
||||
size_t global = count;
|
||||
clerr = clEnqueueNDRangeKernel(queue, bp_poly_batch_, 1, nullptr,
|
||||
&global, nullptr, 0, nullptr, nullptr);
|
||||
if (clerr != CL_SUCCESS) {
|
||||
clReleaseMemObject(d_res); clReleaseMemObject(d_hgen);
|
||||
clReleaseMemObject(d_commits); clReleaseMemObject(d_proofs);
|
||||
return set_error(GpuError::Launch, "bp_poly_batch kernel launch failed");
|
||||
}
|
||||
clFinish(queue);
|
||||
|
||||
std::vector<int> h_res(count);
|
||||
clEnqueueReadBuffer(queue, d_res, CL_TRUE, 0,
|
||||
sizeof(int) * count, h_res.data(), 0, nullptr, nullptr);
|
||||
for (size_t i = 0; i < count; ++i)
|
||||
out_results[i] = h_res[i] ? 1 : 0;
|
||||
|
||||
clReleaseMemObject(d_res); clReleaseMemObject(d_hgen);
|
||||
clReleaseMemObject(d_commits); clReleaseMemObject(d_proofs);
|
||||
clear_error();
|
||||
return GpuError::Ok;
|
||||
}
|
||||
|
||||
/* -- BIP-324 AEAD batch operations (OpenCL via secp256k1_bip324.cl) ----- */
|
||||
|
||||
@ -1279,6 +1373,7 @@ private:
|
||||
cl_program zk_program_ = nullptr;
|
||||
cl_kernel zk_knowledge_verify_ = nullptr;
|
||||
cl_kernel zk_dleq_verify_ = nullptr;
|
||||
cl_kernel bp_poly_batch_ = nullptr; /* range_proof_poly_batch */
|
||||
bool zk_init_attempted_ = false;
|
||||
|
||||
/* BIP-324 AEAD kernel handles (lazy-loaded via secp256k1_bip324.cl) */
|
||||
@ -1578,7 +1673,7 @@ private:
|
||||
|
||||
/* -- Lazy-load ZK proof OpenCL program --------------------------------- */
|
||||
GpuError ensure_zk_kernels() {
|
||||
if (zk_knowledge_verify_ && zk_dleq_verify_) return GpuError::Ok;
|
||||
if (zk_knowledge_verify_ && zk_dleq_verify_ && bp_poly_batch_) return GpuError::Ok;
|
||||
if (zk_init_attempted_)
|
||||
return set_error(GpuError::Launch, "ZK kernel init previously failed");
|
||||
zk_init_attempted_ = true;
|
||||
@ -1642,6 +1737,14 @@ private:
|
||||
return set_error(GpuError::Launch, "zk_dleq_verify_batch kernel not found");
|
||||
}
|
||||
|
||||
bp_poly_batch_ = clCreateKernel(zk_program_, "range_proof_poly_batch", &err);
|
||||
if (err != CL_SUCCESS) {
|
||||
clReleaseKernel(zk_dleq_verify_); zk_dleq_verify_ = nullptr;
|
||||
clReleaseKernel(zk_knowledge_verify_); zk_knowledge_verify_ = nullptr;
|
||||
clReleaseProgram(zk_program_); zk_program_ = nullptr;
|
||||
return set_error(GpuError::Launch, "range_proof_poly_batch kernel not found");
|
||||
}
|
||||
|
||||
return GpuError::Ok;
|
||||
}
|
||||
|
||||
|
||||
@ -595,9 +595,6 @@ __kernel void zk_dleq_verify_batch(
|
||||
// =============================================================================
|
||||
// 3. Bulletproof Range Proof (64-bit)
|
||||
// =============================================================================
|
||||
// Note: Bulletproof section requires OpenCL address-space fixes before enabling.
|
||||
// Knowledge and DLEQ proofs above are fully functional.
|
||||
#if 0 // Bulletproof: pending address-space fixes for __global/__private params
|
||||
// Full Bulletproof range proof verification on OpenCL.
|
||||
// Ported from CUDA implementation (commit 02ac59d).
|
||||
//
|
||||
@ -937,8 +934,8 @@ inline int range_verify_full_impl(
|
||||
const RangeProofGPU* proof,
|
||||
const AffinePoint* commitment,
|
||||
const AffinePoint* H_gen,
|
||||
const AffinePoint* bp_G, // 64 G_i generators
|
||||
const AffinePoint* bp_H, // 64 H_i generators
|
||||
__global const AffinePoint* bp_G, // 64 G_i generators (__global: too large for private)
|
||||
__global const AffinePoint* bp_H, // 64 H_i generators (__global: too large for private)
|
||||
const ZKTagMidstate* bp_ip_midstate)
|
||||
{
|
||||
// ---- Fiat-Shamir: recompute y, z, x ----
|
||||
@ -1154,8 +1151,10 @@ inline int range_verify_full_impl(
|
||||
scalar_mul_mod_n_impl(&proof->a, &s_coeff[i], &a_si);
|
||||
scalar_sub_mod_n_impl(&neg_z, &a_si, &g_coeff);
|
||||
|
||||
/* Copy generator point from __global to __private before scalar_mul_impl */
|
||||
AffinePoint g_pt = bp_G[i];
|
||||
JacobianPoint g_term;
|
||||
scalar_mul_impl(&g_term, &g_coeff, &bp_G[i]);
|
||||
scalar_mul_impl(&g_term, &g_coeff, &g_pt);
|
||||
point_add_impl(&msm_acc, &msm_acc, &g_term);
|
||||
|
||||
// H_i: (z + z2*2^i*y_inv^i) - b*s_inv[i]*y_inv^i
|
||||
@ -1169,8 +1168,10 @@ inline int range_verify_full_impl(
|
||||
scalar_mul_mod_n_impl(&b_si, &y_inv_powers[i], &b_si_yi);
|
||||
scalar_sub_mod_n_impl(&h_pcheck, &b_si_yi, &h_coeff);
|
||||
|
||||
/* Copy generator point from __global to __private before scalar_mul_impl */
|
||||
AffinePoint h_pt = bp_H[i];
|
||||
JacobianPoint h_term;
|
||||
scalar_mul_impl(&h_term, &h_coeff, &bp_H[i]);
|
||||
scalar_mul_impl(&h_term, &h_coeff, &h_pt);
|
||||
point_add_impl(&msm_acc, &msm_acc, &h_term);
|
||||
}
|
||||
|
||||
@ -1478,4 +1479,3 @@ __kernel void pedersen_verify_sum(
|
||||
if (z_bytes[i] != 0) z_zero = 0;
|
||||
*result = (sum.infinity || z_zero);
|
||||
}
|
||||
#endif // Bulletproof: pending address-space fixes
|
||||
|
||||
Loading…
Reference in New Issue
Block a user