- Split embedded OpenCL kernel_source into kernel_parts[] array so no single string literal exceeds MSVC's 65535-byte limit. clCreateProgramWithSource now receives multiple source strings. - Added batch-inversion kernels (field_inv, affine_add, jac_to_affine) using per-workgroup Montgomery's trick with __local memory. - OpenCL BIP352 benchmark scaffold and kernel stubs. - Source graph kit for indexed codebase exploration. - Assorted doc, benchmark, and audit report updates.
8.1 KiB
Android Guide -- UltrafastSecp256k1
Full CPU library port for Android -- ARM64 (arm64-v8a), ARMv7 (armeabi-v7a), x86_64/x86 (emulator).
Architecture
android/
+-- CMakeLists.txt # Android-specific CMake build
+-- build_android.sh # Linux/macOS build script
+-- build_android.ps1 # Windows PowerShell build script
+-- jni/
| +-- secp256k1_jni.cpp # JNI bridge (C++ -> Java/Kotlin)
+-- kotlin/
| +-- com/secp256k1/native/
| +-- Secp256k1.kt # Kotlin wrapper class
+-- example/ # Full Android application example
| +-- build.gradle.kts
| +-- src/main/
| +-- cpp/CMakeLists.txt
| +-- kotlin/.../MainActivity.kt
+-- output/ # Build output (jniLibs/)
ABI Support
| ABI | Architecture | __int128 |
Assembly | Notes |
|---|---|---|---|---|
arm64-v8a |
ARMv8-A + crypto + NEON | [OK] | [OK] ARM64 ASM (MUL/UMULH) | Primary target |
armeabi-v7a |
ARMv7-A + NEON | [FAIL] (32-bit) | [FAIL] | SECP256K1_NO_INT128 fallback |
x86_64 |
x86-64 + SSE4.2 | [OK] | [FAIL] (cross-compile) | For emulator |
x86 |
i686 + SSE3 | [FAIL] (32-bit) | [FAIL] | For emulator |
Note
: ARM64 inline assembly optimization is now enabled --
MUL/UMULHinstructions for field arithmetic (mul, sqr, add, sub, neg). This provides ~5x speedup compared to generic C++ code for scalar_mul operations.
Quick Start
Prerequisites
- Android NDK r25+ (r26c recommended)
- CMake 3.18+
- Ninja
Build (Command Line)
# Linux/macOS
export ANDROID_NDK_HOME=/path/to/android-ndk-r26c
cd libs/UltrafastSecp256k1/android/
./build_android.sh arm64-v8a
# Windows PowerShell
$env:ANDROID_NDK_HOME = "C:\Users\user\AppData\Local\Android\Sdk\ndk\26.1.10909125"
cd libs\UltrafastSecp256k1\android\
.\build_android.ps1 -ABIs arm64-v8a
Build (Manual CMake)
cmake -S android -B build-android-ndk-arm64 \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_HOME/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-28 \
-DANDROID_STL=c++_static \
-DCMAKE_BUILD_TYPE=Release \
-G Ninja
cmake --build build-android-ndk-arm64 --target bench_hornet -j
adb shell 'mkdir -p /data/local/tmp/ufsecp'
adb push build-android-ndk-arm64/bench_hornet /data/local/tmp/ufsecp/bench_hornet
adb shell 'chmod 755 /data/local/tmp/ufsecp/bench_hornet && /data/local/tmp/ufsecp/bench_hornet'
Use a clean Android-only build directory. Reusing a build directory first configured from the
repository root can trigger a CMake source/cache mismatch when switching to android/ as the source tree.
Output
android/output/jniLibs/
+-- arm64-v8a/
| +-- libsecp256k1_jni.so # ~200-400 KB
+-- armeabi-v7a/
| +-- libsecp256k1_jni.so
+-- x86_64/
| +-- libsecp256k1_jni.so
+-- x86/
+-- libsecp256k1_jni.so
Integration in an Android Project
Option 1: Pre-built JNI (simplest)
- Copy
output/jniLibs/to your Android project:
app/src/main/jniLibs/
+-- arm64-v8a/libsecp256k1_jni.so
+-- x86_64/libsecp256k1_jni.so
- Copy
Secp256k1.ktto your Kotlin source:
app/src/main/kotlin/com/secp256k1/native/Secp256k1.kt
- Use it:
Secp256k1.init()
val pubkey = Secp256k1.ctScalarMulGenerator(privkey)
Option 2: Gradle CMake Integration
In app/build.gradle.kts:
android {
externalNativeBuild {
cmake {
path = file("src/main/cpp/CMakeLists.txt")
}
}
defaultConfig {
externalNativeBuild {
cmake {
abiFilters += listOf("arm64-v8a", "x86_64")
arguments += "-DANDROID_STL=c++_static"
}
}
}
}
app/src/main/cpp/CMakeLists.txt:
cmake_minimum_required(VERSION 3.18)
project(MyApp LANGUAGES CXX)
add_subdirectory(/path/to/UltrafastSecp256k1/android ${CMAKE_BINARY_DIR}/secp256k1)
API
Fast API (Maximum Speed)
// Initialization
Secp256k1.init()
// Point operations
val g = Secp256k1.getGenerator() // G (65 bytes)
val g2 = Secp256k1.pointDouble(g) // 2G
val g3 = Secp256k1.pointAdd(g2, g) // 3G
val neg = Secp256k1.pointNegate(g) // -G
val compressed = Secp256k1.pointCompress(g) // 33 bytes
// Scalar x Point (NOT side-channel safe!)
val result = Secp256k1.scalarMulGenerator(k) // k*G
val result2 = Secp256k1.scalarMulPoint(k, point) // k*P
// Scalar arithmetic
val sum = Secp256k1.scalarAdd(a, b)
val product = Secp256k1.scalarMul(a, b)
val diff = Secp256k1.scalarSub(a, b)
CT API (side-channel resistant)
Use for all private key operations:
// Key generation (CT)
val pubkey = Secp256k1.ctScalarMulGenerator(privkey)
// k*P (CT)
val result = Secp256k1.ctScalarMulPoint(k, point)
// ECDH shared secret (CT)
val secret = Secp256k1.ctEcdh(myPrivkey, theirPubkey)
When to Use CT vs Fast
| Operation | API | Reason |
|---|---|---|
| Private key -> Public key | CT | Key is secret |
| ECDH | CT | Private key is involved |
| Signing | CT | nonce/key leak = catastrophe |
| Signature verification | Fast | Public data only |
| Point aggregation | Fast | Public data only |
| Batch verification | Fast | Maximum speed |
Platform Details
ARM64 Optimizations
Inline Assembly (cpu/src/field_asm_arm64.cpp):
field_mul_arm64-- 4x4 schoolbook MUL/UMULH + secp256k1 fast reduction (85 ns/op)field_sqr_arm64-- Optimized squaring (10 mul vs 16) (66 ns/op)field_add_arm64-- ADDS/ADCS + branchless normalization (18 ns/op)field_sub_arm64-- SUBS/SBCS + conditional add p (16 ns/op)field_neg_arm64-- Branchless p - a with CSEL
NDK Clang additionally uses:
- NEON: 128-bit SIMD (implicit in ARMv8-A)
- Crypto extensions: AES/SHA hardware acceleration
__int128: 64x64->128 multiplication (in scalar/field operations)- Auto-vectorization:
-ftree-vectorize -funroll-loops
Benchmark Results (RK3588, Cortex-A55/A76)
| Operation | ARM64 ASM | Generic C++ | Speedup |
|---|---|---|---|
| field_mul (a*b mod p) | 85 ns | ~350 ns | ~4x |
| field_sqr (a^2 mod p) | 66 ns | ~280 ns | ~4x |
| field_add (a+b mod p) | 18 ns | ~30 ns | ~1.7x |
| field_sub (a-b mod p) | 16 ns | ~28 ns | ~1.8x |
| field_inverse | 2,621 ns | ~11,000 ns | ~4x |
| fast scalar_mul (k*G) | 7.6 us | ~40 us | ~5.3x |
| fast scalar_mul (k*P) | 77.6 us | ~400 us | ~5.1x |
| CT scalar_mul (k*G) | 545 us | ~400 us | 0.7x* |
| ECDH (full CT) | 545 us | -- | -- |
* CT mode uses generic C++ (for constant-time guarantees)
Android ARM64 rerun retained on-device SHA2 dispatch
Measured on the connected RK3588 Android device with bench_hornet after wiring the ARMv8 SHA2
path into hash_accel.cpp hot wrappers:
| Operation | Baseline | Retained result | Delta |
|---|---|---|---|
| ECDSA sign | 25.89 us | 22.22 us | 14.2% faster |
| Schnorr sign (precomputed) | 17.73 us | 16.67 us | 6.0% faster |
| Schnorr sign (raw privkey) | 33.01 us | 31.99 us | 3.1% faster |
| CT ECDSA sign | 70.50 us | 67.11 us | 4.8% faster |
The same rerun rejected forced 4x64 point ops, GLV window retuning, and keeping Android PGO as the default path because they did not outperform the retained SHA2 dispatch result on this device.
ARMv7 (32-bit) Limitations
- No
__int128->SECP256K1_NO_INT128fallback (portable 64x64->128) - NEON VFPv4 available
- ~2-3x slower than ARM64
Android-Specific CMake Changes
Automatically in CPU CMakeLists.txt:
-march=native->-march=armv8-a+crypto(cross-compile)-mbmi2 -madxexcluded on ARMSECP256K1_NO_INT128=1on 32-bit targets- x86 assembly excluded (cannot compile on ARM)
Troubleshooting
NDK Not Found
export ANDROID_NDK_HOME=/full/path/to/ndk
c++_static linkage error
In build.gradle.kts:
cmake { arguments += "-DANDROID_STL=c++_static" }
UnsatisfiedLinkError at Runtime
Check that libsecp256k1_jni.so is in the correct ABI folder (jniLibs/arm64-v8a/).
32-bit build warnings
Normal on ARMv7/x86 builds -- SECP256K1_NO_INT128 is automatically enabled.