UltrafastSecp256k1/docs/wiki/Android-Guide.md
shrec fea2420fe7
fix: MSVC C2026 string limit (#173), OpenCL batch-inv kernels, source graph tooling
- Split embedded OpenCL kernel_source into kernel_parts[] array
  so no single string literal exceeds MSVC's 65535-byte limit.
  clCreateProgramWithSource now receives multiple source strings.
- Added batch-inversion kernels (field_inv, affine_add, jac_to_affine)
  using per-workgroup Montgomery's trick with __local memory.
- OpenCL BIP352 benchmark scaffold and kernel stubs.
- Source graph kit for indexed codebase exploration.
- Assorted doc, benchmark, and audit report updates.
2026-03-19 16:43:55 +00:00

8.1 KiB

Android Guide -- UltrafastSecp256k1

Full CPU library port for Android -- ARM64 (arm64-v8a), ARMv7 (armeabi-v7a), x86_64/x86 (emulator).

Architecture

android/
+-- CMakeLists.txt           # Android-specific CMake build
+-- build_android.sh         # Linux/macOS build script
+-- build_android.ps1        # Windows PowerShell build script
+-- jni/
|   +-- secp256k1_jni.cpp    # JNI bridge (C++ -> Java/Kotlin)
+-- kotlin/
|   +-- com/secp256k1/native/
|       +-- Secp256k1.kt     # Kotlin wrapper class
+-- example/                 # Full Android application example
|   +-- build.gradle.kts
|   +-- src/main/
|       +-- cpp/CMakeLists.txt
|       +-- kotlin/.../MainActivity.kt
+-- output/                  # Build output (jniLibs/)

ABI Support

ABI Architecture __int128 Assembly Notes
arm64-v8a ARMv8-A + crypto + NEON [OK] [OK] ARM64 ASM (MUL/UMULH) Primary target
armeabi-v7a ARMv7-A + NEON [FAIL] (32-bit) [FAIL] SECP256K1_NO_INT128 fallback
x86_64 x86-64 + SSE4.2 [OK] [FAIL] (cross-compile) For emulator
x86 i686 + SSE3 [FAIL] (32-bit) [FAIL] For emulator

Note

: ARM64 inline assembly optimization is now enabled -- MUL/UMULH instructions for field arithmetic (mul, sqr, add, sub, neg). This provides ~5x speedup compared to generic C++ code for scalar_mul operations.

Quick Start

Prerequisites

  • Android NDK r25+ (r26c recommended)
  • CMake 3.18+
  • Ninja

Build (Command Line)

# Linux/macOS
export ANDROID_NDK_HOME=/path/to/android-ndk-r26c
cd libs/UltrafastSecp256k1/android/
./build_android.sh arm64-v8a

# Windows PowerShell
$env:ANDROID_NDK_HOME = "C:\Users\user\AppData\Local\Android\Sdk\ndk\26.1.10909125"
cd libs\UltrafastSecp256k1\android\
.\build_android.ps1 -ABIs arm64-v8a

Build (Manual CMake)

cmake -S android -B build-android-ndk-arm64 \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_HOME/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-28 \
    -DANDROID_STL=c++_static \
    -DCMAKE_BUILD_TYPE=Release \
    -G Ninja

cmake --build build-android-ndk-arm64 --target bench_hornet -j

adb shell 'mkdir -p /data/local/tmp/ufsecp'
adb push build-android-ndk-arm64/bench_hornet /data/local/tmp/ufsecp/bench_hornet
adb shell 'chmod 755 /data/local/tmp/ufsecp/bench_hornet && /data/local/tmp/ufsecp/bench_hornet'

Use a clean Android-only build directory. Reusing a build directory first configured from the repository root can trigger a CMake source/cache mismatch when switching to android/ as the source tree.

Output

android/output/jniLibs/
+-- arm64-v8a/
|   +-- libsecp256k1_jni.so      # ~200-400 KB
+-- armeabi-v7a/
|   +-- libsecp256k1_jni.so
+-- x86_64/
|   +-- libsecp256k1_jni.so
+-- x86/
    +-- libsecp256k1_jni.so

Integration in an Android Project

Option 1: Pre-built JNI (simplest)

  1. Copy output/jniLibs/ to your Android project:
app/src/main/jniLibs/
+-- arm64-v8a/libsecp256k1_jni.so
+-- x86_64/libsecp256k1_jni.so
  1. Copy Secp256k1.kt to your Kotlin source:
app/src/main/kotlin/com/secp256k1/native/Secp256k1.kt
  1. Use it:
Secp256k1.init()
val pubkey = Secp256k1.ctScalarMulGenerator(privkey)

Option 2: Gradle CMake Integration

In app/build.gradle.kts:

android {
    externalNativeBuild {
        cmake {
            path = file("src/main/cpp/CMakeLists.txt")
        }
    }
    defaultConfig {
        externalNativeBuild {
            cmake {
                abiFilters += listOf("arm64-v8a", "x86_64")
                arguments += "-DANDROID_STL=c++_static"
            }
        }
    }
}

app/src/main/cpp/CMakeLists.txt:

cmake_minimum_required(VERSION 3.18)
project(MyApp LANGUAGES CXX)
add_subdirectory(/path/to/UltrafastSecp256k1/android ${CMAKE_BINARY_DIR}/secp256k1)

API

Fast API (Maximum Speed)

// Initialization
Secp256k1.init()

// Point operations
val g = Secp256k1.getGenerator()             // G (65 bytes)
val g2 = Secp256k1.pointDouble(g)            // 2G
val g3 = Secp256k1.pointAdd(g2, g)           // 3G
val neg = Secp256k1.pointNegate(g)           // -G
val compressed = Secp256k1.pointCompress(g)  // 33 bytes

// Scalar x Point (NOT side-channel safe!)
val result = Secp256k1.scalarMulGenerator(k)      // k*G
val result2 = Secp256k1.scalarMulPoint(k, point)  // k*P

// Scalar arithmetic
val sum = Secp256k1.scalarAdd(a, b)
val product = Secp256k1.scalarMul(a, b)
val diff = Secp256k1.scalarSub(a, b)

CT API (side-channel resistant)

Use for all private key operations:

// Key generation (CT)
val pubkey = Secp256k1.ctScalarMulGenerator(privkey)

// k*P (CT)
val result = Secp256k1.ctScalarMulPoint(k, point)

// ECDH shared secret (CT)
val secret = Secp256k1.ctEcdh(myPrivkey, theirPubkey)

When to Use CT vs Fast

Operation API Reason
Private key -> Public key CT Key is secret
ECDH CT Private key is involved
Signing CT nonce/key leak = catastrophe
Signature verification Fast Public data only
Point aggregation Fast Public data only
Batch verification Fast Maximum speed

Platform Details

ARM64 Optimizations

Inline Assembly (cpu/src/field_asm_arm64.cpp):

  • field_mul_arm64 -- 4x4 schoolbook MUL/UMULH + secp256k1 fast reduction (85 ns/op)
  • field_sqr_arm64 -- Optimized squaring (10 mul vs 16) (66 ns/op)
  • field_add_arm64 -- ADDS/ADCS + branchless normalization (18 ns/op)
  • field_sub_arm64 -- SUBS/SBCS + conditional add p (16 ns/op)
  • field_neg_arm64 -- Branchless p - a with CSEL

NDK Clang additionally uses:

  • NEON: 128-bit SIMD (implicit in ARMv8-A)
  • Crypto extensions: AES/SHA hardware acceleration
  • __int128: 64x64->128 multiplication (in scalar/field operations)
  • Auto-vectorization: -ftree-vectorize -funroll-loops

Benchmark Results (RK3588, Cortex-A55/A76)

Operation ARM64 ASM Generic C++ Speedup
field_mul (a*b mod p) 85 ns ~350 ns ~4x
field_sqr (a^2 mod p) 66 ns ~280 ns ~4x
field_add (a+b mod p) 18 ns ~30 ns ~1.7x
field_sub (a-b mod p) 16 ns ~28 ns ~1.8x
field_inverse 2,621 ns ~11,000 ns ~4x
fast scalar_mul (k*G) 7.6 us ~40 us ~5.3x
fast scalar_mul (k*P) 77.6 us ~400 us ~5.1x
CT scalar_mul (k*G) 545 us ~400 us 0.7x*
ECDH (full CT) 545 us -- --

* CT mode uses generic C++ (for constant-time guarantees)

Android ARM64 rerun retained on-device SHA2 dispatch

Measured on the connected RK3588 Android device with bench_hornet after wiring the ARMv8 SHA2 path into hash_accel.cpp hot wrappers:

Operation Baseline Retained result Delta
ECDSA sign 25.89 us 22.22 us 14.2% faster
Schnorr sign (precomputed) 17.73 us 16.67 us 6.0% faster
Schnorr sign (raw privkey) 33.01 us 31.99 us 3.1% faster
CT ECDSA sign 70.50 us 67.11 us 4.8% faster

The same rerun rejected forced 4x64 point ops, GLV window retuning, and keeping Android PGO as the default path because they did not outperform the retained SHA2 dispatch result on this device.

ARMv7 (32-bit) Limitations

  • No __int128 -> SECP256K1_NO_INT128 fallback (portable 64x64->128)
  • NEON VFPv4 available
  • ~2-3x slower than ARM64

Android-Specific CMake Changes

Automatically in CPU CMakeLists.txt:

  • -march=native -> -march=armv8-a+crypto (cross-compile)
  • -mbmi2 -madx excluded on ARM
  • SECP256K1_NO_INT128=1 on 32-bit targets
  • x86 assembly excluded (cannot compile on ARM)

Troubleshooting

NDK Not Found

export ANDROID_NDK_HOME=/full/path/to/ndk

c++_static linkage error

In build.gradle.kts:

cmake { arguments += "-DANDROID_STL=c++_static" }

UnsatisfiedLinkError at Runtime

Check that libsecp256k1_jni.so is in the correct ABI folder (jniLibs/arm64-v8a/).

32-bit build warnings

Normal on ARMv7/x86 builds -- SECP256K1_NO_INT128 is automatically enabled.