mineracks-ckbunker-hsm-sign/docs/WHY.md
mineracks 9d380f5013 Initial import: CKBunker HSM validation harness
WebSocket client + CLI harness + pytest suite that exercises each axis of
a CKBunker + Coldcard Mk4 policy and asserts the expected outcomes, including
the critical negative test that a large PSBT without TOTP is rejected with
a specific 'rule #1: need user(s) confirmation' reason.

Configuration via .env / YAML / CLI flags, two pre-crafted test PSBTs as
fixtures (generation guide in fixtures/README.md), dashboard counter
scraper as sanity check, design rationale in docs/.
2026-04-14 10:50:04 +10:00

160 lines
6.9 KiB
Markdown

# Why this harness exists, and why it's written the way it is
## Why a harness at all
The Coldcard HSM's whole value proposition is that the **policy on the
device is what enforces safety** — not the VM, not the network, not the
operator. That's a great story, until someone mis-installs a policy file
and nobody notices because the "happy path" (small, auto-approved txs)
still works.
Failure modes this harness is designed to catch:
1. **Policy rule collapse** — the auto-approve rule (Rule #2) is loaded
but the user-auth rule (Rule #1) is missing or weakened, so large
transactions sign without 2FA. The **`rule1_without_totp_rejects`
test** is the single most important assertion: it attempts to sign an
above-threshold transaction without TOTP and requires a specific
rejection reason.
2. **TOTP secret drift** — authenticator app rotated, backup unclear, or
a policy rewrite issued a new secret without updating the operator's
phone. The **`rule1_with_totp_signs` test** catches this before you
need to send a real transaction.
3. **Coldcard USB detach** — Proxmox USB passthrough occasionally
detaches after host reboots. CKBunker starts, the UI renders, but the
Coldcard isn't actually attached. The **`message_signing` test**
catches this cheaply (no UTXO needed).
4. **Cloudflare Access regression** — an accident in the Zero Trust
dashboard exposes the bunker to the internet. The harness doesn't
directly test CF Access policy, but running it via the Tailscale IP
while periodically curl-ing the public hostname catches the
"SSO gate missing" case.
5. **Silent server rejection** — CKBunker returns an HTTP 200 with a
rejection modal, not an HTTP error code. Automated clients that only
check HTTP status can "succeed" against a server that refused to
sign. The harness parses the modal and treats rejections as failures
when a signature was expected.
## Why WebSocket, not HTTP
CKBunker's web UI and its signing protocol live on the same WebSocket
endpoint. The HTTP endpoints render HTML only. If you only speak HTTP
you can **watch** the counters but can't **cause** a sign. The harness
needs to cause signs — so WebSocket.
An unfortunate side-effect: Cloudflare Access with service tokens
doesn't pass the WebSocket upgrade cleanly. This is why the harness
assumes a private ingress (Tailscale) is available even for
CF-fronted deployments.
## Why a custom client and not upstream
Upstream CKBunker ships a `ckbunker` console script, but in `v0.9.1` it
has a broken import path (tries to `import main` from outside the
package). There is no packaged Python client. The 500-line client in
`ckbunker_hsm_sign/client.py` is hand-rolled against the observed
WebSocket protocol — small enough to audit, big enough to be useful,
and stable because CKBunker's own Vue front-end doesn't change often.
The cost: if upstream changes frame shapes, this harness will need an
update. The protocol doc (`PROTOCOL.md`) captures the current shapes so
future changes are easy to diff.
## Why the harness doesn't generate PSBTs
**Generating spendable PSBTs requires the Coldcard's xpub, a UTXO, and
a recipient.** That's significant state that differs per deployment. The
harness stays deployment-agnostic by accepting **pre-crafted PSBT
fixtures** (see [`fixtures/README.md`](../fixtures/README.md)).
This also means you don't risk spending real sats on a validation run.
The same `large.psbt` can be re-used indefinitely for the reject-path
test because the Coldcard rejects on **amount**, not UTXO availability.
## Why config over code
Every deployment has its own policy shape. Rather than hard-code
"10,000 sats" as the auto-approve cap, the harness reads thresholds
from `config.yaml` and asserts them against outcomes. If your Rule #2
per-txn cap is 50,000 sats, you:
1. Edit `config.yaml` — set `policy.auto_approve.per_txn_sats: 50000`.
2. Craft `small.psbt` at 49,000 sats and `large.psbt` at 100,000 sats.
3. Run the harness.
No code changes. The **outcomes** the harness asserts are framed as
"this PSBT should/shouldn't sign in this path", not "this specific sat
amount should sign".
## Why pytest AND a CLI
Different operators want different ergonomics:
- **`hsm_validate.py`** (CLI) — human-readable coloured output, runs the
tests in order, exits 0/1/2. Good for oncall dashboards, cron monitors,
demoing to stakeholders.
- **`pytest tests/`** — integrates with existing CI, produces JUnit XML,
lets you parametrise against multiple environments. Good for
automated deploy gates.
Both paths share the same client, fixtures, and config loader — there's
no duplication.
## Why the tests are numbered (`test_01`, `test_02` …)
pytest doesn't guarantee execution order across files. The numbered
prefixes ensure the order reads top-to-bottom when presented (by
collection order and by `pytest -v` output), matching the narrative
of the CLI harness. This helps when screenshotting a run for an
incident report — the sequence looks sensible.
## Why we scrape the dashboard at all
The counters test is a **sanity check against client-side deception**.
If a future bug in the client mis-identifies a rejection as a
signature (or vice versa), the dashboard deltas reveal it: the
Coldcard doesn't lie about whether it signed, and the dashboard
reflects Coldcard state. If the harness says "4 signs, 1 reject" but
the dashboard shows "0 signs, 0 rejects", something is wrong at the
network layer.
The scraper is tolerant: CKBunker versions vary in HTML shape, so if
the regex can't find the numbers the test skips rather than fails.
The real signing assertions already prove end-to-end correctness.
## Why rejections aren't exceptions
A rejection is a successful policy evaluation — the **Coldcard did
exactly what it was configured to do**. Treating rejections as Python
exceptions would:
- force every call site into try/except
- conflate policy behaviour with transport errors (network, timeout)
- hide the rejection reason behind an exception type
Instead, `SignResult.status` is an enum with four values (`SIGNED`,
`REJECTED`, `TIMEOUT`, `WS_ERROR`) and the caller asserts the status it
expects. `is_expected_rejection("rule #1")` keeps the specific-reason
check terse.
## Why "don't broadcast" is the default
`submit_psbt` accepts a `broadcast=True` flag that asks CKBunker to
push the signed tx. The harness always sends `broadcast=false`. A
validation run should never touch the mempool. Operators who want to
drive real signings via this client should use it directly, not via
the harness.
## Why there's no CI/CD templating
Every shop's CI is different (GitHub Actions, Drone, Gitea Actions,
Jenkins, Woodpecker). Providing a single-vendor pipeline template
would add maintenance burden without saving meaningful integration
time. The `hsm_validate.py` CLI returns exit code 0 on success, 1 on
failure — which is all any CI needs. Integration examples live in the
README.