mineracks-ckbunker-hsm-sign/docs/WHY.md

# Why this harness exists, and why it's written the way it is

## Why a harness at all

The Coldcard HSM's whole value proposition is that the **policy on the
device is what enforces safety** — not the VM, not the network, not the
operator. That's a great story, until someone mis-installs a policy file
and nobody notices because the "happy path" (small, auto-approved txs)
still works.

Failure modes this harness is designed to catch:

1. **Policy rule collapse** — the auto-approve rule (Rule #2) is loaded
   but the user-auth rule (Rule #1) is missing or weakened, so large
   transactions sign without 2FA. The **`rule1_without_totp_rejects`
   test** is the single most important assertion: it attempts to sign an
   above-threshold transaction without TOTP and requires a specific
   rejection reason.

2. **TOTP secret drift** — authenticator app rotated, backup unclear, or
   a policy rewrite issued a new secret without updating the operator's
   phone. The **`rule1_with_totp_signs` test** catches this before you
   need to send a real transaction.

3. **Coldcard USB detach** — Proxmox USB passthrough occasionally
   detaches after host reboots. CKBunker starts, the UI renders, but the
   Coldcard isn't actually attached. The **`message_signing` test**
   catches this cheaply (no UTXO needed).

4. **Cloudflare Access regression** — an accident in the Zero Trust
   dashboard exposes the bunker to the internet. The harness doesn't
   directly test CF Access policy, but running it via the Tailscale IP
   while periodically curl-ing the public hostname catches the
   "SSO gate missing" case.

5. **Silent server rejection** — CKBunker returns an HTTP 200 with a
   rejection modal, not an HTTP error code. Automated clients that only
   check HTTP status can "succeed" against a server that refused to
   sign. The harness parses the modal and treats rejections as failures
   when a signature was expected.

## Why WebSocket, not HTTP

CKBunker's web UI and its signing protocol live on the same WebSocket
endpoint. The HTTP endpoints render HTML only. If you only speak HTTP
you can **watch** the counters but can't **cause** a sign. The harness
needs to cause signs — so WebSocket.

An unfortunate side-effect: Cloudflare Access with service tokens
doesn't pass the WebSocket upgrade cleanly. This is why the harness
assumes a private ingress (Tailscale) is available even for
CF-fronted deployments.

## Why a custom client and not upstream

Upstream CKBunker ships a `ckbunker` console script, but in `v0.9.1` it
has a broken import path (tries to `import main` from outside the
package). There is no packaged Python client. The 500-line client in
`ckbunker_hsm_sign/client.py` is hand-rolled against the observed
WebSocket protocol — small enough to audit, big enough to be useful,
and stable because CKBunker's own Vue front-end doesn't change often.

The cost: if upstream changes frame shapes, this harness will need an
update. The protocol doc (`PROTOCOL.md`) captures the current shapes so
future changes are easy to diff.

## Why the harness doesn't generate PSBTs

**Generating spendable PSBTs requires the Coldcard's xpub, a UTXO, and
a recipient.** That's significant state that differs per deployment. The
harness stays deployment-agnostic by accepting **pre-crafted PSBT
fixtures** (see [`fixtures/README.md`](../fixtures/README.md)).

This also means you don't risk spending real sats on a validation run.
The same `large.psbt` can be re-used indefinitely for the reject-path
test because the Coldcard rejects on **amount**, not UTXO availability.

## Why config over code

Every deployment has its own policy shape. Rather than hard-code
"10,000 sats" as the auto-approve cap, the harness reads thresholds
from `config.yaml` and asserts them against outcomes. If your Rule #2
per-txn cap is 50,000 sats, you:

1. Edit `config.yaml` — set `policy.auto_approve.per_txn_sats: 50000`.
2. Craft `small.psbt` at 49,000 sats and `large.psbt` at 100,000 sats.
3. Run the harness.

No code changes. The **outcomes** the harness asserts are framed as
"this PSBT should/shouldn't sign in this path", not "this specific sat
amount should sign".

## Why pytest AND a CLI

Different operators want different ergonomics:

- **`hsm_validate.py`** (CLI) — human-readable coloured output, runs the
  tests in order, exits 0/1/2. Good for oncall dashboards, cron monitors,
  demoing to stakeholders.
- **`pytest tests/`** — integrates with existing CI, produces JUnit XML,
  lets you parametrise against multiple environments. Good for
  automated deploy gates.

Both paths share the same client, fixtures, and config loader — there's
no duplication.

## Why the tests are numbered (`test_01`, `test_02` …)

pytest doesn't guarantee execution order across files. The numbered
prefixes ensure the order reads top-to-bottom when presented (by
collection order and by `pytest -v` output), matching the narrative
of the CLI harness. This helps when screenshotting a run for an
incident report — the sequence looks sensible.

## Why we scrape the dashboard at all

The counters test is a **sanity check against client-side deception**.
If a future bug in the client mis-identifies a rejection as a
signature (or vice versa), the dashboard deltas reveal it: the
Coldcard doesn't lie about whether it signed, and the dashboard
reflects Coldcard state. If the harness says "4 signs, 1 reject" but
the dashboard shows "0 signs, 0 rejects", something is wrong at the
network layer.

The scraper is tolerant: CKBunker versions vary in HTML shape, so if
the regex can't find the numbers the test skips rather than fails.
The real signing assertions already prove end-to-end correctness.

## Why rejections aren't exceptions

A rejection is a successful policy evaluation — the **Coldcard did
exactly what it was configured to do**. Treating rejections as Python
exceptions would:

- force every call site into try/except
- conflate policy behaviour with transport errors (network, timeout)
- hide the rejection reason behind an exception type

Instead, `SignResult.status` is an enum with four values (`SIGNED`,
`REJECTED`, `TIMEOUT`, `WS_ERROR`) and the caller asserts the status it
expects. `is_expected_rejection("rule #1")` keeps the specific-reason
check terse.

## Why "don't broadcast" is the default

`submit_psbt` accepts a `broadcast=True` flag that asks CKBunker to
push the signed tx. The harness always sends `broadcast=false`. A
validation run should never touch the mempool. Operators who want to
drive real signings via this client should use it directly, not via
the harness.

## Why there's no CI/CD templating

Every shop's CI is different (GitHub Actions, Drone, Gitea Actions,
Jenkins, Woodpecker). Providing a single-vendor pipeline template
would add maintenance burden without saving meaningful integration
time. The `hsm_validate.py` CLI returns exit code 0 on success, 1 on
failure — which is all any CI needs. Integration examples live in the
README.