# Why this harness exists, and why it's written the way it is ## Why a harness at all The Coldcard HSM's whole value proposition is that the **policy on the device is what enforces safety** — not the VM, not the network, not the operator. That's a great story, until someone mis-installs a policy file and nobody notices because the "happy path" (small, auto-approved txs) still works. Failure modes this harness is designed to catch: 1. **Policy rule collapse** — the auto-approve rule (Rule #2) is loaded but the user-auth rule (Rule #1) is missing or weakened, so large transactions sign without 2FA. The **`rule1_without_totp_rejects` test** is the single most important assertion: it attempts to sign an above-threshold transaction without TOTP and requires a specific rejection reason. 2. **TOTP secret drift** — authenticator app rotated, backup unclear, or a policy rewrite issued a new secret without updating the operator's phone. The **`rule1_with_totp_signs` test** catches this before you need to send a real transaction. 3. **Coldcard USB detach** — Proxmox USB passthrough occasionally detaches after host reboots. CKBunker starts, the UI renders, but the Coldcard isn't actually attached. The **`message_signing` test** catches this cheaply (no UTXO needed). 4. **Cloudflare Access regression** — an accident in the Zero Trust dashboard exposes the bunker to the internet. The harness doesn't directly test CF Access policy, but running it via the Tailscale IP while periodically curl-ing the public hostname catches the "SSO gate missing" case. 5. **Silent server rejection** — CKBunker returns an HTTP 200 with a rejection modal, not an HTTP error code. Automated clients that only check HTTP status can "succeed" against a server that refused to sign. The harness parses the modal and treats rejections as failures when a signature was expected. ## Why WebSocket, not HTTP CKBunker's web UI and its signing protocol live on the same WebSocket endpoint. The HTTP endpoints render HTML only. If you only speak HTTP you can **watch** the counters but can't **cause** a sign. The harness needs to cause signs — so WebSocket. An unfortunate side-effect: Cloudflare Access with service tokens doesn't pass the WebSocket upgrade cleanly. This is why the harness assumes a private ingress (Tailscale) is available even for CF-fronted deployments. ## Why a custom client and not upstream Upstream CKBunker ships a `ckbunker` console script, but in `v0.9.1` it has a broken import path (tries to `import main` from outside the package). There is no packaged Python client. The 500-line client in `ckbunker_hsm_sign/client.py` is hand-rolled against the observed WebSocket protocol — small enough to audit, big enough to be useful, and stable because CKBunker's own Vue front-end doesn't change often. The cost: if upstream changes frame shapes, this harness will need an update. The protocol doc (`PROTOCOL.md`) captures the current shapes so future changes are easy to diff. ## Why the harness doesn't generate PSBTs **Generating spendable PSBTs requires the Coldcard's xpub, a UTXO, and a recipient.** That's significant state that differs per deployment. The harness stays deployment-agnostic by accepting **pre-crafted PSBT fixtures** (see [`fixtures/README.md`](../fixtures/README.md)). This also means you don't risk spending real sats on a validation run. The same `large.psbt` can be re-used indefinitely for the reject-path test because the Coldcard rejects on **amount**, not UTXO availability. ## Why config over code Every deployment has its own policy shape. Rather than hard-code "10,000 sats" as the auto-approve cap, the harness reads thresholds from `config.yaml` and asserts them against outcomes. If your Rule #2 per-txn cap is 50,000 sats, you: 1. Edit `config.yaml` — set `policy.auto_approve.per_txn_sats: 50000`. 2. Craft `small.psbt` at 49,000 sats and `large.psbt` at 100,000 sats. 3. Run the harness. No code changes. The **outcomes** the harness asserts are framed as "this PSBT should/shouldn't sign in this path", not "this specific sat amount should sign". ## Why pytest AND a CLI Different operators want different ergonomics: - **`hsm_validate.py`** (CLI) — human-readable coloured output, runs the tests in order, exits 0/1/2. Good for oncall dashboards, cron monitors, demoing to stakeholders. - **`pytest tests/`** — integrates with existing CI, produces JUnit XML, lets you parametrise against multiple environments. Good for automated deploy gates. Both paths share the same client, fixtures, and config loader — there's no duplication. ## Why the tests are numbered (`test_01`, `test_02` …) pytest doesn't guarantee execution order across files. The numbered prefixes ensure the order reads top-to-bottom when presented (by collection order and by `pytest -v` output), matching the narrative of the CLI harness. This helps when screenshotting a run for an incident report — the sequence looks sensible. ## Why we scrape the dashboard at all The counters test is a **sanity check against client-side deception**. If a future bug in the client mis-identifies a rejection as a signature (or vice versa), the dashboard deltas reveal it: the Coldcard doesn't lie about whether it signed, and the dashboard reflects Coldcard state. If the harness says "4 signs, 1 reject" but the dashboard shows "0 signs, 0 rejects", something is wrong at the network layer. The scraper is tolerant: CKBunker versions vary in HTML shape, so if the regex can't find the numbers the test skips rather than fails. The real signing assertions already prove end-to-end correctness. ## Why rejections aren't exceptions A rejection is a successful policy evaluation — the **Coldcard did exactly what it was configured to do**. Treating rejections as Python exceptions would: - force every call site into try/except - conflate policy behaviour with transport errors (network, timeout) - hide the rejection reason behind an exception type Instead, `SignResult.status` is an enum with four values (`SIGNED`, `REJECTED`, `TIMEOUT`, `WS_ERROR`) and the caller asserts the status it expects. `is_expected_rejection("rule #1")` keeps the specific-reason check terse. ## Why "don't broadcast" is the default `submit_psbt` accepts a `broadcast=True` flag that asks CKBunker to push the signed tx. The harness always sends `broadcast=false`. A validation run should never touch the mempool. Operators who want to drive real signings via this client should use it directly, not via the harness. ## Why there's no CI/CD templating Every shop's CI is different (GitHub Actions, Drone, Gitea Actions, Jenkins, Woodpecker). Providing a single-vendor pipeline template would add maintenance burden without saving meaningful integration time. The `hsm_validate.py` CLI returns exit code 0 on success, 1 on failure — which is all any CI needs. Integration examples live in the README.