# Demo — validating a real production CKBunker deployment This walkthrough shows the harness run against a live, rack-mounted CKBunker + Coldcard Mk4 in HSM mode. Every screenshot is from a real validation run on production hardware, paired with the exact test in this repo that asserts the outcome you see. Use it as reference for what "good" looks like when you run the harness against your own deployment. Environment details (IPs, domain names, device serials) have been generalised; your values will differ. --- ## The deployment being validated ``` ┌──────────────────────────────────────┐ │ Client (laptop / CI runner) │ │ python hsm_validate.py │ └──────┬───────────────────────────────┘ │ Tailscale WireGuard overlay ▼ ┌──────────────────────────────────────┐ │ CKBunker VM │ │ Ubuntu 24.04, Python 3.12 │ │ ckbunker.service (systemd) │ │ hsm. (CF Tunnel) │ │ http://:9823 │ └──────┬───────────────────────────────┘ │ USB HID passthrough ▼ ┌──────────────────────────────────────┐ │ Coldcard Mk4 in HSM mode │ │ " HSM approval" policy │ │ Rule #1 / #2 / TOTP enforcement │ └──────────────────────────────────────┘ ``` **Policy installed on the Coldcard** (abbreviated — yours may differ in thresholds): ``` Rule #1: ≤ 0.001 BTC/txn, ≤ 0.005 BTC/period (needs TOTP from user "operator") Rule #2: ≤ 0.0001 BTC/txn, ≤ 0.0005 BTC/period (auto-approved) Velocity period: 1440 min (24 h) Message signing: any path allowed MicroSD logging: on Boot to HSM: on (6-digit escape code) ``` --- ## Physical setup The Coldcard Mk4 is rack-mounted and USB-attached to the Proxmox host that runs the CKBunker VM. It stays in HSM mode continuously; the keypad is the only channel for policy changes.

Production rack — CKBunker HSM installation — Production rack view. The Coldcard Mk4 is installed in the lower shelf of the stack, USB-tethered to the host running the CKBunker VM. USB passthrough on the hypervisor is configured by vendor/product ID (`d13e:cc10`) so the device survives VM restarts.

Coldcard Mk4 installed — front panel — *Coldcard Mk4 front panel in HSM mode. The keypad is the only path to change policy. Nothing the harness does — and nothing any remote attacker can do — affects what's shown here.*

Coldcard rear — USB tether and ports — *Rear view showing the USB tether. Once the policy is loaded and Boot-to-HSM is enabled, the only way back to the main menu is the 6-digit escape code entered within 60 seconds of power-on.*

--- ## The policy — configured once, enforced forever The Coldcard's policy is loaded on-device via keypad + MicroSD. The CKBunker web UI lets you *author* the policy file before it gets signed and shipped to the Coldcard, but **it cannot modify an already-installed policy over the wire**.

Bunker Setup — Other Policy — Bunker Setup → Other Policy. The 6-digit "Boot To HSM" escape code is the only secret that can take the Coldcard out of HSM mode once the policy is live. The free-form approval note shows on the Coldcard screen when signing, providing a human-readable identifier for the active policy.

The harness reads your expected thresholds from `config.yaml` and asserts every outcome against them. Your policy shape can differ from the example — adjust `policy.auto_approve.per_txn_sats` etc. to match what you actually installed. --- ## Test 1 — `connectivity` The cheapest check: HTTP reachable, WebSocket URL extractable from the page, session cookie obtained. ```bash ./hsm_validate.py --tests connectivity ``` No UI screenshot — this happens before any user-visible action. On success you'll see: ``` ✓ connectivity HTTP + WS endpoint reachable (0.3s) WebSocket URL: ws://:9823/websocket/CBG5KH5BCCG6W3BXDH5QQY5Q Session cookies: yes ``` If **Session cookies: none** appears, you're most likely hitting a CF-Access-protected URL without a service token — auth will fail on the WebSocket upgrade. Switch `CKBUNKER_URL` to your private ingress. --- ## Test 2 — `message_signing` CKBunker can sign an arbitrary text message with a key derived from the Coldcard seed. The server never sees the key; it forwards the message to the Coldcard and returns the signature.

CK Bunker — Text Message Signing — Tools → Text Message Signing on the CKBunker UI. Derivation path `m/84'/0'/0'/1`, segwit (bech32) address. The "Sign Message" button triggers the same WebSocket action the harness invokes programmatically.

The harness verifies the returned signature by sending it back through a wallet (Sparrow in this example) to confirm it validates against the expected address:

Sparrow — Verification Succeeded — Verification succeeded in Sparrow for a CKBunker-produced signature. If this fails, either the Coldcard isn't the device you think it is, or the derivation path in the harness config doesn't match the wallet you're verifying against.

Why this test is cheap and valuable: it doesn't need a UTXO, doesn't affect spending counters, and catches about 80% of "the Coldcard is detached" or "wrong Coldcard" problems in one second. --- ## Test 3 — `rule2_auto_approve` Sub-threshold PSBT (under your Rule #2 per-txn cap) signs with **no TOTP**. ### Via the web UI

CK Bunker — small tx signing page — Signing page for a 9,000-sat PSBT (under a 10,000-sat Rule #2 cap). The Transaction Preview expands the PSBT; "Authorizing User" / "One-Time Code" fields are left empty because Rule #2 does not require them. The policy summary at the bottom is always visible so operators can verify against what's displayed.

CK Bunker — Transaction signed — *Coldcard approved and signed without any human interaction. Approvals counter ticks up; Amount Spent accumulates against the 24-hour velocity budget. The signature came back under a second later.*

### Via the harness ```bash ./hsm_validate.py --tests rule2_auto_approve ``` The harness uses the identical WebSocket protocol the browser uses:

CLI — cksign small transaction — Terminal output from the harness client signing a sub-threshold PSBT. It opens a WebSocket, uploads the PSBT, waits for the Coldcard to evaluate policy, and writes the signed PSBT. No TOTP prompt because Rule #2 does not require one.

To check the output is actually valid, load it in a wallet:

Sparrow — signed small PSBT ready to broadcast — The resulting signed PSBT loaded into Sparrow: "Pay 9,000 sats", the Coldcard signature row is fully filled, Broadcast button is live. End-to-end: harness → CKBunker → Coldcard → signed PSBT → wallet → (would be) broadcast. The harness itself never broadcasts; that's the operator's choice.

Don't broadcast these test PSBTs during a validation run. Re-use the same `small.psbt` fixture across runs while the UTXO is still unspent in your watch-only wallet. --- ## Test 4 — `rule1_without_totp_rejects` — **the critical assertion** The single most important test in this harness. A PSBT over your Rule #2 cap is submitted **without** a TOTP code. The Coldcard must reject it.

Sparrow — unsigned 100,000 sat PSBT — An unsigned 100,000-sat test PSBT (0.001 BTC) — above the 10,000-sat auto-approve cap but within the 100,000-sat user-auth cap. A correctly-configured policy should **refuse** to sign this without TOTP.

CK Bunker — Failed: rejected by Coldcard — The Coldcard responds: **"Rejected: rule #1: need user(s) confirmation, rule #2: would exceed period spending"**. The CKBunker VM had **no power to override this** — the rejection comes from the Coldcard's policy engine. The Refusals counter increments.

The harness asserts not just "some rejection happened" but **that the reason contains "rule #1"**: ```python # tests/test_04_rule1_without_totp_rejects.py assert res.is_expected_rejection("rule #1"), ( f"expected a 'rule #1: need user(s) confirmation' rejection, " f"got status={res.status.value} reason={res.reason!r}" ) ``` ### What failure looks like If this test reports **PASS** when it should fail — i.e. the Coldcard signed an above-threshold PSBT without TOTP — your policy is broken. The harness explicitly flags this case: ``` ✗ rule1_without_totp_rejects policy NOT enforced: large PSBT was signed without TOTP — STOP AND INVESTIGATE ``` Action: exit HSM mode via the escape code and re-install the policy. --- ## Test 5 — `rule1_with_totp_signs` The same large PSBT. A fresh TOTP code. Should sign cleanly.

CLI — cksign with TOTP — Terminal output from the harness: the `--totp` flag auto-generates a 6-digit code from the stored `TOTP_SECRET` (shown here as `579322`, valid for 6 seconds). The client submits the code as a user authorisation, then uploads the PSBT. The Coldcard verifies TOTP against its seeded secret, sees Rule #1 is satisfied, and signs.

Sparrow — signed large PSBT ready to broadcast — The signed large PSBT in Sparrow — same 100k-sat transaction that was rejected in test 4, now with a valid Coldcard signature. The only difference: a 6-digit code held exclusively by the authorised user. The key and policy never moved.

If this test fails with `bad TOTP code` reason: - Your Mac / runner clock is out of sync. TOTP has a 30-second window; check `ntpdate -q time.apple.com` or equivalent. - Your `TOTP_SECRET` env var is stale (TOTP was rotated on the Coldcard but the secret on disk wasn't updated). - The user name in your config doesn't match the user named in the policy's Rule #1. --- ## Test 6 — `counters_tracked` A sanity check that the **server-visible counters** moved as expected. This catches the unlikely case where the harness thinks a sign happened but the CKBunker / Coldcard don't agree.

CK Bunker — dashboard counters after demo — Dashboard after the full harness run: **4 Approvals** (message sign, small PSBT via UI, small PSBT via CLI, large PSBT via CLI+TOTP), **1 Refusal** (the large PSBT attempted without TOTP), **0.00218 BTC** cumulative "amount spent" in the current velocity window. The refusal is the smoking-gun proof that the policy is active.

The harness snapshots the counters before and after running the signing tests, computes the deltas, and asserts they match the number of approvals/rejections it saw in its own results. If the numbers agree, everything from the WebSocket to the device-visible state is consistent. If this test skips with "could not parse dashboard counters on this CKBunker version", the scraper regex didn't find the numbers in your CKBunker's HTML. The signing tests already proved correctness — file an issue with your CKBunker page source if you'd like regex support. --- ## The end-to-end picture All six tests tell you, in one short run, whether the whole trust model is intact: | Layer validated | Tests that cover it | |----------------------------------------------|------------------------| | Network / Tailscale / Cloudflare reachability| 1 | | CKBunker service running, WS protocol intact | 1, 2, 3, 4, 5 | | Coldcard reachable, USB passthrough live | 2 | | Coldcard policy loaded, Rule #2 path active | 3 | | **Coldcard policy Rule #1 gate enforced** | **4** | | TOTP secret in sync between device + holder | 5 | | Server state tracks device decisions | 6 | A green run is a strong signal that the HSM is doing its job. A red run on test 4 is the kind of finding you'd want to wake up for. --- ## Running this yourself Every capability shown here maps to one test in [the test suite](../tests/). To reproduce on your own deployment: 1. Follow the setup in the top-level [README](../README.md). 2. Adjust `config.yaml` to match your policy's per-txn and per-period caps. 3. Craft two PSBTs per [fixtures/README.md](../fixtures/README.md) — one below your Rule #2 cap, one above it. 4. Run `./hsm_validate.py`. A passing run should match the flow in this document. A failing run should tell you exactly which layer of the HSM contract is broken.