docs: add DEMO.md — full walkthrough against a real production deployment
Adds a demonstration doc showing every harness test mapped to the UI state you should see on a correctly-configured CKBunker + Coldcard HSM. Each screenshot is paired with the specific test that asserts the outcome, plus guidance on what failure at that step means. Sensitive/site-specific identifiers (IPs, domain, device serial, CF tunnel UUID) are generalised so the doc reads as a template for any deployment. 15 screenshots in docs/images/ cover: physical rack installation, policy config UI, message signing end-to-end, sub-threshold auto-sign via web UI and CLI, the critical policy-rejection case, TOTP-authorised signing, and dashboard counter verification.
@ -6,6 +6,8 @@ Runs a short, structured sequence of tests against a live CKBunker and exits non
|
|||||||
|
|
||||||
> **The critical test**: a transaction above your auto-approve cap is submitted without 2FA. The Coldcard must reject it with a specific `rule #1: need user(s) confirmation` error. If it signs, something is catastrophically wrong with your policy and the harness exits with a loud failure.
|
> **The critical test**: a transaction above your auto-approve cap is submitted without 2FA. The Coldcard must reject it with a specific `rule #1: need user(s) confirmation` error. If it signs, something is catastrophically wrong with your policy and the harness exits with a loud failure.
|
||||||
|
|
||||||
|
📖 **See [`docs/DEMO.md`](docs/DEMO.md) for a full walkthrough** against a real rack-mounted production deployment, with screenshots of every test showing the expected UI state. Use it as the reference for "what good looks like" before you run the harness on your own CKBunker.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Table of contents
|
## Table of contents
|
||||||
@ -438,9 +440,11 @@ parameter from it.
|
|||||||
│ └── README.md ← how to generate test PSBTs
|
│ └── README.md ← how to generate test PSBTs
|
||||||
│
|
│
|
||||||
└── docs/
|
└── docs/
|
||||||
|
├── DEMO.md ← full demo against a real production deployment
|
||||||
├── PROTOCOL.md ← CKBunker WebSocket protocol reference
|
├── PROTOCOL.md ← CKBunker WebSocket protocol reference
|
||||||
├── WHY.md ← design rationale
|
├── WHY.md ← design rationale
|
||||||
└── POLICY_RECOMMENDATIONS.md ← how to design a two-tier policy
|
├── POLICY_RECOMMENDATIONS.md ← how to design a two-tier policy
|
||||||
|
└── images/ ← screenshots used in DEMO.md
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
308
docs/DEMO.md
Normal file
@ -0,0 +1,308 @@
|
|||||||
|
# Demo — validating a real production CKBunker deployment
|
||||||
|
|
||||||
|
This walkthrough shows the harness run against a live, rack-mounted
|
||||||
|
CKBunker + Coldcard Mk4 in HSM mode. Every screenshot is from a real
|
||||||
|
validation run on production hardware, paired with the exact test in this
|
||||||
|
repo that asserts the outcome you see. Use it as reference for what "good"
|
||||||
|
looks like when you run the harness against your own deployment.
|
||||||
|
|
||||||
|
Environment details (IPs, domain names, device serials) have been generalised;
|
||||||
|
your values will differ.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The deployment being validated
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────────────┐
|
||||||
|
│ Client (laptop / CI runner) │
|
||||||
|
│ python hsm_validate.py │
|
||||||
|
└──────┬───────────────────────────────┘
|
||||||
|
│ Tailscale WireGuard overlay
|
||||||
|
▼
|
||||||
|
┌──────────────────────────────────────┐
|
||||||
|
│ CKBunker VM │
|
||||||
|
│ Ubuntu 24.04, Python 3.12 │
|
||||||
|
│ ckbunker.service (systemd) │
|
||||||
|
│ hsm.<your-domain> (CF Tunnel) │
|
||||||
|
│ http://<tailnet-ip>:9823 │
|
||||||
|
└──────┬───────────────────────────────┘
|
||||||
|
│ USB HID passthrough
|
||||||
|
▼
|
||||||
|
┌──────────────────────────────────────┐
|
||||||
|
│ Coldcard Mk4 in HSM mode │
|
||||||
|
│ "<your org> HSM approval" policy │
|
||||||
|
│ Rule #1 / #2 / TOTP enforcement │
|
||||||
|
└──────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
**Policy installed on the Coldcard** (abbreviated — yours may differ in
|
||||||
|
thresholds):
|
||||||
|
|
||||||
|
```
|
||||||
|
Rule #1: ≤ 0.001 BTC/txn, ≤ 0.005 BTC/period (needs TOTP from user "operator")
|
||||||
|
Rule #2: ≤ 0.0001 BTC/txn, ≤ 0.0005 BTC/period (auto-approved)
|
||||||
|
Velocity period: 1440 min (24 h)
|
||||||
|
Message signing: any path allowed
|
||||||
|
MicroSD logging: on
|
||||||
|
Boot to HSM: on (6-digit escape code)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Physical setup
|
||||||
|
|
||||||
|
The Coldcard Mk4 is rack-mounted and USB-attached to the Proxmox host
|
||||||
|
that runs the CKBunker VM. It stays in HSM mode continuously; the keypad
|
||||||
|
is the only channel for policy changes.
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/14-production-rack.jpg" alt="Production rack — CKBunker HSM installation">
|
||||||
|
<figcaption><em>Production rack view. The Coldcard Mk4 is installed in the lower shelf of the stack, USB-tethered to the host running the CKBunker VM. USB passthrough on the hypervisor is configured by vendor/product ID (<code>d13e:cc10</code>) so the device survives VM restarts.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/13-coldcard-installed-closeup.jpg" alt="Coldcard Mk4 installed — front panel">
|
||||||
|
<figcaption><em>Coldcard Mk4 front panel in HSM mode. The keypad is the only path to change policy. Nothing the harness does — and nothing any remote attacker can do — affects what's shown here.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/15-coldcard-ports.jpg" alt="Coldcard rear — USB tether and ports">
|
||||||
|
<figcaption><em>Rear view showing the USB tether. Once the policy is loaded and Boot-to-HSM is enabled, the only way back to the main menu is the 6-digit escape code entered within 60 seconds of power-on.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The policy — configured once, enforced forever
|
||||||
|
|
||||||
|
The Coldcard's policy is loaded on-device via keypad + MicroSD. The
|
||||||
|
CKBunker web UI lets you *author* the policy file before it gets signed
|
||||||
|
and shipped to the Coldcard, but **it cannot modify an already-installed
|
||||||
|
policy over the wire**.
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/07-policy-bunker-setup.png" alt="Bunker Setup — Other Policy">
|
||||||
|
<figcaption><em>Bunker Setup → Other Policy. The 6-digit "Boot To HSM" escape code is the only secret that can take the Coldcard out of HSM mode once the policy is live. The free-form approval note shows on the Coldcard screen when signing, providing a human-readable identifier for the active policy.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
The harness reads your expected thresholds from `config.yaml` and asserts
|
||||||
|
every outcome against them. Your policy shape can differ from the example
|
||||||
|
— adjust `policy.auto_approve.per_txn_sats` etc. to match what you
|
||||||
|
actually installed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 1 — `connectivity`
|
||||||
|
|
||||||
|
The cheapest check: HTTP reachable, WebSocket URL extractable from the
|
||||||
|
page, session cookie obtained.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./hsm_validate.py --tests connectivity
|
||||||
|
```
|
||||||
|
|
||||||
|
No UI screenshot — this happens before any user-visible action. On success
|
||||||
|
you'll see:
|
||||||
|
|
||||||
|
```
|
||||||
|
✓ connectivity HTTP + WS endpoint reachable (0.3s)
|
||||||
|
WebSocket URL: ws://<bunker>:9823/websocket/CBG5KH5BCCG6W3BXDH5QQY5Q
|
||||||
|
Session cookies: yes
|
||||||
|
```
|
||||||
|
|
||||||
|
If **Session cookies: none** appears, you're most likely hitting a
|
||||||
|
CF-Access-protected URL without a service token — auth will fail on the
|
||||||
|
WebSocket upgrade. Switch `CKBUNKER_URL` to your private ingress.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 2 — `message_signing`
|
||||||
|
|
||||||
|
CKBunker can sign an arbitrary text message with a key derived from the
|
||||||
|
Coldcard seed. The server never sees the key; it forwards the message to
|
||||||
|
the Coldcard and returns the signature.
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/02-message-signing-ui.png" alt="CK Bunker — Text Message Signing">
|
||||||
|
<figcaption><em>Tools → Text Message Signing on the CKBunker UI. Derivation path <code>m/84'/0'/0'/1</code>, segwit (bech32) address. The "Sign Message" button triggers the same WebSocket action the harness invokes programmatically.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
The harness verifies the returned signature by sending it back through a
|
||||||
|
wallet (Sparrow in this example) to confirm it validates against the
|
||||||
|
expected address:
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/01-message-signed-verified.png" alt="Sparrow — Verification Succeeded">
|
||||||
|
<figcaption><em>Verification succeeded in Sparrow for a CKBunker-produced signature. If this fails, either the Coldcard isn't the device you think it is, or the derivation path in the harness config doesn't match the wallet you're verifying against.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
Why this test is cheap and valuable: it doesn't need a UTXO, doesn't
|
||||||
|
affect spending counters, and catches about 80% of "the Coldcard is
|
||||||
|
detached" or "wrong Coldcard" problems in one second.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 3 — `rule2_auto_approve`
|
||||||
|
|
||||||
|
Sub-threshold PSBT (under your Rule #2 per-txn cap) signs with **no TOTP**.
|
||||||
|
|
||||||
|
### Via the web UI
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/05-tx-small-signing.png" alt="CK Bunker — small tx signing page">
|
||||||
|
<figcaption><em>Signing page for a 9,000-sat PSBT (under a 10,000-sat Rule #2 cap). The Transaction Preview expands the PSBT; "Authorizing User" / "One-Time Code" fields are left empty because Rule #2 does not require them. The policy summary at the bottom is always visible so operators can verify against what's displayed.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/06-tx-small-success.png" alt="CK Bunker — Transaction signed">
|
||||||
|
<figcaption><em>Coldcard approved and signed without any human interaction. Approvals counter ticks up; Amount Spent accumulates against the 24-hour velocity budget. The signature came back under a second later.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
### Via the harness
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./hsm_validate.py --tests rule2_auto_approve
|
||||||
|
```
|
||||||
|
|
||||||
|
The harness uses the identical WebSocket protocol the browser uses:
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/08-cli-sign-small.png" alt="CLI — cksign small transaction">
|
||||||
|
<figcaption><em>Terminal output from the harness client signing a sub-threshold PSBT. It opens a WebSocket, uploads the PSBT, waits for the Coldcard to evaluate policy, and writes the signed PSBT. No TOTP prompt because Rule #2 does not require one.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
To check the output is actually valid, load it in a wallet:
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/09-tx-small-broadcast-ready.png" alt="Sparrow — signed small PSBT ready to broadcast">
|
||||||
|
<figcaption><em>The resulting signed PSBT loaded into Sparrow: "Pay 9,000 sats", the Coldcard signature row is fully filled, Broadcast button is live. End-to-end: harness → CKBunker → Coldcard → signed PSBT → wallet → (would be) broadcast. The harness itself never broadcasts; that's the operator's choice.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
Don't broadcast these test PSBTs during a validation run. Re-use the same
|
||||||
|
`small.psbt` fixture across runs while the UTXO is still unspent in your
|
||||||
|
watch-only wallet.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 4 — `rule1_without_totp_rejects` — **the critical assertion**
|
||||||
|
|
||||||
|
The single most important test in this harness. A PSBT over your Rule #2
|
||||||
|
cap is submitted **without** a TOTP code. The Coldcard must reject it.
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/03-tx-large-unsigned.png" alt="Sparrow — unsigned 100,000 sat PSBT">
|
||||||
|
<figcaption><em>An unsigned 100,000-sat test PSBT (0.001 BTC) — above the 10,000-sat auto-approve cap but within the 100,000-sat user-auth cap. A correctly-configured policy should <strong>refuse</strong> to sign this without TOTP.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/04-tx-large-rejected.png" alt="CK Bunker — Failed: rejected by Coldcard">
|
||||||
|
<figcaption><em>The Coldcard responds: <strong>"Rejected: rule #1: need user(s) confirmation, rule #2: would exceed period spending"</strong>. The CKBunker VM had <strong>no power to override this</strong> — the rejection comes from the Coldcard's policy engine. The Refusals counter increments.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
The harness asserts not just "some rejection happened" but **that the
|
||||||
|
reason contains "rule #1"**:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# tests/test_04_rule1_without_totp_rejects.py
|
||||||
|
assert res.is_expected_rejection("rule #1"), (
|
||||||
|
f"expected a 'rule #1: need user(s) confirmation' rejection, "
|
||||||
|
f"got status={res.status.value} reason={res.reason!r}"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### What failure looks like
|
||||||
|
|
||||||
|
If this test reports **PASS** when it should fail — i.e. the Coldcard
|
||||||
|
signed an above-threshold PSBT without TOTP — your policy is broken. The
|
||||||
|
harness explicitly flags this case:
|
||||||
|
|
||||||
|
```
|
||||||
|
✗ rule1_without_totp_rejects policy NOT enforced: large PSBT was signed
|
||||||
|
without TOTP — STOP AND INVESTIGATE
|
||||||
|
```
|
||||||
|
|
||||||
|
Action: exit HSM mode via the escape code and re-install the policy.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 5 — `rule1_with_totp_signs`
|
||||||
|
|
||||||
|
The same large PSBT. A fresh TOTP code. Should sign cleanly.
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/10-cli-sign-totp.png" alt="CLI — cksign with TOTP">
|
||||||
|
<figcaption><em>Terminal output from the harness: the <code>--totp</code> flag auto-generates a 6-digit code from the stored <code>TOTP_SECRET</code> (shown here as <code>579322</code>, valid for 6 seconds). The client submits the code as a user authorisation, then uploads the PSBT. The Coldcard verifies TOTP against its seeded secret, sees Rule #1 is satisfied, and signs.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/11-tx-large-broadcast-ready.png" alt="Sparrow — signed large PSBT ready to broadcast">
|
||||||
|
<figcaption><em>The signed large PSBT in Sparrow — same 100k-sat transaction that was rejected in test 4, now with a valid Coldcard signature. The only difference: a 6-digit code held exclusively by the authorised user. The key and policy never moved.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
If this test fails with `bad TOTP code` reason:
|
||||||
|
|
||||||
|
- Your Mac / runner clock is out of sync. TOTP has a 30-second window;
|
||||||
|
check `ntpdate -q time.apple.com` or equivalent.
|
||||||
|
- Your `TOTP_SECRET` env var is stale (TOTP was rotated on the Coldcard
|
||||||
|
but the secret on disk wasn't updated).
|
||||||
|
- The user name in your config doesn't match the user named in the
|
||||||
|
policy's Rule #1.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test 6 — `counters_tracked`
|
||||||
|
|
||||||
|
A sanity check that the **server-visible counters** moved as expected.
|
||||||
|
This catches the unlikely case where the harness thinks a sign happened
|
||||||
|
but the CKBunker / Coldcard don't agree.
|
||||||
|
|
||||||
|
<figure>
|
||||||
|
<img src="images/12-dashboard-counters.png" alt="CK Bunker — dashboard counters after demo">
|
||||||
|
<figcaption><em>Dashboard after the full harness run: <strong>4 Approvals</strong> (message sign, small PSBT via UI, small PSBT via CLI, large PSBT via CLI+TOTP), <strong>1 Refusal</strong> (the large PSBT attempted without TOTP), <strong>0.00218 BTC</strong> cumulative "amount spent" in the current velocity window. The refusal is the smoking-gun proof that the policy is active.</em></figcaption>
|
||||||
|
</figure>
|
||||||
|
|
||||||
|
The harness snapshots the counters before and after running the signing
|
||||||
|
tests, computes the deltas, and asserts they match the number of
|
||||||
|
approvals/rejections it saw in its own results. If the numbers agree,
|
||||||
|
everything from the WebSocket to the device-visible state is consistent.
|
||||||
|
|
||||||
|
If this test skips with "could not parse dashboard counters on this
|
||||||
|
CKBunker version", the scraper regex didn't find the numbers in your
|
||||||
|
CKBunker's HTML. The signing tests already proved correctness — file an
|
||||||
|
issue with your CKBunker page source if you'd like regex support.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The end-to-end picture
|
||||||
|
|
||||||
|
All six tests tell you, in one short run, whether the whole trust model
|
||||||
|
is intact:
|
||||||
|
|
||||||
|
| Layer validated | Tests that cover it |
|
||||||
|
|----------------------------------------------|------------------------|
|
||||||
|
| Network / Tailscale / Cloudflare reachability| 1 |
|
||||||
|
| CKBunker service running, WS protocol intact | 1, 2, 3, 4, 5 |
|
||||||
|
| Coldcard reachable, USB passthrough live | 2 |
|
||||||
|
| Coldcard policy loaded, Rule #2 path active | 3 |
|
||||||
|
| **Coldcard policy Rule #1 gate enforced** | **4** |
|
||||||
|
| TOTP secret in sync between device + holder | 5 |
|
||||||
|
| Server state tracks device decisions | 6 |
|
||||||
|
|
||||||
|
A green run is a strong signal that the HSM is doing its job. A red run
|
||||||
|
on test 4 is the kind of finding you'd want to wake up for.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Running this yourself
|
||||||
|
|
||||||
|
Every capability shown here maps to one test in [the test suite](../tests/).
|
||||||
|
To reproduce on your own deployment:
|
||||||
|
|
||||||
|
1. Follow the setup in the top-level [README](../README.md).
|
||||||
|
2. Adjust `config.yaml` to match your policy's per-txn and per-period caps.
|
||||||
|
3. Craft two PSBTs per [fixtures/README.md](../fixtures/README.md) —
|
||||||
|
one below your Rule #2 cap, one above it.
|
||||||
|
4. Run `./hsm_validate.py`.
|
||||||
|
|
||||||
|
A passing run should match the flow in this document. A failing run
|
||||||
|
should tell you exactly which layer of the HSM contract is broken.
|
||||||
BIN
docs/images/01-message-signed-verified.png
Normal file
|
After Width: | Height: | Size: 158 KiB |
BIN
docs/images/02-message-signing-ui.png
Normal file
|
After Width: | Height: | Size: 144 KiB |
BIN
docs/images/03-tx-large-unsigned.png
Normal file
|
After Width: | Height: | Size: 134 KiB |
BIN
docs/images/04-tx-large-rejected.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/images/05-tx-small-signing.png
Normal file
|
After Width: | Height: | Size: 115 KiB |
BIN
docs/images/06-tx-small-success.png
Normal file
|
After Width: | Height: | Size: 137 KiB |
BIN
docs/images/07-policy-bunker-setup.png
Normal file
|
After Width: | Height: | Size: 134 KiB |
BIN
docs/images/08-cli-sign-small.png
Normal file
|
After Width: | Height: | Size: 77 KiB |
BIN
docs/images/09-tx-small-broadcast-ready.png
Normal file
|
After Width: | Height: | Size: 146 KiB |
BIN
docs/images/10-cli-sign-totp.png
Normal file
|
After Width: | Height: | Size: 151 KiB |
BIN
docs/images/11-tx-large-broadcast-ready.png
Normal file
|
After Width: | Height: | Size: 192 KiB |
BIN
docs/images/12-dashboard-counters.png
Normal file
|
After Width: | Height: | Size: 199 KiB |
BIN
docs/images/13-coldcard-installed-closeup.jpg
Normal file
|
After Width: | Height: | Size: 158 KiB |
BIN
docs/images/14-production-rack.jpg
Normal file
|
After Width: | Height: | Size: 227 KiB |
BIN
docs/images/15-coldcard-ports.jpg
Normal file
|
After Width: | Height: | Size: 159 KiB |