WebSocket client + CLI harness + pytest suite that exercises each axis of a CKBunker + Coldcard Mk4 policy and asserts the expected outcomes, including the critical negative test that a large PSBT without TOTP is rejected with a specific 'rule #1: need user(s) confirmation' reason. Configuration via .env / YAML / CLI flags, two pre-crafted test PSBTs as fixtures (generation guide in fixtures/README.md), dashboard counter scraper as sanity check, design rationale in docs/.
128 lines
4.7 KiB
Markdown
128 lines
4.7 KiB
Markdown
# Policy design recommendations
|
||
|
||
Not exhaustive — consult the upstream
|
||
[Coldcard HSM docs](https://coldcardwallet.com/docs/ckbunker-hsm) for the
|
||
full grammar. This file captures the two-tier pattern the harness is
|
||
designed around and why it's a reasonable starting point for a signing
|
||
HSM that backs automation.
|
||
|
||
## The two-tier pattern
|
||
|
||
```
|
||
Rule #2 (auto-sign, no user auth)
|
||
per-txn ≤ X sats
|
||
period ≤ N × X sats (N ≈ 5, so a handful of small sends per window)
|
||
|
||
Rule #1 (user-auth via TOTP)
|
||
per-txn ≤ Y sats (Y ≫ X, but still a small fraction of custody)
|
||
period ≤ M × Y sats
|
||
|
||
(implicit) Rule #3: anything else is rejected — on-device keypad/MicroSD
|
||
required to authorise.
|
||
```
|
||
|
||
### Why two tiers
|
||
|
||
- **Single-tier "always require TOTP"** makes the HSM useless for
|
||
automation: every BTCPay callback, every n8n webhook, every monitoring
|
||
script wakes a human.
|
||
- **Single-tier "always auto-sign"** is indistinguishable from a hot
|
||
wallet with extra steps.
|
||
- Two tiers let routine small sends go through un-touched while keeping
|
||
human-in-the-loop pressure on anything larger.
|
||
|
||
### Picking X (auto-approve cap)
|
||
|
||
Rule of thumb: the **most expensive single automated action** you're
|
||
comfortable with happening unattended. Examples:
|
||
|
||
| Automation | Sensible X (sats) |
|
||
|----------------------------------------|-------------------|
|
||
| Lightning channel rebalance | 50,000 – 200,000 |
|
||
| BTCPay invoice settlement | 10,000 – 50,000 |
|
||
| Routine small withdrawals (newsletter) | 5,000 – 20,000 |
|
||
| Dev test sends | 1,000 – 5,000 |
|
||
|
||
Pick **the smallest X** that covers your routine traffic. Anything
|
||
larger is a Rule #1 event — worth waking the TOTP holder for.
|
||
|
||
### Picking N (period multiplier)
|
||
|
||
- Too low (N=1): first sign empties the period budget, second sign fails
|
||
even though it's within per-txn cap.
|
||
- Too high (N≥10): an attacker who steals the VM can drain the budget
|
||
faster than a human will notice.
|
||
- Reasonable: N = 3 to 5. Combined with a 24 h velocity window, this
|
||
caps the *catastrophic* loss from a VM compromise at ~5×X per day.
|
||
|
||
### Picking Y (user-auth cap)
|
||
|
||
A hard ceiling on what TOTP alone can authorise. For custody above Y,
|
||
the only path is keypad + MicroSD — physical presence at the device.
|
||
|
||
Common shapes:
|
||
|
||
- **Operational float** wallet: Y = 10×X. Big enough to cover a busy
|
||
day; small enough that losing the TOTP secret isn't an existential
|
||
problem.
|
||
- **Hot reserve**: Y = 0 (no Rule #1). Forces all non-routine sends
|
||
through physical presence.
|
||
|
||
## Velocity period
|
||
|
||
The Coldcard resets counters after `velocity_minutes` of wall-clock.
|
||
1440 (24 h) is the standard choice. Shorter windows (60–240 min) make
|
||
the HSM safer during active use but noisier during quiet periods
|
||
(routine sends hit the reset mid-day). Longer windows (> 24 h) make a
|
||
compromise more painful to recover from (stolen budget persists).
|
||
|
||
## Message signing
|
||
|
||
Useful for:
|
||
|
||
- proving control of an address to auditors / regulators
|
||
- proof-of-reserves (signed message with timestamp)
|
||
- sanity-checking Coldcard reachability (the harness's
|
||
`message_signing` test)
|
||
|
||
Usually safe to enable on any path — message signing doesn't spend
|
||
funds. If you need to restrict it, the policy supports a BIP32 path
|
||
regex.
|
||
|
||
## Boot-to-HSM
|
||
|
||
**Always enable** for production. Without it, anyone with physical
|
||
access to the device (and the PIN) can navigate out of HSM mode by
|
||
tapping the menu.
|
||
|
||
**Always set a 6-digit escape code** — writing down a "cannot escape HSM"
|
||
device is terrifying and operationally wrong (you will need to enrol new
|
||
users, update policy, etc.). The escape code must be typed within 60
|
||
seconds of Coldcard boot, which is a reasonable safety margin.
|
||
|
||
**Record the escape code in a separate place from the seed backup.** A
|
||
password manager on the TOTP holder's phone is fine; not the same piece
|
||
of paper as the seed words.
|
||
|
||
## Logging
|
||
|
||
- **MicroSD logging ON** — on-device audit trail that survives VM
|
||
compromise. Keeps a tamper-evident record even if the VM is tampered
|
||
with. Costs: you must physically eject the MicroSD to review it.
|
||
- **Fail-if-cant-log OFF** — otherwise a MicroSD hiccup halts signing.
|
||
Default is fine.
|
||
|
||
## Storage locker read count
|
||
|
||
CKBunker encrypts its local state with a key held in the Coldcard's
|
||
Storage Locker. The Locker has a **read counter** — typical policies
|
||
allow 13 reads before the Locker self-wipes. This means:
|
||
|
||
- CKBunker can restart up to 13 times before you need to re-install the
|
||
policy.
|
||
- Heavy debugging (restarting CKBunker to try things) burns reads fast.
|
||
- After policy reinstall, the counter resets.
|
||
|
||
Monitor restart frequency. If you find yourself restarting CKBunker
|
||
often, investigate *why* rather than spending Locker reads.
|