mineracks-ckbunker-hsm-sign/docs/POLICY_RECOMMENDATIONS.md
mineracks 9d380f5013 Initial import: CKBunker HSM validation harness
WebSocket client + CLI harness + pytest suite that exercises each axis of
a CKBunker + Coldcard Mk4 policy and asserts the expected outcomes, including
the critical negative test that a large PSBT without TOTP is rejected with
a specific 'rule #1: need user(s) confirmation' reason.

Configuration via .env / YAML / CLI flags, two pre-crafted test PSBTs as
fixtures (generation guide in fixtures/README.md), dashboard counter
scraper as sanity check, design rationale in docs/.
2026-04-14 10:50:04 +10:00

128 lines
4.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Policy design recommendations
Not exhaustive — consult the upstream
[Coldcard HSM docs](https://coldcardwallet.com/docs/ckbunker-hsm) for the
full grammar. This file captures the two-tier pattern the harness is
designed around and why it's a reasonable starting point for a signing
HSM that backs automation.
## The two-tier pattern
```
Rule #2 (auto-sign, no user auth)
per-txn ≤ X sats
period ≤ N × X sats (N ≈ 5, so a handful of small sends per window)
Rule #1 (user-auth via TOTP)
per-txn ≤ Y sats (Y ≫ X, but still a small fraction of custody)
period ≤ M × Y sats
(implicit) Rule #3: anything else is rejected — on-device keypad/MicroSD
required to authorise.
```
### Why two tiers
- **Single-tier "always require TOTP"** makes the HSM useless for
automation: every BTCPay callback, every n8n webhook, every monitoring
script wakes a human.
- **Single-tier "always auto-sign"** is indistinguishable from a hot
wallet with extra steps.
- Two tiers let routine small sends go through un-touched while keeping
human-in-the-loop pressure on anything larger.
### Picking X (auto-approve cap)
Rule of thumb: the **most expensive single automated action** you're
comfortable with happening unattended. Examples:
| Automation | Sensible X (sats) |
|----------------------------------------|-------------------|
| Lightning channel rebalance | 50,000 200,000 |
| BTCPay invoice settlement | 10,000 50,000 |
| Routine small withdrawals (newsletter) | 5,000 20,000 |
| Dev test sends | 1,000 5,000 |
Pick **the smallest X** that covers your routine traffic. Anything
larger is a Rule #1 event — worth waking the TOTP holder for.
### Picking N (period multiplier)
- Too low (N=1): first sign empties the period budget, second sign fails
even though it's within per-txn cap.
- Too high (N≥10): an attacker who steals the VM can drain the budget
faster than a human will notice.
- Reasonable: N = 3 to 5. Combined with a 24 h velocity window, this
caps the *catastrophic* loss from a VM compromise at ~5×X per day.
### Picking Y (user-auth cap)
A hard ceiling on what TOTP alone can authorise. For custody above Y,
the only path is keypad + MicroSD — physical presence at the device.
Common shapes:
- **Operational float** wallet: Y = 10×X. Big enough to cover a busy
day; small enough that losing the TOTP secret isn't an existential
problem.
- **Hot reserve**: Y = 0 (no Rule #1). Forces all non-routine sends
through physical presence.
## Velocity period
The Coldcard resets counters after `velocity_minutes` of wall-clock.
1440 (24 h) is the standard choice. Shorter windows (60240 min) make
the HSM safer during active use but noisier during quiet periods
(routine sends hit the reset mid-day). Longer windows (> 24 h) make a
compromise more painful to recover from (stolen budget persists).
## Message signing
Useful for:
- proving control of an address to auditors / regulators
- proof-of-reserves (signed message with timestamp)
- sanity-checking Coldcard reachability (the harness's
`message_signing` test)
Usually safe to enable on any path — message signing doesn't spend
funds. If you need to restrict it, the policy supports a BIP32 path
regex.
## Boot-to-HSM
**Always enable** for production. Without it, anyone with physical
access to the device (and the PIN) can navigate out of HSM mode by
tapping the menu.
**Always set a 6-digit escape code** — writing down a "cannot escape HSM"
device is terrifying and operationally wrong (you will need to enrol new
users, update policy, etc.). The escape code must be typed within 60
seconds of Coldcard boot, which is a reasonable safety margin.
**Record the escape code in a separate place from the seed backup.** A
password manager on the TOTP holder's phone is fine; not the same piece
of paper as the seed words.
## Logging
- **MicroSD logging ON** — on-device audit trail that survives VM
compromise. Keeps a tamper-evident record even if the VM is tampered
with. Costs: you must physically eject the MicroSD to review it.
- **Fail-if-cant-log OFF** — otherwise a MicroSD hiccup halts signing.
Default is fine.
## Storage locker read count
CKBunker encrypts its local state with a key held in the Coldcard's
Storage Locker. The Locker has a **read counter** — typical policies
allow 13 reads before the Locker self-wipes. This means:
- CKBunker can restart up to 13 times before you need to re-install the
policy.
- Heavy debugging (restarting CKBunker to try things) burns reads fast.
- After policy reinstall, the counter resets.
Monitor restart frequency. If you find yourself restarting CKBunker
often, investigate *why* rather than spending Locker reads.