mineracks-ckbunker-hsm-sign/docs/POLICY_RECOMMENDATIONS.md

# Policy design recommendations

Not exhaustive — consult the upstream
[Coldcard HSM docs](https://coldcardwallet.com/docs/ckbunker-hsm) for the
full grammar. This file captures the two-tier pattern the harness is
designed around and why it's a reasonable starting point for a signing
HSM that backs automation.

## The two-tier pattern

```
Rule #2 (auto-sign, no user auth)
  per-txn  ≤ X sats
  period   ≤ N × X sats    (N ≈ 5, so a handful of small sends per window)

Rule #1 (user-auth via TOTP)
  per-txn  ≤ Y sats        (Y ≫ X, but still a small fraction of custody)
  period   ≤ M × Y sats

(implicit) Rule #3: anything else is rejected — on-device keypad/MicroSD
required to authorise.
```

### Why two tiers

- **Single-tier "always require TOTP"** makes the HSM useless for
  automation: every BTCPay callback, every n8n webhook, every monitoring
  script wakes a human.
- **Single-tier "always auto-sign"** is indistinguishable from a hot
  wallet with extra steps.
- Two tiers let routine small sends go through un-touched while keeping
  human-in-the-loop pressure on anything larger.

### Picking X (auto-approve cap)

Rule of thumb: the **most expensive single automated action** you're
comfortable with happening unattended. Examples:

| Automation                             | Sensible X (sats) |
|----------------------------------------|-------------------|
| Lightning channel rebalance            | 50,000 – 200,000  |
| BTCPay invoice settlement              | 10,000 – 50,000   |
| Routine small withdrawals (newsletter) | 5,000 – 20,000    |
| Dev test sends                         | 1,000 – 5,000     |

Pick **the smallest X** that covers your routine traffic. Anything
larger is a Rule #1 event — worth waking the TOTP holder for.

### Picking N (period multiplier)

- Too low (N=1): first sign empties the period budget, second sign fails
  even though it's within per-txn cap.
- Too high (N≥10): an attacker who steals the VM can drain the budget
  faster than a human will notice.
- Reasonable: N = 3 to 5. Combined with a 24 h velocity window, this
  caps the *catastrophic* loss from a VM compromise at ~5×X per day.

### Picking Y (user-auth cap)

A hard ceiling on what TOTP alone can authorise. For custody above Y,
the only path is keypad + MicroSD — physical presence at the device.

Common shapes:

- **Operational float** wallet: Y = 10×X. Big enough to cover a busy
  day; small enough that losing the TOTP secret isn't an existential
  problem.
- **Hot reserve**: Y = 0 (no Rule #1). Forces all non-routine sends
  through physical presence.

## Velocity period

The Coldcard resets counters after `velocity_minutes` of wall-clock.
1440 (24 h) is the standard choice. Shorter windows (60–240 min) make
the HSM safer during active use but noisier during quiet periods
(routine sends hit the reset mid-day). Longer windows (> 24 h) make a
compromise more painful to recover from (stolen budget persists).

## Message signing

Useful for:

- proving control of an address to auditors / regulators
- proof-of-reserves (signed message with timestamp)
- sanity-checking Coldcard reachability (the harness's
  `message_signing` test)

Usually safe to enable on any path — message signing doesn't spend
funds. If you need to restrict it, the policy supports a BIP32 path
regex.

## Boot-to-HSM

**Always enable** for production. Without it, anyone with physical
access to the device (and the PIN) can navigate out of HSM mode by
tapping the menu.

**Always set a 6-digit escape code** — writing down a "cannot escape HSM"
device is terrifying and operationally wrong (you will need to enrol new
users, update policy, etc.). The escape code must be typed within 60
seconds of Coldcard boot, which is a reasonable safety margin.

**Record the escape code in a separate place from the seed backup.** A
password manager on the TOTP holder's phone is fine; not the same piece
of paper as the seed words.

## Logging

- **MicroSD logging ON** — on-device audit trail that survives VM
  compromise. Keeps a tamper-evident record even if the VM is tampered
  with. Costs: you must physically eject the MicroSD to review it.
- **Fail-if-cant-log OFF** — otherwise a MicroSD hiccup halts signing.
  Default is fine.

## Storage locker read count

CKBunker encrypts its local state with a key held in the Coldcard's
Storage Locker. The Locker has a **read counter** — typical policies
allow 13 reads before the Locker self-wipes. This means:

- CKBunker can restart up to 13 times before you need to re-install the
  policy.
- Heavy debugging (restarting CKBunker to try things) burns reads fast.
- After policy reinstall, the counter resets.

Monitor restart frequency. If you find yourself restarting CKBunker
often, investigate *why* rather than spending Locker reads.