Open-source 2-of-3 policy-enforced threshold HSM: auto-signs cold→hot treasury refills under on-device Coldcard policy, no human in the loop. Includes the full operator manual + quick-start, the reference coordinator/signing code, and a signer-host bootstrap. No keys, seeds, or secrets — placeholders only. Live signet demo: https://multisighsm.mineracks.com Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
386 lines
25 KiB
Markdown
386 lines
25 KiB
Markdown
# Operator's Manual — mineracks distributed policy-enforced multisig HSM
|
||
|
||
**A 2-of-3 threshold HSM that auto-signs cold→hot refills under on-device policy, with no human in the loop.**
|
||
|
||
This manual is everything you need to set the system up and **operate it safely**. Read §1–§3 before you
|
||
touch anything — the safety model is not optional context, it is the reason the design is shaped the way it
|
||
is, and operating it without understanding it will get funds lost.
|
||
|
||
| | |
|
||
|---|---|
|
||
| **Status** | A **live reference deployment** runs at `multisighsm.mineracks.com` on real Bitcoin signet — every spend is a genuine on-chain 2-of-3 co-signature enforced by on-device policy. This manual documents both the **reference configuration** (**[REF]**) and the **production hardware procedure** (**[PROD]**). |
|
||
| **Audience** | The operator running the treasury / refill tier (you), and a reviewer evaluating it. |
|
||
| **Companion docs** | Design & market rationale: [`README.md`](./README.md) · live demo: `multisighsm.mineracks.com` |
|
||
| **Scope** | The **cold/warm tier and the cold→hot refill pipe** — low-throughput, high-stakes. **NOT** your high-TPS hot-wallet signer (that's MPC's job — see §1.2). |
|
||
|
||
---
|
||
|
||
## 1. Read this first — the security model
|
||
|
||
### 1.1 What the system is
|
||
|
||
Three independent **Coldcard** hardware signers, each in **HSM mode** under its own spending policy, each on a
|
||
**physically separate host** (ideally one genuinely offsite). A **keyless coordinator** watches your hot
|
||
wallet, builds a refill transaction when it runs low, fans the unsigned PSBT to **any two** of the three
|
||
signers, they **auto-sign under their on-device policy with no human**, and the coordinator combines,
|
||
finalizes and broadcasts. Any 2 of 3 can move funds; lose any one signer and you keep operating; lose two and
|
||
funds are **frozen safe**.
|
||
|
||
### 1.2 What it is NOT
|
||
|
||
It is **not a hot-wallet signing engine.** Coldcard signing is seconds-per-PSBT and is not built for the
|
||
hundreds–thousands of signatures/hour a busy exchange hot wallet does, and it carries no FIPS/PKCS#11
|
||
certification an auditor or insurer will expect for the primary hot engine. **Keep using an MPC platform for
|
||
the hot wallet.** This system secures the **95% of reserves behind it** and the **refill pipe** between cold
|
||
and hot — where automation with hardware keys you physically hold beats both a manual 3am ceremony and a
|
||
custodian.
|
||
|
||
### 1.3 The three enforcement layers (and which one you actually trust)
|
||
|
||
| Layer | Where it runs | Enforces | Trust property |
|
||
|---|---|---|---|
|
||
| **On-device policy** | Each Coldcard secure element | per-txn cap · velocity · **address whitelist** · message-signing paths | **Tamper-proof. This is the safety floor.** A compromised host cannot lift it. |
|
||
| **Coordinator global velocity cap** | The keyless coordinator (software) | the *authoritative* total-per-period across all signers | **Operational, not safety.** Precise day-to-day limit; bypassable if the coordinator is compromised — but bounded by the layer above. |
|
||
| **Quorum (2-of-3)** | The protocol | no single signer can move funds; no single signer outage freezes funds | Structural. |
|
||
|
||
> **The golden rule: the coordinator may *limit*, but only the hardware may *bound*.** Size the on-device
|
||
> limits as your real safety envelope; treat the coordinator cap as a tighter operational convenience.
|
||
|
||
### 1.4 What a compromise can and cannot do
|
||
|
||
**Coordinator fully compromised (worst realistic software breach):**
|
||
- It **holds no keys** → cannot sign or forge. Every spend still needs **two Coldcards** to each pass their
|
||
own on-device whitelist + cap + velocity.
|
||
- It **cannot redirect funds off the whitelist** → at worst it prematurely refills *your own* hot-wallet
|
||
addresses, never an attacker's address.
|
||
- It **cannot exceed the devices' velocity ceilings.** That ceiling is the true catastrophic bound.
|
||
- **Blast radius = 1.5 × the per-device velocity ceiling `V`** (with any-2-of-3): every spend burns two
|
||
signatures for one unit of value, so 3 devices × `V` ÷ 2 sigs = `1.5V` extractable before the hardware
|
||
freezes it. **Size accordingly — see §6.3.**
|
||
- *Residual:* a compromised coordinator **plus** a compromised hot wallet could drain up to `1.5V` into the
|
||
(whitelisted) hot wallet and out. That's why `V` lives on the secure element, not in software.
|
||
|
||
**One or more signer hosts compromised:**
|
||
- Owning **one** signer host moves nothing (need two; the device still enforces its own policy and holds the
|
||
key in its secure element — the host can't extract it).
|
||
- Owning **two** signer hosts: an attacker can produce two signatures, but each device **still enforces its
|
||
policy** — so spends are still bounded by per-txn cap + velocity and confined to the whitelist. To steal,
|
||
an attacker would need to defeat **two independent secure elements' policies**, not two Linux hosts.
|
||
|
||
**Coordinator offline:** **fail-safe for safety.** No coordinator → no PSBT is built → no spend → the cap
|
||
trivially holds. You lose the ability to *refill* (a liveness gap, §1.5), never control of funds.
|
||
|
||
### 1.5 No single point of failure — and the one you must engineer around
|
||
|
||
- **Keys:** 2-of-3 across independent failure domains. Survives losing any one signer.
|
||
- **Global velocity counter:** **derived from the blockchain**, not a single-host file (see §6.2). Any
|
||
coordinator replica on any host recomputes the same number from chain history → no single ledger to lose or
|
||
tamper.
|
||
- **Coordinator liveness:** a single coordinator is a *liveness* SPOF (if it's down you can't refill). Run it
|
||
**replicated across the same independent hosts as the signers** so any replica can drive a refill. Because
|
||
the counter is chain-derived and the cap is bounded by hardware, replicas need no shared state and a rogue
|
||
replica can't exceed the hardware bound.
|
||
|
||
---
|
||
|
||
## 2. Architecture & topology
|
||
|
||
```
|
||
hot wallet (MPC / your existing engine) ── monitored ──┐
|
||
▲ │
|
||
│ signed refill broadcast │ "balance below floor"
|
||
│ ▼
|
||
┌──────────────────┴───────────────────────────────────────────────┐
|
||
│ KEYLESS COORDINATOR (replicated; holds NO keys) │
|
||
│ • watches hot-wallet floor • builds PSBT from watch-only wallet│
|
||
│ • global velocity cap (chain-derived) • fans to any 2 of 3 │
|
||
│ • combines → finalizes → broadcasts │
|
||
└───────┬───────────────────┬───────────────────┬──────────────────-┘
|
||
│ tailnet │ tailnet │ tailnet
|
||
┌───────▼──────┐ ┌───────▼──────┐ ┌────────▼─────┐
|
||
│ signer host A│ │ signer host B│ │ signer host C│ ← independent failure domains
|
||
│ Coldcard + │ │ Coldcard + │ │ Coldcard + │ (power / switch / site)
|
||
│ signer-agent │ │ signer-agent │ │ signer-agent │ one ideally OFFSITE
|
||
│ HSM policy A │ │ HSM policy B │ │ HSM policy C │
|
||
└──────────────┘ └──────────────┘ └──────────────┘
|
||
keys live ONLY on the secure elements; agents hold none
|
||
│
|
||
watch-only 2-of-3 descriptor wallet
|
||
on a Bitcoin full node
|
||
```
|
||
|
||
**Components:**
|
||
- **Signer host (×3)** — a small machine (NUC/Pi/VM) with a USB-attached Coldcard, running a **signer agent**:
|
||
a thin authenticated tailnet service wrapping `ckcc-protocol`. Receives `{psbt, wallet_id}`, ensures its
|
||
device is HSM-started with the 2-of-3 registered, signs, returns `partial_psbt` or `denied(reason)`.
|
||
**Holds no keys.** **[REF]** the reference deployment runs all three as segregated `--mk5 --hsm` Coldcard
|
||
signers, each on its own unix socket.
|
||
- **Coordinator** — builds PSBTs, enforces the global cap, fans out 2-of-3, combines/broadcasts, exposes the
|
||
control surface. **[REF]** `orchestrator.py` (`multisig-orchestrator`, `:8099`).
|
||
- **Watch-only wallet** — a `bitcoind` descriptor wallet tracking the 2-of-3 descriptor. **Reuse, don't
|
||
build.** **[REF]** a watch-only wallet on a signet node.
|
||
- **Bitcoin node** — provides the watch-only wallet, builds PSBTs, broadcasts, and is the source of truth for
|
||
the velocity counter. **[REF]** a signet node (RPC over the tailnet).
|
||
|
||
---
|
||
|
||
## 3. Failure-domain placement (the make-or-break)
|
||
|
||
This is the single most important deployment decision, and the reasoning is concrete: nominally independent
|
||
hosts can still fail *together* — a shared power feed or UPS, a common chassis or drive batch, the same
|
||
hypervisor or network switch, or correlated hardware faults. **If two of three keys sit in the same failure
|
||
domain, a single event can take both down and freeze the treasury.** Therefore:
|
||
|
||
- The three signers **MUST** sit in **independent failure domains** — different physical hosts, ideally
|
||
different power circuits / UPS / network switches.
|
||
- **At least one signer should be genuinely offsite** (a small physical box over Tailscale). A cloud VPS
|
||
cannot host a USB Coldcard, but a Pi/NUC at a second location can be signer #3.
|
||
- Spread the coordinator replicas across those same domains.
|
||
- Do **not** co-locate two signers on hosts that share a single point of failure (same PSU, same switch, same
|
||
rack PDU, same hypervisor). Quorum HA is worthless if one event takes out two keys.
|
||
|
||
> **[REF] note:** the reference deployment runs all three signers on one host for convenience — that has **no**
|
||
> failure-domain independence and is for functional validation. Production uses three independent hosts (above);
|
||
> never hold mainnet value on a single host that runs more than one signer.
|
||
|
||
---
|
||
|
||
## 4. Prerequisites & bill of materials
|
||
|
||
**[PROD] hardware:**
|
||
- **3× Coldcard Mk5** (or Q) — the current device; dual secure elements (ATECC608 + DS28C36B). HSM mode +
|
||
multisig (P2SH/P2WSH) co-signing are supported in firmware.
|
||
- **3× signer hosts** in independent failure domains (one offsite), each with a free USB port.
|
||
- **Steel backups** for 3 seeds + a durable record of the wallet descriptor.
|
||
|
||
**Software / services:**
|
||
- A **Bitcoin full node** (watch-only-capable; descriptor wallets). Mainnet for production; signet for the lab.
|
||
- **Tailscale** on every signer host + coordinator (signer agents are RPC *clients* — they don't bind the
|
||
tailnet IP, so the bind-race gotcha doesn't apply).
|
||
- `ckcc-protocol` (the `ckcc` CLI / Python lib) on each signer host.
|
||
- The coordinator + signer-agent software (`orchestrator.py` is the reference coordinator).
|
||
- (Optional) **CKBunker** as a human break-glass UI, kept **off** the automated critical path.
|
||
|
||
**Skills:** comfortable with bitcoind RPC, descriptors/PSBT, Coldcard HSM mode, and Linux service ops.
|
||
|
||
---
|
||
|
||
## 5. Initial setup
|
||
|
||
> ⚠️ **Do a complete dry-run on signet or testnet first** (the lab does exactly this). Only move to mainnet
|
||
> once you have rehearsed a refill, a failover, a policy change, and a full restore from backup.
|
||
|
||
### Step 1 — Generate three independent seeds
|
||
Generate a **distinct** seed on **each** Coldcard (never clone one seed to three devices — that defeats the
|
||
whole model). Record each 24-word seed to steel and store the three **geographically separated**. Note each
|
||
device's master fingerprint + the BIP-48 account xpub (`m/48'/0'/0'/2'` mainnet; `m/48'/1'/0'/2'` signet).
|
||
|
||
### Step 2 — Build the 2-of-3 descriptor + watch-only wallet
|
||
Construct `wsh(sortedmulti(2, key1, key2, key3))` from the three `[fingerprint/48h/0h/0h/2h]xpub` keys
|
||
(receive `/0/*` and change `/1/*` branches). Create a **watch-only** descriptor wallet on the node and import
|
||
both branches (`importdescriptors`, private keys disabled, internal=true for the change branch). Record the
|
||
descriptor durably — see §9, losing it makes recovery painful even with all three seeds.
|
||
|
||
### Step 3 — Register the 2-of-3 wallet on EACH Coldcard
|
||
Each device must have the multisig wallet **registered** so it recognises change back to the wallet as
|
||
internal (and doesn't mistake it for an external send that the whitelist/velocity would block). Export the
|
||
Coldcard multisig config and `ckcc upload -m <wallet.txt>` to each device (confirm on-device).
|
||
|
||
### Step 4 — Author the HSM policy (per device)
|
||
Policy is **JSON, versioned in this repo**, deployed to each device. Minimum viable policy (see §6 for the
|
||
full reference and the diverse-policy option):
|
||
|
||
```json
|
||
{
|
||
"must_log": true,
|
||
"period": 60,
|
||
"msg_paths": ["any"],
|
||
"rules": [
|
||
{ "max_amount": 8000,
|
||
"per_period": 33333,
|
||
"wallet": "ckms23",
|
||
"whitelist": ["<hot-wallet deposit address 1>", "<...>"] }
|
||
]
|
||
}
|
||
```
|
||
(`per_period` here is the **per-device** ceiling `V`; see §6.3 for why `V = ⅔ × global cap`.)
|
||
|
||
### Step 4b — Enrol the TOTP "owner" (only if you use the surge tier)
|
||
If your policy has a TOTP-gated **surge tier** (§6.2b — a rule with `users:["owner"], min_users:1`), enrol the
|
||
`owner` TOTP user on **each** Coldcard **before** loading the policy. All three signers must hold the **same**
|
||
secret (one authenticator code has to work for whichever two devices sign), so the device-picks-its-own-QR
|
||
path does **not** apply here — a shared secret has to be generated once and loaded onto all three.
|
||
|
||
> 🔑 **Who generates it matters (production vs demo).** **Production:** *the owner* generates the secret
|
||
> (in their authenticator app, or offline) and loads it onto each Coldcard **directly over USB during setup** —
|
||
> **never through the coordinator**. Then a fully-compromised coordinator can't mint surge codes; at spend time
|
||
> it only **relays** the owner's live 6-digit code (it never needs the secret). **Demo only:** the coordinator
|
||
> generates the secret + shows the enrolment QR for convenience — acceptable on signet with no real funds, but
|
||
> it raises a coordinator-compromise blast radius from tier-1 up to the surge ceiling, so don't do it in prod.
|
||
|
||
Mechanically: `create_user("owner", USER_AUTH_TOTP, <shared-secret>)` on each device (standard RFC-6238, so any
|
||
authenticator app works). The secret must persist so re-arming a rebooted signer re-enrols the same value.
|
||
|
||
### Step 5 — Load policy + start HSM mode (MIND THE ORDERING)
|
||
> 🪤 **Sharp edge (ckbunker issue #12):** loading an HSM policy can *delete a registered multisig wallet* on
|
||
> the Coldcard. **Order: register the multisig wallet (Step 3) → enrol the TOTP user (Step 4b, if any) → load
|
||
> the policy → verify the wallet still exists on the device.** Re-check `hsm_status` and the registered-wallet
|
||
> count after every policy load.
|
||
|
||
Load the policy and enter HSM mode on each device (the on-device approval is two-step: confirm, then a random
|
||
digit to save). HSM mode is a **one-way trip** until reboot — design for it.
|
||
|
||
### Step 6 — Configure the coordinator
|
||
- **Global velocity cap `G`** — the authoritative total-per-period (chain-derived; §6.2).
|
||
- **Per-device ceiling `V = ⅔ × G`** in each device policy (§6.3), and enable **round-robin** of the signer
|
||
pair so rotation stays even.
|
||
- **Whitelist** = your hot-wallet deposit addresses (anonymous/untrusted callers can never add to it).
|
||
- **Refill trigger** — the hot-wallet floor that starts a refill, and the refill amount.
|
||
- The coordinator's HMAC session secret lives **only on the host** (`chmod 600`, **never in git**).
|
||
|
||
### Step 7 — Verify before funding
|
||
Quorum check: **≥2 of 3** signers online **and** HSM-active **and** wallet-registered. Then, on signet/testnet:
|
||
run a within-policy refill (expect sign+broadcast), an over-cap spend (expect on-device refusal), an off-list
|
||
spend (expect refusal), and a velocity-exceeding burst (expect the coordinator to block at `G`). Confirm the
|
||
chain-derived counter increments on broadcast and the on-device refusal reasons read correctly.
|
||
|
||
---
|
||
|
||
## 6. Policy reference
|
||
|
||
### 6.1 The four on-device gates (tamper-proof, per device)
|
||
| Gate | Policy field | What it does | Notes |
|
||
|---|---|---|---|
|
||
| Per-txn cap | `max_amount` | refuses any single spend over the cap | sats |
|
||
| Velocity | `per_period` + top-level `period` | refuses once this device's signed total in the window exceeds it | **per-device**, counted locally — see §6.3 |
|
||
| Whitelist | `whitelist: [addr…]` | external outputs must be on this list; change back to `wallet` is exempt | the strongest control — confines *where* funds can go |
|
||
| Message paths | `msg_paths` | which derivation paths may sign messages (`["any"]` or specific) | proof-of-control without moving funds |
|
||
| (binding) | `wallet` | names the registered multisig so change is recognised as internal | required, or change trips the whitelist |
|
||
| (audit) | `must_log: true` | device writes a log entry per decision | feed into monitoring (§8) |
|
||
|
||
### 6.2 The coordinator global velocity cap (the authoritative limit)
|
||
On-device velocity counters **drift** under a rotating "any 2 of 3" (each device only counts what *it* signed)
|
||
and a device decrements its counter the moment it signs **even if the PSBT is later dropped and never
|
||
broadcast**. So no single device's counter reflects true global outflow. The coordinator therefore enforces
|
||
the real cap, and:
|
||
- it counts **only real broadcasts** — so dropped/refused PSBTs never burn budget;
|
||
- it **includes mempool (0-conf) sends** — they're counted the instant they broadcast, so a burst of spends
|
||
inside one ~10-min block interval can't slip past (abandoned/conflicted txns are excluded so a dropped tx
|
||
doesn't permanently burn budget);
|
||
- it is **chain-derived** — computed from the watch-only wallet's on-chain `send` total over the window
|
||
(`listtransactions`), not a single-host file, so any replica recomputes it and there is no ledger SPOF.
|
||
|
||
### 6.2b Surge tier — a TOTP-gated higher tier (human in the loop)
|
||
For occasional large moves, add a **second, ordered rule** that permits a higher `max_amount` + `per_period`
|
||
(and/or extra whitelist) but requires the owner's **TOTP** (`users:["owner"], min_users:1`). Coldcard
|
||
evaluates rules **first-match**, so routine spends match the automated tier-1 and a larger one falls through to
|
||
tier-2 and is refused unless a valid code is presented. **Secure model:** the owner reads the 6-digit code from
|
||
a normal authenticator app (standard RFC-6238); the **keyless coordinator only relays it** (`user_auth`) — the
|
||
secret lives on the devices + the owner's app, never on the coordinator, so a compromised coordinator can't
|
||
mint codes. The coordinator's global cap gets a matching surge ceiling. **Sizing:** the surge ceilings raise
|
||
the §1.4 blast-radius bound *when a code is present* — set them to the most you'd ever authorise in one
|
||
human-approved move, and treat entering a code as approving a spend up to the surge cap. *(Live for signed-in
|
||
users on the reference deployment; exercised end-to-end on real signet.)*
|
||
|
||
### 6.3 Sizing — the 1.5× rule (do not skip this)
|
||
A compromised coordinator that ignores `G` is bounded by the hardware ceilings. With any-2-of-3, worst-case
|
||
extractable = **`1.5 × V`** (each spend burns two signatures for one unit of value; `3V ÷ 2`). Therefore:
|
||
|
||
- **Set each device's `per_period` `V = ⅔ × G`.** Then worst-case `1.5 × ⅔G = G` — the hardware bound equals
|
||
your intended global velocity.
|
||
- **Round-robin the signer pair** so rotation is even, otherwise busy devices hit `⅔G` early and honest
|
||
refills stall (raising `V` for liveness margin pushes the worst case back above `G`).
|
||
- **Topology dial:**
|
||
- *any-2-of-3 auto-signers* → worst case **1.5×**, but "lose any one, keep running" unattended.
|
||
- *2 fixed auto-signers + 1 human break-glass (offline)* → every routine spend needs both online devices, so
|
||
`V ≥ G`, but the offline key can't be used unattended → worst case **1.0×**, at the cost of unattended
|
||
failover. Choose deliberately.
|
||
|
||
---
|
||
|
||
## 7. Day-to-day operations
|
||
|
||
**Automated refill (the normal path, no human):** hot wallet drops below the floor → coordinator builds a PSBT
|
||
to a whitelisted address from the watch-only wallet → checks the global cap → fans to 2 of 3 → devices
|
||
auto-sign under policy → combine → broadcast. Watch it confirm on your explorer.
|
||
|
||
**Change a policy value (cap / velocity / period):** edit the versioned policy, push to each device, **re-arm**
|
||
all signers. Re-arming restarts the HSM session and **resets the per-device velocity counters** — expected.
|
||
Re-verify the registered wallet survived (§5 Step 5 trap).
|
||
|
||
**Add a whitelisted destination:** add the address to the policy `whitelist`, push, re-arm. Only the operator
|
||
can do this; untrusted callers can never extend the whitelist.
|
||
|
||
**Take a signer down for maintenance:** quorum tolerates **one** down — the other two keep signing. Bring it
|
||
back, re-arm it (§ boot-to-signing-ready), confirm quorum is 3 again. **Never** take a second one down while
|
||
one is already offline (that freezes spending until one returns).
|
||
|
||
**Boot-to-signing-ready:** a Coldcard needs PIN + HSM-mode entry after any power loss. Unattended operation
|
||
means the signer agent must restore the device to signing-ready automatically after a reboot, and monitoring
|
||
must confirm it did — a device that silently fails to return erodes quorum. Rehearse: reboot a signer host and
|
||
confirm the agent re-arms it and quorum self-heals.
|
||
|
||
---
|
||
|
||
## 8. Monitoring & alerting (non-negotiable for unattended operation)
|
||
|
||
Wire these into your observability stack (your observability stack (e.g. Loki + Grafana + alerting)):
|
||
- **Quorum health** — are **≥2 of 3** signers online **and** HSM-active **and** wallet-registered? **Alert the
|
||
moment it drops to exactly 2** (one more failure = frozen).
|
||
- **Velocity near limit** — global cap approaching for the period; per-device counters near `V`.
|
||
- **Policy denials** — every on-device refusal (the `must_log` trail) → alert; a spike may signal an attack or
|
||
a misconfiguration.
|
||
- **USB / device health** — VMs surviving a host crash can carry latent USB/udev damage; don't repeat-restart,
|
||
restore from backup.
|
||
- **Refill anomalies** — refills outside expected cadence/size (a compromised coordinator's tell).
|
||
|
||
---
|
||
|
||
## 9. Backup, recovery & DR
|
||
|
||
- **3 seeds**, each to steel, **geographically separated**. 2-of-3 survives losing **one** seed.
|
||
- **The descriptor is load-bearing.** Losing it makes recovery painful **even with all three seeds** — store
|
||
the wallet descriptor offsite, independently of the seeds (in your vault).
|
||
- **Rehearse recovery** before funding mainnet: reconstruct the watch-only wallet from the descriptor on a
|
||
fresh node, recover a signer from seed, re-register the multisig, reload policy, sign a test spend.
|
||
- **Coordinator state is disposable** — it's chain-derived; a replacement coordinator recomputes the velocity
|
||
counter from chain history with zero handed-over state.
|
||
|
||
---
|
||
|
||
## 10. Incident response & break-glass
|
||
|
||
| Situation | Response |
|
||
|---|---|
|
||
| One signer down | Operate on the remaining two; restore + re-arm the third; do not drop a second. |
|
||
| Two signers down | Spending is **frozen (safe)**. Restore one to resume. This is the design working. |
|
||
| Coordinator compromised suspected | Funds are bounded by §1.4 (whitelist + `1.5V`). Rotate the coordinator host/secret; the hardware caps already contain the blast radius. Review the policy-denial + refill logs. |
|
||
| Large / exception spend needed | Use the **human break-glass** path (3rd human-held key / CKBunker), outside the automated policy. |
|
||
| Suspected key compromise (one device) | One key alone moves nothing. Rotate to a fresh 2-of-3 (new seeds + descriptor), sweep funds under the old policy to the new wallet. |
|
||
|
||
---
|
||
|
||
## 11. Known sharp edges (read before production)
|
||
|
||
- **USB passthrough pins a VM to its host** — a VM with a physical USB Coldcard **cannot live-migrate**. These
|
||
signer VMs deliberately break the "freely migratable" model; the multisig *is* the HA.
|
||
- **HSM + multisig together is advanced / lightly-charted** — soak on signet for a long time before mainnet.
|
||
- **CKBunker is niche** (v0.9.1, "at your own risk") — keep it as the human break-glass surface only; the
|
||
automated path is the signer-agent over `ckcc-protocol`, not CKBunker.
|
||
- **The #12 ordering trap** (§5 Step 5) — register wallet → load policy → verify wallet survived.
|
||
- **Velocity counters don't compose across devices** — that's the whole reason the authoritative cap is the
|
||
coordinator's chain-derived one; per-device velocity is the hardware bound, not the operational truth.
|
||
- **Post-host-crash latent damage** — a signer VM that survived a host hard-crash can carry subclinical FS/USB
|
||
damage; restore from backup rather than repeat-restarting.
|
||
|
||
---
|
||
|
||
## 12. Regulatory note
|
||
|
||
Running this for **your own** funds (treasury / your own refill tier) is **not** custody-of-others. Offering it
|
||
as **custody-as-a-service for third parties** is **regulated activity in AU (AUSTRAC / financial services)** —
|
||
get legal advice before productising. Same flag as a regulated swap service.
|
||
|
||
---
|
||
|