docs: expand crabbox user guide

This commit is contained in:
Peter Steinberger 2026-05-07 00:47:41 +01:00
parent e82281ff08
commit f031e9d1aa
No known key found for this signature in database
24 changed files with 2968 additions and 139 deletions

View File

@ -76,8 +76,9 @@ Run history and inspection are intentionally handled by the Crabbox CLI and repo
Pick whichever matches your intent:
- **Get the mental model:** [How Crabbox Works](how-it-works.md), [Architecture](architecture.md), [Orchestrator](orchestrator.md).
- **Use the CLI:** [CLI](cli.md), [Commands](commands/README.md), [Features](features/README.md), [Actions hydration](features/actions-hydration.md), [Browser portal](features/portal.md), [Telemetry](features/telemetry.md).
- **Start here:** [Getting started](getting-started.md), [How Crabbox Works](how-it-works.md), [Concepts and glossary](concepts.md).
- **Get the mental model:** [Architecture](architecture.md), [Orchestrator](orchestrator.md).
- **Use the CLI:** [CLI](cli.md), [Commands](commands/README.md), [Features](features/README.md), [Configuration](features/configuration.md), [Actions hydration](features/actions-hydration.md), [Browser portal](features/portal.md), [Telemetry](features/telemetry.md).
- **Pick or add a target:** [Provider reference](providers/README.md), [Providers feature overview](features/providers.md), [Provider authoring](features/provider-authoring.md), [Provider backends](provider-backends.md), [AWS](providers/aws.md), [Hetzner](providers/hetzner.md), [Static SSH](providers/ssh.md), [Blacksmith Testbox](providers/blacksmith-testbox.md), [Daytona](providers/daytona.md), [Islo](providers/islo.md), [Interactive desktop and VNC](features/interactive-desktop-vnc.md).
- **Operate it:** [Operations](operations.md), [Observability](observability.md), [Troubleshooting](troubleshooting.md), [Performance](performance.md).
- **Set it up or audit it:** [Infrastructure](infrastructure.md), [Security](security.md), [Source Map](source-map.md), [MVP Plan](mvp-plan.md).

View File

@ -3,27 +3,63 @@
`crabbox attach` follows recorded events for an active coordinator run.
```sh
crabbox attach run_...
crabbox attach --id run_... --after 42
crabbox attach run_abcdef123456
crabbox attach --id run_abcdef123456 --after 42
crabbox attach run_abcdef123456 --poll 500ms
```
Stdout and stderr preview events are written back to stdout and stderr.
Lifecycle events are printed to stderr with their sequence number, phase,
timestamp, and message. When the run has already finished, `attach` prints any
remaining events and exits.
## Behavior
Flags:
`attach` polls the coordinator for new run events on a fixed interval,
prints them as they arrive, and exits when the run finishes.
- stdout and stderr preview events are written back to stdout and stderr,
preserving the stream split;
- lifecycle events (lease, bootstrap, sync, command-start, finish, release)
are printed to stderr with their sequence number, phase, timestamp, and
message;
- when the run has already finished, attach prints any remaining events
and exits;
- when the run is still active, attach polls until it sees a `finish`
event.
`attach` is not detached command execution. It follows the events the
original CLI is emitting; if that CLI process dies, the run state remains
inspectable through [history](history.md), [events](events.md), and
[logs](logs.md), but `attach` cannot resurrect it.
## Bounded Output
Output events are a bounded preview. The coordinator caps stdout/stderr
capture at 64 KiB per run and records an `output.truncated` marker when the
cap is reached. Use [logs](logs.md) for the larger retained command output
after completion.
## Flags
```text
--id <run-id> run id
--after <seq> resume after this event sequence
--id <run-id> run id (also accepted as a positional argument)
--after <seq> resume after this event sequence number
--poll <duration> polling interval, default 1s
```
`attach` follows events emitted by the original CLI. It is not detached command
execution. If the original CLI process dies, the last recorded phase remains
inspectable through [history](history.md), [events](events.md), and
[logs](logs.md).
## Use Cases
Output events are a bounded preview. Use [logs](logs.md) for the retained
command output after completion.
- watch a long warmup or run from a second terminal without disturbing the
original CLI;
- monitor an agent-launched run while doing something else locally;
- replay events from a known sequence (`--after`) when reconnecting after
a network blip.
## Direct Mode
Direct-provider mode does not record runs centrally, so `attach` has no
event stream to follow. Use shell output from the original CLI instead.
Related docs:
- [logs](logs.md)
- [events](events.md)
- [history](history.md)
- [run](run.md)
- [History and logs](../features/history-logs.md)

View File

@ -9,23 +9,107 @@ crabbox cache warm --id blue-lobster -- pnpm install --frozen-lockfile
crabbox cache purge --id blue-lobster --kind pnpm --force
```
`--id` accepts the stable `cbx_...` ID or an active friendly slug. Cache commands that SSH to the box touch the lease and validate the local repo claim; add `--reclaim` to move an existing claim.
Cache kinds:
## Subcommands
```text
pnpm
npm
docker
git
all
cache stats show usage for each cache kind on the lease
cache warm run a command in the synced workdir to populate caches
cache purge delete one or all cache kinds (requires --force)
```
`cache warm` runs a command in the synced repo workdir for that lease. On boxes prepared by `crabbox actions hydrate`, it uses the hydrated `$GITHUB_WORKSPACE` and sources the workflow env handoff like `crabbox run`.
`--id` accepts the canonical `cbx_...` lease ID or an active friendly
slug. Cache commands SSH to the box, touch the lease, and validate the
local repo claim. Add `--reclaim` to move an existing claim from another
repo.
Repo `cache.pnpm`, `cache.npm`, `cache.docker`, and `cache.git` toggles control which kinds `stats` reports and which kinds `purge --kind all` removes.
## Cache Kinds
```text
pnpm /var/cache/crabbox/pnpm
npm /var/cache/crabbox/npm
docker Docker layer/image cache (host-managed)
git /var/cache/crabbox/git (shared origin objects)
all every kind enabled in repo config
```
Repo `cache.pnpm`, `cache.npm`, `cache.docker`, and `cache.git` toggles
control which kinds `stats` reports and which kinds `purge --kind all`
removes. Disabled kinds are omitted from stats, are not purged by
`--kind all`, and asking to purge a disabled specific kind fails early.
## stats
```sh
crabbox cache stats --id blue-lobster
```
Prints sizes for each enabled cache kind:
```text
pnpm 8.4GiB
npm 1.2GiB
docker 18.7GiB
git 430MiB
```
`--json` returns the same data as a structured object.
## warm
```sh
crabbox cache warm --id blue-lobster -- pnpm install --frozen-lockfile
crabbox cache warm --id blue-lobster -- docker compose pull
```
Runs a command in the synced repo workdir for that lease. On boxes
prepared by `crabbox actions hydrate`, it uses the hydrated
`$GITHUB_WORKSPACE` and sources the workflow env handoff, just like
`crabbox run` does.
Use warm for one-off cache priming when you do not want to record a full
run history entry.
## purge
```sh
crabbox cache purge --id blue-lobster --kind pnpm --force
crabbox cache purge --id blue-lobster --kind all --force
```
Removes the named cache kind from the lease. `--force` is required to
prevent accidental purges. If `cache.maxGB` is set, purge is rarely
needed - the runner trims the oldest entries automatically when caches
exceed the cap.
## Flags
```text
--id <lease-id-or-slug> target lease (required)
--kind pnpm|npm|docker|git|all for purge
--force required for purge
--reclaim move local claim from another repo
--json stats as JSON
```
## When To Use Cache
Caches are speed hints, not source of truth. The synced worktree remains
authoritative.
- Use `cache stats` to confirm a long-lived warm box is gaining benefit
from cached packages.
- Use `cache warm` to prime a fresh lease before handing it to agents that
run many short commands.
- Use `cache purge` when a corrupt cache is poisoning a build (rare;
usually the underlying tool's own cache reset works first).
Disposable leases lose cache state when the VM is deleted; kept leases
can reuse cache state across repeated agent runs. For shared baked
images, see [Prebaked runner images](../features/prebaked-images.md).
Related docs:
- [Performance](../performance.md)
- [Cache controls](../features/cache.md)
- [Performance](../performance.md)
- [run](run.md)
- [actions](actions.md)

View File

@ -1,29 +1,77 @@
# cleanup
`crabbox cleanup` sweeps direct-provider leftovers.
`crabbox cleanup` sweeps direct-provider leftovers based on Crabbox labels.
```sh
crabbox cleanup --dry-run
crabbox cleanup
```
Cleanup refuses to run when a coordinator is configured. Brokered cleanup belongs to the Durable Object alarm.
`crabbox machine cleanup` is preserved as a compatibility alias.
Direct cleanup skips kept machines, deletes expired ready/leased/active machines, and gives running/provisioning machines an extra stale safety window. It relies on provider labels such as `lease`, `slug`, `expires_at`, and `state`.
## Behavior
Static SSH targets are existing hosts, so `provider=ssh` has nothing to sweep.
Cleanup refuses to run when a coordinator is configured. Brokered cleanup
belongs to the Durable Object alarm; sweeping provider resources behind the
coordinator can race live brokered leases.
Flags:
In direct-provider mode, cleanup is intentionally conservative:
- skip machines tagged `keep=true`;
- skip machines in `running` or `provisioning` state until the extra stale
safety window passes (expiry plus 12 hours);
- delete machines that are clearly expired in `ready`, `leased`, or
`active` states;
- delete machines that have been inactive past expiry.
Selection is label-driven. Cleanup uses `lease`, `slug`, `expires_at`,
`last_touched_at`, `state`, and `keep` labels written when the machine was
created. Resources without Crabbox labels are never touched.
Static SSH targets are existing operator-owned hosts, so `provider=ssh`
has nothing to sweep. Cleanup exits early for that provider.
## Output
`--dry-run` lists every decision without taking action:
```text
--provider hetzner|aws
--target linux|macos|windows
--windows-mode normal|wsl2
--static-host <host>
--static-user <user>
--static-port <port>
--static-work-root <path>
--dry-run
hetzner cx53 hz-12345 lease=cbx_abcdef123456 slug=blue-lobster keep=true skip=keep
hetzner cx53 hz-67890 lease=cbx_abcdef234567 slug=amber-crab expires_at=2026-05-01T17:30:00Z delete
```
`crabbox machine cleanup` remains as a compatibility alias.
Without `--dry-run`, the same lines print but each `delete` is followed by
`deleted` after the provider call returns. Failures print the provider
error and continue with the next candidate.
## Flags
```text
--provider hetzner|aws provider to sweep (delegated providers do not need cleanup)
--target linux|macos|windows for AWS, restrict by target
--windows-mode normal|wsl2 when target=windows
--static-host <host> ignored (provider=ssh has nothing to sweep)
--static-user <user> ignored
--static-port <port> ignored
--static-work-root <path> ignored
--dry-run log decisions without making provider calls
```
## When To Run
- after a CLI process crashed mid-warmup and left a server behind;
- when migrating from direct mode to brokered mode (sweep first, then
switch);
- as a safety net after rotating provider credentials;
- never as part of a brokered workflow - the coordinator owns that path.
For brokered fleets, audit `crabbox admin leases --state active` and use
`crabbox admin release` instead.
Related docs:
- [stop](stop.md)
- [admin](admin.md)
- [Lifecycle cleanup](../features/lifecycle-cleanup.md)
- [Orchestrator](../orchestrator.md)
- [Operations](../operations.md)

View File

@ -1,29 +1,101 @@
# doctor
`crabbox doctor` checks local prerequisites and broker/provider access.
`crabbox doctor` runs the local preflight before you commit to a long
workflow. It is fast (under a second on a healthy machine), local-only, and
never calls a billable provider API.
```sh
crabbox doctor
crabbox doctor --provider aws
crabbox doctor --provider hetzner --target linux
crabbox doctor --provider ssh --target windows --windows-mode normal --static-host win-dev.local
```
It checks local tools, user config permissions, per-lease key generation support,
coordinator health when configured, and direct-provider API access otherwise. If
`CRABBOX_SSH_KEY` is explicitly set, it also validates that private key and
matching `.pub` file.
For `provider=ssh`, doctor checks that the static SSH host is reachable and has
the tools required by the selected target mode.
Flags:
## What It Checks
```text
--provider hetzner|aws|ssh
--target linux|macos|windows
--windows-mode normal|wsl2
--static-host <host>
--static-user <user>
--static-port <port>
--static-work-root <path>
config config files load and parse, required keys are present
auth broker token is set, signed token is valid, identity resolves
network coordinator URL reachable, DNS works, SSH transport probes work
ssh SSH key path readable, key permissions sane, ssh-keygen on PATH
tools rsync, git, ssh, ssh-keygen present and executable
```
For `--provider ssh`, doctor also probes the static host: SSH reachability
on the configured port, target-required tools (`bash`, `git`, `rsync`,
`tar` for POSIX targets; OpenSSH, PowerShell, and `tar` for native
Windows), and `static.workRoot` writability.
When `CRABBOX_SSH_KEY` is explicitly set, doctor validates the private key
and the matching `.pub` file. When unset, it skips that check because
per-lease keys do not need a global key.
For the full list of checks, including how each one decides between
`fail`, `skip`, and `ok`, see
[Doctor checks](../features/doctor.md).
## Output
```text
config:
ok user config: ~/.config/crabbox/config.yaml
ok repo config: ./.crabbox.yaml
ok provider: aws
ok target: linux
auth:
ok broker: https://crabbox.openclaw.ai
ok owner: alex@example.com
network:
ok coordinator dns
ok coordinator https
ssh:
ok ssh-keygen present
skip ssh.key unset (per-lease keys will be used)
tools:
ok git
ok rsync
ok ssh
ok ssh-keygen
```
Failures swap the leading `ok` for `fail` and add a remediation hint:
```text
auth:
fail broker token is missing - run `crabbox login`
```
Exit code is `0` on full success, `2` on any failure. Skips never change
the exit code.
## Flags
```text
--provider hetzner|aws|ssh provider to validate
--target linux|macos|windows target OS for ssh provider checks
--windows-mode normal|wsl2 when target=windows
--static-host <host> static SSH host
--static-user <user> static SSH user override
--static-port <port> static SSH port override
--static-work-root <path> static target work root
```
## When To Run
- before the first `crabbox run` on a new machine;
- after rotating the broker token;
- after editing `~/.crabbox.yaml` or repo config;
- in agent boot sequences as a sanity check;
- when triaging "Crabbox is broken" reports - doctor often catches the
problem before the user has to describe it.
Doctor is safe to run from `pre-commit`, scheduled jobs, and CI smoke
because it never provisions, never costs money, and never modifies state.
Related docs:
- [Doctor checks](../features/doctor.md)
- [Configuration](../features/configuration.md)
- [Auth and admin](../features/auth-admin.md)
- [Network and reachability](../features/network.md)
- [Troubleshooting](../troubleshooting.md)

View File

@ -3,34 +3,79 @@
`crabbox events` prints the coordinator event log for a recorded run.
```sh
crabbox events run_...
crabbox events --id run_... --after 42 --limit 100
crabbox events run_... --json
crabbox events run_abcdef123456
crabbox events --id run_abcdef123456 --after 42 --limit 100
crabbox events run_abcdef123456 --json
```
Coordinator-backed `crabbox run` creates a durable `run_...` handle before it
leases or syncs. The CLI appends lifecycle events as the run advances through
leasing, bootstrap, sync, command execution, output streaming, finish, and
release.
## What Events Are Recorded
Human output includes sequence number, event type, phase, stream, timestamp, and
short message or output text. JSON output returns the raw event records.
Output events are a bounded preview: stdout/stderr capture stops after 64 KiB
per run and records an `output.truncated` marker. Use `crabbox logs` for the
larger retained command output.
Coordinator-backed `crabbox run` creates a durable `run_...` handle before
it leases or syncs. The CLI appends ordered events as the run advances:
Flags:
- `lease.acquire.start`, `lease.acquire.success`, `lease.acquire.fail`;
- `bootstrap.wait`, `bootstrap.ready`;
- `sync.start`, `sync.skip`, `sync.success`, `sync.fail`;
- `command.start`, `command.finish`;
- `output.stdout`, `output.stderr`, `output.truncated`;
- `release.start`, `release.success`, `release.fail`.
Each event carries a sequence number, event type, phase, optional stream
(stdout/stderr), timestamp, and short message or output text.
## Output
Human output prints sequence number, event type, phase, stream, timestamp,
and message:
```text
--id <run-id> run id
--after <seq> only show events after this sequence
--limit <n> default 500, maximum 500
--json print JSON
1 lease.acquire.start plan 2026-05-07T07:42:18Z
2 lease.acquire.success plan 2026-05-07T07:42:21Z leased=cbx_abcdef123456 slug=blue-lobster
3 bootstrap.wait provision 2026-05-07T07:42:21Z
4 bootstrap.ready provision 2026-05-07T07:43:05Z
5 sync.start sync 2026-05-07T07:43:05Z
6 sync.success sync 2026-05-07T07:43:08Z files=184 bytes=12.4MiB
7 command.start run 2026-05-07T07:43:08Z pnpm test
8 output.stdout run 2026-05-07T07:43:09Z > vitest run
9 output.stdout run 2026-05-07T07:43:11Z ✓ src/foo.test.ts (8)
...
42 command.finish run 2026-05-07T07:45:32Z exit=0
43 release.success release 2026-05-07T07:45:34Z
```
Related:
`--json` returns the raw event records.
## Bounded Output Capture
Output events are a bounded preview. The coordinator caps stdout/stderr
capture at 64 KiB per run and records an `output.truncated` marker when
the cap is reached. The retained log keeps up to 8 MiB. For the larger
retained command output, use [logs](logs.md).
## Flags
```text
--id <run-id> run id (also accepted as a positional argument)
--after <seq> only show events after this sequence number
--limit <n> maximum number of events, default 500, maximum 500
--json print JSON
```
`--after` is what `attach` uses internally - resume from a known sequence
without replaying the whole event log.
## Use Cases
- post-mortem on a failed run when you need the exact sequence of phases;
- correlating a failed step with the timestamps of surrounding sync or
bootstrap events;
- scripting a status check that filters by event type;
- archiving event records for runs that exceeded the retained log cap.
Related docs:
- [history](history.md)
- [attach](attach.md)
- [logs](logs.md)
- [attach](attach.md)
- [results](results.md)
- [History and logs](../features/history-logs.md)

View File

@ -1,27 +1,106 @@
# init
`crabbox init` onboards a repository for agent-first remote verification.
It writes the minimum config needed for `crabbox run` and sets up the
optional Actions hydration bridge and agent skill.
```sh
crabbox init
crabbox init --force
crabbox init --workflow .github/workflows/crabbox-test.yml
```
It writes:
## Files It Writes
- `.crabbox.yaml`
- `.github/workflows/crabbox.yml`
- `.agents/skills/crabbox/SKILL.md`
```text
.crabbox.yaml repo defaults (provider, profile, class, sync, env)
.github/workflows/crabbox.yml Actions hydration stub (optional)
.agents/skills/crabbox/SKILL.md agent-facing skill instructions
```
The generated workflow is intentionally conservative. It is a starting point for repo-specific hydration, not a full replacement for CI. Edit it to install dependencies, start service containers, and warm caches before agents begin repeated `crabbox run` calls.
By default `init` will not overwrite existing files. `--force` overrides
that and replaces them with freshly generated content.
The workflow contract is the same one used by `crabbox actions hydrate`: it accepts the Crabbox lease ID and dynamic runner label, runs on that self-hosted runner, writes a ready marker under `$HOME/.crabbox/actions`, and keeps the job alive for the remote command loop.
## `.crabbox.yaml`
Flags:
A starting template that includes:
- a default `profile` and `class`;
- `sync.exclude` covering common heavy directories;
- `env.allow` with conservative defaults (`CI`, `NODE_OPTIONS`,
`PROJECT_*`);
- `actions.workflow` pointing at the generated workflow stub;
- `cache` toggles for pnpm, npm, docker, and git.
Open the file after `init` and adjust it to match the repo:
- pick the right `class` for the workload;
- add repo-specific `sync.exclude` patterns;
- expand `env.allow` for project-specific tunables;
- pin `sync.baseRef` to the project's default branch.
See [Configuration](../features/configuration.md) for the full schema.
## `.github/workflows/crabbox.yml`
The generated workflow is intentionally conservative. It is a starting
point for repo-specific hydration, not a full replacement for CI. Edit it
to install dependencies, start service containers, and warm caches before
agents begin repeated `crabbox run` calls.
The workflow contract is the one used by `crabbox actions hydrate`:
- accepts the Crabbox lease ID and dynamic runner label;
- runs on that self-hosted runner registered by Crabbox;
- writes a ready marker under `$HOME/.crabbox/actions`;
- keeps the job alive so the local CLI can run repeated commands in the
hydrated workspace.
If the repo has no Actions hydration plans, you can delete the workflow.
`crabbox run` works fine without it - hydration is optional.
## `.agents/skills/crabbox/SKILL.md`
Repo-local agent instructions. The generated skill explains:
- when to use Crabbox vs running locally;
- how to acquire and reuse leases;
- which commands the agent should prefer (`warmup`, `run --id`, `stop`);
- what env vars the project allows;
- where to find repo-specific test commands.
Edit this file to match how you want agents to operate in the repo. The
skill is read by OpenClaw and similar agent runtimes that auto-discover
`.agents/skills/`.
## Flags
```text
--force overwrite generated files
--config <path> repo config path
--workflow <path> workflow path
--skill <path> agent skill path
--config <path> repo config path (default ./.crabbox.yaml)
--workflow <path> Actions workflow path (default .github/workflows/crabbox.yml)
--skill <path> agent skill path (default .agents/skills/crabbox/SKILL.md)
```
## Idempotency
`init` is safe to re-run. Without `--force`, it leaves existing files
alone and exits with a summary of what would be created. With `--force`,
it replaces files atomically.
## After Init
```sh
crabbox doctor # validate the config
crabbox sync-plan # preview what would sync
crabbox warmup # acquire a lease
crabbox run -- pnpm test # run a command
```
Related docs:
- [Configuration](../features/configuration.md)
- [Repository onboarding](../features/repository-onboarding.md)
- [Actions hydration](../features/actions-hydration.md)
- [Sync](../features/sync.md)
- [Getting started](../getting-started.md)

View File

@ -1,6 +1,8 @@
# inspect
`crabbox inspect` prints detailed lease and provider metadata.
`crabbox inspect` prints detailed lease and provider metadata. Use it for
debugging coordinator state, provider labels, expiry, SSH target details,
and Tailscale metadata.
```sh
crabbox inspect --id blue-lobster
@ -9,23 +11,60 @@ crabbox inspect --id blue-lobster --json
crabbox inspect --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local
```
Use this for debugging coordinator state, provider labels, expiry, and SSH target details.
## Output
Flags:
Human output prints lease state, provider, server type, public IP, work
root, owner, org, idle timeout, TTL, expiry, last touched, the resolved
SSH command for the selected network mode, and any Tailscale metadata the
lease carries.
```text
--id <lease-id-or-slug>
--provider hetzner|aws|ssh
--target linux|macos|windows
--windows-mode normal|wsl2
--static-host <host>
--static-user <user>
--static-port <port>
--static-work-root <path>
--network auto|tailscale|public
--json
lease=cbx_abcdef123456 slug=blue-lobster
state=active provider=aws server=i-0abcdef0123456789 type=c7a.48xlarge
host=203.0.113.10 user=crabbox port=2222 work_root=/work/crabbox
owner=alex@example.com org=openclaw
idle_timeout=30m0s ttl=90m0s
created_at=2026-05-07T07:42:18Z last_touched=2026-05-07T07:55:12Z expires_at=2026-05-07T08:25:12Z
ssh: ssh -i ~/.config/crabbox/testboxes/cbx_abcdef123456/id_ed25519 -p 2222 crabbox@203.0.113.10
tailscale: state=ok ipv4=100.64.0.5 fqdn=blue-lobster.tail-scale.ts.net tags=tag:crabbox
```
JSON output includes non-secret Tailscale metadata when present. Human output
prints both the provider host and the resolved SSH command for the selected
network.
JSON output returns the structured record, including non-secret Tailscale
metadata. Secrets (broker tokens, provider keys, VNC passwords) are never
included.
## Flags
```text
--id <lease-id-or-slug> lease to inspect; required for managed providers
--provider hetzner|aws|ssh override the configured provider
--target linux|macos|windows
--windows-mode normal|wsl2
--static-host <host> static SSH host for provider=ssh
--static-user <user> static SSH user override
--static-port <port> static SSH port override
--static-work-root <path> static target work root
--network auto|tailscale|public select which address inspect prints
--json print JSON
```
## Inspect vs Status vs List
- `inspect` is the long-form record for one lease, including provider
metadata, label state, and the resolved SSH command;
- `status` is the shorter "is this lease healthy right now" check, with
optional `--wait` and bounded telemetry;
- `list` is the table view across many leases, scoped by owner/org or
fleet-wide for admins.
Use `inspect` when something is unexpected and you want all the detail in
one place. Use `status` when an automation needs a quick liveness check.
Use `list` when you are looking for a specific lease across the pool.
Related docs:
- [status](status.md)
- [list](list.md)
- [ssh](ssh.md)
- [Identifiers](../features/identifiers.md)
- [Network and reachability](../features/network.md)

View File

@ -7,9 +7,31 @@ crabbox logout
crabbox logout --json
```
The broker URL and provider are left in place so a later `crabbox login` or `crabbox login --token-stdin` can reuse them.
The broker URL and provider stay in place so a later `crabbox login` or
`crabbox login --token-stdin` can reuse them. Per-lease SSH keys, repo
claims, and history records are unaffected.
After logout:
- `crabbox whoami` exits with auth code 3 (`auth failure`);
- `crabbox run` and `crabbox warmup` against the coordinator fail with the
same code;
- direct-provider mode keeps working when local provider credentials
(AWS SDK, `HCLOUD_TOKEN`) are present, because direct mode does not need
the broker token.
Use logout when:
- a token has leaked or you want to rotate it;
- you are switching the operator identity on a shared workstation;
- you are testing the unauthenticated path.
To clear everything (URL, provider, token, profile defaults), edit the user
config file directly. `crabbox config path` prints the location.
Related docs:
- [login](login.md)
- [whoami](whoami.md)
- [Auth and admin](../features/auth-admin.md)
- [Configuration](../features/configuration.md)

View File

@ -1,20 +1,70 @@
# logs
`crabbox logs` prints the retained remote output for a recorded run.
`crabbox logs` prints the retained command output for a recorded run.
```sh
crabbox logs run_...
crabbox logs --id run_...
crabbox logs run_... --json
crabbox logs run_abcdef123456
crabbox logs --id run_abcdef123456
crabbox logs run_abcdef123456 --json
```
The plain form writes the log text to stdout. `--json` returns run metadata plus the log.
## What Gets Stored
Logs are bounded remote stdout/stderr captures. The CLI keeps up to 8 MiB per run and the coordinator stores larger captures in chunks, so failures from noisy parallel runs remain visible without turning run history into unlimited archival storage.
When `crabbox run` runs against a coordinator, it streams remote stdout and
stderr to the local terminal *and* records a bounded copy on the
coordinator. The CLI keeps up to 8 MiB of capture per run; the coordinator
stores larger captures in chunks so a noisy parallel run does not exceed
Durable Object storage limits.
Output beyond the cap is truncated with an `output.truncated` marker on the
last event so the consumer knows the tail is missing.
## Output
The plain form writes the log text to stdout. `--json` returns run metadata
plus the log:
```json
{
"runId": "run_abcdef123456",
"leaseId": "cbx_abcdef123456",
"exitCode": 0,
"truncated": false,
"log": "..."
}
```
`--json` is stable enough for scripts that filter by exit code and want the
log text in one payload.
## Flags
```text
--id <run-id> run id (also accepted as a positional argument)
--json print JSON with metadata and log text
```
## When To Use Logs vs Events vs Attach
- `logs` returns the retained command output. Use when you want the full
bounded transcript after the run finished.
- `events` returns ordered run events (lease, sync, command, output chunks,
finish). Use when you need to know *what happened* and *when*.
- `attach` follows live events. Use when the run is still active and you
want to watch it without re-attaching the original CLI.
Logs and events are independent surfaces - logs stay focused on command
output, events stay focused on lifecycle.
## Direct Mode
Direct-provider mode does not record runs centrally, so `crabbox logs` has
nothing to fetch. Use shell output or the local terminal log instead.
Related docs:
- [history](history.md)
- [events](events.md)
- [attach](attach.md)
- [results](results.md)
- [History and logs](../features/history-logs.md)

View File

@ -1,17 +1,22 @@
# results
`crabbox results` prints structured test summaries attached to a recorded run.
`crabbox results` prints structured test summaries attached to a recorded
run.
```sh
crabbox run --id cbx_... --junit junit.xml -- go test ./...
crabbox results run_...
crabbox results run_... --json
crabbox run --id cbx_abcdef123456 --junit junit.xml -- go test ./...
crabbox results run_abcdef123456
crabbox results run_abcdef123456 --json
```
Results are attached only when `crabbox run` is told where to find remote JUnit XML. Use either:
## When Results Are Attached
Results are attached only when `crabbox run` is told where to find remote
JUnit XML. Use either:
```sh
crabbox run --junit junit.xml -- <command...>
crabbox run --junit junit.xml,reports/junit.xml -- <command...>
```
or repo config:
@ -23,10 +28,76 @@ results:
- reports/junit.xml
```
Human output shows totals and failed test cases. JSON output returns the stored summary. Stored summaries keep aggregate counts but cap bulky failure details.
After the command exits, the CLI reads each remote file from the workdir,
parses JUnit, and sends only the summary to the coordinator. Raw XML is not
stored. Multiple JUnit files are merged into a single summary so a multi-
report test setup still produces one result record.
## Output
Human output shows totals and the names of failed test cases:
```text
run_abcdef123456 lease=cbx_abcdef123456 command="pnpm test"
totals: tests=412 failures=2 errors=0 skipped=4 time=42.318s
failures:
src/auth.test.ts > login → returns user
src/sync.test.ts > rsync → handles deletes
```
`--json` returns the stored structured summary:
```json
{
"runId": "run_abcdef123456",
"totals": { "tests": 412, "failures": 2, "errors": 0, "skipped": 4, "timeSeconds": 42.318 },
"failures": [
{ "suite": "src/auth.test.ts", "name": "login → returns user" },
{ "suite": "src/sync.test.ts", "name": "rsync → handles deletes" }
],
"files": [
{ "path": "junit.xml", "size": 12345 }
]
}
```
## Limits
The coordinator caps stored summaries:
- aggregate counters (tests, failures, errors, skipped) are kept verbatim;
- failed-case entries are capped to a bounded list;
- long strings (test names, suite names, message bodies) are truncated;
- file lists keep paths and sizes, never raw bytes.
This keeps the result record small enough for the lease detail page and
the run detail page to render without paging through gigabytes of XML.
## Flags
```text
--id <run-id> run id (also accepted as a positional argument)
--json print JSON
```
## When To Use Results vs Logs
- `results` is the structured summary - "did the suite pass, and which
cases failed?";
- `logs` is the retained command output - "what did the command print?".
Use `results` for dashboards and quick triage. Use `logs` when you need to
read the actual stack trace.
## Future Formats
Today only JUnit XML is supported. Vitest JSON, Go `test2json`, and flaky-
test correlation across runs are tracked in
[Test results](../features/test-results.md).
Related docs:
- [run](run.md)
- [history](history.md)
- [logs](logs.md)
- [Test results](../features/test-results.md)

View File

@ -1,25 +1,80 @@
# sync-plan
`crabbox sync-plan` prints the local sync manifest without leasing a box.
Use it to preview what `crabbox run` would send before paying for a cold
sync, or after editing `.crabboxignore` to confirm artifacts dropped out
of the manifest.
```sh
crabbox sync-plan
crabbox sync-plan --limit 10
crabbox sync-plan --limit 25 --json
```
It uses the same Git file-list manifest, `.crabboxignore`, and config excludes
as `crabbox run`, then prints:
## What It Reads
`sync-plan` uses the same Git file-list manifest, `.crabboxignore`, and
`sync.exclude` rules as `crabbox run`:
- tracked files from `git ls-files --cached`;
- nonignored untracked files from
`git ls-files --others --exclude-standard`;
- root `.crabboxignore` patterns;
- repo-local `sync.exclude` patterns;
- Crabbox's default cache/build excludes.
It does not require a lease, does not call the broker, and does not call
any provider API.
## Output
Default output prints:
- candidate file count and total bytes;
- tracked deletes that would be applied remotely;
- largest files;
- largest first or second-level directories.
- the largest files;
- the largest first or second-level directories.
Use it before a cold sync when the preflight estimate looks too large, or after
editing `.crabboxignore` to confirm that local artifacts dropped out of the
manifest.
```text
files: 1843
bytes: 312.5MiB
tracked deletes: 0
largest files:
84.5MiB assets/demo.mp4
12.4MiB fixtures/sample-data.json
...
largest directories:
140.2MiB assets
80.1MiB fixtures
...
```
## Flags
```text
--limit <n> show this many files and directories in each top list (default 5)
--json print structured JSON output
```
`--limit 0` shows the full lists (use sparingly; large repos produce big
output).
## Use Cases
- preview a first sync before warming a beast-class lease;
- find sneaky directories that grew (`.cache/`, `dist/`, generated assets);
- audit `.crabboxignore` after adding new excludes;
- compare repo footprint over time as part of repo health checks.
The numbers `sync-plan` prints are upper bounds; rsync's actual transfer
size depends on what is already on the remote runner. Repeat sync after a
warmup is much smaller because the manifest matches the remote fingerprint
and rsync ships only changed bytes.
Related docs:
- [run](run.md)
- [Sync](../features/sync.md)
- [Configuration](../features/configuration.md)

View File

@ -1,21 +1,77 @@
# whoami
`crabbox whoami` verifies broker auth and prints the identity the coordinator sees.
`crabbox whoami` verifies broker auth and prints the identity the
coordinator sees.
```sh
crabbox whoami
crabbox whoami --json
```
Human output:
## Human Output
```text
user=steipete@gmail.com org=openclaw auth=github broker=https://crabbox.openclaw.ai
user=alex@example.com org=openclaw auth=github broker=https://crabbox.openclaw.ai
```
Identity normally comes from the signed GitHub login token. Shared bearer-token automation reports owner/org from `X-Crabbox-Owner` and `X-Crabbox-Org`; the CLI fills those from `CRABBOX_OWNER`, Git email env, `git config user.email`, and `CRABBOX_ORG`. Raw Cloudflare Access identity headers are ignored; only a verified Access JWT email can become the bearer-token owner. JSON output also reports the forwarded auth mode, such as `github` or `bearer`.
The fields:
- `user` - the resolved owner email.
- `org` - the organization namespace, when set.
- `auth` - the authentication mode the coordinator accepted (`github` for
signed login tokens, `bearer` for shared automation tokens).
- `broker` - the configured coordinator URL.
## JSON Output
```json
{
"owner": "alex@example.com",
"org": "openclaw",
"auth": "github",
"broker": "https://crabbox.openclaw.ai",
"tokenSource": "user-config",
"accessJwtVerified": false
}
```
JSON output also reports the forwarded auth mode, where the token came
from (`user-config`, `env`, `stdin`), and whether a verified Cloudflare
Access JWT was present.
## Identity Sources
Identity normally comes from the signed GitHub login token. The browser
flow embeds the verified GitHub email and allowed-org membership in a
short-lived signed token; the coordinator extracts owner/org from that
token, not from headers.
Shared bearer-token automation reports owner/org from `X-Crabbox-Owner` and
`X-Crabbox-Org`. The CLI fills those headers from:
- `CRABBOX_OWNER` env (highest precedence);
- `GIT_AUTHOR_EMAIL` or `GIT_COMMITTER_EMAIL` env;
- `git config user.email`;
- `CRABBOX_ORG` env for the org header.
Raw Cloudflare Access identity headers are ignored. Only a verified Access
JWT email (with the JWT validated against the Cloudflare team's public
keys) can become the bearer-token owner.
## Exit Codes
```text
0 identity resolved successfully
2 broker URL or token missing
3 auth failure (token rejected, GitHub org membership missing, etc.)
```
Use `whoami` in CI scripts before any long workflow to fail fast on auth
issues.
Related docs:
- [login](login.md)
- [logout](logout.md)
- [Auth and admin](../features/auth-admin.md)
- [Broker auth and routing](../features/broker-auth-routing.md)

256
docs/concepts.md Normal file
View File

@ -0,0 +1,256 @@
# Concepts
Read when:
- you encounter a Crabbox term you do not recognize;
- you are writing docs and want to stay consistent with existing usage;
- you need a single page that lays out the vocabulary.
This page is a glossary. It defines the nouns and the verbs Crabbox uses
across the CLI, broker, providers, and docs. When two synonyms exist, the
preferred form is in **bold**.
## Compute Vocabulary
**Lease** - a time-bounded reservation of a remote runner that Crabbox
created or resolved. Has a canonical ID (`cbx_...`), a friendly slug, an
idle timeout, a TTL, and a state (`active`, `released`, `expired`,
`failed`). Leases are the unit of cost accounting and cleanup.
**Runner** - the remote machine itself. Provisioned by the provider,
prepared by cloud-init, used for one or more leases. Crabbox does not
distinguish between a Hetzner cloud server, an AWS EC2 instance, and a
static SSH host beyond what the provider backend tells it - all are
runners.
**Box** / **Testbox** - informal synonym for runner. Used in the README and
some early docs. Prefer "runner" in new docs unless the surrounding context
is talking about leases as a product (in which case "box" reads better).
**Pool** - the set of currently active runners visible to a user, org, or
the whole fleet. `crabbox list` and `/v1/pool` both expose it.
**Slug** - the friendly name for a lease. Looks like `blue-lobster`.
Generated from a stable hash of the lease ID; collisions append a 4-hex
suffix. See [Identifiers](features/identifiers.md).
**Lease ID** - the canonical machine-friendly identifier
(`cbx_abcdef123456`). Used in labels, logs, and APIs. Always 16 chars.
**Run** - a single `crabbox run` invocation against a coordinator. Has a
`run_...` ID, an owning lease, a command, an exit code, and a record in
coordinator history.
## Roles
**CLI** - the local Go binary `crabbox`. Owns config, sync, command
execution, output streaming, and per-lease SSH keys. See
[Architecture](architecture.md).
**Broker** / **Coordinator** - the Cloudflare Worker plus Fleet Durable
Object. Owns provider credentials, lease state, expiry, cleanup alarms,
usage, and cost. Both terms are used interchangeably; "coordinator" is
preferred in feature docs that emphasize state, "broker" when emphasizing
the trust boundary between CLI and provider.
**Provider** - a Crabbox component that knows how to acquire, resolve,
list, and release runners on a backing service. Built-in providers: AWS,
Hetzner, Static SSH, Blacksmith Testbox, Daytona, Islo. See
[Provider reference](providers/README.md).
**Backend** - the Go interface a provider implements:
`SSHLeaseBackend` for providers that hand Crabbox a real SSH target,
`DelegatedRunBackend` for providers that own command execution
themselves. See [Provider backends](provider-backends.md).
**Operator** - a person with broker-side access (admin token, Cloudflare
config). Operators run `crabbox admin` commands and image bake/promote
flows.
**Agent** - an LLM-backed process invoking Crabbox through the CLI or the
OpenClaw plugin. Agents are first-class users of Crabbox; the docs
intentionally write for both humans and agents.
## Modes
**Brokered mode** / **coordinator mode** - the normal path, where the CLI
talks to the Cloudflare Worker for lease creation, lease state, and
cleanup. Provider secrets stay broker-side. Used for shared team
infrastructure.
**Direct mode** / **direct-provider mode** - the local-debug fallback, where
the CLI talks straight to the provider API (AWS SDK, Hetzner API, Daytona
SDK, Islo SDK). No coordinator, no central history, no spend caps. Use
when you are debugging the broker itself.
**Static mode** - lease behavior for `provider: ssh`. The host is operator-
owned; Crabbox does not provision or delete it. Bypasses both broker and
direct provisioning paths.
**Delegated mode** - the path used by Blacksmith, Islo, and the Daytona
`run` flow. The provider owns command execution and streams output back to
Crabbox. Crabbox-owned sync (`--sync-only`, `--checksum`) is rejected;
sync timing reports `sync=delegated`.
## Commands
**warmup** - acquire a lease and keep it ready. No command runs yet.
**run** - acquire or reuse a lease, sync, run a command, stream output,
release.
**stop** - release a specific lease and delete its provider resources.
**cleanup** - sweep direct-provider leftovers based on labels. Refuses
when a coordinator is configured.
**reuse** - using `--id` (or a slug) to pick an existing lease instead of
creating a new one. Both `warmup` (idempotent) and `run` accept `--id`.
**reclaim** - move a local claim from one repo to another so a lease
created in repo A can be reused from repo B. Required because Crabbox
binds leases to repos by default.
**hydrate** - prepare a runner with project dependencies, usually by
dispatching a real GitHub Actions job that registers an ephemeral
self-hosted runner. The CLI then runs the local command in the hydrated
workspace. See [Actions hydration](features/actions-hydration.md).
## State
**Idle timeout** - the duration a lease may go without heartbeats before
the broker auto-releases it. Default 30m. Reset by every heartbeat or
explicit touch.
**TTL** - the absolute maximum wall-clock lifetime of a lease. Default
90m. Cannot be extended by heartbeats. `expiresAt = min(createdAt + ttl,
lastTouchedAt + idleTimeout)`.
**Heartbeat** - a `POST /v1/leases/{id}/heartbeat` call sent by the CLI
during long-running commands. Updates `lastTouchedAt`, can ship telemetry
samples, and can update idle timeout when explicitly requested.
**Touch** - lower-level synonym for "update lease state and idle". The
provider's `Touch` method is what handles direct-provider state updates;
heartbeat is the brokered equivalent.
**Reserved cost** - the worst-case TTL cost the broker reserves for a
lease at creation time (`hourlyRate × ttl`). Charged against the monthly
spend cap until the lease ends; freed on release. Distinct from elapsed
runtime cost, which is reported by `crabbox usage`.
**Estimated cost** - elapsed-runtime cost for a lease, computed from the
hourly rate and the time spent in `active`. What `crabbox usage` reports
as a billing approximation.
## Sync
**Manifest** - the NUL-delimited list of paths Crabbox will sync, built
from `git ls-files --cached` and `git ls-files --others --exclude-standard`.
**Fingerprint** - a hash of the commit, dirty file metadata, and manifest.
When the local fingerprint matches the remote one, Crabbox skips rsync.
**Git seeding** - the optional first-sync step where Crabbox fetches the
configured origin/base ref into the runner's Git directory before rsync,
so changed-file diffs are available remotely.
**Base ref** - the Git ref that Crabbox seeds and hydrates. Default
`main`. Configurable per repo in `sync.baseRef`.
**Sanity check** - a guardrail run after rsync that detects mass tracked
deletions, missing manifest entries, and other suspicious sync outcomes.
## Capabilities
**Desktop** - lease capability that adds Xvfb + XFCE + x11vnc. Required
for `crabbox vnc`, `crabbox webvnc`, and most `--browser` UI runs.
**Browser** - lease capability that installs Chrome/Chromium and exports
`BROWSER`/`CHROME_BIN`. Useful for Playwright/Vitest/etc. without a full
QA harness.
**Code** - lease capability that installs code-server bound to loopback.
Used by `crabbox code` and the portal `/code/` bridge.
**Tailscale** - optional reachability layer for managed Linux leases.
Joins the lease to the configured tailnet so clients on the tailnet can
reach the runner without the public IP. Distinct from the network mode
(`--network tailscale`) that selects which plane the CLI uses.
## Backplane
**Durable Object** - the Cloudflare Worker primitive that holds Crabbox
fleet state. Crabbox uses one fleet Durable Object so all scheduling
decisions are serialized.
**Alarm** - the Durable Object scheduling primitive that fires on a future
timestamp. Crabbox uses alarms for idle-timeout sweeps and TTL cleanup.
**Portal** - the server-rendered web UI hosted by the same Worker. Pages
under `/portal/...`. See [Browser portal](features/portal.md).
**Bridge** - a portal endpoint that proxies traffic to a loopback service
on the lease (VNC, code-server). Bridges authenticate against the portal
session, then talk to the lease over the internal SSH plane.
## Identity
**Owner** - the email address that owns a lease. Resolved from the signed
GitHub login token, `CRABBOX_OWNER`, Git env, or `git config user.email`.
**Org** - the GitHub-style organization namespace for a lease. Resolved
from the signed token or `CRABBOX_ORG`. Used for usage scoping and
multi-tenant cost caps.
**Allowed org** - the GitHub org membership the broker requires before
issuing a signed login token. Configured per Cloudflare Worker.
**Admin token** - the separately scoped token required for `/v1/pool`,
admin lease routes, and fleet-wide listing. Held more closely than the
shared automation token.
**Cloudflare Access** - optional protection layer in front of the Worker.
When configured, the Worker only trusts the `CF-Access-Jwt-Assertion`
header (verified upstream); raw identity headers from the client are
ignored.
## Storage
**State directory** - where the CLI keeps local state (claims, per-lease
keys, known_hosts). Defaults to `$XDG_STATE_HOME/crabbox`, falling back to
the platform-specific user config directory.
**Claim** - a JSON file under the state directory binding a lease to a
repo. Required for `crabbox run --id` to resolve slugs and to refuse
cross-repo reuse without `--reclaim`.
**Workdir** / **work root** - the directory on the runner where Crabbox
syncs the repo. Default `/work/crabbox` on Linux; provider-specific on
Windows and macOS.
## Documentation
**Source map** - the doc page that points each user-facing behavior at the
implementation file behind it. Updated when behavior changes. See
[Source map](source-map.md).
**Feature page** - a doc under `docs/features/<name>.md` describing what
Crabbox does in one capability area. Owns the conceptual story; commands
and providers cross-link from here.
**Command page** - a doc under `docs/commands/<name>.md` describing the
flags, behavior, and exit codes of one CLI command. One per top-level
command, kept in sync with `--help` by `scripts/check-command-docs.mjs`.
**Provider page** - a doc under `docs/providers/<name>.md` describing one
provider's targets, config keys, env vars, sync behavior, and expected
failures.
Related docs:
- [How Crabbox Works](how-it-works.md)
- [Architecture](architecture.md)
- [CLI](cli.md)
- [Configuration](features/configuration.md)
- [Provider backends](provider-backends.md)

View File

@ -8,39 +8,62 @@ Read when:
- you are deciding where a behavior belongs;
- you need the feature-level contract before changing code.
Core features:
## Foundations
- [Configuration](configuration.md): precedence, YAML schema, profiles, classes, env vars.
- [Identifiers](identifiers.md): lease IDs, slugs, run IDs, claims, and how lookup resolves.
- [Doctor checks](doctor.md): what `crabbox doctor` validates and how to extend it.
- [Network and reachability](network.md): `--network auto|tailscale|public`, port fallback, public/tailnet planes.
- [Lease capabilities](capabilities.md): `--desktop`, `--browser`, and `--code` selection rules.
- [Environment forwarding](env-forwarding.md): name-based env allowlist for the remote command.
## Brokered fleet
- [Coordinator](coordinator.md): brokered leases through Cloudflare Workers and Durable Objects.
- [Broker auth and routing](broker-auth-routing.md): GitHub login, shared bearer tokens, optional Cloudflare Access, and Worker routes.
- [Browser portal](portal.md): authenticated lease/run UI, detail pages, bridge routes, and runner visibility.
- [Broker auth and routing](broker-auth-routing.md): GitHub login, shared bearer tokens, optional Cloudflare Access, and Worker routes.
- [Auth and admin](auth-admin.md): login/logout/whoami and trusted operator controls.
- [Telemetry](telemetry.md): lightweight Linux load, memory, disk, uptime, and run resource samples.
- [History and logs](history-logs.md): coordinator run records, events, and retained remote output.
- [Cost and usage](cost-usage.md): guardrails, provider-backed pricing, and reporting.
- [Lifecycle cleanup](lifecycle-cleanup.md): release, expiry, keep mode, and direct cleanup.
## Providers
- [Providers](providers.md): provider overview, target matrix, classes, and fallback.
- [Provider backends](../provider-backends.md): implementation guide for adding a new provider/backend/plugin.
- [Provider authoring](provider-authoring.md): step-by-step guide for adding a provider package.
- [Capacity and fallback](capacity-fallback.md): class chains, market spot/on-demand, region/AZ routing.
- [Provider backends](../provider-backends.md): contract reference for backend interfaces and registration.
- [Authoring a provider](provider-authoring.md): step-by-step guide to writing a new provider.
- [AWS](aws.md): EC2 Linux, Windows, WSL2, EC2 Mac, capacity, AMIs, and security groups.
- [Hetzner](hetzner.md): Linux-only managed Hetzner behavior, classes, and cleanup.
- [Blacksmith Testbox](blacksmith-testbox.md): delegated Testbox backend behavior.
- [Daytona](daytona.md): Daytona SDK/toolbox sandbox leases with optional short-lived SSH access.
- [Islo](islo.md): delegated Islo sandbox runs using the Islo Go SDK.
## Runners and reachability
- [Tailscale](tailscale.md): optional tailnet reachability for managed Linux leases and static hosts.
- [Runner bootstrap](runner-bootstrap.md): cloud-init, installed tools, SSH port, and readiness.
- [Prebaked runner images](prebaked-images.md): provider-owned image storage and the image/cache/state boundary.
- [Image bake runbook](image-bake-runbook.md): exact AWS bake, candidate smoke, promotion, rollback, and cleanup flow.
- [SSH keys](ssh-keys.md): per-lease keys, provider key cleanup, and local storage.
## Sync, run, and recording
- [Sync](sync.md): Git file-list manifests, rsync, fingerprints, excludes, guardrails, and sanity checks.
- [Actions hydration](actions-hydration.md): let GitHub Actions prepare a runner, then sync local work into that workspace.
- [Interactive desktop and VNC](interactive-desktop-vnc.md): VNC hub, support matrix, tunnel model, and QA boundaries.
- [Linux VNC](vnc-linux.md), [Windows VNC](vnc-windows.md), [macOS VNC](vnc-macos.md): OS-specific desktop setup and troubleshooting.
- [SSH keys](ssh-keys.md): per-lease keys, provider key cleanup, and local storage.
- [Cost and usage](cost-usage.md): guardrails, provider-backed pricing, and reporting.
- [History and logs](history-logs.md): coordinator run records, events, and retained remote output.
- [Telemetry](telemetry.md): lightweight Linux load, memory, disk, uptime, and run resource samples.
- [Test results](test-results.md): JUnit summaries attached to recorded runs.
- [Cache controls](cache.md): inspect, purge, and warm remote package/build caches.
- [Auth and admin](auth-admin.md): login/logout/whoami and trusted operator controls.
- [Lifecycle cleanup](lifecycle-cleanup.md): release, expiry, keep mode, and direct cleanup.
## Integrations
- [OpenClaw plugin](openclaw-plugin.md): agent tools that wrap the CLI.
- [Repository onboarding](repository-onboarding.md): `crabbox init`, repo config, workflow stub, and agent skill.
- [Source map](../source-map.md): implementation files behind documented behavior.
Command docs:
## Command docs
- [doctor](../commands/doctor.md)
- [init](../commands/init.md)

View File

@ -0,0 +1,191 @@
# Lease Capabilities
Read when:
- adding `--desktop`, `--browser`, or `--code` to a workflow;
- changing how Crabbox detects whether a lease can host a visible desktop;
- adding a new lease capability flag.
Lease capabilities are opt-in features that change what a managed runner can
do beyond running headless commands. They are a separate concept from the
provider feature set declared in `ProviderSpec.Features`: feature set says
"this provider can support a desktop"; lease capability says "this lease was
created with a desktop and exposes one right now".
## The Three Capabilities
```text
--desktop visible desktop with a loopback VNC server
--browser Chrome/Chromium installed and exported via $BROWSER and $CHROME_BIN
--code code-server bound to a loopback port for portal/code bridging
```
All three default to off. They have to be requested at lease creation time
(`crabbox warmup --desktop`) and reused afterwards. A lease created without a
capability cannot grow it later.
## Selection And Validation
Capability flags follow a two-step validation:
1. **Provider feature check.** When the user sets a capability flag,
`validateRequestedCapabilities` looks up the selected provider's
`Spec.Features` and rejects the request if the matching feature
(`FeatureDesktop`, `FeatureBrowser`, `FeatureCode`) is missing. Hetzner
Linux supports all three; Blacksmith Testbox supports none.
2. **Lease label check.** When reusing a lease (`--id`),
`enforceManagedLeaseCapabilities` checks the matching label
(`desktop=true`, `browser=true`, `code=true`) on the existing lease. If
the label is missing, Crabbox refuses with a hint to warm a new lease.
For static SSH targets, label enforcement is skipped because Crabbox does not
own the host. The capability is detected probe-by-probe instead - `--desktop`
on a static target probes the loopback VNC port; `--browser` on a static
target probes for Chrome and exports `BROWSER`/`CHROME_BIN` from what it
finds.
`--code` is currently restricted to managed Linux leases. The validator
rejects it for Windows, macOS, and static SSH.
## Desktop
When a managed Linux lease is created with `--desktop`, bootstrap installs:
- Xvfb (virtual framebuffer);
- a slim XFCE session;
- x11vnc bound to `127.0.0.1:5900`;
- a randomized VNC password at `/var/lib/crabbox/vnc.password`;
- screenshot tooling (`scrot`) and ffmpeg.
`crabbox vnc --id ...` opens an SSH tunnel to that loopback port. The user's
local VNC viewer talks through the tunnel and uses the password the CLI
fetches from `/var/lib/crabbox/vnc.password`. There is no public VNC port; the
loopback bind is the security boundary.
Static targets must already expose loopback VNC at `127.0.0.1:5900`. macOS
hosts can enable Screen Sharing; Windows hosts need a VNC server bound to
loopback (TightVNC works).
For per-OS detail and known limits, see:
- [Linux VNC](vnc-linux.md);
- [Windows VNC](vnc-windows.md);
- [macOS VNC](vnc-macos.md);
- [Interactive desktop and VNC](interactive-desktop-vnc.md).
When the run injects environment, Crabbox also sets:
```text
DISPLAY=:99
CRABBOX_DESKTOP=1
```
Tools that respect `DISPLAY` will draw onto the desktop the lease created.
## Browser
`--browser` adds a usable browser to the lease without dragging in a full QA
test environment.
On managed Linux:
- Google Chrome stable when available;
- Chromium fallback;
- native addon build helpers (`build-essential`, `libgbm-dev`,
`libnss3-dev`, etc.) so dependency installs that compile against Chromium
succeed.
On static targets, Crabbox probes for an existing browser and reports an
error if none is found. `requestedCapabilityEnv` shells out to the host:
- macOS: `/Applications/Google Chrome.app/Contents/MacOS/Google Chrome`;
- Windows: `chrome.exe` or `msedge.exe` from PATH or the standard install
directories;
- Linux: `$BROWSER`, `$CHROME_BIN`, then `google-chrome`, `chromium`, or
`chromium-browser` from PATH.
The detected path is exported into the run as:
```text
BROWSER=/path/to/browser
CHROME_BIN=/path/to/browser
CRABBOX_BROWSER=1
```
Test runners that read `BROWSER` or `CHROME_BIN` (Vitest, Playwright, etc.)
work without extra plumbing. If a browser is requested but no binary is
found, the run aborts before the command starts.
## Code
`--code` provisions code-server on managed Linux leases:
- installs the binary at `/usr/local/bin/code-server`;
- binds to a loopback port (default `8080`);
- generates an auth token stored in coordinator state.
The portal and `crabbox code --id ...` open a code-server tab through the
authenticated portal bridge at `/portal/leases/{id-or-slug}/code/`. The bridge
proxies HTTP and WebSocket traffic to the loopback port; the code-server
auth token is injected by the bridge so the user does not see it. There is no
public code-server port.
Code is managed-Linux-only because the bridge depends on the lease shape and
the cloud-init that prepares the binary. Windows, macOS, and static SSH are
intentionally not supported today.
## Capability Labels
Managed lease records carry capability labels so list, status, and detail
pages can render the capability matrix without re-probing the host:
```text
desktop=true|false
browser=true|false
code=true|false
```
`enforceManagedLeaseCapabilities` reads these labels to gate `--desktop`,
`--browser`, and `--code` on `--id` reuse paths. The labels are written when
the lease is created and never flipped on a live lease.
## Composing Capabilities
Capabilities are independent - any combination is allowed where the
provider supports them:
```sh
crabbox warmup --desktop # desktop only
crabbox warmup --desktop --browser # browser running on the desktop
crabbox warmup --desktop --browser --code # full interactive box
crabbox warmup --browser # headless browser, no VNC
crabbox warmup --code # editor-only Linux lease
```
Capability bootstrap adds installation time. A bare lease is the fastest to
warm; a lease with all three takes the longest. Use the lightest combination
that satisfies the workflow.
## Static Targets
For static SSH hosts, capability validation degrades to probe-based detection:
- `--desktop`: probe `127.0.0.1:5900` over SSH; fail with a clear error if
the port is not bound;
- `--browser`: probe for a browser binary using the OS-specific search list;
fail if none found;
- `--code` is rejected (managed Linux only).
This is intentional. Crabbox is not responsible for installing software on
operator-owned static hosts; if the box does not expose the capability, the
run should not silently fall back.
Related docs:
- [warmup command](../commands/warmup.md)
- [run command](../commands/run.md)
- [vnc command](../commands/vnc.md)
- [webvnc command](../commands/webvnc.md)
- [code command](../commands/code.md)
- [Interactive desktop and VNC](interactive-desktop-vnc.md)
- [Browser portal](portal.md)

View File

@ -0,0 +1,215 @@
# Capacity And Fallback
Read when:
- adding or changing machine classes;
- debugging "why did Crabbox pick this instance type?";
- working on AWS spot/on-demand fallback or Hetzner location fallback;
- configuring multi-region or multi-AZ capacity for AWS.
Crabbox cares about capacity in three ways:
1. **Class fallback** - the ordered list of provider types that satisfy a
class request.
2. **Market fallback** - AWS-specific Spot to On-Demand failover within a
class.
3. **Region/AZ routing** - where the broker tries to provision when capacity
is tight in a single zone.
Hetzner only deals with class fallback. AWS deals with all three. Static
SSH, Blacksmith, Daytona, and Islo do not have capacity fallback because
the operator or external service controls the underlying resources.
## Classes
Class names are provider-agnostic intent labels:
```text
standard typical CI lane
fast ~2x more cores than standard for parallel-friendly suites
large memory-heavy or many-process workloads
beast maximum capacity within the provider's burstable family
```
Each provider maps the four class names to an ordered list of concrete
instance types. The list is the fallback chain: try the first; if rejected,
try the second; and so on.
The full Hetzner and AWS class tables live in
[Providers](providers.md#hetzner-summary). The table also lists the AWS
Windows, Windows WSL2, and macOS class maps.
## When Class Fallback Triggers
Hetzner falls back when:
- the requested server type is unavailable in the configured location;
- the project quota rejects the request;
- the API returns a transient capacity error.
AWS falls back when:
- the instance type is rejected by capacity in the chosen Availability Zone;
- the account policy denies the type (e.g. quota = 0 vCPUs);
- the spot request is rejected by capacity.
Quota rejections are detected from the API error code rather than scraped
from the message string, so the fallback is deterministic. The next
candidate in the chain is tried until either one succeeds or the chain is
exhausted.
When the chain is exhausted, Crabbox returns exit code 4 (`no capacity`) and
the error includes `provisioningAttempts` that record which types were
tried, why each failed, and where (region/AZ for AWS). The same metadata is
attached to the failed lease record on the coordinator so operators can
inspect what went wrong without rerunning the workflow.
## Explicit Type Override
`--type c7a.16xlarge` and the matching `type:` config key skip the class
fallback chain and request that specific instance type. The contract is
"give me this exact type, not a fallback". If the provider rejects it,
Crabbox fails loudly with exit code 4 and does not silently choose a
different type.
Use `--type` when:
- you want deterministic capacity for benchmarks;
- you are pinning a specific generation for a known-bug workaround;
- you are debugging the capacity layer itself.
For everything else, prefer a class - the fallback chain handles transient
rejections without operator intervention.
## AWS Market Fallback
AWS supports two markets: `spot` and `on-demand`.
```yaml
capacity:
market: spot
fallback: on-demand-after-120s
```
`capacity.market: spot` requests Spot capacity first. `capacity.fallback:
on-demand-after-120s` falls back to On-Demand for the same instance type
when Spot fails to come up within 120 seconds. Set `fallback` to `none` (or
omit it) to never fall back to On-Demand.
Per-command overrides:
```sh
crabbox warmup --market spot
crabbox run --market on-demand -- pnpm test
```
The `--market` flag overrides `capacity.market` for one lease without
rewriting repo config. Use it when an account is temporarily out of Spot
quota or when Spot interruption rates spike.
## AWS Capacity Hints
The brokered AWS path uses Service Quotas and EC2 placement scoring to
preflight large requests:
```yaml
capacity:
hints: true
largeClasses:
- large
- beast
```
When `hints: true` and the class is in `largeClasses`:
- the broker calls Service Quotas to check applied Spot or On-Demand vCPU
limits;
- candidates that exceed quota are recorded as quota attempts and skipped;
- remaining candidates are scored with `GetSpotPlacementScores` (Spot mode)
to pick the most-available region/AZ.
The result is a single provisioning attempt that picks the best location
and skips known-rejected types instead of letting the chain stumble through
them sequentially.
Hints apply only on the brokered (Worker) path. Direct AWS mode still falls
back through the class chain but does not run quota or placement preflight.
## Region And Availability Zone Routing
```yaml
capacity:
regions:
- eu-west-1
- us-east-1
availabilityZones:
- eu-west-1a
- eu-west-1b
```
`regions` is the ordered list of AWS regions the broker considers when
multiple regions are configured. Single-region setups use `aws.region` and
leave `capacity.regions` empty; multi-region setups list every region the
broker may launch into.
`availabilityZones` narrows the per-region zone selection. The broker uses
Spot placement scoring across the listed AZs and picks the highest-scoring
zone that has capacity.
Regions are tried in order; AZs within a region are scored. If every AZ in
a region rejects the request, Crabbox advances to the next region.
## Fallback Strategies
```yaml
capacity:
strategy: most-available
```
| Value | Behavior |
|:------|:---------|
| `most-available` (default) | use placement scoring or class chain order |
| `cheapest` | prefer types with the lowest live hourly price (when known) |
| `provider-default` | follow the provider's own placement defaults |
`cheapest` is currently honored on the brokered AWS path that has live
pricing. Hetzner does not differentiate strategies because its server-type
prices are consistent across locations.
## Direct Mode Differences
Direct provider mode (no coordinator) supports class fallback but has no
quota preflight, no placement score, no `provisioningAttempts` metadata, and
no central history. Direct AWS still respects `--market` and the `fallback`
config key, so spot-to-on-demand failover works locally - just without the
diagnostic richness the broker provides.
If a direct AWS run exits with code 4, run the same command through the
broker once to get structured `provisioningAttempts` evidence; then go back
to direct mode for the rest of the iteration loop.
## Failure Surface
Capacity failures map to:
```text
exit 4 no capacity every candidate in the chain was rejected
exit 5 provisioning failed a candidate was accepted but never reached SSH
exit 8 lease expired long warmup exceeded the configured TTL before SSH
```
The accompanying error message names the chain, the markets that were
tried, and (for brokered runs) `provisioningAttempts` you can inspect with:
```sh
crabbox history --lease cbx_...
```
Related docs:
- [Providers](providers.md)
- [AWS](../providers/aws.md)
- [Hetzner](../providers/hetzner.md)
- [Cost and usage](cost-usage.md)
- [Orchestrator](../orchestrator.md)
- [Operations](../operations.md)

View File

@ -0,0 +1,368 @@
# Configuration
Read when:
- adding a new config key, env override, or flag;
- debugging "why is Crabbox using value X here?";
- onboarding a repo and choosing what belongs in repo config vs user config;
- reviewing the YAML schema that `crabbox config show` and `crabbox init`
emit.
Crabbox configuration is layered. The CLI loads values from five sources and
merges them in a deterministic order. Each source is optional - the binary
boots with sane defaults for everything.
## Precedence
```text
flags > env > repo-local crabbox.yaml/.crabbox.yaml > user config > defaults
```
Reading order is the lowest precedence first: defaults are applied, then
overridden by user config, then repo config, then env vars, then flags. Every
override only replaces fields that are explicitly set; unset fields fall
through.
`crabbox config show` prints the merged configuration as the CLI sees it after
all five layers run. `--json` is stable enough to diff in scripts.
`crabbox config path` prints the user config file path so other tools can
edit it without parsing prose.
## File Locations
```text
macOS user: ~/Library/Application Support/crabbox/config.yaml
Linux user: ~/.config/crabbox/config.yaml
XDG override: $XDG_CONFIG_HOME/crabbox/config.yaml
repo: ./crabbox.yaml or ./.crabbox.yaml at repo root
explicit: $CRABBOX_CONFIG (any path)
```
If `CRABBOX_CONFIG` is set, it overrides the repo-local search and replaces
the effective repo config. User config is never replaced by the env override.
State that does not belong in either YAML file:
- live lease records (those are coordinator-owned);
- per-lease SSH private keys (those live under the user config dir but not in
`config.yaml`);
- provider secrets (those live in the broker environment, your shell env, or
a credential manager).
## YAML Schema
The full schema below merges what `crabbox init` emits and what advanced
operators set in user config. Most repos only need a small subset.
### Top-level
```yaml
broker:
url: https://crabbox.openclaw.ai
provider: aws
token: <signed-github-token-or-shared-token>
access:
clientId: <cloudflare-access-service-token-id>
clientSecret: <cloudflare-access-service-token-secret>
provider: aws # default provider when --provider is not set
target: linux # default target OS
windows:
mode: normal # normal or wsl2 when target=windows
profile: project-check
class: beast # standard | fast | large | beast
type: c7a.48xlarge # explicit provider type, overrides class fallback
network: auto # auto | tailscale | public
lease:
idleTimeout: 30m
ttl: 90m
```
### Capacity
```yaml
capacity:
market: spot # spot | on-demand
strategy: most-available
fallback: on-demand-after-120s
hints: true
regions:
- eu-west-1
- us-east-1
availabilityZones:
- eu-west-1a
- eu-west-1b
largeClasses:
- large
- beast
```
### AWS
```yaml
aws:
region: eu-west-1
ami: ami-0123456789abcdef0
securityGroupId: sg-0abcdef0123456789
subnetId: subnet-0abcdef0123456789
instanceProfile: crabbox-runner
rootGB: 400
sshCidrs:
- 203.0.113.0/24
macHostId: h-0123456789abcdef0
```
### Hetzner
Hetzner credentials and image come from broker-side config. Repos do not need
a `hetzner:` block unless they pin a class or location.
### Static SSH
```yaml
provider: ssh
target: macos
static:
host: mac-studio.local
user: steipete
port: "22"
workRoot: /Users/steipete/crabbox
```
### Blacksmith Testbox
```yaml
provider: blacksmith-testbox
blacksmith:
org: openclaw
workflow: .github/workflows/ci-check-testbox.yml
job: test
ref: main
idleTimeout: 90m
debug: false
```
### Daytona
```yaml
provider: daytona
daytona:
snapshot: openclaw-crabbox
apiKey: <daytona-api-key> # prefer DAYTONA_API_KEY env
```
### Sync
```yaml
sync:
delete: true
checksum: false
gitSeed: true
fingerprint: true
baseRef: main
timeout: 15m
warnFiles: 50000
warnBytes: 5368709120
failFiles: 150000
failBytes: 21474836480
allowLarge: false
exclude:
- node_modules
- .turbo
- dist
```
A `.crabboxignore` file at the repo root appends to `sync.exclude`. See
[Sync](sync.md) for the matcher rules.
### Env Forwarding
```yaml
env:
allow:
- CI
- NODE_OPTIONS
- PROJECT_*
```
`env.allow` is name-based and supports trailing wildcards. Crabbox forwards
matching local env vars to the remote command. Secrets do not belong in
`env.allow`; pass them through provider-side mechanisms.
### Actions
```yaml
actions:
workflow: .github/workflows/crabbox.yml
job: test
ref: main
fields:
- crabbox_docker_cache=true
runnerLabels:
- crabbox
ephemeral: true
runnerVersion: latest
```
### Cache
```yaml
cache:
pnpm: true
npm: true
docker: true
git: true
maxGB: 80
purgeOnRelease: false
```
### Results
```yaml
results:
junit:
- junit.xml
- reports/junit.xml
```
### SSH
```yaml
ssh:
key: ~/.ssh/id_ed25519
user: crabbox
port: "2222"
fallbackPorts:
- "22"
```
### Tailscale
```yaml
tailscale:
enabled: false
tags:
- tag:crabbox
hostnameTemplate: crabbox-{slug}
authKeyEnv: CRABBOX_TAILSCALE_AUTH_KEY
exitNode: ""
exitNodeAllowLanAccess: false
```
## Profiles
Profiles are named bundles of config that get applied as a layer on top of
user/repo config. They live under a `profiles:` map and are selected by
`--profile` or `profile:` in repo config.
```yaml
profiles:
project-check:
class: beast
sync:
baseRef: main
env:
allow:
- PROJECT_*
smoke:
class: standard
lease:
ttl: 30m
```
Use profiles when one repo has multiple test lanes with different machine
classes, sync rules, or env allowlists. A repo without profiles never needs
the block.
## Machine Classes
A machine class is a provider-agnostic name for "standard", "fast", "large",
or "beast" capacity. Each provider maps the class to a list of concrete
instance/server types and falls back through the list when the first
candidate cannot be provisioned.
| Class | Intent |
|:------|:-------|
| `standard` | typical CI lane |
| `fast` | ~2x more cores than standard for parallel-friendly suites |
| `large` | memory-heavy or many-process workloads |
| `beast` | maximum capacity within the provider's burstable family |
Class-to-type mappings live in [Providers](providers.md). When you set
`type:`, that exact provider type wins and the class is ignored. The
`--type` and `type:` paths intentionally do not fall back; they fail loud
if the provider rejects the type.
## Environment Variables
Every YAML key has a `CRABBOX_*` env override. The full list is in
[CLI](../cli.md#environment-variables). Common ones:
```text
CRABBOX_COORDINATOR
CRABBOX_COORDINATOR_TOKEN
CRABBOX_PROVIDER
CRABBOX_TARGET
CRABBOX_PROFILE
CRABBOX_DEFAULT_CLASS
CRABBOX_IDLE_TIMEOUT
CRABBOX_TTL
CRABBOX_NETWORK
CRABBOX_OWNER
CRABBOX_ORG
```
Provider credentials live outside the Crabbox env namespace because they are
provider-native:
```text
HCLOUD_TOKEN / HETZNER_TOKEN
AWS_PROFILE / AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN
DAYTONA_API_KEY / DAYTONA_JWT_TOKEN
BLACKSMITH_* (read by the Blacksmith CLI)
ISLO_API_KEY (read by the Islo SDK)
```
## What Belongs Where
| Setting | User config | Repo config | Profile | Notes |
|:--------|:------------|:------------|:--------|:------|
| `broker.url` and `broker.token` | yes | no | no | Per-machine identity. |
| `provider`, `class`, `type` | optional default | yes | yes | Per-repo defaults; profiles for lanes. |
| `sync.exclude`, `sync.fingerprint`, `sync.baseRef` | no | yes | yes | Lives with the repo. |
| `env.allow` | no | yes | yes | Repo decides what is safe to forward. |
| Per-user SSH key path | yes | no | no | Personal preference. |
| `aws.region`, `aws.ami` | optional | yes | yes | Repos can pin region. |
| Tailscale tags and template | yes | yes | yes | Both layers can set this. |
| Profiles | yes | yes | n/a | Either layer can define profiles. |
The rule of thumb: anything other repos should inherit when they clone goes in
repo config; anything tied to one operator's machine goes in user config.
## Validation
The CLI validates config eagerly:
- `parseNetworkMode` rejects `--network` values outside `auto|tailscale|public`;
- `validateNetworkConfig` requires `tailscale.tags` when `tailscale.enabled`
is true and rejects Tailscale on Blacksmith and static providers;
- `validateRequestedCapabilities` rejects `--desktop`, `--browser`, or
`--code` for providers whose `Spec.Features` does not list the matching
feature flag;
- `crabbox doctor` runs a richer set of checks against config, network
reachability, and SSH keys.
When validation fails, `crabbox` exits with code 2 and a message that names
the offending field.
Related docs:
- [CLI](../cli.md)
- [config command](../commands/config.md)
- [doctor command](../commands/doctor.md)
- [Sync](sync.md)
- [Providers](providers.md)
- [Capacity and fallback](capacity-fallback.md)
- [Network and reachability](network.md)

172
docs/features/doctor.md Normal file
View File

@ -0,0 +1,172 @@
# Doctor Checks
Read when:
- adding a new precheck before users run long workflows;
- debugging an unexpected `doctor` failure;
- deciding whether a check belongs in `doctor` or somewhere else.
`crabbox doctor` is the local preflight. It validates the things that have
silently broken commands in the past so users get an answer before they
spend ten minutes on a failed lease.
The command is fast (under a second on a healthy machine), local-only,
non-destructive, and never talks to provider APIs that might cost money.
## Categories
Doctor groups checks under five categories:
```text
config config files load and parse, required keys are present
auth broker token is set, signed token is valid, identity resolves
network coordinator URL reachable, DNS works, SSH transport probes work
ssh SSH key path readable, key type acceptable, ssh-keygen on PATH
tools rsync, git, ssh, ssh-keygen present and executable
```
Each category emits one or more pass/fail/skip lines. Failures are listed
first; passes and skips follow in deterministic order so the output is
diffable across runs.
## What `config` Checks
- The user config file parses without error.
- The repo config (when present) parses without error.
- Provider name resolves through `ProviderFor`.
- Target OS is one of `linux`, `macos`, `windows`.
- Network mode is one of `auto`, `tailscale`, `public`.
- Tailscale config validates when `tailscale.enabled: true` (tags non-empty,
hostname template non-empty, exit-node-allow-lan-access requires an
exit node, target is `linux`, provider is not Blacksmith or static).
- Class is one of `standard`, `fast`, `large`, `beast` when set; explicit
`type:` values are accepted as-is.
## What `auth` Checks
- A broker URL is configured if the user expects coordinator mode.
- A broker token is present when the URL is configured.
- The signed token (when GitHub login was used) decodes and is not expired.
- Owner can be resolved from `CRABBOX_OWNER`, Git env, or
`git config user.email`.
- `whoami` succeeds against the configured coordinator with the stored
token.
When auth is missing, doctor prints `crabbox login` as the next step.
## What `network` Checks
- The coordinator URL resolves via DNS.
- The coordinator is reachable over HTTPS within a small timeout.
- When `--network tailscale` is configured, `tailscale status` reports a
joined client.
- SSH transport probes succeed for the primary port and fall back to the
configured fallback ports.
DNS is checked before HTTPS so a broken DNS responder does not look like a
broker outage.
## What `ssh` Checks
- The configured SSH key path (`ssh.key` or `CRABBOX_SSH_KEY`) is readable
when set.
- The key file has a sensible permissions mode (warn on group/world
readable).
- `ssh-keygen` is on PATH so per-lease key generation works.
- The user's `~/.ssh/known_hosts` is writable (if it exists).
When `ssh.key` is unset, doctor skips the path validation - per-lease keys
do not need a global key.
## What `tools` Checks
- `git` is on PATH.
- `rsync` is on PATH.
- `ssh` is on PATH.
- `ssh-keygen` is on PATH.
The check is path-based, not version-based. Crabbox tolerates any reasonably
modern version of these tools.
## What Doctor Does Not Do
Doctor stays local on purpose. It does not:
- start a real lease or provision a server;
- talk to AWS, Hetzner, Daytona, Islo, or any provider API;
- run `git ls-files` against the repo (that belongs in `crabbox sync-plan`);
- estimate costs;
- modify config or rotate keys.
Anything that costs money or has side effects belongs in a different
command. Doctor is for "before I run anything, is my machine sane?" and
should be safe to run from `pre-commit` hooks, agent boot, or CI smoke.
## Output Shape
```text
config:
ok user config: ~/.config/crabbox/config.yaml
ok repo config: ./.crabbox.yaml
ok provider: aws
ok target: linux
ok network: auto
auth:
ok broker: https://crabbox.openclaw.ai
ok owner: alex@example.com
ok org: openclaw
network:
ok coordinator dns
ok coordinator https
ssh:
ok ssh-keygen present
skip ssh.key unset (per-lease keys will be used)
tools:
ok git
ok rsync
ok ssh
ok ssh-keygen
```
Failures swap the leading `ok` for `fail` and add a remediation hint:
```text
auth:
fail broker token is missing - run `crabbox login`
```
Skips swap `ok` for `skip` and explain why the check did not run:
```text
network:
skip coordinator unconfigured (direct provider mode)
```
Exit code is `0` on full success, `2` on any failure. Skips do not change
the exit code.
## Adding A Check
Doctor checks live in `internal/cli/doctor.go`. Each check returns a
`doctorResult{ Status, Category, Subject, Detail, Remediation }`. The CLI
sorts results by category, then by subject, so output stays stable.
Rules for new checks:
- they must run in under ~100ms;
- they must not call out to a paid API or write any state;
- they must produce a `Remediation` string when they fail;
- they should `skip` (not `fail`) when the configuration genuinely does
not apply (e.g. SSH key check when `ssh.key` is unset).
Tests in `doctor_test.go` exercise the result struct and ordering. Add a
test for the new check that asserts the failure message and remediation
text so future refactors do not silently regress the user-facing output.
Related docs:
- [doctor command](../commands/doctor.md)
- [Configuration](configuration.md)
- [Network and reachability](network.md)
- [SSH keys](ssh-keys.md)
- [Source map](../source-map.md)

View File

@ -0,0 +1,155 @@
# Environment Forwarding
Read when:
- adding a new env var that the remote command needs to see;
- debugging "why is `$CI` empty inside `crabbox run`?";
- writing a repo config that lets agents set tunable values without flags;
- reviewing a PR that loosens or tightens the env allowlist.
By default, `crabbox run` does not forward arbitrary local environment
variables to the remote command. Forwarding is opt-in and name-based: the
repo declares which variable names are allowed, and Crabbox forwards only
those that are present locally.
## Why Allowlist
Agents and CI environments run with rich and sometimes sensitive
environments: tokens, private credentials, terminal paths, vendor-specific
debug flags. Forwarding everything would:
- leak secrets to remote runners;
- introduce non-determinism between local and CI runs;
- make it impossible to reason about what affects a remote command.
Allowlist forwarding makes the contract explicit. The repo decides what
"counts" as input to the remote command, and the user can audit the
allowlist in `crabbox.yaml`.
## Configuration
```yaml
env:
allow:
- CI
- NODE_OPTIONS
- PROJECT_*
```
Rules:
- entries are env var names, not values;
- a trailing `*` is a prefix wildcard (`PROJECT_*` matches `PROJECT_FOO`,
`PROJECT_BAR`);
- inline wildcards (`PROJECT_*_DEBUG`) are not supported;
- match is exact and case-sensitive;
- empty entries are ignored.
The user-side override is `CRABBOX_ENV_ALLOW`, a comma-separated list:
```sh
CRABBOX_ENV_ALLOW='CI,NODE_OPTIONS,PROJECT_*' crabbox run -- pnpm test
```
`CRABBOX_ENV_ALLOW` replaces the repo allowlist for that command rather than
appending to it. Use it for one-off tests; persistent allowances belong in
`env.allow`.
## What Gets Forwarded
For each env var in the allowlist, Crabbox checks whether the variable is
set locally. If it is, the variable is forwarded to the remote command with
the same name and value. If it is not set locally, nothing is forwarded -
Crabbox does not invent values.
The remote command sees the variables as part of its environment when run
through SSH:
```sh
ssh runner 'CI=true NODE_OPTIONS=--max_old_space_size=4096 cd workdir && pnpm test'
```
Quoting and escaping happen automatically. Values that contain shell
metacharacters are passed through safely.
## Capability-Injected Env
A small set of env vars is injected by Crabbox itself when the matching
capability is requested. These bypass the allowlist because Crabbox owns
them:
```text
DISPLAY=:99 when --desktop
CRABBOX_DESKTOP=1 when --desktop
BROWSER=<path> when --browser, after probe
CHROME_BIN=<path> when --browser, after probe
CRABBOX_BROWSER=1 when --browser
```
User-allowed env vars override capability-injected ones if they overlap.
Repos that need a different `BROWSER` value can include `BROWSER` in
`env.allow` and set it locally.
## Secrets
Do not put secrets in `env.allow` even if forwarding seems convenient.
Secrets belong in:
- the broker environment (Cloudflare Worker secrets) for provider
credentials;
- the operator's credential store (`op`, AWS Vault, etc.) for short-lived
tokens;
- per-runner image bake when the secret should be on every lease;
- post-bootstrap secret injection in repo-owned setup scripts (devcontainer,
mise, repo-controlled `bin/setup`).
Crabbox forwards values it sees locally. If a secret leaks into the
allowlist, every run of every contributor will leak it.
## Examples
```yaml
env:
allow:
- CI # mark a remote command as CI-driven
- NODE_OPTIONS # adjust Node memory in test suites
- PYTEST_ADDOPTS # tune pytest flags from the local env
- PROJECT_* # repo's own debug knobs
- VITEST_* # let agents override vitest config
- DEBUG # `debug` package selector
```
Common things you usually do not allow:
```text
HOME, USER, PATH, SHELL runner already has its own
SSH_* leaks SSH agent state
GITHUB_TOKEN use Actions hydration or runner setup
AWS_* use IAM roles or instance profile
*_API_KEY, *_TOKEN use a secret manager
```
## Inspecting Forwarding
`crabbox run --debug` prints the set of env vars that were forwarded for
that invocation. Use it to verify that the allowlist matches expectations
before debugging "why does the remote command not see this variable?".
```sh
$ crabbox run --debug -- env | grep '^PROJECT'
[crabbox] forwarding env: CI NODE_OPTIONS PROJECT_FOO PROJECT_BAR
PROJECT_FOO=value
PROJECT_BAR=other-value
```
Variables that match the allowlist but are unset locally are not in the
forwarded list, so the debug line is the source of truth for "what did the
remote command actually see".
Related docs:
- [Sync](sync.md)
- [Configuration](configuration.md)
- [run command](../commands/run.md)
- [Capabilities](capabilities.md)
- [Security](../security.md)

View File

@ -0,0 +1,199 @@
# Identifiers
Read when:
- changing how Crabbox names leases, slugs, runs, or claims;
- debugging "why does `crabbox run --id` not find this lease?";
- adding a new lookup form (alias, provider id, anything that resolves to a
lease).
Crabbox names every long-lived thing twice: once with a stable canonical ID
that machines compare, and once with a friendly slug that humans type. This
page lists the identifiers, where they come from, and how lookup resolves
across them.
## Lease ID
Canonical lease IDs look like:
```text
cbx_abcdef123456
```
The pattern is fixed: the literal `cbx_` prefix followed by 12 hex characters.
`isCanonicalLeaseID` enforces it as a regex; anything else is treated as a
slug or alias.
The CLI mints a provisional lease ID before calling the broker. The broker
may return a different final ID (when the Worker dedupes a retried request,
for example); the CLI then moves the local SSH key directory and claim file
from the provisional ID to the final ID with `MoveStoredTestboxKey` and
re-keys references accordingly.
Provider resources reference the lease ID through Crabbox labels:
```text
crabbox-lease=cbx_abcdef123456
```
That label is what `crabbox cleanup` and `crabbox list` use to map a provider
machine back to a Crabbox lease.
## Slug
Slugs are friendly, human-typeable lease names. They look like:
```text
blue-lobster
amber-crab
silver-shrimp
```
Slugs are generated from a stable hash of the lease ID, so the same lease
always gets the same slug. The vocabulary is small (14 adjectives, 8 nouns)
because Crabbox is intentionally a small fleet. When a slug collides with an
existing active lease, `slugWithCollisionSuffix` appends a 4-hex suffix
keyed by the seed:
```text
blue-lobster-1234
```
The collision path is rare in normal use - a single user's active leases
rarely exceed the 14 × 8 = 112 unique base slugs.
Slugs are normalized everywhere they are accepted. `normalizeLeaseSlug` keeps
only `[a-z0-9-]`, collapses runs of separators, and trims leading/trailing
dashes. `Blue_Lobster` and `BLUE-LOBSTER` resolve to `blue-lobster`.
## Provider Name
Each managed lease also gets a per-provider resource name that includes the
slug and a hash of the lease ID, so the provider console shows useful names:
```text
crabbox-blue-lobster-7f8a2c1d
```
That name is what shows up as the EC2 `Name` tag, the Hetzner server name,
and the Daytona sandbox name. It is derived from `leaseProviderName(leaseID,
slug)`; the function falls back to `crabbox-cbx-...` if the slug is empty.
## Run ID
Each `crabbox run` against a coordinator also gets a durable run handle:
```text
run_abcdef123456
```
A run is created before the lease is acquired so events can be appended for
leasing failures, sync failures, and command output even when the run never
reaches command-start. Run IDs are stable across a single invocation;
retrying the same command produces a new run.
`crabbox history`, `crabbox events`, `crabbox attach`, `crabbox logs`, and
`crabbox results` all accept run IDs. Slugs do not resolve to runs - only to
leases.
## Local Claims
Reusable leases get a JSON claim file stored under the user state directory:
```text
$XDG_STATE_HOME/crabbox/claims/cbx_abcdef123456.json
```
When `XDG_STATE_HOME` is not set, claims live next to user config in
`~/Library/Application Support/crabbox/state/claims` on macOS or
`~/.config/crabbox/state/claims` on Linux.
The claim payload looks like:
```json
{
"leaseID": "cbx_abcdef123456",
"slug": "blue-lobster",
"provider": "aws",
"repoRoot": "/Users/steipete/Projects/openclaw",
"claimedAt": "2026-05-07T07:42:18Z",
"lastUsedAt": "2026-05-07T07:55:12Z",
"idleTimeoutSeconds": 1800
}
```
Claims do three things:
- bind a lease to one repo so wrappers and agents do not silently reuse a
lease against a different checkout;
- give `crabbox run --id blue-lobster` a slug-to-canonical-ID translation
without round-tripping the broker;
- power "is this lease still mine?" checks before destructive operations
(`stop`, `cleanup`, `actions register`).
A conflicting claim (same lease, different repo) refuses commands by default;
`--reclaim` overrides the check and rewrites the claim atomically.
Static SSH leases tag their claims with `provider: ssh` so the resolver knows
the lease bypasses the coordinator. Coordinator-backed claims leave
`provider` blank because the coordinator owns provider tracking.
## SSH Key Storage
Per-lease SSH key directories are keyed by lease ID:
```text
~/.config/crabbox/testboxes/cbx_abcdef123456/id_ed25519
~/.config/crabbox/testboxes/cbx_abcdef123456/id_ed25519.pub
~/.config/crabbox/testboxes/cbx_abcdef123456/known_hosts
```
The provisional → final lease ID move uses `os.Rename` on the directory so
the key, public key, and known_hosts file all migrate atomically. The
provider key name (`crabbox-cbx-abcdef123456`) is what the cloud account
sees.
## Resolving An Identifier
`crabbox <command> --id <value>` accepts:
- a canonical `cbx_...` lease ID;
- a normalized slug (`blue-lobster`, `Blue Lobster`, `BLUE_LOBSTER` all resolve
to the same lease);
- in coordinator mode, also the slug as known to the broker, regardless of
case.
Resolution order:
1. Read the local claim store for the literal identifier or any slug match
in `claims/`.
2. If a matching claim exists, use its `leaseID` as the canonical handle.
3. If no claim is found and a coordinator is configured, ask the coordinator
to resolve the identifier (slug or canonical ID).
4. For static SSH and direct-provider modes, fall back to the provider's
`Resolve` implementation (`SSHLeaseBackend.Resolve`).
The first source that returns a hit wins. This is why `--id blue-lobster`
works from any directory once the warmup ran in some other repo - the local
claim translates slug to lease ID before the broker is involved.
## Identifier Lifetime
```text
provisional lease ID newLeaseID() call → broker returns final ID
final lease ID broker accepts → stored in claim, key dir, labels
slug computed on first lease creation, stable forever
provider name derived from lease ID + slug
run ID minted per crabbox run when a coordinator is configured
```
Slugs are not recycled. When a lease ends, the slug stays free for any future
lease that happens to hash to it; the small vocabulary makes that
collision-by-hash possible but rare in practice.
Related docs:
- [Coordinator](coordinator.md)
- [SSH keys](ssh-keys.md)
- [Lifecycle cleanup](lifecycle-cleanup.md)
- [Source map](../source-map.md)

195
docs/features/network.md Normal file
View File

@ -0,0 +1,195 @@
# Network And Reachability
Read when:
- choosing between `--network auto`, `tailscale`, or `public`;
- debugging "Crabbox can SSH but my browser can't reach the desktop";
- changing how Crabbox falls back between the public IP and the tailnet IP;
- adjusting SSH port fallbacks for restrictive operator networks.
A Crabbox lease can be reachable through more than one network plane.
Brokered Linux leases can join a Tailscale tailnet, brokered AWS Windows and
EC2 Mac leases stay public, and static SSH targets can be on either depending
on how the operator configured them. The CLI picks one plane per command and
prints which it picked.
## Modes
```text
--network auto prefer tailnet when reachable, otherwise fall back to public
--network tailscale require tailnet reachability; fail otherwise
--network public ignore tailnet metadata and use the public address
```
`auto` is the default. It optimizes for "do not surprise me": prefer tailnet
when both client and runner are on the tailnet, fall back transparently to
the public path when the client is off-tailnet.
`tailscale` is the strict mode. Use it when you specifically want to verify
tailnet reachability or when the public IP is firewalled to a CI runner that
your local box cannot reach.
`public` is the escape hatch. Use it when the tailnet metadata is stale, when
you are debugging public-network issues, or when the client cannot reach the
tailnet for unrelated reasons.
The mode applies to `crabbox ssh`, `crabbox run`, `crabbox vnc`, and
`crabbox webvnc`. `crabbox status --network auto` also resolves through this
path so the printed address matches what later commands will use.
## How `auto` Picks A Plane
For a lease with tailnet metadata, `auto` mode:
1. reads `tailscale_fqdn`, `tailscale_ipv4`, and `tailscale_hostname` from the
server labels;
2. probes the first non-empty option over SSH with a 5-second TCP transport
probe;
3. uses that target if the probe succeeds;
4. falls back to the public IP and prints `network=public` with the reason
`tailscale_unreachable`.
For a lease with no tailnet metadata, `auto` is just public mode.
Static SSH targets behave the same way when the static host name is a
MagicDNS or `100.x` address. If the operator points `static.host` at a
MagicDNS name, `--network tailscale` works without any other configuration -
the address is already on the tailnet.
## Public Reachability
Brokered AWS Linux, AWS Windows, AWS Mac, Hetzner Linux, Daytona, and Islo
leases all expose at least one public address. Crabbox stores the public
address on the server record and uses it whenever the network mode resolves
to `public`.
Public addresses are gated by the provider's security group / firewall. AWS
managed leases use the `crabbox-runners` security group with SSH ingress
limited to the configured CIDRs or the request source IP. Hetzner managed
leases use the cloud firewall attached to the project; the broker keeps it
limited to the operator's IPs.
If your client IP changes during a long warmup, the existing security group
rule may not include the new IP. Re-running `crabbox status` adds the
current IP back and updates the rule.
## Tailnet Reachability
When a managed Linux lease is created with `--tailscale`, cloud-init:
- installs the Tailscale package;
- joins the tailnet with the configured tags (default `tag:crabbox`);
- writes non-secret metadata to `/var/lib/crabbox/tailscale-*`;
- extends `crabbox-ready` with a bounded check that a `100.x` address has
been assigned;
- discards the auth key after `tailscale up` so it never persists.
The metadata Crabbox stores on the lease record:
```text
tailscale=true
tailscale_hostname=blue-lobster
tailscale_fqdn=blue-lobster.tail-scale.ts.net
tailscale_ipv4=100.64.0.5
tailscale_state=ok
tailscale_tags=tag:crabbox
tailscale_exit_node=...
tailscale_exit_node_allow_lan_access=true|false
```
Brokered leases get a one-shot auth key minted by the Worker via Tailscale
OAuth (`worker/src/tailscale.ts`). Direct-provider leases use a key from
`CRABBOX_TAILSCALE_AUTH_KEY`. The auth key is never stored on the runner.
When the metadata says the lease is on the tailnet but the client cannot
reach it, the most common reasons are:
- the client is not joined to the tailnet (`tailscale status` on the client);
- ACLs block the tag pair from reaching `100.x`;
- the runner's `tailscaled` process died (rare; readiness probes catch it
before the lease is handed back).
`crabbox status --id <lease> --network tailscale` is the fastest way to test
tailnet reachability after lease creation.
## SSH Port And Fallback
Crabbox runs SSH on a non-standard port by default to keep noise out of the
provider firewall logs:
```yaml
ssh:
port: "2222"
fallbackPorts:
- "22"
```
`ssh.port` is the primary port the bootstrap binds to. `ssh.fallbackPorts` is
an ordered list of additional ports the CLI will try when the primary port
is unreachable - typically because the operator's egress is restricted, the
sshd has not bound the new port yet, or cloud-init is still mid-flight.
Fallback rules:
- the CLI tries primary first, then each fallback in order;
- the first port that opens a TCP connection wins for that command;
- success is sticky for the run; the next command repeats the probe;
- the CLI prints `ssh-port-fallback=22` when fallback was used.
Set `ssh.fallbackPorts: []` or `CRABBOX_SSH_FALLBACK_PORTS=none` to disable
fallback entirely. Some networks prefer this so a misconfigured `2222` rule
fails loud instead of quietly using `22`.
## Loopback-Bound Capabilities
Lease capabilities (desktop, code) are bound to loopback on purpose so they
do not need provider firewall changes:
```text
VNC 127.0.0.1:5900 reached via SSH tunnel
code-server 127.0.0.1:8080 reached via portal bridge
```
The network mode does not change loopback bindings. `--network` only changes
which interface the SSH tunnel or portal bridge uses to talk to the lease.
Loopback is loopback; it is reachable from the runner regardless.
## Static Hosts
Static SSH targets honor the same modes:
- `--network public` uses `static.host` as configured;
- `--network tailscale` requires `static.host` to be a MagicDNS name or
`100.x` address, then probes for SSH reachability;
- `--network auto` defers to the resolved address: if `static.host` is on
the tailnet, that is what `auto` uses; otherwise it is public.
Tailscale-managed bootstrap (`--tailscale`) is rejected for static providers.
Static hosts are operator-owned; Crabbox does not install Tailscale on them.
Set `static.host` to a tailnet address and select `--network tailscale`
explicitly.
## Failure Surface
When a network mode cannot be satisfied, the CLI exits with code 5 and a
message that names the mode and the lease:
```text
network=tailscale requested but lease cbx_... has no tailnet address
network=tailscale requested for static host mac-studio but SSH is not reachable
network=tailscale requested but blue-lobster.tail-scale.ts.net is not reachable over SSH
```
`auto` mode never fails on a tailnet probe; it falls back to public and
records the reason. The `network=public reason=tailscale_unreachable` log
line is the diagnostic signal that the tailnet plane is unhealthy even
though the command kept working.
Related docs:
- [Tailscale](tailscale.md)
- [Runner bootstrap](runner-bootstrap.md)
- [SSH keys](ssh-keys.md)
- [vnc command](../commands/vnc.md)
- [ssh command](../commands/ssh.md)
- [doctor command](../commands/doctor.md)

View File

@ -0,0 +1,165 @@
# OpenClaw Plugin
Read when:
- enabling Crabbox as a plugin inside OpenClaw;
- changing the plugin tools, schema, or wrapper behavior;
- understanding why some Crabbox surfaces are CLI-only and not plugin tools.
The Crabbox repository root is also a native OpenClaw plugin package. When
OpenClaw loads the plugin, it exposes a small set of agent tools that shell
out to the user's installed `crabbox` binary. The plugin does not embed the
CLI or duplicate any of its logic - it is a thin contract for safe, allowlisted
invocations.
## Plugin Manifest
`openclaw.plugin.json` declares the plugin id, the tools it owns, and the
config schema:
```json
{
"id": "crabbox",
"name": "Crabbox",
"description": "Run Crabbox remote testbox checks from OpenClaw.",
"activation": { "onStartup": true },
"contracts": {
"tools": [
"crabbox_run",
"crabbox_warmup",
"crabbox_status",
"crabbox_list",
"crabbox_stop"
]
},
"configSchema": { ... }
}
```
The runtime entrypoint is `index.js`. Tests in `index.test.js` lock the tool
schemas, argv shapes, output trimming, and config validation so a future
refactor cannot silently change the agent-facing contract.
## Tools
```text
crabbox_run run a command on a leased remote box
crabbox_warmup acquire a warm box for repeated commands
crabbox_status query a lease's state
crabbox_list list visible leases for the current owner/org
crabbox_stop stop a lease and release its provider resources
```
Each tool accepts an argv array of `string` plus an optional `env` object of
string values. The plugin enforces these as JSON schema before invoking the
binary, so an agent cannot pass arbitrary shell commands or non-string env
values.
`crabbox_run`, `crabbox_warmup`, and `crabbox_stop` can be disabled per
install by setting `allowRun`, `allowWarmup`, or `allowStop` to `false` in
plugin config. `crabbox_status` and `crabbox_list` are read-only and always
allowed.
## Config
The plugin accepts only four config keys, all optional:
```json
{
"binary": "crabbox",
"maxOutputBytes": 60000,
"timeoutSeconds": 1800,
"allowRun": true,
"allowWarmup": true,
"allowStop": true
}
```
| Key | Default | Effect |
|:----|:--------|:-------|
| `binary` | `crabbox` | Path to the Crabbox binary. Set when the binary is not on PATH. |
| `maxOutputBytes` | 60000 | Max captured stdout/stderr returned to the model per call. |
| `timeoutSeconds` | 1800 | Default wrapper timeout for a Crabbox CLI invocation. |
| `allowRun` | true | Gate `crabbox_run`. |
| `allowWarmup` | true | Gate `crabbox_warmup`. |
| `allowStop` | true | Gate `crabbox_stop`. |
Crabbox config (broker URL, provider, token, profile, class) lives in the
user/repo config files. The plugin does not duplicate those keys; it inherits
them from whatever `crabbox config show` would return for the agent's
working directory.
## Output Handling
The plugin captures stdout and stderr separately, trims each to
`maxOutputBytes`, and reports the exit code, the trimmed bytes, and a
truncation flag back to the model. Truncated output gets a tail marker so
agents know they did not get the full transcript:
```text
... [output truncated; 12345 of 87654 bytes shown]
```
Long-running tools still respect `timeoutSeconds`. When the wrapper times
out, the plugin sends SIGTERM, waits a short grace period, then escalates to
SIGKILL. The exit code in the response reflects the wrapper outcome, not the
inner remote command.
## What Belongs In The CLI Instead
History, log inspection, attach, results, usage, and admin operations are
intentionally not plugin tools. They are best run from a shell-capable agent:
```sh
crabbox history --lease cbx_...
crabbox events run_... --after 0 --limit 50
crabbox attach run_...
crabbox logs run_...
crabbox results run_...
crabbox usage --scope user
crabbox admin leases --state active
crabbox cleanup --dry-run
```
Reasons for keeping these out of the plugin:
- they often produce more output than `maxOutputBytes` can usefully capture;
- agents tend to want raw logs they can grep, not trimmed model output;
- admin tools are easier to gate at the shell level (env, allowlists) than
through plugin config;
- `crabbox attach` is interactive by design.
## Provider Allowlist
The plugin schema constrains the `provider` argument to the providers
Crabbox actually supports:
```text
aws | hetzner | ssh | blacksmith-testbox | blacksmith | daytona | islo
```
Adding a provider to the CLI requires updating this list in `index.js` and
the test fixture in `index.test.js`. The schema is the agent-facing contract;
without the update, the new provider would be rejected by JSON validation
before reaching the binary.
## When To Update
Edit the plugin when you:
- add or remove a provider;
- add a new agent-safe tool (read-only, owner-scoped, bounded output);
- change argv conventions across all `crabbox` commands (rare);
- update default timeouts or output budgets.
Run `node --test index.test.js` after every change. The tests exercise the
schema, argv handling, and output trimming end-to-end.
Related docs:
- [docs/README.md](../README.md) - top-level overview includes the plugin.
- [Source map](../source-map.md) - `package.json`, `openclaw.plugin.json`,
`index.js`, `index.test.js`.
- [run command](../commands/run.md) - what `crabbox_run` ultimately invokes.
- [warmup command](../commands/warmup.md) - what `crabbox_warmup` invokes.
- [stop command](../commands/stop.md) - what `crabbox_stop` invokes.

232
docs/getting-started.md Normal file
View File

@ -0,0 +1,232 @@
# Getting Started
Read when:
- you are new to Crabbox and want a working `run` in 10 minutes;
- you are evaluating Crabbox for a repo and want to see the shape;
- you want a reference for what a typical onboarding looks like.
This is a cookbook, not a reference. It walks through one repo end to end,
from install to `crabbox run -- pnpm test`. For deeper coverage, follow the
links in each step.
## Step 1. Install
```sh
brew install openclaw/tap/crabbox
```
Verify the install:
```sh
crabbox --version
crabbox doctor
```
`crabbox doctor` should print `ok` for `tools` (git, rsync, ssh,
ssh-keygen). It is fine if `auth` and `network` are still missing - we set
those next.
If you do not have Homebrew, GitHub Releases ship signed tarballs for macOS,
Linux, and Windows. Download the matching archive from
<https://github.com/openclaw/crabbox/releases>.
## Step 2. Log In
```sh
crabbox login
```
`login` opens a browser to the GitHub OAuth flow. The broker exchanges the
OAuth code, verifies your GitHub org membership, and writes a signed token
to your user config. From then on, every `crabbox` command authenticates
automatically.
```sh
crabbox whoami
```
Confirms the resolved owner, org, broker URL, and selected provider.
If you are running Crabbox in a CI environment that cannot open a browser,
use shared-token auth:
```sh
printf '%s' "$TOKEN" | crabbox login \
--url https://crabbox.openclaw.ai \
--provider aws \
--token-stdin
```
See [Auth and admin](features/auth-admin.md) for the full identity model.
## Step 3. Onboard A Repo
Inside the repo:
```sh
crabbox init
```
`init` writes three files:
```text
.crabbox.yaml repo defaults (profile, class, sync, env)
.github/workflows/crabbox.yml Actions hydration stub (optional)
.agents/skills/crabbox/SKILL.md agent-facing skill instructions
```
Open `.crabbox.yaml` and fill in:
- `profile`: a name for this lane (e.g. `project-check`);
- `class`: `standard`, `fast`, `large`, or `beast`;
- `sync.exclude`: directories that should not be sent to the runner;
- `env.allow`: env vars the remote command should see.
Then run:
```sh
crabbox sync-plan
```
`sync-plan` previews what would be sent: file count, total bytes, the
biggest files. If it shows surprises (a `dist/` folder, a `.cache/` you
forgot, a 2 GiB asset), tighten `sync.exclude` and re-run. The first sync
to a fresh runner is bound by this size.
## Step 4. Warm A Box
```sh
crabbox warmup
```
Warmup acquires a lease through the broker, provisions the runner,
bootstraps SSH and tooling, and prints a slug + lease ID:
```text
leased cbx_abcdef123456 slug=blue-lobster provider=aws server=i-0123 type=c7a.48xlarge ip=203.0.113.10 idle_timeout=30m0s expires=2026-05-07T17:30:00Z
```
The lease is now waiting for commands. Idle timeout (default 30m) and TTL
(default 90m) bound how long it lives before the broker reclaims it.
## Step 5. Run A Command
```sh
crabbox run --id blue-lobster -- pnpm test
```
What happens:
1. The CLI verifies SSH readiness on the lease.
2. It seeds remote Git from your origin/base ref, then rsyncs the dirty
working tree.
3. It runs the command over SSH, streaming stdout/stderr.
4. It heartbeats the broker so the lease does not idle out mid-test.
5. It records a `run_...` history entry with sync time, command time, exit
code, and (for Linux) bounded telemetry samples.
You can omit `--id` for a one-shot run:
```sh
crabbox run -- pnpm test
```
That acquires a fresh lease, runs the command, and releases the lease when
the command exits. Use this for ad-hoc tests; use `warmup` + `--id` for
iterative work.
## Step 6. Inspect History
```sh
crabbox history
crabbox events run_abcdef123456
crabbox logs run_abcdef123456
crabbox results run_abcdef123456
```
`history` lists recent runs for the lease or owner. `events` prints ordered
events (lease, sync, command, output chunks, finish). `logs` returns the
retained command output. `results` parses any JUnit reports the run
attached.
`/portal/runs/run_abcdef123456` renders the same data as a browser page if
you prefer a UI.
## Step 7. Stop The Lease
When you are done:
```sh
crabbox stop blue-lobster
```
Stop releases the lease, deletes the provider machine, removes the local
claim, and frees reserved cost. If you forget, the broker idle alarm
releases the lease automatically.
```sh
crabbox cleanup --dry-run
```
`cleanup` is a sweep for direct-provider leftovers. It refuses to run when
a coordinator is configured because brokered cleanup is the alarm's job.
## Common Variations
Use a kept lease across days:
```sh
crabbox warmup --idle-timeout 4h --ttl 8h
crabbox run --id blue-lobster -- pnpm test
crabbox run --id blue-lobster -- pnpm bench
crabbox stop blue-lobster
```
Open a desktop session:
```sh
crabbox warmup --desktop
crabbox vnc --id blue-lobster --open
```
Open a code-server tab:
```sh
crabbox warmup --code
crabbox code --id blue-lobster --open
```
Use a Mac Studio you already own:
```yaml
# .crabbox.yaml
provider: ssh
target: macos
static:
host: mac-studio.local
user: steipete
port: "22"
workRoot: /Users/steipete/crabbox
```
```sh
crabbox run -- xcodebuild test
```
Use AWS instead of the configured default:
```sh
crabbox run --provider aws --class beast -- pnpm test
```
## Where To Go Next
- [How Crabbox Works](how-it-works.md) - the mental model.
- [CLI](cli.md) - the full command surface and exit codes.
- [Commands](commands/README.md) - one page per command.
- [Features](features/README.md) - one page per feature.
- [Configuration](features/configuration.md) - YAML schema and precedence.
- [Providers](features/providers.md) - which provider to pick.
- [Provider authoring](features/provider-authoring.md) - add a new provider.
- [Troubleshooting](troubleshooting.md) - what to do when a step fails.