crabbox/docs/features/sync.md
2026-05-05 02:29:06 -07:00

83 lines
3.5 KiB
Markdown

# Sync
Read when:
- changing rsync behavior;
- debugging missing or stale files on a runner;
- changing Git seeding, fingerprints, excludes, or env forwarding.
Crabbox syncs the current checkout to the leased runner before running a command.
It syncs the Git-managed working set, not the whole directory tree:
- tracked files from `git ls-files --cached`;
- nonignored untracked files from `git ls-files --others --exclude-standard`;
- root `.crabboxignore` patterns, repo-local `sync.exclude` patterns, and
Crabbox's default cache/build excludes.
Ignored build output, dependency folders, `.git`, and common local caches stay out of the transfer. This keeps first syncs close to the code that CI would see while still letting agents test uncommitted edits.
Sync flow:
1. pick the local repository root;
2. seed remote Git from the configured origin/base ref when possible;
3. build a NUL-delimited sync manifest from Git's tracked and nonignored file list;
4. print a file/byte estimate and enforce configured large-sync guardrails;
5. compute or reuse a sync fingerprint from the commit, dirty metadata, and manifest;
6. skip rsync when the fingerprint matches;
7. feed the manifest to rsync with `--files-from=- --from0`;
8. delete previously managed remote files that disappeared from the manifest when delete sync is enabled;
9. run sanity checks for mass tracked deletions;
10. hydrate configured base-ref history for changed-test workflows.
The remote manifest deletion step only removes paths Crabbox previously synced. It does not delete workflow-created state, package caches, `.git`, or other local runner files outside the managed file list. Native Windows static targets use the same Git manifest but transfer it as a tar archive over OpenSSH instead of rsync.
In remote Git worktrees, Crabbox stores its sync metadata under `.git/crabbox` so repository status stays clean. Crabbox does not delete files under the worktree `.crabbox/` directory; that path remains available for repository-owned files and config.
Important controls:
```text
CRABBOX_SYNC_CHECKSUM
CRABBOX_SYNC_DELETE
CRABBOX_SYNC_GIT_SEED
CRABBOX_SYNC_FINGERPRINT
CRABBOX_SYNC_BASE_REF
CRABBOX_SYNC_TIMEOUT
CRABBOX_SYNC_WARN_FILES
CRABBOX_SYNC_WARN_BYTES
CRABBOX_SYNC_FAIL_FILES
CRABBOX_SYNC_FAIL_BYTES
CRABBOX_SYNC_ALLOW_LARGE
CRABBOX_ENV_ALLOW
```
Defaults:
```yaml
sync:
timeout: 15m
warnFiles: 50000
warnBytes: 5368709120
failFiles: 150000
failBytes: 21474836480
allowLarge: false
```
`crabbox run --force-sync-large` bypasses the fail thresholds for one run. `--debug` adds rsync progress/stat output; normal syncs still print a heartbeat when rsync is quiet for a while.
Use `crabbox sync-plan` to inspect the local manifest before leasing a box. It prints the candidate file count, total bytes, and the largest files/directories using the same excludes as `run`.
Repo-local config should hold project-specific excludes and env allowlists. Secrets must not be passed as command-line arguments or broad env globs.
Use `.crabboxignore` when you only need repo-local sync exclusions. The file is
read from the repository root. Blank lines and lines starting with `#` are
ignored; remaining lines are appended to `sync.exclude` and use the same matcher
as config excludes. Crabbox intentionally supports only `.crabboxignore`; there
is no short alias.
Related docs:
- [CLI](../cli.md)
- [run command](../commands/run.md)
- [sync-plan command](../commands/sync-plan.md)
- [Repository onboarding](repository-onboarding.md)