diff --git a/CHANGELOG.md b/CHANGELOG.md index aa7774b..415f6dd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,6 +8,8 @@ ### Fixed +- Fixed recorded run logs so long noisy commands are stored in bounded chunks instead of losing the failure evidence between the first output events and the final tail. + ## 0.3.0 - 2026-05-02 Crabbox 0.3.0 makes brokered runs much easier to observe and debug, adds diff --git a/docs/architecture.md b/docs/architecture.md index 49f2efd..c642453 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -81,7 +81,7 @@ Core stored records: ```sql leases(id, slug, provider, cloud_id, region, owner, org, profile, class, server_type, server_id, server_name, provider_key, host, ssh_user, ssh_port, work_root, keep, ttl_seconds, idle_timeout_seconds, estimated_hourly_usd, max_estimated_usd, state, created_at, updated_at, last_touched_at, expires_at, released_at, ended_at) runs(id, lease_id, slug, owner, org, provider, class, server_type, command_json, state, exit_code, sync_ms, command_ms, duration_ms, log_bytes, log_truncated, results_json, started_at, ended_at) -runlog(run_id, bounded_stdout_stderr_tail) +runlog(run_id, bounded_stdout_stderr_capture) ``` State transitions: diff --git a/docs/cli.md b/docs/cli.md index 0fd0939..d1126b2 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -189,7 +189,7 @@ Behavior: 6. Sync current repo, unless a matching sync fingerprint lets Crabbox skip rsync. 7. Seed remote Git from the configured origin/base ref before first sync when possible. 8. Run command over SSH. -9. Stream remote output, append run events, and retain the latest log tail in coordinator history. +9. Stream remote output, append run events, and retain bounded command output in coordinator history. 10. Heartbeat coordinator leases in the background. 11. Release lease unless `--keep` is set. 12. Exit with the remote command exit code. diff --git a/docs/commands/attach.md b/docs/commands/attach.md index 561339d..5b60bf1 100644 --- a/docs/commands/attach.md +++ b/docs/commands/attach.md @@ -26,5 +26,4 @@ inspectable through [history](history.md), [events](events.md), and [logs](logs.md). Output events are a bounded preview. Use [logs](logs.md) for the retained -command output tail after completion. - +command output after completion. diff --git a/docs/commands/events.md b/docs/commands/events.md index cc3b604..6c46207 100644 --- a/docs/commands/events.md +++ b/docs/commands/events.md @@ -17,7 +17,7 @@ Human output includes sequence number, event type, phase, stream, timestamp, and short message or output text. JSON output returns the raw event records. Output events are a bounded preview: stdout/stderr capture stops after 64 KiB per run and records an `output.truncated` marker. Use `crabbox logs` for the -retained command output tail. +larger retained command output. Flags: diff --git a/docs/commands/logs.md b/docs/commands/logs.md index 3b25e33..f5b1bfe 100644 --- a/docs/commands/logs.md +++ b/docs/commands/logs.md @@ -1,6 +1,6 @@ # logs -`crabbox logs` prints the retained remote output tail for a recorded run. +`crabbox logs` prints the retained remote output for a recorded run. ```sh crabbox logs run_... @@ -10,7 +10,7 @@ crabbox logs run_... --json The plain form writes the log text to stdout. `--json` returns run metadata plus the log. -Logs are bounded tails of remote stdout/stderr. They are for debugging recent runs, not unlimited archival. +Logs are bounded remote stdout/stderr captures. The CLI keeps up to 8 MiB per run and the coordinator stores larger captures in chunks, so failures from noisy parallel runs remain visible without turning run history into unlimited archival storage. Related docs: diff --git a/docs/commands/run.md b/docs/commands/run.md index 97261e3..fa3e8f8 100644 --- a/docs/commands/run.md +++ b/docs/commands/run.md @@ -31,9 +31,9 @@ Before the first rsync into a Git checkout, Crabbox tries to seed the remote wor After sync, Crabbox runs a remote sanity check. If the remote checkout reports at least 200 tracked deletions, Crabbox fails before running tests unless local `CRABBOX_ALLOW_MASS_DELETIONS=1` is set. -When a coordinator is configured, Crabbox records each remote command as a run history item. `crabbox history` lists those records and `crabbox logs ` prints the retained remote output tail. Log retention is intentionally bounded so a noisy command cannot fill Durable Object storage. +When a coordinator is configured, Crabbox records each remote command as a run history item. `crabbox history` lists those records and `crabbox logs ` prints retained remote output. Log retention is intentionally bounded so a noisy command cannot fill Durable Object storage. -Add `--junit ` or configure `results.junit` to attach JUnit XML summaries to the run record. `crabbox results ` then prints failed tests without reading the raw log tail. +Add `--junit ` or configure `results.junit` to attach JUnit XML summaries to the run record. `crabbox results ` then prints failed tests without reading the raw log. Use `crabbox sync-plan` to inspect the same local manifest without leasing a box when a sync estimate looks unexpectedly large. diff --git a/docs/features/README.md b/docs/features/README.md index d149b55..0595efc 100644 --- a/docs/features/README.md +++ b/docs/features/README.md @@ -19,7 +19,7 @@ Core features: - [Actions hydration](actions-hydration.md): let GitHub Actions prepare a runner, then sync local work into that workspace. - [SSH keys](ssh-keys.md): per-lease keys, provider key cleanup, and local storage. - [Cost and usage](cost-usage.md): guardrails, provider-backed pricing, and reporting. -- [History and logs](history-logs.md): coordinator run records, events, and retained remote output tails. +- [History and logs](history-logs.md): coordinator run records, events, and retained remote output. - [Test results](test-results.md): JUnit summaries attached to recorded runs. - [Cache controls](cache.md): inspect, purge, and warm remote package/build caches. - [Auth and admin](auth-admin.md): login/logout/whoami and trusted operator controls. diff --git a/docs/features/history-logs.md b/docs/features/history-logs.md index 6ed66ce..223e404 100644 --- a/docs/features/history-logs.md +++ b/docs/features/history-logs.md @@ -19,7 +19,7 @@ CLI finishes that run with: - total duration; - owner and org; - provider, class, and server type; -- retained remote output tail. +- retained remote output. Use: @@ -32,10 +32,11 @@ crabbox logs run_... ``` History records and run events live in the Fleet Durable Object. Log text is -stored separately from run metadata and intentionally capped to the latest tail -so noisy commands cannot exhaust storage. Event output capture is also bounded; -use `crabbox attach` for active run previews and `crabbox logs` for the retained -command output tail. +stored separately from run metadata and intentionally capped so noisy commands +cannot exhaust storage. Logs larger than one storage value are chunked by the +coordinator and reassembled by `crabbox logs`. Event output capture is also +bounded; use `crabbox attach` for active run previews and `crabbox logs` for the +retained command output. Direct-provider mode does not have central history. Use shell output or local terminal logs there. diff --git a/docs/features/test-results.md b/docs/features/test-results.md index f3c59bd..9a17e55 100644 --- a/docs/features/test-results.md +++ b/docs/features/test-results.md @@ -6,7 +6,7 @@ Read when: - changing how failed tests are summarized; - debugging why `crabbox results` has no data. -Crabbox can attach JUnit XML summaries to coordinator run history. The agent uses this so a failed run can answer "which tests failed?" without scraping a large log tail. +Crabbox can attach JUnit XML summaries to coordinator run history. The agent uses this so a failed run can answer "which tests failed?" without scraping a large raw log. Configure per run: diff --git a/docs/observability.md b/docs/observability.md index f4cd5b2..64464c3 100644 --- a/docs/observability.md +++ b/docs/observability.md @@ -7,7 +7,7 @@ Read when: - finding a remote machine for SSH inspection; - correlating Actions hydration with the remote workspace. -Crabbox exposes operational visibility through CLI commands, coordinator usage summaries, retained run history/log tails, provider labels, GitHub Actions run links, and Worker logs. The reliable path is to keep the lease ID and run ID together. +Crabbox exposes operational visibility through CLI commands, coordinator usage summaries, retained run history/logs, provider labels, GitHub Actions run links, and Worker logs. The reliable path is to keep the lease ID and run ID together. ## Lease State @@ -49,7 +49,7 @@ Reports include lease count, active lease count, elapsed runtime, estimated elap Coordinator-backed `crabbox run` creates a durable run record before leasing starts, appends lifecycle events while the CLI progresses, and finishes the run -with exit code, timing, and the latest retained output tail. +with exit code, timing, and retained command output. Use: @@ -65,8 +65,8 @@ bin/crabbox results run_... History is for command debugging, not unlimited log archival. Events are ordered phase and output chunks for reconnect/inspection, and `attach` can follow those -events while the original CLI is still alive. Logs are bounded tails of remote -stdout/stderr. Test results are stored as structured summaries when `--junit` +events while the original CLI is still alive. Logs are bounded retained remote +stdout/stderr captures. Test results are stored as structured summaries when `--junit` or `results.junit` is configured. ## Remote Debugging diff --git a/docs/security.md b/docs/security.md index edc4a65..c46d12c 100644 --- a/docs/security.md +++ b/docs/security.md @@ -150,7 +150,7 @@ Do not store: - file contents; - SSH keys. -Coordinator run records keep bounded stdout/stderr tails and optional structured JUnit summaries for debugging. +Coordinator run records keep bounded stdout/stderr captures and optional structured JUnit summaries for debugging. ## Future Audit Trail diff --git a/internal/cli/coordinator.go b/internal/cli/coordinator.go index c77602f..c908fdb 100644 --- a/internal/cli/coordinator.go +++ b/internal/cli/coordinator.go @@ -509,11 +509,13 @@ func (c *CoordinatorClient) CreateRun(ctx context.Context, leaseID string, cfg C func (c *CoordinatorClient) FinishRun(ctx context.Context, runID string, exitCode int, sync, command time.Duration, log string, truncated bool, results *TestResultSummary) (CoordinatorRun, error) { var res CoordinatorRunResponse + logChunks := splitRunLogChunks(log) err := c.do(ctx, http.MethodPost, "/v1/runs/"+url.PathEscape(runID)+"/finish", map[string]any{ "exitCode": exitCode, "syncMs": sync.Milliseconds(), "commandMs": command.Milliseconds(), - "log": log, + "log": runLogFallbackPreview(log, truncated), + "logChunks": logChunks, "logTruncated": truncated, "results": results, }, &res) diff --git a/internal/cli/coordinator_test.go b/internal/cli/coordinator_test.go index d222dc8..02dce46 100644 --- a/internal/cli/coordinator_test.go +++ b/internal/cli/coordinator_test.go @@ -116,6 +116,41 @@ func TestCoordinatorRunEvents(t *testing.T) { } } +func TestCoordinatorFinishRunSendsLogChunks(t *testing.T) { + var finishBody map[string]any + server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + if r.Method != http.MethodPost || r.URL.Path != "/v1/runs/run_123/finish" { + t.Fatalf("unexpected request %s %s", r.Method, r.URL.Path) + } + if err := json.NewDecoder(r.Body).Decode(&finishBody); err != nil { + t.Fatal(err) + } + _, _ = w.Write([]byte(`{"run":{"id":"run_123","leaseID":"","owner":"peter@example.com","org":"openclaw","provider":"aws","class":"standard","serverType":"t3.small","command":["pnpm","test"],"state":"failed","phase":"failed","exitCode":1,"logBytes":0,"logTruncated":false,"startedAt":"2026-05-02T00:00:00Z"}}`)) + })) + defer server.Close() + client := CoordinatorClient{BaseURL: server.URL, Client: server.Client()} + log := strings.Repeat("x", coordinatorRunLogChunkBytes) + "tail" + if _, err := client.FinishRun(context.Background(), "run_123", 1, time.Second, 2*time.Second, log, false, nil); err != nil { + t.Fatal(err) + } + chunks, ok := finishBody["logChunks"].([]any) + if !ok { + t.Fatalf("logChunks body=%#v", finishBody["logChunks"]) + } + if len(chunks) != 2 { + t.Fatalf("logChunks=%d, want 2", len(chunks)) + } + if got := chunks[0].(string); len(got) != coordinatorRunLogChunkBytes { + t.Fatalf("first chunk length=%d, want %d", len(got), coordinatorRunLogChunkBytes) + } + if got := chunks[1].(string); got != "tail" { + t.Fatalf("second chunk=%q, want tail", got) + } + if got := finishBody["log"].(string); len(got) != runLogFallbackPreviewBytes || !strings.HasSuffix(got, "tail") { + t.Fatalf("fallback log length=%d suffix=%q", len(got), got[len(got)-4:]) + } +} + func TestCurlConfigKeepsBearerTokenInConfig(t *testing.T) { client := CoordinatorClient{ BaseURL: "https://example.test", diff --git a/internal/cli/run_output_events.go b/internal/cli/run_output_events.go index 615f492..6df07dc 100644 --- a/internal/cli/run_output_events.go +++ b/internal/cli/run_output_events.go @@ -89,7 +89,7 @@ func outputTruncatedEventInput() CoordinatorRunEventInput { return CoordinatorRunEventInput{ Type: "output.truncated", Phase: "command", - Message: fmt.Sprintf("stdout/stderr event capture capped at %d bytes; use crabbox logs for the retained output tail", runEventOutputMaxBytes), + Message: fmt.Sprintf("stdout/stderr event capture capped at %d bytes; use crabbox logs for retained command output", runEventOutputMaxBytes), } } diff --git a/internal/cli/runlog.go b/internal/cli/runlog.go index 6c84407..4c3bb63 100644 --- a/internal/cli/runlog.go +++ b/internal/cli/runlog.go @@ -2,7 +2,11 @@ package cli import "sync" -const maxRunLogBytes = 64 * 1024 +const ( + maxRunLogBytes = 8 * 1024 * 1024 + coordinatorRunLogChunkBytes = 64 * 1024 + runLogFallbackPreviewBytes = 64 * 1024 +) type runLogBuffer struct { mu sync.Mutex @@ -39,3 +43,35 @@ func (b *runLogBuffer) Truncated() bool { defer b.mu.Unlock() return b.truncated } + +func splitRunLogChunks(log string) []string { + if len(log) == 0 { + return nil + } + chunks := make([]string, 0, (len(log)+coordinatorRunLogChunkBytes-1)/coordinatorRunLogChunkBytes) + start := 0 + size := 0 + for index, char := range log { + charSize := len(string(char)) + if size > 0 && size+charSize > coordinatorRunLogChunkBytes { + chunks = append(chunks, log[start:index]) + start = index + size = 0 + } + size += charSize + } + if start < len(log) { + chunks = append(chunks, log[start:]) + } + return chunks +} + +func runLogFallbackPreview(log string, truncated bool) string { + if !truncated && len(log) <= runLogFallbackPreviewBytes { + return log + } + if len(log) <= runLogFallbackPreviewBytes { + return log + } + return log[len(log)-runLogFallbackPreviewBytes:] +} diff --git a/internal/cli/runlog_test.go b/internal/cli/runlog_test.go index f2df465..e358cb0 100644 --- a/internal/cli/runlog_test.go +++ b/internal/cli/runlog_test.go @@ -65,3 +65,28 @@ func TestRunLogBufferConcurrentWrites(t *testing.T) { t.Fatalf("log missing expected output: %q", log) } } + +func TestSplitRunLogChunks(t *testing.T) { + log := strings.Repeat("a", coordinatorRunLogChunkBytes) + "tail" + chunks := splitRunLogChunks(log) + if len(chunks) != 2 { + t.Fatalf("chunks=%d, want 2", len(chunks)) + } + if len(chunks[0]) != coordinatorRunLogChunkBytes { + t.Fatalf("first chunk length=%d, want %d", len(chunks[0]), coordinatorRunLogChunkBytes) + } + if got := strings.Join(chunks, ""); got != log { + t.Fatalf("joined chunks length=%d, want %d", len(got), len(log)) + } +} + +func TestRunLogFallbackPreviewKeepsTail(t *testing.T) { + log := strings.Repeat("a", runLogFallbackPreviewBytes) + "tail" + preview := runLogFallbackPreview(log, true) + if len(preview) != runLogFallbackPreviewBytes { + t.Fatalf("preview length=%d, want %d", len(preview), runLogFallbackPreviewBytes) + } + if !strings.HasSuffix(preview, "tail") { + t.Fatalf("preview does not keep tail: suffix=%q", preview[len(preview)-8:]) + } +} diff --git a/worker/src/fleet.ts b/worker/src/fleet.ts index 0e0fe0e..d09a821 100644 --- a/worker/src/fleet.ts +++ b/worker/src/fleet.ts @@ -25,6 +25,10 @@ import type { import { costLimits, enforceCostLimits, leaseCost, requestOrg, usageSummary } from "./usage"; const fleetID = "default"; +const maxStoredRunLogBytes = 8 * 1024 * 1024; +const runLogChunkBytes = 64 * 1024; +const textEncoder = new TextEncoder(); +const textDecoder = new TextDecoder(); export class FleetDurableObject implements DurableObject { constructor( @@ -400,7 +404,7 @@ export class FleetDurableObject implements DurableObject { if (!run || !this.runVisibleToRequest(run, request)) { return notFound(); } - const log = (await this.state.storage.get(runLogKey(runID))) ?? ""; + const log = await this.readRunLog(runID); return new Response(log, { headers: { "content-type": "text/plain; charset=utf-8" }, }); @@ -453,13 +457,13 @@ export class FleetDurableObject implements DurableObject { run.state = run.exitCode === 0 ? "succeeded" : "failed"; run.phase = run.state; run.endedAt = now.toISOString(); - const log = input.log ?? ""; - run.logBytes = new TextEncoder().encode(log).byteLength; - run.logTruncated = Boolean(input.logTruncated); + const logInput = normalizeRunLogInput(input); + run.logBytes = logInput.bytes; + run.logTruncated = logInput.truncated; if (input.results) { run.results = boundedTestResults(input.results); } - await this.state.storage.put(runLogKey(runID), log); + await this.writeRunLog(runID, logInput.log); await this.putRun(run); await this.appendRunEventRecord(run, { type: "command.finished", @@ -469,6 +473,35 @@ export class FleetDurableObject implements DurableObject { return json({ run }); } + private async readRunLog(runID: string): Promise { + const chunks = await this.state.storage.list({ prefix: runLogChunkPrefix(runID) }); + if (chunks.size > 0) { + return [...chunks.entries()] + .toSorted(([left], [right]) => left.localeCompare(right)) + .map(([, chunk]) => chunk) + .join(""); + } + return (await this.state.storage.get(runLogKey(runID))) ?? ""; + } + + private async writeRunLog(runID: string, log: string): Promise { + await this.deleteRunLogChunks(runID); + if (textEncoder.encode(log).byteLength <= runLogChunkBytes) { + await this.state.storage.put(runLogKey(runID), log); + return; + } + await this.state.storage.put(runLogKey(runID), ""); + const chunks = splitRunLogByBytes(log, runLogChunkBytes); + await Promise.all( + chunks.map((chunk, index) => this.state.storage.put(runLogChunkKey(runID, index), chunk)), + ); + } + + private async deleteRunLogChunks(runID: string): Promise { + const chunks = await this.state.storage.list({ prefix: runLogChunkPrefix(runID) }); + await Promise.all([...chunks.keys()].map((key) => this.state.storage.delete(key))); + } + private async listRuns(request: Request): Promise { const url = new URL(request.url); const leaseID = url.searchParams.get("leaseID") ?? ""; @@ -746,6 +779,14 @@ function runLogKey(runID: string): string { return `runlog:${runID}`; } +function runLogChunkPrefix(runID: string): string { + return `runlog:${runID}:chunk:`; +} + +function runLogChunkKey(runID: string, index: number): string { + return `${runLogChunkPrefix(runID)}${String(index).padStart(6, "0")}`; +} + function runEventPrefix(runID: string): string { return `runevent:${runID}:`; } @@ -816,6 +857,52 @@ function finiteQueryNumber(value: string | null): number | undefined { return Number.isFinite(parsed) && parsed >= 0 ? Math.trunc(parsed) : undefined; } +function normalizeRunLogInput(input: RunFinishRequest): { + log: string; + bytes: number; + truncated: boolean; +} { + const chunkLog = Array.isArray(input.logChunks) + ? input.logChunks.map((chunk) => String(chunk)).join("") + : ""; + const rawLog = chunkLog || input.log || ""; + const bounded = truncateUtf8Tail(rawLog, maxStoredRunLogBytes); + const rawBytes = textEncoder.encode(rawLog).byteLength; + return { + log: bounded, + bytes: Math.min(rawBytes, maxStoredRunLogBytes), + truncated: Boolean(input.logTruncated) || rawBytes > maxStoredRunLogBytes, + }; +} + +function splitRunLogByBytes(log: string, maxBytes: number): string[] { + const chunks: string[] = []; + let current = ""; + let currentBytes = 0; + for (const char of log) { + const charBytes = textEncoder.encode(char).byteLength; + if (current && currentBytes + charBytes > maxBytes) { + chunks.push(current); + current = ""; + currentBytes = 0; + } + current += char; + currentBytes += charBytes; + } + if (current) { + chunks.push(current); + } + return chunks; +} + +function truncateUtf8Tail(value: string, maxBytes: number): string { + const encoded = textEncoder.encode(value); + if (encoded.byteLength <= maxBytes) { + return value; + } + return textDecoder.decode(encoded.slice(encoded.byteLength - maxBytes)); +} + const MAX_RESULT_FILES = 50; const MAX_RESULT_FAILURES = 100; const MAX_RESULT_STRING_BYTES = 4096; diff --git a/worker/src/types.ts b/worker/src/types.ts index c05dda6..57a2b7d 100644 --- a/worker/src/types.ts +++ b/worker/src/types.ts @@ -163,6 +163,7 @@ export interface RunFinishRequest { syncMs?: number; commandMs?: number; log?: string; + logChunks?: string[]; logTruncated?: boolean; results?: TestResultSummary; } diff --git a/worker/test/fleet.test.ts b/worker/test/fleet.test.ts index 85e8b0a..c8fd04b 100644 --- a/worker/test/fleet.test.ts +++ b/worker/test/fleet.test.ts @@ -549,6 +549,50 @@ describe("fleet run history", () => { expect(await logs.text()).toBe("ok\n"); }); + it("records chunked run logs so failures do not disappear from long output", async () => { + const storage = new MemoryStorage(); + const fleet = testFleet(storage); + const create = await fleet.fetch( + request("POST", "/v1/runs", { + body: { + leaseID: "cbx_000000000001", + provider: "aws", + class: "beast", + serverType: "c7a.48xlarge", + command: ["pnpm", "test"], + }, + }), + ); + expect(create.status).toBe(201); + const { run } = (await create.json()) as { run: { id: string } }; + const chunkA = `${"a".repeat(70_000)}\nFAIL src/example.test.ts\n`; + const chunkB = `${"b".repeat(70_000)}\nELIFECYCLE Test failed\n`; + + const finish = await fleet.fetch( + request("POST", `/v1/runs/${run.id}/finish`, { + body: { + exitCode: 1, + log: "fallback tail only\n", + logChunks: [chunkA, chunkB], + }, + }), + ); + expect(finish.status).toBe(200); + const finished = (await finish.json()) as { + run: { state: string; logBytes: number; logTruncated: boolean }; + }; + expect(finished.run.state).toBe("failed"); + expect(finished.run.logBytes).toBe(chunkA.length + chunkB.length); + expect(finished.run.logTruncated).toBe(false); + expect(storage.value(`runlog:${run.id}`)).toBe(""); + + const logs = await fleet.fetch(request("GET", `/v1/runs/${run.id}/logs`)); + const logText = await logs.text(); + expect(logText).toContain("FAIL src/example.test.ts"); + expect(logText).toContain("ELIFECYCLE Test failed"); + expect(logText).not.toContain("fallback tail only"); + }); + it("records resolved lease metadata instead of caller-supplied fallback guesses", async () => { const storage = new MemoryStorage(); const fleet = testFleet(storage);