fix: retain chunked run logs

2026-05-02 12:06:59 +01:00 · 2026-05-02 12:06:59 +01:00 · 5b18b7d53c
commit 5b18b7d53c
parent 819640fbf7
20 changed files with 261 additions and 29 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -8,6 +8,8 @@

 ### Fixed

+- Fixed recorded run logs so long noisy commands are stored in bounded chunks instead of losing the failure evidence between the first output events and the final tail.
+
 ## 0.3.0 - 2026-05-02

 Crabbox 0.3.0 makes brokered runs much easier to observe and debug, adds
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -81,7 +81,7 @@ Core stored records:
 ```sql
 leases(id, slug, provider, cloud_id, region, owner, org, profile, class, server_type, server_id, server_name, provider_key, host, ssh_user, ssh_port, work_root, keep, ttl_seconds, idle_timeout_seconds, estimated_hourly_usd, max_estimated_usd, state, created_at, updated_at, last_touched_at, expires_at, released_at, ended_at)
 runs(id, lease_id, slug, owner, org, provider, class, server_type, command_json, state, exit_code, sync_ms, command_ms, duration_ms, log_bytes, log_truncated, results_json, started_at, ended_at)
-runlog(run_id, bounded_stdout_stderr_tail)
+runlog(run_id, bounded_stdout_stderr_capture)
 ```

 State transitions:
--- a/docs/cli.md
+++ b/docs/cli.md
@ -189,7 +189,7 @@ Behavior:
 6. Sync current repo, unless a matching sync fingerprint lets Crabbox skip rsync.
 7. Seed remote Git from the configured origin/base ref before first sync when possible.
 8. Run command over SSH.
-9. Stream remote output, append run events, and retain the latest log tail in coordinator history.
+9. Stream remote output, append run events, and retain bounded command output in coordinator history.
 10. Heartbeat coordinator leases in the background.
 11. Release lease unless `--keep` is set.
 12. Exit with the remote command exit code.
--- a/docs/commands/attach.md
+++ b/docs/commands/attach.md
@ -26,5 +26,4 @@ inspectable through [history](history.md), [events](events.md), and
 [logs](logs.md).

 Output events are a bounded preview. Use [logs](logs.md) for the retained
-command output tail after completion.
-
+command output after completion.
--- a/docs/commands/events.md
+++ b/docs/commands/events.md
@ -17,7 +17,7 @@ Human output includes sequence number, event type, phase, stream, timestamp, and
 short message or output text. JSON output returns the raw event records.
 Output events are a bounded preview: stdout/stderr capture stops after 64 KiB
 per run and records an `output.truncated` marker. Use `crabbox logs` for the
-retained command output tail.
+larger retained command output.

 Flags:

--- a/docs/commands/logs.md
+++ b/docs/commands/logs.md
@ -1,6 +1,6 @@
 # logs

-`crabbox logs` prints the retained remote output tail for a recorded run.
+`crabbox logs` prints the retained remote output for a recorded run.

 ```sh
 crabbox logs run_...
@ -10,7 +10,7 @@ crabbox logs run_... --json

 The plain form writes the log text to stdout. `--json` returns run metadata plus the log.

-Logs are bounded tails of remote stdout/stderr. They are for debugging recent runs, not unlimited archival.
+Logs are bounded remote stdout/stderr captures. The CLI keeps up to 8 MiB per run and the coordinator stores larger captures in chunks, so failures from noisy parallel runs remain visible without turning run history into unlimited archival storage.

 Related docs:

--- a/docs/commands/run.md
+++ b/docs/commands/run.md
@ -31,9 +31,9 @@ Before the first rsync into a Git checkout, Crabbox tries to seed the remote wor

 After sync, Crabbox runs a remote sanity check. If the remote checkout reports at least 200 tracked deletions, Crabbox fails before running tests unless local `CRABBOX_ALLOW_MASS_DELETIONS=1` is set.

-When a coordinator is configured, Crabbox records each remote command as a run history item. `crabbox history` lists those records and `crabbox logs <run-id>` prints the retained remote output tail. Log retention is intentionally bounded so a noisy command cannot fill Durable Object storage.
+When a coordinator is configured, Crabbox records each remote command as a run history item. `crabbox history` lists those records and `crabbox logs <run-id>` prints retained remote output. Log retention is intentionally bounded so a noisy command cannot fill Durable Object storage.

-Add `--junit <path>` or configure `results.junit` to attach JUnit XML summaries to the run record. `crabbox results <run-id>` then prints failed tests without reading the raw log tail.
+Add `--junit <path>` or configure `results.junit` to attach JUnit XML summaries to the run record. `crabbox results <run-id>` then prints failed tests without reading the raw log.

 Use `crabbox sync-plan` to inspect the same local manifest without leasing a box when a sync estimate looks unexpectedly large.

--- a/docs/features/README.md
+++ b/docs/features/README.md
@ -19,7 +19,7 @@ Core features:
 - [Actions hydration](actions-hydration.md): let GitHub Actions prepare a runner, then sync local work into that workspace.
 - [SSH keys](ssh-keys.md): per-lease keys, provider key cleanup, and local storage.
 - [Cost and usage](cost-usage.md): guardrails, provider-backed pricing, and reporting.
- [History and logs](history-logs.md): coordinator run records, events, and retained remote output tails.
+- [History and logs](history-logs.md): coordinator run records, events, and retained remote output.
 - [Test results](test-results.md): JUnit summaries attached to recorded runs.
 - [Cache controls](cache.md): inspect, purge, and warm remote package/build caches.
 - [Auth and admin](auth-admin.md): login/logout/whoami and trusted operator controls.
--- a/docs/features/history-logs.md
+++ b/docs/features/history-logs.md
@ -19,7 +19,7 @@ CLI finishes that run with:
 - total duration;
 - owner and org;
 - provider, class, and server type;
- retained remote output tail.
+- retained remote output.

 Use:

@ -32,10 +32,11 @@ crabbox logs run_...
 ```

 History records and run events live in the Fleet Durable Object. Log text is
-stored separately from run metadata and intentionally capped to the latest tail
-so noisy commands cannot exhaust storage. Event output capture is also bounded;
-use `crabbox attach` for active run previews and `crabbox logs` for the retained
-command output tail.
+stored separately from run metadata and intentionally capped so noisy commands
+cannot exhaust storage. Logs larger than one storage value are chunked by the
+coordinator and reassembled by `crabbox logs`. Event output capture is also
+bounded; use `crabbox attach` for active run previews and `crabbox logs` for the
+retained command output.

 Direct-provider mode does not have central history. Use shell output or local terminal logs there.

--- a/docs/features/test-results.md
+++ b/docs/features/test-results.md
@ -6,7 +6,7 @@ Read when:
 - changing how failed tests are summarized;
 - debugging why `crabbox results` has no data.

-Crabbox can attach JUnit XML summaries to coordinator run history. The agent uses this so a failed run can answer "which tests failed?" without scraping a large log tail.
+Crabbox can attach JUnit XML summaries to coordinator run history. The agent uses this so a failed run can answer "which tests failed?" without scraping a large raw log.

 Configure per run:

--- a/docs/observability.md
+++ b/docs/observability.md
@ -7,7 +7,7 @@ Read when:
 - finding a remote machine for SSH inspection;
 - correlating Actions hydration with the remote workspace.

-Crabbox exposes operational visibility through CLI commands, coordinator usage summaries, retained run history/log tails, provider labels, GitHub Actions run links, and Worker logs. The reliable path is to keep the lease ID and run ID together.
+Crabbox exposes operational visibility through CLI commands, coordinator usage summaries, retained run history/logs, provider labels, GitHub Actions run links, and Worker logs. The reliable path is to keep the lease ID and run ID together.

 ## Lease State

@ -49,7 +49,7 @@ Reports include lease count, active lease count, elapsed runtime, estimated elap

 Coordinator-backed `crabbox run` creates a durable run record before leasing
 starts, appends lifecycle events while the CLI progresses, and finishes the run
-with exit code, timing, and the latest retained output tail.
+with exit code, timing, and retained command output.

 Use:

@ -65,8 +65,8 @@ bin/crabbox results run_...

 History is for command debugging, not unlimited log archival. Events are ordered
 phase and output chunks for reconnect/inspection, and `attach` can follow those
-events while the original CLI is still alive. Logs are bounded tails of remote
-stdout/stderr. Test results are stored as structured summaries when `--junit`
+events while the original CLI is still alive. Logs are bounded retained remote
+stdout/stderr captures. Test results are stored as structured summaries when `--junit`
 or `results.junit` is configured.

 ## Remote Debugging
--- a/docs/security.md
+++ b/docs/security.md
@ -150,7 +150,7 @@ Do not store:
 - file contents;
 - SSH keys.

-Coordinator run records keep bounded stdout/stderr tails and optional structured JUnit summaries for debugging.
+Coordinator run records keep bounded stdout/stderr captures and optional structured JUnit summaries for debugging.

 ## Future Audit Trail

--- a/internal/cli/coordinator.go
+++ b/internal/cli/coordinator.go
@ -509,11 +509,13 @@ func (c *CoordinatorClient) CreateRun(ctx context.Context, leaseID string, cfg C

 func (c *CoordinatorClient) FinishRun(ctx context.Context, runID string, exitCode int, sync, command time.Duration, log string, truncated bool, results *TestResultSummary) (CoordinatorRun, error) {
 	var res CoordinatorRunResponse
+	logChunks := splitRunLogChunks(log)
 	err := c.do(ctx, http.MethodPost, "/v1/runs/"+url.PathEscape(runID)+"/finish", map[string]any{
 		"exitCode":     exitCode,
 		"syncMs":       sync.Milliseconds(),
 		"commandMs":    command.Milliseconds(),
-		"log":          log,
+		"log":          runLogFallbackPreview(log, truncated),
+		"logChunks":    logChunks,
 		"logTruncated": truncated,
 		"results":      results,
 	}, &res)
--- a/internal/cli/coordinator_test.go
+++ b/internal/cli/coordinator_test.go
@ -116,6 +116,41 @@ func TestCoordinatorRunEvents(t *testing.T) {
 	}
 }

+func TestCoordinatorFinishRunSendsLogChunks(t *testing.T) {
+	var finishBody map[string]any
+	server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+		if r.Method != http.MethodPost || r.URL.Path != "/v1/runs/run_123/finish" {
+			t.Fatalf("unexpected request %s %s", r.Method, r.URL.Path)
+		}
+		if err := json.NewDecoder(r.Body).Decode(&finishBody); err != nil {
+			t.Fatal(err)
+		}
+		_, _ = w.Write([]byte(`{"run":{"id":"run_123","leaseID":"","owner":"peter@example.com","org":"openclaw","provider":"aws","class":"standard","serverType":"t3.small","command":["pnpm","test"],"state":"failed","phase":"failed","exitCode":1,"logBytes":0,"logTruncated":false,"startedAt":"2026-05-02T00:00:00Z"}}`))
+	}))
+	defer server.Close()
+	client := CoordinatorClient{BaseURL: server.URL, Client: server.Client()}
+	log := strings.Repeat("x", coordinatorRunLogChunkBytes) + "tail"
+	if _, err := client.FinishRun(context.Background(), "run_123", 1, time.Second, 2*time.Second, log, false, nil); err != nil {
+		t.Fatal(err)
+	}
+	chunks, ok := finishBody["logChunks"].([]any)
+	if !ok {
+		t.Fatalf("logChunks body=%#v", finishBody["logChunks"])
+	}
+	if len(chunks) != 2 {
+		t.Fatalf("logChunks=%d, want 2", len(chunks))
+	}
+	if got := chunks[0].(string); len(got) != coordinatorRunLogChunkBytes {
+		t.Fatalf("first chunk length=%d, want %d", len(got), coordinatorRunLogChunkBytes)
+	}
+	if got := chunks[1].(string); got != "tail" {
+		t.Fatalf("second chunk=%q, want tail", got)
+	}
+	if got := finishBody["log"].(string); len(got) != runLogFallbackPreviewBytes || !strings.HasSuffix(got, "tail") {
+		t.Fatalf("fallback log length=%d suffix=%q", len(got), got[len(got)-4:])
+	}
+}
+
 func TestCurlConfigKeepsBearerTokenInConfig(t *testing.T) {
 	client := CoordinatorClient{
 		BaseURL: "https://example.test",
--- a/internal/cli/run_output_events.go
+++ b/internal/cli/run_output_events.go
@ -89,7 +89,7 @@ func outputTruncatedEventInput() CoordinatorRunEventInput {
 	return CoordinatorRunEventInput{
 		Type:    "output.truncated",
 		Phase:   "command",
-		Message: fmt.Sprintf("stdout/stderr event capture capped at %d bytes; use crabbox logs for the retained output tail", runEventOutputMaxBytes),
+		Message: fmt.Sprintf("stdout/stderr event capture capped at %d bytes; use crabbox logs for retained command output", runEventOutputMaxBytes),
 	}
 }

--- a/internal/cli/runlog.go
+++ b/internal/cli/runlog.go
@ -2,7 +2,11 @@ package cli

 import "sync"

-const maxRunLogBytes = 64 * 1024
+const (
+	maxRunLogBytes              = 8 * 1024 * 1024
+	coordinatorRunLogChunkBytes = 64 * 1024
+	runLogFallbackPreviewBytes  = 64 * 1024
+)

 type runLogBuffer struct {
 	mu        sync.Mutex
@ -39,3 +43,35 @@ func (b *runLogBuffer) Truncated() bool {
 	defer b.mu.Unlock()
 	return b.truncated
 }
+
+func splitRunLogChunks(log string) []string {
+	if len(log) == 0 {
+		return nil
+	}
+	chunks := make([]string, 0, (len(log)+coordinatorRunLogChunkBytes-1)/coordinatorRunLogChunkBytes)
+	start := 0
+	size := 0
+	for index, char := range log {
+		charSize := len(string(char))
+		if size > 0 && size+charSize > coordinatorRunLogChunkBytes {
+			chunks = append(chunks, log[start:index])
+			start = index
+			size = 0
+		}
+		size += charSize
+	}
+	if start < len(log) {
+		chunks = append(chunks, log[start:])
+	}
+	return chunks
+}
+
+func runLogFallbackPreview(log string, truncated bool) string {
+	if !truncated && len(log) <= runLogFallbackPreviewBytes {
+		return log
+	}
+	if len(log) <= runLogFallbackPreviewBytes {
+		return log
+	}
+	return log[len(log)-runLogFallbackPreviewBytes:]
+}
--- a/internal/cli/runlog_test.go
+++ b/internal/cli/runlog_test.go
@ -65,3 +65,28 @@ func TestRunLogBufferConcurrentWrites(t *testing.T) {
 		t.Fatalf("log missing expected output: %q", log)
 	}
 }
+
+func TestSplitRunLogChunks(t *testing.T) {
+	log := strings.Repeat("a", coordinatorRunLogChunkBytes) + "tail"
+	chunks := splitRunLogChunks(log)
+	if len(chunks) != 2 {
+		t.Fatalf("chunks=%d, want 2", len(chunks))
+	}
+	if len(chunks[0]) != coordinatorRunLogChunkBytes {
+		t.Fatalf("first chunk length=%d, want %d", len(chunks[0]), coordinatorRunLogChunkBytes)
+	}
+	if got := strings.Join(chunks, ""); got != log {
+		t.Fatalf("joined chunks length=%d, want %d", len(got), len(log))
+	}
+}
+
+func TestRunLogFallbackPreviewKeepsTail(t *testing.T) {
+	log := strings.Repeat("a", runLogFallbackPreviewBytes) + "tail"
+	preview := runLogFallbackPreview(log, true)
+	if len(preview) != runLogFallbackPreviewBytes {
+		t.Fatalf("preview length=%d, want %d", len(preview), runLogFallbackPreviewBytes)
+	}
+	if !strings.HasSuffix(preview, "tail") {
+		t.Fatalf("preview does not keep tail: suffix=%q", preview[len(preview)-8:])
+	}
+}
--- a/worker/src/fleet.ts
+++ b/worker/src/fleet.ts
@ -25,6 +25,10 @@ import type {
 import { costLimits, enforceCostLimits, leaseCost, requestOrg, usageSummary } from "./usage";

 const fleetID = "default";
+const maxStoredRunLogBytes = 8 * 1024 * 1024;
+const runLogChunkBytes = 64 * 1024;
+const textEncoder = new TextEncoder();
+const textDecoder = new TextDecoder();

 export class FleetDurableObject implements DurableObject {
  constructor(
@ -400,7 +404,7 @@ export class FleetDurableObject implements DurableObject {
      if (!run || !this.runVisibleToRequest(run, request)) {
        return notFound();
      }
-      const log = (await this.state.storage.get<string>(runLogKey(runID))) ?? "";
+      const log = await this.readRunLog(runID);
      return new Response(log, {
        headers: { "content-type": "text/plain; charset=utf-8" },
      });
@ -453,13 +457,13 @@ export class FleetDurableObject implements DurableObject {
    run.state = run.exitCode === 0 ? "succeeded" : "failed";
    run.phase = run.state;
    run.endedAt = now.toISOString();
-    const log = input.log ?? "";
-    run.logBytes = new TextEncoder().encode(log).byteLength;
-    run.logTruncated = Boolean(input.logTruncated);
+    const logInput = normalizeRunLogInput(input);
+    run.logBytes = logInput.bytes;
+    run.logTruncated = logInput.truncated;
    if (input.results) {
      run.results = boundedTestResults(input.results);
    }
-    await this.state.storage.put(runLogKey(runID), log);
+    await this.writeRunLog(runID, logInput.log);
    await this.putRun(run);
    await this.appendRunEventRecord(run, {
      type: "command.finished",
@ -469,6 +473,35 @@ export class FleetDurableObject implements DurableObject {
    return json({ run });
  }

+  private async readRunLog(runID: string): Promise<string> {
+    const chunks = await this.state.storage.list<string>({ prefix: runLogChunkPrefix(runID) });
+    if (chunks.size > 0) {
+      return [...chunks.entries()]
+        .toSorted(([left], [right]) => left.localeCompare(right))
+        .map(([, chunk]) => chunk)
+        .join("");
+    }
+    return (await this.state.storage.get<string>(runLogKey(runID))) ?? "";
+  }
+
+  private async writeRunLog(runID: string, log: string): Promise<void> {
+    await this.deleteRunLogChunks(runID);
+    if (textEncoder.encode(log).byteLength <= runLogChunkBytes) {
+      await this.state.storage.put(runLogKey(runID), log);
+      return;
+    }
+    await this.state.storage.put(runLogKey(runID), "");
+    const chunks = splitRunLogByBytes(log, runLogChunkBytes);
+    await Promise.all(
+      chunks.map((chunk, index) => this.state.storage.put(runLogChunkKey(runID, index), chunk)),
+    );
+  }
+
+  private async deleteRunLogChunks(runID: string): Promise<void> {
+    const chunks = await this.state.storage.list<string>({ prefix: runLogChunkPrefix(runID) });
+    await Promise.all([...chunks.keys()].map((key) => this.state.storage.delete(key)));
+  }
+
  private async listRuns(request: Request): Promise<Response> {
    const url = new URL(request.url);
    const leaseID = url.searchParams.get("leaseID") ?? "";
@ -746,6 +779,14 @@ function runLogKey(runID: string): string {
  return `runlog:${runID}`;
 }

+function runLogChunkPrefix(runID: string): string {
+  return `runlog:${runID}:chunk:`;
+}
+
+function runLogChunkKey(runID: string, index: number): string {
+  return `${runLogChunkPrefix(runID)}${String(index).padStart(6, "0")}`;
+}
+
 function runEventPrefix(runID: string): string {
  return `runevent:${runID}:`;
 }
@ -816,6 +857,52 @@ function finiteQueryNumber(value: string | null): number | undefined {
  return Number.isFinite(parsed) && parsed >= 0 ? Math.trunc(parsed) : undefined;
 }

+function normalizeRunLogInput(input: RunFinishRequest): {
+  log: string;
+  bytes: number;
+  truncated: boolean;
+} {
+  const chunkLog = Array.isArray(input.logChunks)
+    ? input.logChunks.map((chunk) => String(chunk)).join("")
+    : "";
+  const rawLog = chunkLog || input.log || "";
+  const bounded = truncateUtf8Tail(rawLog, maxStoredRunLogBytes);
+  const rawBytes = textEncoder.encode(rawLog).byteLength;
+  return {
+    log: bounded,
+    bytes: Math.min(rawBytes, maxStoredRunLogBytes),
+    truncated: Boolean(input.logTruncated) || rawBytes > maxStoredRunLogBytes,
+  };
+}
+
+function splitRunLogByBytes(log: string, maxBytes: number): string[] {
+  const chunks: string[] = [];
+  let current = "";
+  let currentBytes = 0;
+  for (const char of log) {
+    const charBytes = textEncoder.encode(char).byteLength;
+    if (current && currentBytes + charBytes > maxBytes) {
+      chunks.push(current);
+      current = "";
+      currentBytes = 0;
+    }
+    current += char;
+    currentBytes += charBytes;
+  }
+  if (current) {
+    chunks.push(current);
+  }
+  return chunks;
+}
+
+function truncateUtf8Tail(value: string, maxBytes: number): string {
+  const encoded = textEncoder.encode(value);
+  if (encoded.byteLength <= maxBytes) {
+    return value;
+  }
+  return textDecoder.decode(encoded.slice(encoded.byteLength - maxBytes));
+}
+
 const MAX_RESULT_FILES = 50;
 const MAX_RESULT_FAILURES = 100;
 const MAX_RESULT_STRING_BYTES = 4096;
--- a/worker/src/types.ts
+++ b/worker/src/types.ts
@ -163,6 +163,7 @@ export interface RunFinishRequest {
  syncMs?: number;
  commandMs?: number;
  log?: string;
+  logChunks?: string[];
  logTruncated?: boolean;
  results?: TestResultSummary;
 }
--- a/worker/test/fleet.test.ts
+++ b/worker/test/fleet.test.ts
@ -549,6 +549,50 @@ describe("fleet run history", () => {
    expect(await logs.text()).toBe("ok\n");
  });

+  it("records chunked run logs so failures do not disappear from long output", async () => {
+    const storage = new MemoryStorage();
+    const fleet = testFleet(storage);
+    const create = await fleet.fetch(
+      request("POST", "/v1/runs", {
+        body: {
+          leaseID: "cbx_000000000001",
+          provider: "aws",
+          class: "beast",
+          serverType: "c7a.48xlarge",
+          command: ["pnpm", "test"],
+        },
+      }),
+    );
+    expect(create.status).toBe(201);
+    const { run } = (await create.json()) as { run: { id: string } };
+    const chunkA = `${"a".repeat(70_000)}\nFAIL src/example.test.ts\n`;
+    const chunkB = `${"b".repeat(70_000)}\nELIFECYCLE Test failed\n`;
+
+    const finish = await fleet.fetch(
+      request("POST", `/v1/runs/${run.id}/finish`, {
+        body: {
+          exitCode: 1,
+          log: "fallback tail only\n",
+          logChunks: [chunkA, chunkB],
+        },
+      }),
+    );
+    expect(finish.status).toBe(200);
+    const finished = (await finish.json()) as {
+      run: { state: string; logBytes: number; logTruncated: boolean };
+    };
+    expect(finished.run.state).toBe("failed");
+    expect(finished.run.logBytes).toBe(chunkA.length + chunkB.length);
+    expect(finished.run.logTruncated).toBe(false);
+    expect(storage.value<string>(`runlog:${run.id}`)).toBe("");
+
+    const logs = await fleet.fetch(request("GET", `/v1/runs/${run.id}/logs`));
+    const logText = await logs.text();
+    expect(logText).toContain("FAIL src/example.test.ts");
+    expect(logText).toContain("ELIFECYCLE Test failed");
+    expect(logText).not.toContain("fallback tail only");
+  });
+
  it("records resolved lease metadata instead of caller-supplied fallback guesses", async () => {
    const storage = new MemoryStorage();
    const fleet = testFleet(storage);