fix: dispatch repair worker workflow

This commit is contained in:
Peter Steinberger 2026-04-29 13:58:00 +01:00
parent 692bdd98f4
commit 0ef767cf5e
No known key found for this signature in database
13 changed files with 50 additions and 37 deletions

View File

@ -226,7 +226,7 @@ For a maintainer-facing architecture map of the automation lanes, see
[`docs/INTERNAL_FEATURES.md`](docs/INTERNAL_FEATURES.md).
For the ClawSweeper feedback loop that updates existing generated PRs, see
[`docs/auto-update-prs.md`](docs/auto-update-prs.md).
[`docs/repair/auto-update-prs.md`](auto-update-prs.md).
That loop is marker-driven. ClawSweeper comments use hidden
`clawsweeper-verdict:*` markers, and only actionable PR feedback includes
@ -312,7 +312,7 @@ Supported commands:
```
`status` and `explain` post a short status reply. `fix ci`, `address review`,
and `rebase` dispatch the normal `cluster-worker.yml` repair path, but only for
and `rebase` dispatch the normal `repair-cluster-worker.yml` repair path, but only for
existing ClawSweeper PRs identified by the `clawsweeper/*` branch.
`automerge` opts an open PR into the bounded review/fix/merge loop. `approve`
is maintainer-only exact-head approval after a human-review pause; it clears

View File

@ -21,7 +21,7 @@ The loop is intentionally small:
to review that PR head.
3. The comment router sees trusted ClawSweeper feedback.
4. ClawSweeper dispatches the existing or adopted job through
`cluster-worker.yml`.
`repair-cluster-worker.yml`.
5. The repair worker pushes another commit to the source branch if it finds a
safe, narrow fix, or opens a credited replacement when the source branch
cannot be safely updated.
@ -177,7 +177,7 @@ five automatic ClawSweeper-triggered repair iterations. The per-PR cap is total
across all head SHAs and stops the automatic review/repair loop even when every
iteration produces a new commit.
Runs for the same job path and mode share the `cluster-worker.yml` concurrency
Runs for the same job path and mode share the `repair-cluster-worker.yml` concurrency
group, so repeated dispatches queue instead of racing the same branch.
ClawSweeper edits one durable review comment in place. The router keys its
@ -227,13 +227,13 @@ as:
Workflow:
- `.github/workflows/comment-router.yml`
- `.github/workflows/repair-comment-router.yml`
Scripts:
- `scripts/comment-router.ts`
- `scripts/comment-router-core.ts`
- `scripts/comment-router-utils.ts`
- `src/repair/comment-router.ts`
- `src/repair/comment-router-core.ts`
- `src/repair/comment-router-utils.ts`
Durable state:
@ -255,7 +255,7 @@ Syntax and workflow checks:
```bash
pnpm run check
actionlint .github/workflows/comment-router.yml
actionlint .github/workflows/repair-comment-router.yml
```
Dry-run the router against live recent comments:

View File

@ -106,7 +106,7 @@ replacement PR. Direct mutation still happens outside Codex.
## Cloud Worker Flow
Workflow: `.github/workflows/cluster-worker.yml`
Workflow: `.github/workflows/repair-cluster-worker.yml`
The cluster worker has two jobs:
@ -285,7 +285,7 @@ The finalizer scans open ClawSweeper PRs in the target repo. It finds PRs by the
- security hold
When `--dispatch-repairs --execute` is enabled, it dispatches the existing
cluster job back through `cluster-worker.yml` instead of creating another PR.
cluster job back through `repair-cluster-worker.yml` instead of creating another PR.
The idempotency key includes target repo, PR number, and head SHA, so the same
PR/head is not repeatedly repaired unless `--allow-repeat` is used.
@ -295,8 +295,8 @@ clearly transient jobs, and pass branch-caused failures into the repair prompt.
## Self-Heal Failed ClawSweeper Runs
Workflow: `.github/workflows/self-heal.yml`
Script: `scripts/self-heal-failed-runs.ts`
Workflow: `.github/workflows/repair-self-heal.yml`
Script: `src/repair/self-heal-failed-runs.ts`
Self-heal retries failed ClawSweeper cluster-worker runs. It reads published
`results/runs/*.json`, selects the latest failed run per source job, skips jobs
@ -309,11 +309,11 @@ finalizer/comment command repair path.
## Maintainer Comment Routing
Workflow: `.github/workflows/comment-router.yml`
Workflow: `.github/workflows/repair-comment-router.yml`
Scripts:
- `scripts/comment-router.ts`
- `scripts/comment-router-core.ts`
- `src/repair/comment-router.ts`
- `src/repair/comment-router-core.ts`
Comment routing scans recent target-repo issue/PR comments and accepts only
maintainer-authored commands. Default allowed GitHub `author_association`
@ -326,7 +326,7 @@ values:
Contributor comments are ignored without a reply.
The generated-PR auto-update design is documented in
[`docs/auto-update-prs.md`](auto-update-prs.md). That lane lets trusted
[`docs/repair/auto-update-prs.md`](auto-update-prs.md). That lane lets trusted
ClawSweeper comments dispatch a repair run for an existing ClawSweeper PR or a
PR explicitly opted into `clawsweeper:automerge` without allowing arbitrary
comment authors to trigger work.
@ -372,7 +372,7 @@ Behavior:
Repair commands apply to existing ClawSweeper PRs and PRs opted into
`clawsweeper:automerge`. The router finds ClawSweeper PRs by the
`clawsweeper/*` branch, resolves or creates the backing job, posts one
idempotent response marker, and dispatches `cluster-worker.yml`.
idempotent response marker, and dispatches `repair-cluster-worker.yml`.
Trusted ClawSweeper comments become `clawsweeper_auto_repair`. Preferred
comments use hidden `clawsweeper-verdict:*` markers and include

View File

@ -5,7 +5,7 @@ commands, finalizers, self-heal, gates, and ledgers, see
[`docs/INTERNAL_FEATURES.md`](INTERNAL_FEATURES.md).
For the trusted ClawSweeper-to-ClawSweeper PR repair loop, see
[`docs/auto-update-prs.md`](auto-update-prs.md).
[`docs/repair/auto-update-prs.md`](auto-update-prs.md).
For commit-review findings, ClawSweeper dispatches
`clawsweeper_commit_finding` to this repository. ClawSweeper fetches the latest
@ -194,7 +194,7 @@ Repair commands apply to existing ClawSweeper PRs and to PRs opted into
`clawsweeper/*` branch prefix. Opted-in non-ClawSweeper PRs get an adopted job
at `jobs/<owner>/inbox/automerge-<owner>-<repo>-<pr>.md`.
The router posts one idempotent reply with a hidden marker and dispatches the
normal `cluster-worker.yml` repair path. It records processed comment versions
normal `repair-cluster-worker.yml` repair path. It records processed comment versions
in `results/comment-router.json`. For durable ClawSweeper comments,
idempotency is per comment id plus GitHub `updated_at`, and response markers
include the target PR head SHA. That lets edited ClawSweeper comments wake

View File

@ -2,7 +2,12 @@ import type { JsonValue, LooseRecord } from "./json-types.js";
import { DEFAULT_ALLOWED_REPOSITORY_PERMISSIONS } from "./comment-router-core.js";
import { currentProjectRepo, readMaxLiveWorkers } from "./lib.js";
import { assertRepo, commaSet, positiveInteger } from "./comment-router-utils.js";
import { DEFAULT_HEAD_PREFIX, DEFAULT_TARGET_REPO } from "./constants.js";
import {
DEFAULT_HEAD_PREFIX,
DEFAULT_TARGET_REPO,
REPAIR_CLUSTER_WORKFLOW,
SWEEP_WORKFLOW,
} from "./constants.js";
export { DEFAULT_HEAD_PREFIX, DEFAULT_TARGET_REPO } from "./constants.js";
const DEFAULT_ALLOWED_ASSOCIATIONS = ["OWNER", "MEMBER", "COLLABORATOR"];
@ -44,7 +49,7 @@ export function readCommentRouterConfig(args: LooseRecord): CommentRouterConfig
);
const workflow = stringSetting(
args.workflow ?? process.env.CLAWSWEEPER_COMMENT_WORKFLOW,
"cluster-worker.yml",
REPAIR_CLUSTER_WORKFLOW,
);
const reviewRepo = stringSetting(
args["review-repo"] ?? process.env.CLAWSWEEPER_REVIEW_REPO,
@ -52,7 +57,7 @@ export function readCommentRouterConfig(args: LooseRecord): CommentRouterConfig
);
const reviewWorkflow = stringSetting(
args["review-workflow"] ?? process.env.CLAWSWEEPER_REVIEW_WORKFLOW,
"sweep.yml",
SWEEP_WORKFLOW,
);
const runner = stringSetting(
args.runner ?? process.env.CLAWSWEEPER_WORKER_RUNNER,

View File

@ -2,6 +2,9 @@ export const DEFAULT_TARGET_REPO = "openclaw/openclaw";
export const DEFAULT_HEAD_PREFIX = "clawsweeper/";
export const DEFAULT_LABEL = "clawsweeper";
export const REPAIR_CLUSTER_WORKFLOW = "repair-cluster-worker.yml";
export const SWEEP_WORKFLOW = "sweep.yml";
export const CLAWSWEEPER_LABEL = "clawsweeper";
export const CLAWSWEEPER_LABEL_COLOR = "F97316";
export const CLAWSWEEPER_LABEL_DESCRIPTION = "Tracked by ClawSweeper automation";

View File

@ -15,6 +15,7 @@ import {
waitForLiveWorkerCapacity,
} from "./lib.js";
import { sleepMs } from "./timing.js";
import { REPAIR_CLUSTER_WORKFLOW } from "./constants.js";
const args = parseArgs(process.argv.slice(2));
const defaultRunner = process.env.CLAWSWEEPER_WORKER_RUNNER ?? "blacksmith-4vcpu-ubuntu-2404";
@ -23,7 +24,7 @@ const defaultExecutionRunner =
const mode = args.mode ?? "plan";
const runner = args.runner ?? defaultRunner;
const executionRunner = args["execution-runner"] ?? args.execution_runner ?? defaultExecutionRunner;
const workflow = args.workflow ?? "cluster-worker.yml";
const workflow = args.workflow ?? REPAIR_CLUSTER_WORKFLOW;
const repo = String(args.repo ?? currentProjectRepo());
const model = String(args.model ?? process.env.CLAWSWEEPER_MODEL ?? "gpt-5.5");
const maxLiveWorkers = readMaxLiveWorkers(args);

View File

@ -14,7 +14,7 @@ import {
} from "./lib.js";
import { ghJson, ghText } from "./github-cli.js";
import { sleepMs } from "./timing.js";
import { DEFAULT_TARGET_REPO, REVIEW_BOTS } from "./constants.js";
import { DEFAULT_TARGET_REPO, REPAIR_CLUSTER_WORKFLOW, REVIEW_BOTS } from "./constants.js";
import { numberEnv } from "./env-utils.js";
import { compactText, escapeRegExp } from "./text-utils.js";
@ -35,7 +35,7 @@ const writeReport = Boolean(args["write-report"]);
const execute = Boolean(args.execute);
const dispatchRepairs = Boolean(args["dispatch-repairs"] || args.dispatch || execute);
const workflow = String(
args.workflow ?? process.env.CLAWSWEEPER_FINALIZER_WORKFLOW ?? "cluster-worker.yml",
args.workflow ?? process.env.CLAWSWEEPER_FINALIZER_WORKFLOW ?? REPAIR_CLUSTER_WORKFLOW,
);
const runner = String(
args.runner ?? process.env.CLAWSWEEPER_WORKER_RUNNER ?? "blacksmith-4vcpu-ubuntu-2404",

View File

@ -1,5 +1,6 @@
import { ghJson } from "./github-cli.js";
import type { JsonValue, LooseRecord } from "./json-types.js";
import { REPAIR_CLUSTER_WORKFLOW } from "./constants.js";
import { currentProjectRepo } from "./project-repo.js";
import { sleepMs } from "./timing.js";
@ -20,7 +21,7 @@ export function readMaxLiveWorkers(args: LooseRecord = {}) {
export function liveWorkerCapacity({
repo = currentProjectRepo(),
workflow = "cluster-worker.yml",
workflow = REPAIR_CLUSTER_WORKFLOW,
requested = 1,
maxLiveWorkers = DEFAULT_MAX_LIVE_WORKERS,
}: LooseRecord = {}) {
@ -61,7 +62,7 @@ export function waitForLiveWorkerCapacity(options: LooseRecord = {}) {
);
if (requestedCount > max) {
throw new Error(
`refusing dispatch: requested ${requestedCount} ${options.workflow ?? "cluster-worker.yml"} workers exceeds max-live-workers=${max}`,
`refusing dispatch: requested ${requestedCount} ${options.workflow ?? REPAIR_CLUSTER_WORKFLOW} workers exceeds max-live-workers=${max}`,
);
}
const pollMs = readPositiveInteger(
@ -91,13 +92,13 @@ export function waitForLiveWorkerCapacity(options: LooseRecord = {}) {
}
throw new Error(
`timed out waiting for ${options.workflow ?? "cluster-worker.yml"} capacity: ${latest?.active ?? "unknown"} active + ${requestedCount} requested exceeds max-live-workers=${max}`,
`timed out waiting for ${options.workflow ?? REPAIR_CLUSTER_WORKFLOW} capacity: ${latest?.active ?? "unknown"} active + ${requestedCount} requested exceeds max-live-workers=${max}`,
);
}
export function listActiveWorkflowRuns({
repo = currentProjectRepo(),
workflow = "cluster-worker.yml",
workflow = REPAIR_CLUSTER_WORKFLOW,
}: LooseRecord = {}) {
const runs: LooseRecord[] = [];
for (const status of ACTIVE_WORKFLOW_STATUSES) {

View File

@ -16,9 +16,10 @@ import {
} from "./lib.js";
import { ghJson, ghText } from "./github-cli.js";
import { sleepMs } from "./timing.js";
import { REPAIR_CLUSTER_WORKFLOW } from "./constants.js";
const DEFAULT_REPO = currentProjectRepo();
const DEFAULT_WORKFLOW = "cluster-worker.yml";
const DEFAULT_WORKFLOW = REPAIR_CLUSTER_WORKFLOW;
const DEFAULT_RUNNER = process.env.CLAWSWEEPER_WORKER_RUNNER ?? "blacksmith-4vcpu-ubuntu-2404";
const DEFAULT_EXECUTION_RUNNER =
process.env.CLAWSWEEPER_EXECUTION_RUNNER ?? "blacksmith-16vcpu-ubuntu-2404";

View File

@ -15,9 +15,10 @@ import {
} from "./lib.js";
import { ghJson, ghText } from "./github-cli.js";
import { sleepMs } from "./timing.js";
import { REPAIR_CLUSTER_WORKFLOW } from "./constants.js";
const DEFAULT_REPO = currentProjectRepo();
const DEFAULT_WORKFLOW = "cluster-worker.yml";
const DEFAULT_WORKFLOW = REPAIR_CLUSTER_WORKFLOW;
const DEFAULT_RUNNER = process.env.CLAWSWEEPER_WORKER_RUNNER ?? "blacksmith-4vcpu-ubuntu-2404";
const DEFAULT_EXECUTION_RUNNER =
process.env.CLAWSWEEPER_EXECUTION_RUNNER ?? "blacksmith-16vcpu-ubuntu-2404";

View File

@ -4,6 +4,7 @@ import fs from "node:fs";
import path from "node:path";
import { hasSecuritySignalText, parseArgs, parseJob, repoRoot, validateJob } from "./lib.js";
import { ghJson } from "./github-cli.js";
import { REPAIR_CLUSTER_WORKFLOW } from "./constants.js";
import { readJsonFileIfExists as readJson } from "./json-file.js";
const args = parseArgs(process.argv.slice(2));
@ -291,7 +292,7 @@ function readActiveClusterRuns() {
"--repo",
repo,
"--workflow",
"cluster-worker.yml",
REPAIR_CLUSTER_WORKFLOW,
"--status",
status,
"--limit",

View File

@ -289,7 +289,7 @@ test("renderResponse reports trusted repair dispatches without losing guardrails
target: { head_sha: "def456" },
},
{
workflow: "cluster-worker.yml",
workflow: "repair-cluster-worker.yml",
job_path: "jobs/openclaw/inbox/example.md",
mode: "autonomous",
model: "gpt-5.5",
@ -298,7 +298,7 @@ test("renderResponse reports trusted repair dispatches without losing guardrails
assert.match(body, /Thanks, ClawSweeper/);
assert.match(body, /clawsweeper-command:456:2026-04-29T07:12:31Z:clawsweeper_auto_repair:def456/);
assert.match(body, /cluster-worker\.yml/);
assert.match(body, /repair-cluster-worker\.yml/);
assert.match(body, /safe credited replacement/);
assert.match(body, /narrow fix/);
assert.doesNotMatch(body, /ClawSweeper Repair/i);
@ -359,7 +359,7 @@ test("renderResponse reports automerge repair dispatches", () => {
target: { head_sha: "def457" },
},
{
workflow: "cluster-worker.yml",
workflow: "repair-cluster-worker.yml",
job_path: "jobs/openclaw/inbox/automerge-openclaw-openclaw-74156.md",
mode: "autonomous",
model: "gpt-5.5",
@ -367,7 +367,7 @@ test("renderResponse reports automerge repair dispatches", () => {
);
assert.match(body, /picked up the repair feedback/);
assert.match(body, /cluster-worker\.yml/);
assert.match(body, /repair-cluster-worker\.yml/);
assert.match(body, /automerge-openclaw-openclaw-74156/);
assert.doesNotMatch(body, /did not dispatch/);
});