fix(workflow): leave timeout room for reports

This commit is contained in:
Vincent Koc 2026-04-27 21:14:15 -07:00
parent 8facaf4d62
commit d0e303d624
No known key found for this signature in database
3 changed files with 4 additions and 4 deletions

View File

@ -102,7 +102,7 @@ Current autonomy posture:
- Replacement fix PR execution must use the recoverable target branch `clownfish/<cluster-id>`. If that branch already exists, resume it instead of starting from scratch. After agent edits and review-fix edits, commit and push checkpoint commits to that branch before expensive validation/review gates so a timed-out run can be requeued without losing the patch. Do not open the PR until validation and Codex `/review` pass.
- Resumed replacement branches may be rebased and narrowly refactored onto current `origin/main`. If the rebase conflicts, let the executor run the Codex rebase-repair loop, resolve conflict markers, continue the rebase, then proceed through the normal validation/review gate. Tune `CLOWNFISH_REBASE_REPAIR_ATTEMPTS` instead of disabling the rebase gate.
- Useful but uneditable or unsafe source PRs are replacement candidates, not human blockers. When a canonical PR is draft, stale, unmergeable, has `maintainer_can_modify=false`, or has broad unrelated churn, emit or execute `replace_uneditable_branch` with full source PR credit instead of waiting for a maintainer decision.
- Fix execution should provide Codex actual repo-discovery context before editing; repeated "no target repo changes" means tune `scripts/execute-fix-artifact.mjs` before replaying more jobs. GitHub Actions may block Codex bwrap write/review sandboxes, so write-mode and review execution default to `danger-full-access` there after tokens are stripped from the Codex environment. A Codex write preflight must fail fast before the expensive repair loop if sandbox/auth/write access is broken; do not wait through multi-attempt edits to discover startup failures. Keep canary execution bounded: default worker timeout is 30 minutes, fix Codex timeout is 30 minutes, preflight timeout is 2 minutes, Codex model is `gpt-5.5`, and Codex reasoning effort is `medium`. Worker timeout/failure and exhausted `/review` attempts must write blocked artifacts and keep the workflow reporting path alive. Fix executor runs must copy Codex debug logs into the run artifact so timeout failures are inspectable.
- Fix execution should provide Codex actual repo-discovery context before editing; repeated "no target repo changes" means tune `scripts/execute-fix-artifact.mjs` before replaying more jobs. GitHub Actions may block Codex bwrap write/review sandboxes, so write-mode and review execution default to `danger-full-access` there after tokens are stripped from the Codex environment. A Codex write preflight must fail fast before the expensive repair loop if sandbox/auth/write access is broken; do not wait through multi-attempt edits to discover startup failures. Keep canary execution bounded: default worker timeout is 30 minutes, build-PR step timeout is 30 minutes, fix Codex edit budget is 20 minutes with reserve for artifact writing, preflight timeout is 2 minutes, Codex model is `gpt-5.5`, and Codex reasoning effort is `medium`. Worker timeout/failure and exhausted `/review` attempts must write blocked artifacts and keep the workflow reporting path alive. Fix executor runs must copy Codex debug logs into the run artifact so timeout failures are inspectable.
- Match OpenClaw's CI fast lane for fix validation. Use `blacksmith-4vcpu-ubuntu-2404` for cluster planning/review and `blacksmith-16vcpu-ubuntu-2404` for fix/apply execution. The executor sets `OPENCLAW_LOCAL_CHECK=0` and treats `pnpm check:changed` plus diff checks as the default hard gate. It normalizes target validation commands to `pnpm check:changed` unless `CLOWNFISH_TARGET_VALIDATION_MODE=strict` or `CLOWNFISH_STRICT_TARGET_VALIDATION=1` is explicitly set, so unrelated flaky main CI and broad suites do not block narrow ProjectClownfish fixes.
- After fix execution, run post-flight finalization before the final closeout replay. Post-flight may merge only ProjectClownfish-opened/pushed fix PRs, only after merge preflight, security clearance, resolved review threads, and non-ignored checks are clean. Default ignored checks are `auto-response`, `Labeler`, and `Stale`; configure `CLOWNFISH_POST_FLIGHT_IGNORE_CHECKS` rather than broadening the hard gate in code.
- Prefer `keep_related`, `keep_independent`, `keep_closed`, `fix_needed`, `route_security`, and subcluster notes over blanket `needs_human`.

View File

@ -177,8 +177,8 @@ jobs:
CLOWNFISH_CODEX_REASONING_EFFORT: ${{ vars.CLOWNFISH_CODEX_REASONING_EFFORT || 'medium' }}
CLOWNFISH_CODEX_REVIEW_ATTEMPTS: ${{ vars.CLOWNFISH_CODEX_REVIEW_ATTEMPTS || '2' }}
CLOWNFISH_REBASE_REPAIR_ATTEMPTS: ${{ vars.CLOWNFISH_REBASE_REPAIR_ATTEMPTS || '4' }}
CLOWNFISH_FIX_CODEX_TIMEOUT_MS: ${{ vars.CLOWNFISH_FIX_CODEX_TIMEOUT_MS || '1800000' }}
CLOWNFISH_FIX_STEP_TIMEOUT_MS: "1800000"
CLOWNFISH_FIX_CODEX_TIMEOUT_MS: ${{ vars.CLOWNFISH_FIX_CODEX_TIMEOUT_MS || '1200000' }}
CLOWNFISH_FIX_STEP_TIMEOUT_MS: "1500000"
CLOWNFISH_FIX_TIMEOUT_RESERVE_MS: ${{ vars.CLOWNFISH_FIX_TIMEOUT_RESERVE_MS || '300000' }}
CLOWNFISH_FIX_PREFLIGHT_TIMEOUT_MS: ${{ vars.CLOWNFISH_FIX_PREFLIGHT_TIMEOUT_MS || '120000' }}
CLOWNFISH_RESOLVE_REVIEW_THREADS: ${{ vars.CLOWNFISH_RESOLVE_REVIEW_THREADS || '1' }}

View File

@ -241,7 +241,7 @@ The workflow needs:
- optional `CLOWNFISH_CODEX_CLI_VERSION` variable to pin and refresh the cached Codex CLI
- optional `CLOWNFISH_MODEL` override for dispatch scripts; default Codex model is `gpt-5.5`
- optional `CLOWNFISH_MAX_LIVE_WORKERS` variable for dispatch/requeue/self-heal worker fan-out; default is `50`
- optional `CLOWNFISH_CODEX_TIMEOUT_MS` and `CLOWNFISH_FIX_CODEX_TIMEOUT_MS` variables; both default to 30 minutes
- optional `CLOWNFISH_CODEX_TIMEOUT_MS` and `CLOWNFISH_FIX_CODEX_TIMEOUT_MS` variables; worker planning defaults to 30 minutes, while fix execution defaults to a 20 minute Codex budget inside the 30 minute build-PR step so timeout artifacts can be written
- optional `CLOWNFISH_CODEX_REVIEW_ATTEMPTS` and `CLOWNFISH_RESOLVE_REVIEW_THREADS` variables for agentic merge-prep review loops
Keep exact secret names, token scopes, and execution-window procedures in private operations docs or repository settings notes. Do not put token values or live operational credentials in job files.