Compare commits

...

1 Commits

Author SHA1 Message Date
Vincent Koc
afc0a65273
chore: add maintainer setup baseline 2026-05-22 22:22:16 +08:00
10 changed files with 2135 additions and 0 deletions

View File

@ -0,0 +1,153 @@
---
name: autoreview
description: "Auto Review closeout. Codex review is the default when no engine is set and is the recommended reviewer."
---
# Auto Review
Run the bundled structured review helper as a closeout check. This is code review, not Guardian `auto_review` approval routing.
Codex review is the default when no engine is set. It usually delivers the best review results and should remain the normal final closeout engine.
Use when:
- user asks for Codex review / Claude review / autoreview / second-model review
- after non-trivial code edits, before final/commit/ship
- reviewing a local branch or PR branch after fixes
## Contract
- Treat review output as advisory. Never blindly apply it.
- Verify every finding by reading the real code path and adjacent files.
- Read dependency docs/source/types when the finding depends on external behavior.
- Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase.
- Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
- Keep going until structured review returns no accepted/actionable findings.
- If a review-triggered fix changes code, rerun focused tests and rerun the structured review helper.
- For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
- Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
- Tools are useful in review mode. The helper allows read-only inspection tools and web search by default so reviewers can check dependency contracts, upstream docs, and current behavior.
- Security perspective is always included, but it should not cripple legitimate functionality. Report security findings only when the change creates a concrete, actionable risk or removes an important safety check.
- Do not invoke built-in `codex review`, nested reviewers, or reviewer panels from inside the review. The helper builds one bundle, calls one selected engine, validates one structured result, and stops.
- Stop as soon as the helper exits 0 with no accepted/actionable findings. Do not run an extra review just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
- Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
- If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
- If `gh`/Gitcrawl reports `database disk image is malformed`, run `gitcrawl doctor --json` once to let the portable cache repair before retrying review; do not bypass the shim unless repair fails and freshness requires live GitHub.
- If Gitcrawl reports a portable manifest mismatch, source/runtime DB health error, or stale portable-store checkout, run `gitcrawl doctor --json` and inspect `source_db_health`, `runtime_db_health`, and `portable_store_status` before falling back to live GitHub.
- Do not push just to review. Push only when the user requested push/ship/PR update.
## Pick Target
Dirty local work:
```bash
<autoreview-helper> --mode local
```
Use this only when the patch is actually unstaged/staged/untracked in the
current checkout. For committed, pushed, or PR work, point the helper at the commit
or branch diff instead; do not force `--mode local` / `--uncommitted` just
because the helper docs mention dirty work first. A clean local review
only proves there is no local patch.
Branch/PR work:
```bash
<autoreview-helper> --mode branch --base origin/main
```
Optional review context is first-class:
```bash
<autoreview-helper> --mode branch --base origin/main --prompt-file /tmp/review-notes.md --dataset /tmp/evidence.json
```
If an open PR exists, use its actual base:
```bash
base=$(gh pr view --json baseRefName --jq .baseRefName)
<autoreview-helper> --mode branch --base "origin/$base"
```
Committed single change:
```bash
<autoreview-helper> --mode commit --commit HEAD
```
or with the helper:
```bash
/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode commit --commit HEAD
```
Use commit review for already-landed or already-pushed work on `main`. Reviewing
clean `main` against `origin/main` is usually an empty diff after push. For a
small stack, review each commit explicitly or review the branch before merging
with `--base`.
## Parallel Closeout
Format first if formatting can change line locations. Then it is OK to run tests and review in parallel:
```bash
scripts/autoreview --parallel-tests "<focused test command>"
```
Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation.
## Context Efficiency
Run the helper directly so target selection, engine choice, structured validation, and exit status all stay in one path. If output is noisy, summarize the completed helper output after it returns; do not ask another agent or reviewer to rerun the review.
## Helper
OpenClaw repo-local helper:
```bash
.agents/skills/autoreview/scripts/autoreview --help
```
`agent-scripts` checkout helper:
```bash
skills/autoreview/scripts/autoreview --help
```
Global helper from `agent-scripts`:
```bash
~/.codex/skills/agent-scripts/autoreview/scripts/autoreview --help
```
If installed from `agent-scripts`, path is:
```bash
/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --help
```
The helper:
- chooses dirty local changes first
- otherwise uses current PR base if `gh pr view` works
- otherwise uses `origin/main` for non-main branches
- supports `--engine codex`, `claude`, `droid`, `copilot`, `pi`, and `opencode`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set
- `--engine pi` requires an explicit `--model` because the helper isolates Pi's config directory during review
- use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
- writes only to stdout unless `--output` or `--json-output` is set
- supports `--dry-run`, `--parallel-tests`, `--prompt`, `--prompt-file`, `--dataset`, `--no-tools`, `--no-web-search`, and commit refs
- allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through `codex exec` with read-only sandbox and structured output
- prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0
- exits nonzero when accepted/actionable findings are present
## Final Report
Include:
- review command used
- tests/proof run
- findings accepted/rejected, briefly why
- the clean review result from the final helper/review run, or why a remaining finding was consciously rejected
Do not run another review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.

View File

@ -0,0 +1,892 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import re
import subprocess
import sys
import tempfile
import textwrap
import time
from pathlib import Path
from typing import Any
SCHEMA: dict[str, Any] = {
"type": "object",
"additionalProperties": False,
"required": [
"findings",
"overall_correctness",
"overall_explanation",
"overall_confidence",
],
"properties": {
"findings": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": False,
"required": [
"title",
"body",
"priority",
"confidence",
"category",
"code_location",
],
"properties": {
"title": {"type": "string", "minLength": 1, "maxLength": 140},
"body": {"type": "string", "minLength": 1, "maxLength": 2000},
"priority": {"type": "string", "enum": ["P0", "P1", "P2", "P3"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1},
"category": {
"type": "string",
"enum": ["bug", "security", "regression", "test_gap", "maintainability"],
},
"code_location": {
"type": "object",
"additionalProperties": False,
"required": ["file_path", "line"],
"properties": {
"file_path": {"type": "string", "minLength": 1},
"line": {"type": "integer", "minimum": 1},
},
},
},
},
},
"overall_correctness": {
"type": "string",
"enum": ["patch is correct", "patch is incorrect"],
},
"overall_explanation": {"type": "string", "minLength": 1, "maxLength": 3000},
"overall_confidence": {"type": "number", "minimum": 0, "maximum": 1},
},
}
def run(
args: list[str],
cwd: Path,
*,
input_text: str | None = None,
env: dict[str, str] | None = None,
check: bool = True,
) -> subprocess.CompletedProcess[str]:
result = subprocess.run(
args,
cwd=cwd,
input=input_text,
env=env,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
if check and result.returncode != 0:
cmd = " ".join(args)
raise SystemExit(f"command failed ({result.returncode}): {cmd}\n{result.stderr or result.stdout}")
return result
def git(repo: Path, *args: str, check: bool = True) -> str:
return run(["git", *args], repo, check=check).stdout
def repo_root() -> Path:
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
if result.returncode != 0:
raise SystemExit("autoreview must run inside a git repository")
return Path(result.stdout.strip()).resolve()
def current_branch(repo: Path) -> str:
return git(repo, "branch", "--show-current", check=False).strip() or "detached"
def is_dirty(repo: Path) -> bool:
return bool(git(repo, "status", "--porcelain").strip())
def choose_target(repo: Path, mode: str, base_ref: str | None) -> tuple[str, str | None]:
branch = current_branch(repo)
if mode == "local" or (mode == "auto" and is_dirty(repo)):
return "local", None
if mode == "commit":
return "commit", None
if mode == "branch" or (mode == "auto" and branch != "main"):
return "branch", base_ref or detect_pr_base(repo) or "origin/main"
raise SystemExit("no review target: clean main checkout and no forced mode")
def detect_pr_base(repo: Path) -> str | None:
if not shutil_which("gh"):
return None
result = run(["gh", "pr", "view", "--json", "baseRefName", "--jq", ".baseRefName"], repo, check=False)
base = result.stdout.strip()
return f"origin/{base}" if result.returncode == 0 and base else None
def shutil_which(name: str) -> str | None:
for part in os.environ.get("PATH", "").split(os.pathsep):
candidate = Path(part) / name
if candidate.exists() and os.access(candidate, os.X_OK):
return str(candidate)
return None
def bounded(text: str, limit: int = 180_000) -> str:
if len(text) <= limit:
return text
return text[:limit] + f"\n\n[truncated at {limit} characters]\n"
def read_text(path: Path, limit: int = 40_000) -> str:
try:
data = path.read_bytes()
except OSError as exc:
return f"[unreadable: {exc}]"
if b"\0" in data:
return "[binary file omitted]"
text = data.decode("utf-8", errors="replace")
return bounded(text, limit)
def local_bundle(repo: Path) -> str:
parts = [
"# Git Status",
git(repo, "status", "--short"),
"# Staged Diff",
git(repo, "diff", "--cached", "--stat"),
bounded(git(repo, "diff", "--cached", "--patch", "--find-renames")),
"# Unstaged Diff",
git(repo, "diff", "--stat"),
bounded(git(repo, "diff", "--patch", "--find-renames")),
]
untracked = [line for line in git(repo, "ls-files", "--others", "--exclude-standard").splitlines() if line]
if untracked:
parts.append("# Untracked Files")
for rel in untracked:
path = repo / rel
parts.append(f"## {rel}\n{read_text(path)}")
return "\n\n".join(parts)
def branch_bundle(repo: Path, base_ref: str) -> str:
git(repo, "fetch", "origin", "--quiet", check=False)
return "\n\n".join(
[
"# Branch Diff",
f"base: {base_ref}",
git(repo, "diff", "--stat", f"{base_ref}...HEAD"),
bounded(git(repo, "diff", "--patch", "--find-renames", f"{base_ref}...HEAD")),
]
)
def commit_bundle(repo: Path, commit_ref: str) -> str:
return "\n\n".join(
[
"# Commit Diff",
f"commit: {commit_ref}",
git(repo, "show", "--stat", "--format=fuller", commit_ref),
bounded(git(repo, "show", "--patch", "--find-renames", "--format=fuller", commit_ref)),
]
)
def review_paths(repo: Path, target: str, target_ref: str | None, commit_ref: str) -> set[str]:
names: set[str] = set()
if target == "local":
sources = [
git(repo, "diff", "--name-only", "--cached"),
git(repo, "diff", "--name-only"),
git(repo, "ls-files", "--others", "--exclude-standard"),
]
elif target == "branch":
assert target_ref
sources = [git(repo, "diff", "--name-only", f"{target_ref}...HEAD")]
else:
sources = [git(repo, "show", "--name-only", "--format=", commit_ref)]
for source in sources:
for line in source.splitlines():
path = line.strip()
if path:
names.add(path)
return names
def load_extra_prompt(args: argparse.Namespace) -> str:
chunks: list[str] = []
for value in args.prompt or []:
chunks.append(value)
for path in args.prompt_file or []:
chunks.append(Path(path).read_text())
return "\n\n".join(chunks)
def load_datasets(args: argparse.Namespace) -> str:
chunks: list[str] = []
for spec in args.dataset or []:
path = Path(spec)
if path.is_dir():
raise SystemExit(f"--dataset must be a file, got directory: {path}")
chunks.append(f"# Dataset: {path}\n{read_text(path)}")
return "\n\n".join(chunks)
def build_prompt(repo: Path, target: str, target_ref: str | None, bundle: str, extra_prompt: str, datasets: str) -> str:
target_line = f"{target} {target_ref}" if target_ref else target
return textwrap.dedent(
f"""
You are a senior code reviewer. Review the provided git change bundle only.
Hard rules:
- Return exactly one JSON object and nothing else. Do not wrap it in Markdown.
- The JSON object must match this schema exactly:
{json.dumps(SCHEMA, indent=2)}
- Do not modify files.
- Do not invoke nested reviewers or review tools.
- Forbidden nested review commands include: codex review, autoreview, claude review, oracle review.
- You may use read-only tools and web search to inspect files, dependency contracts, upstream docs, current behavior, and security implications.
- Shell commands, if available, must be read-only inspection commands. Do not run tests, formatters, package installs, generators, network mutation commands, git mutation commands, or commands that write files.
- Report only actionable defects introduced or exposed by this change.
- Prefer high-signal findings over style feedback.
- Include security findings: injection, secret leaks, authz/authn bypass, path traversal, unsafe deserialization, unsafe filesystem or shell use, privacy leaks, and credential handling.
- Do not reject legitimate functionality merely because it touches shell, filesystem, network, auth, or sensitive data. Report a security finding only when the patch creates a concrete exploitable risk, removes an important safety check, or lacks validation at a trust boundary.
- For each finding, use the smallest file/line location that demonstrates the issue.
- If there are no actionable findings, return an empty findings array and mark the patch correct.
Review target: {target_line}
Repository: {repo}
{extra_prompt}
{datasets}
# Change Bundle
{bundle}
"""
).strip()
def write_json_temp(data: dict[str, Any]) -> Path:
handle = tempfile.NamedTemporaryFile("w", suffix=".json", delete=False)
with handle:
json.dump(data, handle)
return Path(handle.name)
def run_codex(args: argparse.Namespace, repo: Path, prompt: str) -> str:
if not args.tools:
raise SystemExit("--no-tools is not supported by the Codex engine; use --engine claude --no-tools for a no-tools run")
schema_path = write_json_temp(SCHEMA)
output_path = Path(tempfile.NamedTemporaryFile("w", suffix=".json", delete=False).name)
cmd = [args.codex_bin, "--ask-for-approval", "never"]
if args.web_search:
cmd.append("--search")
if args.model:
cmd.extend(["--model", args.model])
cmd.extend(
[
"exec",
"--ephemeral",
"-C",
str(repo),
"-s",
"read-only",
"--output-schema",
str(schema_path),
"--output-last-message",
str(output_path),
"-",
]
)
result = run(cmd, repo, input_text=prompt, check=False)
try:
output = output_path.read_text()
finally:
schema_path.unlink(missing_ok=True)
output_path.unlink(missing_ok=True)
if result.returncode != 0:
raise SystemExit(f"codex engine failed ({result.returncode})\n{result.stderr or result.stdout}")
return output or result.stdout
def run_claude(args: argparse.Namespace, repo: Path, prompt: str) -> str:
cmd = [
args.claude_bin,
"--print",
"--no-session-persistence",
"--output-format",
"json",
"--json-schema",
json.dumps(SCHEMA),
]
if args.tools:
cmd.extend(["--allowedTools", claude_allowed_tools(args)])
else:
cmd.extend(["--tools", ""])
if args.model:
cmd.extend(["--model", args.model])
result = run(cmd, repo, input_text=prompt, check=False)
if result.returncode != 0:
raise SystemExit(f"claude engine failed ({result.returncode})\n{result.stderr or result.stdout}")
return result.stdout
def run_droid(args: argparse.Namespace, repo: Path, prompt: str) -> str:
prompt_path = Path(tempfile.NamedTemporaryFile("w", suffix=".txt", delete=False).name)
prompt_path.write_text(prompt)
cmd = [
args.droid_bin,
"exec",
"--cwd",
str(repo),
"--output-format",
"json",
"-f",
str(prompt_path),
]
if args.model:
cmd.extend(["--model", args.model])
if not args.tools:
cmd.extend(["--disabled-tools", "*"])
result = run(cmd, repo, check=False)
prompt_path.unlink(missing_ok=True)
if result.returncode != 0:
raise SystemExit(f"droid engine failed ({result.returncode})\n{result.stderr or result.stdout}")
return result.stdout
def run_copilot(args: argparse.Namespace, repo: Path, prompt: str) -> str:
if not args.tools:
raise SystemExit("--no-tools is not supported by the copilot engine; copilot requires a read-only file view tool to load the review bundle without exposing it in argv")
with tempfile.TemporaryDirectory(prefix="autoreview-copilot.") as tempdir:
prompt_path = Path(tempdir) / "prompt.txt"
prompt_path.write_text(prompt)
os.chmod(prompt_path, 0o600)
cmd = [
args.copilot_bin,
"-C",
tempdir,
"-p",
"Read ./prompt.txt and follow it exactly. Return only the requested JSON object.",
"--output-format",
"json",
"--stream",
"off",
"--no-ask-user",
"--disable-builtin-mcps",
]
if args.model:
cmd.extend(["--model", args.model])
cmd.extend(
[
"--available-tools=read_agent,rg,view,web_fetch",
"--allow-tool=read_agent",
"--allow-tool=rg",
"--allow-tool=view",
"--allow-tool=web_fetch",
]
)
if args.web_search:
cmd.append("--allow-all-urls")
result = run(cmd, Path(tempdir), check=False)
if result.returncode != 0:
raise SystemExit(f"copilot engine failed ({result.returncode})\n{result.stderr or result.stdout}")
return result.stdout
def run_pi(args: argparse.Namespace, repo: Path, prompt: str) -> str:
if not args.tools:
raise SystemExit("--no-tools is not supported by the pi engine; use --tools read-only allowlist for review")
if not args.model:
raise SystemExit("--engine pi requires --model because autoreview isolates PI_CODING_AGENT_DIR from user settings")
with tempfile.TemporaryDirectory(prefix="autoreview-pi.") as tempdir:
temp = Path(tempdir)
prompt_path = temp / "prompt.txt"
prompt_path.write_text(prompt)
os.chmod(prompt_path, 0o600)
env = os.environ.copy()
agent_dir = temp / "agent"
agent_dir.mkdir()
env["PI_CODING_AGENT_DIR"] = str(agent_dir)
env["PI_CODING_AGENT_SESSION_DIR"] = str(temp / "sessions")
env["PI_TELEMETRY"] = "0"
cmd = [
args.pi_bin,
"--no-session",
"--no-context-files",
"--no-extensions",
"--no-skills",
"--no-prompt-templates",
"--no-themes",
"--tools",
pi_readonly_tools(args),
"--mode",
"json",
]
if args.model:
cmd.extend(["--model", args.model])
cmd.extend(["-p", f"@{prompt_path}", "Read the attached review prompt and follow it exactly."])
result = run(cmd, repo, env=env, check=False)
if result.returncode != 0:
raise SystemExit(f"pi engine failed ({result.returncode})\n{result.stderr or result.stdout}")
return result.stdout
def run_opencode(args: argparse.Namespace, repo: Path, prompt: str) -> str:
if not args.tools:
raise SystemExit("--no-tools is not supported by the opencode engine; opencode requires read-only tools to load the review bundle")
with tempfile.TemporaryDirectory(prefix="autoreview-opencode.") as tempdir:
temp = Path(tempdir)
config_dir = temp / "config"
config_dir.mkdir()
prompt_path = temp / "prompt.txt"
prompt_path.write_text(prompt)
os.chmod(prompt_path, 0o600)
env = os.environ.copy()
env.update(
{
"OPENCODE_CONFIG_DIR": str(config_dir),
"OPENCODE_CONFIG_CONTENT": json.dumps(opencode_review_config(args)),
"OPENCODE_DISABLE_PROJECT_CONFIG": "1",
"OPENCODE_PURE": "1",
"OPENCODE_DISABLE_AUTOUPDATE": "1",
"OPENCODE_DISABLE_AUTOCOMPACT": "1",
"OPENCODE_DISABLE_MODELS_FETCH": "1",
}
)
cmd = [
args.opencode_bin,
"run",
"--pure",
"--format",
"json",
"--agent",
"autoreview",
"--dir",
str(repo),
"-f",
str(prompt_path),
]
if args.model:
cmd.extend(["--model", args.model])
cmd.append("Read the attached review prompt and follow it exactly. Return only the requested JSON object.")
result = run(cmd, repo, env=env, check=False)
if result.returncode != 0:
raise SystemExit(f"opencode engine failed ({result.returncode})\n{result.stderr or result.stdout}")
return result.stdout
def pi_readonly_tools(args: argparse.Namespace) -> str:
return "read,grep,find,ls"
def opencode_review_config(args: argparse.Namespace) -> dict[str, Any]:
permission = {
"*": "deny",
"read": "allow",
"grep": "allow",
"glob": "allow",
"list": "allow",
"edit": "deny",
"bash": "deny",
"task": "deny",
"todowrite": "deny",
"question": "deny",
"repo_clone": "deny",
"repo_overview": "deny",
"skill": "deny",
}
if args.web_search:
permission.update(
{
"webfetch": "allow",
"websearch": "allow",
}
)
return {
"agent": {
"autoreview": {
"description": "Read-only structured code review agent",
"mode": "primary",
"steps": 8,
"permission": permission,
}
}
}
def claude_allowed_tools(args: argparse.Namespace) -> str:
tools = [tool.strip() for tool in args.claude_allowed_tools.split(",") if tool.strip()]
if not args.web_search:
tools = [tool for tool in tools if tool not in {"WebSearch", "WebFetch"}]
return ",".join(tools)
def extract_json(text: str) -> dict[str, Any]:
stripped = text.strip()
if not stripped:
raise SystemExit("review engine returned empty output")
try:
parsed = json.loads(stripped)
except json.JSONDecodeError as exc:
fenced_report = parse_json_candidate(stripped)
if isinstance(fenced_report, dict) and "findings" in fenced_report:
return fenced_report
jsonl_report = extract_json_from_jsonl(stripped)
if jsonl_report:
return jsonl_report
raise SystemExit(f"review engine returned non-JSON output: {exc}\n{stripped[:2000]}")
if isinstance(parsed, dict) and "findings" in parsed:
return parsed
if isinstance(parsed, dict) and isinstance(parsed.get("structured_output"), dict):
return parsed["structured_output"]
if isinstance(parsed, dict) and isinstance(parsed.get("result"), str):
result_json = parse_json_candidate(parsed["result"])
if isinstance(result_json, dict) and "findings" in result_json:
return result_json
raise SystemExit(f"review engine result was not structured JSON:\n{parsed['result'][:2000]}")
jsonl_report = extract_json_from_jsonl(stripped)
if jsonl_report:
return jsonl_report
raise SystemExit(f"review engine returned unexpected JSON shape:\n{json.dumps(parsed)[:2000]}")
def extract_json_from_jsonl(text: str) -> dict[str, Any] | None:
candidates: list[str] = []
assistant_stream: list[str] = []
for line in text.splitlines():
line = line.strip()
if not line:
continue
try:
event = json.loads(line)
except json.JSONDecodeError:
continue
if not isinstance(event, dict):
continue
if isinstance(event.get("text"), str):
candidates.append(event["text"])
assistant_stream.append(event["text"])
if isinstance(event.get("delta"), str):
assistant_stream.append(event["delta"])
part = event.get("part")
if isinstance(part, dict) and isinstance(part.get("text"), str):
candidates.append(part["text"])
assistant_stream.append(part["text"])
assistant_event = event.get("assistantMessageEvent")
if isinstance(assistant_event, dict):
if isinstance(assistant_event.get("content"), str):
candidates.append(assistant_event["content"])
if isinstance(assistant_event.get("delta"), str):
assistant_stream.append(assistant_event["delta"])
partial = assistant_event.get("partial")
if isinstance(partial, dict):
candidates.extend(extract_text_blocks(partial.get("content")))
data = event.get("data")
if isinstance(data, dict) and isinstance(data.get("content"), str):
candidates.append(data["content"])
if isinstance(event.get("result"), str):
candidates.append(event["result"])
message = event.get("message")
if isinstance(message, dict):
texts = extract_text_blocks(message.get("content"))
candidates.extend(texts)
if message.get("role") == "assistant":
assistant_stream.extend(texts)
messages = event.get("messages")
if isinstance(messages, list):
for item in messages:
if not isinstance(item, dict):
continue
texts = extract_text_blocks(item.get("content"))
candidates.extend(texts)
if item.get("role") == "assistant":
assistant_stream.extend(texts)
if assistant_stream:
candidates.append("".join(assistant_stream))
for candidate in reversed(candidates):
parsed = parse_json_candidate(candidate)
if isinstance(parsed, dict) and "findings" in parsed:
return parsed
return None
def extract_text_blocks(value: Any) -> list[str]:
if isinstance(value, str):
return [value]
if not isinstance(value, list):
return []
result: list[str] = []
for item in value:
if isinstance(item, dict) and isinstance(item.get("text"), str):
result.append(item["text"])
return result
def parse_json_candidate(text: str) -> Any | None:
stripped = text.strip()
if stripped.startswith("```"):
lines = stripped.splitlines()
if lines and lines[0].startswith("```") and lines[-1].strip() == "```":
stripped = "\n".join(lines[1:-1]).strip()
try:
parsed = json.loads(stripped)
except json.JSONDecodeError:
repaired = repair_invalid_json_escapes(stripped)
if repaired == stripped:
return None
try:
parsed = json.loads(repaired)
except json.JSONDecodeError:
return None
if isinstance(parsed, str) and parsed != text:
nested = parse_json_candidate(parsed)
return nested if nested is not None else parsed
return parsed
def repair_invalid_json_escapes(text: str) -> str:
return re.sub(r'\\(?!["\\/bfnrtu])', "", text)
def validate_report(
report: dict[str, Any],
repo: Path,
changed_paths: set[str],
required: list[str],
required_any: list[str],
) -> None:
allowed_top = {"findings", "overall_correctness", "overall_explanation", "overall_confidence"}
extra_top = set(report) - allowed_top
if extra_top:
raise SystemExit(f"review JSON has unexpected top-level keys: {sorted(extra_top)}")
for key in SCHEMA["required"]:
if key not in report:
raise SystemExit(f"review JSON missing required key: {key}")
if not isinstance(report["findings"], list):
raise SystemExit("review JSON findings must be an array")
if report.get("overall_correctness") not in {"patch is correct", "patch is incorrect"}:
raise SystemExit(f"review JSON has invalid overall_correctness: {report.get('overall_correctness')}")
if not isinstance(report.get("overall_explanation"), str) or not report["overall_explanation"]:
raise SystemExit("review JSON overall_explanation must be a non-empty string")
if len(report["overall_explanation"]) > 3000:
raise SystemExit("review JSON overall_explanation is too long")
if not number_in_range(report.get("overall_confidence")):
raise SystemExit("review JSON overall_confidence must be numeric")
finding_text = ""
for index, finding in enumerate(report["findings"]):
if not isinstance(finding, dict):
raise SystemExit(f"finding {index} must be an object")
allowed_finding = {"title", "body", "priority", "confidence", "category", "code_location"}
extra_finding = set(finding) - allowed_finding
if extra_finding:
raise SystemExit(f"finding {index} has unexpected keys: {sorted(extra_finding)}")
for key in allowed_finding:
if key not in finding:
raise SystemExit(f"finding {index} missing required key: {key}")
title = finding.get("title")
if not isinstance(title, str) or not title or len(title) > 140:
raise SystemExit(f"finding {index} has invalid title")
body = finding.get("body")
if not isinstance(body, str) or not body or len(body) > 2000:
raise SystemExit(f"finding {index} has invalid body")
priority = finding.get("priority")
if priority not in {"P0", "P1", "P2", "P3"}:
raise SystemExit(f"finding {index} has invalid priority: {priority}")
if not number_in_range(finding.get("confidence")):
raise SystemExit(f"finding {index} has invalid confidence")
category = finding.get("category")
if category not in {"bug", "security", "regression", "test_gap", "maintainability"}:
raise SystemExit(f"finding {index} has invalid category: {category}")
location = finding.get("code_location")
if not isinstance(location, dict):
raise SystemExit(f"finding {index} missing code_location")
rel = str(location.get("file_path", "")).strip()
line = location.get("line")
if not rel or not isinstance(line, int) or line < 1:
raise SystemExit(f"finding {index} has invalid location: {location}")
if Path(rel).is_absolute() or ".." in Path(rel).parts:
raise SystemExit(f"finding {index} uses invalid file path: {rel}")
if rel not in changed_paths:
raise SystemExit(f"finding {index} points to a file outside the reviewed change: {rel}")
finding_text += "\n" + json.dumps(finding, sort_keys=True)
haystack = finding_text.lower()
for needle in required:
if needle.lower() not in haystack:
raise SystemExit(f"required finding text not found: {needle}")
for group in required_any:
needles = [needle.strip().lower() for needle in group.split(",") if needle.strip()]
if needles and not any(needle in haystack for needle in needles):
raise SystemExit(f"required finding text not found; need one of: {', '.join(needles)}")
def number_in_range(value: Any) -> bool:
return isinstance(value, (int, float)) and not isinstance(value, bool) and 0 <= value <= 1
def print_report(report: dict[str, Any]) -> None:
findings = report["findings"]
if findings:
print(f"autoreview findings: {len(findings)}")
elif report["overall_correctness"] == "patch is incorrect":
print("autoreview verdict: patch is incorrect without discrete findings")
else:
print("autoreview clean: no accepted/actionable findings reported")
for finding in findings:
loc = finding["code_location"]
print(f"[{finding['priority']}] {finding['title']}")
print(f"{loc['file_path']}:{loc['line']}")
print(f"{finding['body']}")
print()
print(f"overall: {report['overall_correctness']} ({report['overall_confidence']})")
print(report["overall_explanation"])
def start_parallel_tests(command: str, repo: Path) -> tuple[subprocess.Popen, float]:
print(f"tests: {command}")
return subprocess.Popen(command, cwd=repo, shell=True), time.time()
def finish_parallel_tests(proc: subprocess.Popen, started: float) -> int:
proc.wait()
print(f"tests exit: {proc.returncode} after {int(time.time() - started)}s")
return int(proc.returncode or 0)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Bundle-driven AI code review.")
parser.add_argument("--mode", choices=["auto", "local", "branch", "commit"], default="auto")
parser.add_argument("--base")
parser.add_argument("--commit", default="HEAD")
parser.add_argument("--engine", choices=["codex", "claude", "droid", "copilot", "pi", "opencode"], default=os.environ.get("AUTOREVIEW_ENGINE", "codex"))
parser.add_argument("--model")
parser.add_argument("--codex-bin", default=os.environ.get("CODEX_BIN", "codex"))
parser.add_argument("--claude-bin", default=os.environ.get("CLAUDE_BIN", "claude"))
parser.add_argument("--droid-bin", default=os.environ.get("DROID_BIN", "droid"))
parser.add_argument("--copilot-bin", default=os.environ.get("COPILOT_BIN", "copilot"))
parser.add_argument("--pi-bin", default=os.environ.get("PI_BIN", "pi"))
parser.add_argument("--opencode-bin", default=os.environ.get("OPENCODE_BIN", "opencode"))
parser.add_argument("--no-tools", dest="tools", action="store_false", default=True, help="Disable tools for engines that support it. Codex, copilot, pi, and opencode reject no-tools review.")
parser.add_argument("--no-web-search", dest="web_search", action="store_false", default=True)
parser.add_argument(
"--claude-allowed-tools",
default=os.environ.get(
"AUTOREVIEW_CLAUDE_TOOLS",
"Read,Grep,Glob,WebSearch,WebFetch",
),
)
parser.add_argument("--prompt", action="append", help="Additional review instruction text.")
parser.add_argument("--prompt-file", action="append", help="Additional review instruction file.")
parser.add_argument("--dataset", action="append", help="Extra evidence file to include in the review bundle.")
parser.add_argument("--output", help="Write human output to a file as well as stdout.")
parser.add_argument("--json-output", help="Write validated structured review JSON.")
parser.add_argument("--parallel-tests", help="Run a test command concurrently with review; failure fails the helper.")
parser.add_argument("--require-finding", action="append", default=[], help="Require finding text to contain this substring.")
parser.add_argument("--require-any-finding", action="append", default=[], help="Require finding text to contain at least one comma-separated substring.")
parser.add_argument("--expect-findings", action="store_true", help="Treat findings as success; for harness acceptance tests.")
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args()
if args.engine not in {"codex", "claude", "droid", "copilot", "pi", "opencode"}:
raise SystemExit(f"invalid --engine/AUTOREVIEW_ENGINE: {args.engine}")
return args
def run_engine(args: argparse.Namespace, repo: Path, prompt: str) -> str:
if args.engine == "codex":
return run_codex(args, repo, prompt)
if args.engine == "claude":
return run_claude(args, repo, prompt)
if args.engine == "droid":
return run_droid(args, repo, prompt)
if args.engine == "copilot":
return run_copilot(args, repo, prompt)
if args.engine == "pi":
return run_pi(args, repo, prompt)
if args.engine == "opencode":
return run_opencode(args, repo, prompt)
raise SystemExit(f"unsupported engine: {args.engine}")
def main() -> int:
args = parse_args()
repo = repo_root()
target, target_ref = choose_target(repo, args.mode, args.base)
print(f"autoreview target: {target}")
print(f"branch: {current_branch(repo)}")
print(f"engine: {args.engine}")
print(f"tools: {'on' if args.tools else 'off'}")
print(f"web_search: {'on' if args.web_search else 'off'}")
display_ref = args.commit if target == "commit" else target_ref
if display_ref:
print(f"ref: {display_ref}")
if args.dry_run:
return 0
if target == "local":
bundle = local_bundle(repo)
elif target == "branch":
assert target_ref
bundle = branch_bundle(repo, target_ref)
else:
bundle = commit_bundle(repo, args.commit)
target_ref = args.commit
prompt = build_prompt(repo, target, target_ref, bundle, load_extra_prompt(args), load_datasets(args))
changed_paths = review_paths(repo, target, target_ref, args.commit)
print(f"bundle: {len(prompt)} chars")
tests_proc: tuple[subprocess.Popen, float] | None = None
if args.parallel_tests:
tests_proc = start_parallel_tests(args.parallel_tests, repo)
try:
raw = run_engine(args, repo, prompt)
report = extract_json(raw)
validate_report(report, repo, changed_paths, args.require_finding, args.require_any_finding)
if args.json_output:
Path(args.json_output).write_text(json.dumps(report, indent=2) + "\n")
if args.output:
original_stdout = sys.stdout
with Path(args.output).open("w") as handle:
sys.stdout = Tee(original_stdout, handle)
print_report(report)
sys.stdout = original_stdout
else:
print_report(report)
finally:
tests_status = finish_parallel_tests(*tests_proc) if tests_proc else 0
has_findings = bool(report["findings"])
overall_incorrect = report["overall_correctness"] == "patch is incorrect"
if tests_status != 0:
return 1
if args.expect_findings:
return 0 if has_findings else 1
return 1 if has_findings or overall_incorrect else 0
class Tee:
def __init__(self, *streams: Any) -> None:
self.streams = streams
def write(self, data: str) -> None:
for stream in self.streams:
stream.write(data)
def flush(self) -> None:
for stream in self.streams:
stream.flush()
if __name__ == "__main__":
raise SystemExit(main())

View File

@ -0,0 +1,711 @@
---
name: crabbox
description: Use the Crabbox wrapper for OpenClaw remote validation across Linux, macOS, Windows, and WSL2, including delegated Blacksmith Testbox proof. Report the actual provider and id.
---
# Crabbox
Use the Crabbox wrapper when OpenClaw needs remote Linux proof for broad tests,
CI-parity checks, secrets, hosted services, Docker/E2E/package lanes, warmed
reusable boxes, sync timing, logs/results, cache inspection, or lease cleanup.
Crabbox is the transport/orchestration surface. The actual backend can be:
- brokered AWS Crabbox: direct provider, `provider=aws`, lease ids like
`cbx_...`, `syncDelegated=false`
- Blacksmith Testbox through Crabbox: delegated provider,
`provider=blacksmith-testbox`, ids like `tbx_...`, `syncDelegated=true`
For OpenClaw maintainer broad `pnpm` gates, Blacksmith Testbox through the
Crabbox wrapper is acceptable and often preferred when the standing Testbox
rules apply. Do not describe those runs as "AWS Crabbox"; report them as
Testbox-through-Crabbox with the `tbx_...` id and Actions run.
Use the repo `.crabbox.yaml` brokered AWS path when the task specifically needs
direct AWS Crabbox behavior, persistent direct-provider leases, `--fresh-pr`,
`--full-resync`, environment forwarding, capture/download support, or provider
comparison. Use `--provider blacksmith-testbox` when the task needs OpenClaw
maintainer Testbox proof, prepared CI environment, broad/heavy pnpm gates, or
the user asks for Testbox/Blacksmith.
## First Checks
- Run from the repo root. Crabbox sync mirrors the current checkout.
- Check the wrapper and providers before remote work:
```sh
command -v crabbox
../crabbox/bin/crabbox --version
pnpm crabbox:run -- --help | sed -n '1,120p'
../crabbox/bin/crabbox desktop launch --help
../crabbox/bin/crabbox webvnc --help
```
- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH
shim can be stale.
- Check `.crabbox.yaml` for direct-provider defaults. Omitting `--provider`
means brokered AWS today.
- The brokered AWS default is a Linux developer image in `eu-west-1`; the repo
config pins hot `eu-west-1a/b/c` placement so Fast Snapshot Restore can apply.
If warmup drifts well past the minute-scale path, verify image promotion,
region/AZ placement, and FSR state before blaming OpenClaw.
- For broad OpenClaw maintainer `pnpm` gates, prefer the repo wrapper with
`--provider blacksmith-testbox` or the repo Testbox helpers when the standing
Testbox policy applies.
- Always report the actual provider and id. `cbx_...` means AWS Crabbox;
`tbx_...` means Blacksmith Testbox through Crabbox. If the output only says
`blacksmith testbox list`, use `blacksmith testbox list --all` before
concluding no box exists.
- If a warm direct-provider lease smells stale, retry with `--full-resync`
(alias `--fresh-sync`) before replacing the lease. This resets the remote
workdir, skips the fingerprint fast path, reseeds Git when possible, and
uploads the checkout from scratch.
- For live/provider bugs, use the configured secret workflow before downgrading
to mocks. Copy only the exact needed key into the remote process environment
for that one command. Do not print it, do not sync it as a repo file, and do
not leave it in remote shell history or logs. If no secret-safe injection path
is available, say true live provider auth is blocked instead of silently using
a fake key.
- Prefer local targeted tests for tight edit loops. Broad gates belong remote.
- Do not treat inherited shell env as operator intent. In particular,
`OPENCLAW_LOCAL_CHECK_MODE=throttled` from the local shell is not permission
to move broad `pnpm check:changed`, `pnpm test:changed`, full `pnpm test`, or
lint/typecheck fan-out onto the laptop.
- Only use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` when the user explicitly
asks for local proof in the current task. If Testbox is queued or capacity is
constrained, report the blocker and keep only targeted local edit-loop checks
running.
## macOS And Windows Targets
Use these only when the task needs an existing non-Linux host. OpenClaw broad
Linux validation uses the repo Crabbox config unless a provider is explicitly
requested.
Native brokered Windows is available for Windows-specific proof. Use the AWS
developer image in `us-west-2` on demand; it has the expected OpenClaw developer
toolchain and Docker image cache. Keep broad Linux gates on Linux/Testbox unless
the bug is Windows-specific:
```sh
../crabbox/bin/crabbox warmup \
--provider aws \
--target windows \
--windows-mode normal \
--region us-west-2 \
--market on-demand \
--timing-json
```
The hydrate workflow assumes Docker should already be baked into Linux images
and only installs it as a fallback. Do not add per-run Docker installs to proof
commands unless the image probe shows Docker is actually missing.
When the user explicitly asks for brokered macOS runners, use Crabbox AWS
macOS only after confirming the deployed coordinator supports EC2 Mac host
lifecycle/image routes and the operator has AWS EC2 Mac Dedicated Host quota
and IAM. Prefer `CRABBOX_HOST_ID` for a known Crabbox-managed Dedicated Host,
or run the no-spend preflight first:
```sh
crabbox admin hosts quota --provider aws --target macos --region eu-west-1 --type mac2.metal --json
crabbox admin hosts allocate --provider aws --target macos --region eu-west-1 --type mac2.metal --dry-run --json
CRABBOX_MACOS_TYPES=all scripts/macos-host-region-preflight.sh
```
Do not silently substitute AWS macOS for normal OpenClaw Linux proof. Report
paid-host blockers as quota, IAM, coordinator deployment, or host availability
instead of falling back to local macOS.
Crabbox supports static SSH targets:
```sh
../crabbox/bin/crabbox run --provider ssh --target macos --static-host mac-studio.local -- xcodebuild test
../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode normal --static-host win-dev.local -- pwsh -NoProfile -Command "dotnet test"
../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local -- pnpm test
```
- `target=macos` and `target=windows --windows-mode wsl2` use the POSIX SSH,
bash, Git, rsync, and tar contract.
- Native Windows uses OpenSSH, PowerShell, Git, and tar; sync is manifest tar
archive transfer into `static.workRoot`. Direct native Windows runs support
`--script*`, `--env-from-profile`, `--preflight`, and PowerShell `--shell`.
- `crabbox actions hydrate/register` are Linux-only today; use plain
`crabbox run` loops for static macOS and Windows hosts.
- Live proof needs a reachable, operator-managed SSH host. Without one, verify
with `../crabbox/bin/crabbox run --help`, config/flag tests, and the Crabbox
Go test suite.
## Direct Brokered AWS Backend
Use this when the task needs direct AWS Crabbox semantics rather than the
prepared Blacksmith Testbox CI environment.
Changed gate:
```sh
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
```
Full suite:
```sh
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
```
Focused rerun:
```sh
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test <path-or-filter>"
```
Read the JSON summary. Useful fields:
- `provider`: `aws`
- `leaseId`: `cbx_...`
- `syncDelegated`: `false`
- `commandPhases`: populated when the command prints `CRABBOX_PHASE:<name>`
- `commandMs` / `totalMs`
- `exitCode`
Crabbox should stop one-shot AWS leases automatically after the run. Verify
cleanup when a run fails, is interrupted, or the command output is unclear:
```sh
../crabbox/bin/crabbox list --provider aws
```
## Blacksmith Testbox Through Crabbox
Use this for OpenClaw maintainer broad/heavy `pnpm` gates when the prepared CI
environment is the right proof surface:
```sh
node scripts/crabbox-wrapper.mjs run \
--provider blacksmith-testbox \
--blacksmith-org openclaw \
--blacksmith-workflow .github/workflows/ci-check-testbox.yml \
--blacksmith-job check \
--blacksmith-ref main \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
-- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 OPENCLAW_TESTBOX=1 OPENCLAW_TESTBOX_REMOTE_RUN=1 pnpm check:changed
```
Read the JSON summary and the Testbox line. Useful fields:
- `provider`: `blacksmith-testbox`
- `leaseId`: `tbx_...`
- `syncDelegated`: `true`
- `syncPhases`: delegated/skipped because Blacksmith owns checkout/sync
- Actions run URL/id from the Testbox output
- `exitCode`
`blacksmith testbox list` may hide hydrating or ready boxes. Use:
```sh
blacksmith testbox list --all
blacksmith testbox status <tbx_id>
```
## Observability Flags
Use these on debugging runs before inventing ad hoc logging:
- `--preflight`: prints run context, workspace mode, SSH target, remote user/cwd,
and target-specific tool probes. Defaults cover `git`, `tar`, `node`, `npm`,
`corepack`, `pnpm`, `yarn`, `bun`, `docker`, plus POSIX
`sudo`/`apt`/`bubblewrap` and native Windows
`powershell`/`execution_policy`/`longpaths`/`temp`/`pwsh`. Add
`--preflight-tools node,bun,docker`, `CRABBOX_PREFLIGHT_TOOLS`, or repo
`run.preflightTools` to replace the list. `default` expands built-ins; `none`
prints only the workspace summary. Preflight is diagnostic only; install
toolchains through Actions hydration, images, devcontainer/Nix/mise/asdf, or
the run script. On `blacksmith-testbox`, this prints a delegated-unsupported
note because the workflow owns setup.
- `CRABBOX_ENV_ALLOW=NAME,...`: forwards only listed local env vars for direct
providers and prints `set len=N secret=true` style summaries. On
`blacksmith-testbox`, env forwarding is unsupported; put secrets in the
Testbox workflow instead.
- `--env-from-profile <file>` plus `--allow-env NAME`: loads simple
`export NAME=value` / `NAME=value` lines from a local profile without
executing it, then forwards only allowlisted names. `--allow-env` is
repeatable and comma-separated. Profile values override ambient allowlisted
env values for that run. Direct POSIX, WSL2, and native Windows runs are
supported; delegated providers are not. Crabbox probes the uploaded profile
remotely and prints redacted presence/length metadata before the command.
- `--env-helper <name>`: with `--env-from-profile` on POSIX SSH targets,
persists `.crabbox/env/<name>` and `.crabbox/env/<name>.env` so follow-up
commands on the same lease can run through `./.crabbox/env/<name> <command>`.
Use only on leases you control; the profile stays until cleanup, lease reset,
or `--full-resync`.
- `--script <file>` / `--script-stdin`: upload a local script into
`.crabbox/scripts/` and execute it on the remote box. Shebang scripts execute
directly on POSIX; scripts without a shebang run through `bash`. Native
Windows uploads run through Windows PowerShell, and Crabbox appends `.ps1`
when needed. Arguments after `--` become script args.
- `--fresh-pr owner/repo#123|URL|number`: skip dirty local sync and create a
fresh remote checkout of the GitHub PR. Bare numbers use the current repo's
GitHub origin. Add `--apply-local-patch` only when the current local
`git diff --binary HEAD` should be applied on top of that PR checkout.
- `--full-resync` / `--fresh-sync`: reset a stale direct-provider workdir
before syncing. Use after sync fingerprints look wrong, SSH times out before
sync, or rsync watchdog output suggests it. It is redundant with
`--fresh-pr`, incompatible with `--no-sync`, and unsupported by delegated
providers.
- `--capture-stdout <path>` / `--capture-stderr <path>`: write remote streams to
local files and keep binary/noisy output out of retained logs. Parent
directories must already exist. These are direct-provider only.
- `--capture-on-fail`: on non-zero direct-provider exits, downloads
`.crabbox/captures/*.tar.gz` with `test-results`, `playwright-report`,
`coverage`, JUnit XML, and nearby logs. Treat as secret-bearing until reviewed.
- `--keep-on-failure`: leave a failed one-shot lease alive for live debugging
until idle/TTL expiry. Useful on direct providers and delegated one-shots.
- `--timing-json`: final machine-readable timing. Add
`echo CRABBOX_PHASE:install`, `CRABBOX_PHASE:test`, etc. in long shell
commands; direct providers and Blacksmith Testbox both report them as
`commandPhases`.
Live-provider debug template for direct AWS/Hetzner leases:
```sh
mkdir -p .crabbox/logs
pnpm crabbox:run -- --provider aws \
--preflight \
--allow-env OPENAI_API_KEY,OPENAI_BASE_URL \
--timing-json \
--capture-stdout .crabbox/logs/live-provider.stdout.log \
--capture-stderr .crabbox/logs/live-provider.stderr.log \
--capture-on-fail \
--shell -- \
"echo CRABBOX_PHASE:install; pnpm install --frozen-lockfile; echo CRABBOX_PHASE:test; pnpm test:live"
```
Do not pass `--capture-*`, `--download`, `--checksum`, `--force-sync-large`, or
`--sync-only` to delegated providers. Also do not pass `--script*`,
`--fresh-pr`, `--full-resync`, or `--env-helper` there. Crabbox rejects these
because the provider owns sync or command transport. `--keep-on-failure` is OK
for delegated one-shots when you need to inspect a failed lease.
## Efficient Bug E2E Verification
Use the smallest Crabbox lane that proves the reported user path, not just the
touched code. Aim for one after-fix E2E proof before commenting, closing, or
opening a PR for a user-visible bug.
When the user says "test in Crabbox", do not simply copy tests to the remote
box and run them there. Crabbox is for remote real-scenario proof: copy or
install OpenClaw as the user would, run the same setup/update/CLI/Gateway/API
call that failed, and capture behavior from that entrypoint. For regressions or
bug reports, prove the broken state first when feasible, then run the same
scenario after the fix.
Pick the lane by symptom:
- Docker/setup/install bug: build a package tarball and run the matching
`scripts/e2e/*-docker.sh` or package script. This proves npm packaging,
install paths, runtime deps, config writes, and container behavior.
- Provider/model/auth bug: prefer true live E2E. Use the configured secret
workflow, then inject the single needed key into Crabbox if needed. Scrub
unrelated provider env vars in the child command so interactive defaults do
not drift to another provider. If only a dummy key is used, label the proof
narrowly, e.g. "UI/install path only; live provider auth not exercised."
- Channel delivery bug: use the channel Docker/live lane when available; include
setup, config, gateway start, send/receive or agent-turn proof, and redacted
logs.
- Gateway/session/tool bug: prefer an end-to-end CLI or Gateway RPC command that
creates real state and inspects the resulting files/API output.
- Pure parser/config bug: targeted tests may be enough, but still run a
Crabbox command when OS, package, Docker, secrets, or service lifecycle could
change behavior.
Efficient flow:
1. Reproduce or prove the pre-fix symptom from the real user-facing entrypoint
when feasible. If the issue cannot be reproduced, capture the exact command
and observed behavior instead.
2. Patch locally and run narrow local tests for edit speed.
3. Run one Crabbox E2E command that starts from the user-facing entrypoint:
package install, Docker setup, onboarding, channel add, gateway start, or
agent turn as appropriate.
4. Record proof as: Testbox id, command, environment shape, redacted secret
source, and copied success/failure output.
5. If the issue says "cannot reproduce", ask for the missing config/log fields
that would distinguish the tested path from the reporter's path.
Keep it efficient:
- Reuse existing E2E scripts and helper assertions before writing ad hoc shell.
- Use `--script <file>` or `--script-stdin` for multi-line E2E commands instead
of quote-heavy `--shell` strings on direct SSH providers.
- Use `--fresh-pr <pr>` when validating an upstream PR in isolation from the
local dirty tree. Add `--apply-local-patch` only when testing a local fixup on
top of that PR.
- Use `--full-resync` before replacing a warmed direct-provider lease when the
remote workdir or sync fingerprint appears stale.
- Use one-shot Crabbox for a single proof; use a reusable Testbox only when
several commands must share built images, installed packages, or live state.
- Prefer `OPENCLAW_CURRENT_PACKAGE_TGZ` with Docker/package lanes when testing a
candidate tarball; prefer the repo's package helper instead of direct source
execution when the bug might be packaging/install related.
- Keep secrets redacted. It is fine to report key presence, source, and length;
never print secret values.
- Include `--timing-json` on broad or flaky runs when command duration or sync
behavior matters.
Before/after PR proof on delegated Testbox:
- For PRs that should prove "broken before, fixed after", compare base and PR
on the same Testbox when practical. Fetch both refs, create detached temp
worktrees under `/tmp`, install in each, then run the same harness twice.
- Do not checkout base/PR refs in the synced repo root. Delegated Testbox sync
may leave the root dirty with local files; `git checkout` can abort or mix
proof state.
- Temp harness files under `/tmp` do not resolve repo packages by default. Put
the harness inside the worktree, or in ESM use
`createRequire(path.join(process.cwd(), "package.json"))` before requiring
workspace deps such as `@lydell/node-pty`.
- For full-screen TUI/CLI bugs, a PTY harness is stronger than helper-only
assertions. Use a real PTY, wait for visible lifecycle markers, send input,
then send control keys and assert process exit/stuck behavior.
- When validating a rebased local branch before push, remember delegated sync
usually validates synced file content on a detached dirty checkout, not a
remote commit object. Record the local head SHA, changed files, Testbox id,
and final success markers; after pushing, ensure the pushed SHA has the same
file content.
- If GitHub CI is still queued but the exact changed content passed Testbox
`pnpm check:changed`, `pnpm check:test-types`, and the real E2E proof, it is
reasonable to merge once required checks allow it. Note any still-running
unrelated shards in the proof comment instead of waiting forever.
Interactive CLI/onboarding:
- For full-screen or prompt-heavy CLI flows, run the target command inside tmux
on the Crabbox and drive it with `tmux send-keys`; capture proof with
`tmux capture-pane`, redacted through `sed`.
- Prefer deterministic arrow navigation over search typing for Clack-style
searchable selects. Raw `send-keys -l openai` may not trigger filtering in a
tmux pane; inspect option order locally or on-box and send exact Down/Enter
sequences.
- Isolate mutable state with `OPENCLAW_STATE_DIR=$(mktemp -d)`. Plugin npm
installs live under that state dir (`npm/node_modules/...`), not under
`OPENCLAW_CONFIG_DIR`. Verify downloads by checking the state dir, package
lock, and installed package metadata.
- To test automatic setup installs against local package artifacts, use
`OPENCLAW_ALLOW_PLUGIN_INSTALL_OVERRIDES=1` plus
`OPENCLAW_PLUGIN_INSTALL_OVERRIDES='{"plugin-id":"npm-pack:/tmp/plugin.tgz"}'`.
Pack with `npm pack`, set an isolated `OPENCLAW_STATE_DIR`, and verify the
package under `npm/node_modules`. Overrides are test-only and must not be
treated as official/trusted-source installs.
- For OpenAI/Codex onboarding proof, the useful markers are the UI line
`Installed Codex plugin`, `npm/node_modules/@openclaw/codex`, and the
package-lock entry showing the bundled `@openai/codex` dependency. A dummy
OpenAI-shaped key can prove only UI/install behavior; it is not live auth.
## Reuse And Keepalive
For most Crabbox calls, one-shot is enough. Use reuse only when you need
multiple manual commands on the same hydrated box.
If Crabbox returns a reusable id or you intentionally keep a lease:
```sh
pnpm crabbox:run -- --id <cbx_id-or-slug> --no-sync --timing-json --shell -- "pnpm test <path>"
```
Stop boxes you created before handoff:
```sh
pnpm crabbox:stop -- <id-or-slug>
blacksmith testbox stop --id <tbx_id>
```
## Interactive Desktop And WebVNC
Prefer WebVNC for human inspection because the browser portal can preload the
lease VNC password and avoids a native VNC client's copy/paste/password dance.
Use native `crabbox vnc` only when WebVNC is unavailable, the browser portal is
broken, or the user explicitly wants a local VNC client.
Common desktop flow:
```sh
../crabbox/bin/crabbox warmup --provider hetzner --desktop --browser --class standard --idle-timeout 60m --ttl 240m
../crabbox/bin/crabbox desktop launch --provider hetzner --id <cbx_id-or-slug> --browser --url https://example.com --webvnc --open --take-control
```
Useful WebVNC commands:
```sh
../crabbox/bin/crabbox webvnc --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox webvnc daemon start --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox webvnc daemon status --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc daemon stop --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc status --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc reset --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox desktop doctor --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox desktop click --provider hetzner --id <cbx_id-or-slug> --x 640 --y 420
../crabbox/bin/crabbox desktop paste --provider hetzner --id <cbx_id-or-slug> --text "user@example.com"
../crabbox/bin/crabbox desktop key --provider hetzner --id <cbx_id-or-slug> ctrl+l
../crabbox/bin/crabbox artifacts collect --id <cbx_id-or-slug> --all --output artifacts/<slug>
../crabbox/bin/crabbox artifacts publish --dir artifacts/<slug> --pr <number>
```
`desktop launch --webvnc --open` is usually the nicest one-shot: it starts the
browser/app inside the visible session, bridges the lease into the authenticated
WebVNC portal, and opens the portal. Keep browsers windowed for human QA; use
`--fullscreen` only for capture/video workflows.
For human handoff, include `--take-control` so the opened portal viewer gets
keyboard/mouse control automatically instead of landing as an observer.
Human handoff preflight:
- Do not assume a visible desktop or launched browser means the repo CLI/app is
installed, built, or on the interactive terminal's `PATH`.
- Before handing WebVNC to a human tester, prove the expected command from the
same kept lease and from a neutral directory such as `~`.
- If the handoff needs repo-local code, sync/build/link it explicitly on that
lease. Source-tree CLIs often need build output before a symlink works.
- Prefer a real `command -v <expected-command> && <expected-command> --version`
check over a repo-root-only `pnpm ...` command.
Generic handoff repair pattern:
```sh
../crabbox/bin/crabbox run --id <cbx_id-or-slug> --full-resync --shell -- \
"set -euo pipefail
pnpm install --frozen-lockfile
pnpm build
sudo ln -sf \"\$PWD/<cli-entry>\" /usr/local/bin/<expected-command>
cd ~
command -v <expected-command>
<expected-command> --version"
```
## If Crabbox Fails
Keep the fallback narrow. First decide whether the failure is Crabbox itself,
the brokered AWS lease, Blacksmith/Testbox, repo hydration, sync, or the test
command.
Fast checks:
```sh
command -v crabbox
../crabbox/bin/crabbox --version
pnpm crabbox:run -- --help | sed -n '1,140p'
../crabbox/bin/crabbox doctor
command -v blacksmith
blacksmith --version
blacksmith testbox list
```
Common Crabbox-only failures:
- Provider missing or old CLI: use `../crabbox/bin/crabbox` from the sibling
repo, or update/install Crabbox before retrying.
- Bad local config: inspect `.crabbox.yaml`, `crabbox config show`, and
`crabbox whoami`; normal OpenClaw proof should use brokered AWS without
asking for cloud keys.
- Slug/claim confusion: use the raw `cbx_...` / `tbx_...` id, or run one-shot
without `--id`.
- Sync/timing bug: add `--debug --timing-json`; capture the final JSON and the
printed Actions URL. Large sync warnings now include top source directories
by file count and a hint to update `.crabboxignore` / `sync.exclude`; inspect
those before reaching for `--force-sync-large`. Quiet rsync watchdogs and SSH
timeouts now print `next_action=` hints; follow them, usually `--full-resync`
first and a fresh lease second.
- Cleanup uncertainty: run `crabbox list --provider aws`; for explicit
Blacksmith runs, use `blacksmith testbox list` and stop only boxes you
created.
- Testbox queued/capacity pressure: do not retry Blacksmith repeatedly. Rerun
once without `--provider` so `.crabbox.yaml` routes to brokered AWS, or report
the Blacksmith blocker if Testbox itself is the requested proof.
If brokered AWS cannot dispatch, sync, attach, or stop, retry once with
`--debug` and `--timing-json`:
```sh
pnpm crabbox:run -- --debug --timing-json -- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed
```
Full suite:
```sh
pnpm crabbox:run -- --debug --timing-json -- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test
```
Auth fallback, only when `blacksmith` says auth is missing:
```sh
blacksmith auth login --non-interactive --organization openclaw
```
Raw Blacksmith footguns:
- Run from repo root. The CLI syncs the current directory.
- Save the returned `tbx_...` id in the session.
- Reuse that id for focused reruns; stop it before handoff.
- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
- Treat `blacksmith testbox list` as cleanup diagnostics, not a shared reusable
queue.
Use Blacksmith only when the task is specifically about Testbox, brokered AWS
is unavailable, or an explicit comparison is needed. If Blacksmith is down or
quota-limited, do not keep probing it; stay on brokered AWS and note the
delegated-provider outage.
## Blacksmith Backend Notes
Crabbox Blacksmith backend delegates setup to:
- org: `openclaw`
- workflow: `.github/workflows/ci-check-testbox.yml`
- job: `check`
- ref: `main` unless testing a branch/tag intentionally
The hydration workflow owns checkout, Node/pnpm setup, dependency install,
secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command
execution, timing, logs/results, and cleanup.
Minimal Blacksmith-backed Crabbox run, from repo root:
```sh
pnpm crabbox:run -- --provider blacksmith-testbox --timing-json -- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed
```
Use direct Blacksmith only when Crabbox is the broken layer and you are
isolating a Crabbox bug. Prefer direct `blacksmith testbox list` for cleanup
diagnostics, not as a reusable work queue.
Important Blacksmith footguns:
- Always run from repo root. The CLI syncs the current directory.
- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
- If auth is missing and browser auth is acceptable:
```sh
blacksmith auth login --non-interactive --organization openclaw
```
## Brokered AWS
Use AWS for normal OpenClaw remote proof. The repo `.crabbox.yaml` already
selects brokered AWS, so omit `--provider` unless you are testing a different
provider deliberately.
```sh
pnpm crabbox:warmup -- --class beast --market on-demand --idle-timeout 90m
pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
pnpm crabbox:stop -- <cbx_id-or-slug>
```
Install/auth for owned Crabbox if needed:
```sh
brew install openclaw/tap/crabbox
crabbox login --url https://crabbox.openclaw.ai --provider aws
```
New users should self-resolve broker auth before anyone asks for AWS keys:
```sh
crabbox config show
crabbox doctor
crabbox whoami
```
- If broker auth is missing, run `crabbox login --url https://crabbox.openclaw.ai --provider aws`.
- If the CLI asks for `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, or AWS
profile setup during normal OpenClaw validation, assume the agent selected
the wrong path. Use brokered `crabbox login` or an existing brokered lease
before asking the user for cloud credentials.
- Ask for AWS keys only for explicit direct-provider/account administration,
not for normal brokered OpenClaw proof.
- Trusted automation may still use
`printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin`.
macOS config lives at:
```text
~/Library/Application Support/crabbox/config.yaml
```
It should include `broker.url`, `broker.token`, and usually `provider: aws`
for OpenClaw lanes. Let that config drive normal validation.
### Interactive Desktop / WebVNC
For human desktop demos, prefer `webvnc` over native `vnc` and keep the remote
desktop visible/windowed. Do not fullscreen the remote browser or hide the XFCE
panel/window chrome unless the explicit goal is video/capture output. After
launch, verify a screenshot shows the desktop panel plus browser title bar. If
Chrome is fullscreen, toggle it back with:
```sh
crabbox run --id <lease> --shell -- 'DISPLAY=:99 xdotool search --onlyvisible --class google-chrome windowactivate key F11'
```
## Diagnostics
```sh
crabbox status --id <id-or-slug> --wait
crabbox inspect --id <id-or-slug> --json
crabbox sync-plan
crabbox history --limit 20
crabbox history --lease <id-or-slug>
crabbox attach <run_id>
crabbox events <run_id> --json
crabbox logs <run_id>
crabbox results <run_id>
crabbox cache stats --id <id-or-slug>
crabbox ssh --id <id-or-slug>
blacksmith testbox list
```
Use `--debug` on `run` when measuring sync timing.
Use `--timing-json` on warmup, hydrate, and run when comparing backends.
Use `--market spot|on-demand` only on AWS warmup/one-shot runs.
## Failure Triage
- Crabbox cannot find provider: verify `../crabbox/bin/crabbox --help` lists
the provider selected by `.crabbox.yaml`; update Crabbox before falling back.
- Hydration stuck or failed: open the printed GitHub Actions run URL and inspect
the hydration step.
- Sync failed: rerun with `--debug`; check changed-file count and whether the
checkout is dirty.
- Command failed: rerun only the failing shard/file first. Do not rerun a full
suite until the focused failure is understood.
- Cleanup uncertain: `crabbox list --provider aws`; for explicit Blacksmith
runs, use `blacksmith testbox list` and stop owned `tbx_...` leases you
created.
- Crabbox broken but Blacksmith works: use the direct Blacksmith fallback above,
then file/fix the Crabbox issue.
## Boundary
Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the
hydration workflow and keep Crabbox generic around lease, sync, command
execution, logs/results, timing, and cleanup.

47
.crabbox.yaml Normal file
View File

@ -0,0 +1,47 @@
profile: mcporter-check
provider: aws
class: standard
capacity:
market: spot
strategy: most-available
fallback: on-demand-after-120s
hints: true
regions:
- eu-west-1
- eu-west-2
- eu-central-1
- us-east-1
- us-west-2
actions:
workflow: .github/workflows/crabbox-hydrate.yml
job: hydrate
ref: main
runnerLabels:
- crabbox
- openclaw
- mcporter
runnerVersion: latest
ephemeral: true
aws:
region: eu-west-1
rootGB: 120
sync:
delete: true
checksum: false
gitSeed: true
fingerprint: true
baseRef: main
exclude:
- .artifacts
- .codex
- .DS_Store
- node_modules
- data/*.db-wal
- data/*.db-shm
env:
allow:
- CI
- NODE_OPTIONS
ssh:
user: crabbox
port: "2222"

19
.github/CODEOWNERS vendored Normal file
View File

@ -0,0 +1,19 @@
# Protect ownership and automation rules.
/.github/CODEOWNERS @openclaw/openclaw-secops
/.github/dependabot.yml @openclaw/openclaw-secops
/.github/workflows/ @openclaw/openclaw-secops
/.agents/skills/ @openclaw/openclaw-secops
/.crabbox.yaml @openclaw/openclaw-secops
/SECURITY.md @openclaw/openclaw-secops
/AGENTS.md @openclaw/openclaw-secops
# Package, release, and security-sensitive surfaces.
/package.json @openclaw/openclaw-secops
/pnpm-lock.yaml @openclaw/openclaw-secops
/package-lock.json @openclaw/openclaw-secops
/src/ @openclaw/openclaw-secops
/Sources/ @openclaw/openclaw-secops
/cmd/ @openclaw/openclaw-secops
/internal/ @openclaw/openclaw-secops
/scripts/ @openclaw/openclaw-secops
/docs/ @openclaw/openclaw-secops

32
.github/dependabot.yml vendored Normal file
View File

@ -0,0 +1,32 @@
version: 2
updates:
- package-ecosystem: npm
directory: /
schedule:
interval: daily
cooldown:
default-days: 2
groups:
npm:
patterns:
- "*"
update-types:
- minor
- patch
open-pull-requests-limit: 5
- package-ecosystem: github-actions
directory: /
schedule:
interval: daily
cooldown:
default-days: 2
groups:
actions:
patterns:
- "*"
update-types:
- minor
- patch
open-pull-requests-limit: 5

40
.github/workflows/codeql.yml vendored Normal file
View File

@ -0,0 +1,40 @@
name: CodeQL
on:
pull_request:
push:
branches:
- main
schedule:
- cron: "49 4 * * 1"
workflow_dispatch:
permissions:
actions: read
contents: read
security-events: write
jobs:
analyze:
name: analyze (${{ matrix.language }})
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
language:
- actions
- javascript-typescript
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Initialize CodeQL
uses: github/codeql-action/init@v4
with:
languages: ${{ matrix.language }}
build-mode: none
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v4
with:
category: "/language:${{ matrix.language }}"

125
.github/workflows/crabbox-hydrate.yml vendored Normal file
View File

@ -0,0 +1,125 @@
name: Crabbox Hydrate
on:
workflow_dispatch:
inputs:
crabbox_id:
description: "Crabbox lease ID"
required: true
type: string
ref:
description: "Git ref to hydrate"
required: false
type: string
crabbox_runner_label:
description: "Dynamic Crabbox runner label"
required: true
type: string
crabbox_job:
description: "Hydration job identifier expected by Crabbox"
required: false
default: "hydrate"
type: string
crabbox_keep_alive_minutes:
description: "Minutes to keep the hydrated job alive"
required: false
default: "90"
type: string
permissions:
contents: read
env:
NODE_VERSION: "24"
PNPM_VERSION: "10.33.2"
jobs:
hydrate:
name: hydrate
runs-on: [self-hosted, "${{ inputs.crabbox_runner_label }}"]
timeout-minutes: 120
steps:
- uses: actions/checkout@v6
with:
ref: ${{ inputs.ref || github.ref }}
- uses: pnpm/action-setup@v4
with:
version: ${{ env.PNPM_VERSION }}
- uses: actions/setup-node@v6
with:
node-version: ${{ env.NODE_VERSION }}
cache: pnpm
- name: Prepare pnpm workspace
shell: bash
run: |
set -euo pipefail
git fetch --no-tags --depth=50 origin "+refs/heads/main:refs/remotes/origin/main"
pnpm install --frozen-lockfile
node --version
pnpm --version
- name: Mark Crabbox ready
shell: bash
env:
CRABBOX_ID: ${{ inputs.crabbox_id }}
CRABBOX_JOB: ${{ inputs.crabbox_job }}
run: |
set -euo pipefail
job="${CRABBOX_JOB}"
if [ -z "$job" ]; then job=hydrate; fi
case "$CRABBOX_ID" in
''|*[!A-Za-z0-9._-]*)
echo "Invalid crabbox_id" >&2
exit 2
;;
esac
mkdir -p "$HOME/.crabbox/actions"
state="$HOME/.crabbox/actions/${CRABBOX_ID}.env"
env_file="$HOME/.crabbox/actions/${CRABBOX_ID}.env.sh"
{
for key in CI GITHUB_ACTIONS GITHUB_WORKSPACE GITHUB_REPOSITORY GITHUB_RUN_ID GITHUB_RUN_NUMBER GITHUB_RUN_ATTEMPT GITHUB_REF GITHUB_REF_NAME GITHUB_SHA GITHUB_EVENT_NAME GITHUB_ACTOR RUNNER_OS RUNNER_ARCH RUNNER_TEMP RUNNER_TOOL_CACHE PATH; do
value="${!key-}"
if [ -n "$value" ]; then
printf 'export %s=%q\n' "$key" "$value"
fi
done
} > "${env_file}.tmp"
mv "${env_file}.tmp" "$env_file"
tmp="${state}.tmp"
{
echo "WORKSPACE=${GITHUB_WORKSPACE}"
echo "RUN_ID=${GITHUB_RUN_ID}"
echo "JOB=${job}"
echo "ENV_FILE=${env_file}"
echo "READY_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
} > "$tmp"
mv "$tmp" "$state"
- name: Keep Crabbox job alive
shell: bash
env:
CRABBOX_ID: ${{ inputs.crabbox_id }}
CRABBOX_KEEP_ALIVE_MINUTES: ${{ inputs.crabbox_keep_alive_minutes }}
run: |
set -euo pipefail
case "$CRABBOX_ID" in
''|*[!A-Za-z0-9._-]*)
echo "Invalid crabbox_id" >&2
exit 2
;;
esac
minutes="${CRABBOX_KEEP_ALIVE_MINUTES}"
case "$minutes" in
''|*[!0-9]*) minutes=90 ;;
esac
stop="$HOME/.crabbox/actions/${CRABBOX_ID}.stop"
deadline=$(( $(date +%s) + minutes * 60 ))
while [ "$(date +%s)" -lt "$deadline" ]; do
if [ -f "$stop" ]; then
exit 0
fi
sleep 15
done

86
.github/workflows/stale.yml vendored Normal file
View File

@ -0,0 +1,86 @@
name: Stale
on:
schedule:
- cron: "17 4 * * *"
workflow_dispatch:
permissions: {}
jobs:
stale:
permissions:
issues: write
pull-requests: write
runs-on: ubuntu-latest
steps:
- name: Mark stale unassigned issues and pull requests
uses: actions/stale@v10
with:
days-before-issue-stale: 14
days-before-issue-close: 7
days-before-pr-stale: 14
days-before-pr-close: 7
stale-issue-label: stale
stale-pr-label: stale
exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale
exempt-pr-labels: maintainer,no-stale
operations-per-run: 1000
ascending: true
exempt-all-assignees: true
remove-stale-when-updated: true
stale-issue-message: |
This issue has been automatically marked as stale due to inactivity.
Please add updated mcporter details or it will be closed.
stale-pr-message: |
This pull request has been automatically marked as stale due to inactivity.
Please update it or it will be closed.
close-issue-message: |
Closing due to inactivity.
If this still affects mcporter, open a new issue with current reproduction details.
close-issue-reason: not_planned
close-pr-message: |
Closing due to inactivity.
If this PR should be revived, reopen it with current context and validation.
- name: Mark stale assigned issues
uses: actions/stale@v10
with:
days-before-issue-stale: 30
days-before-issue-close: 10
days-before-pr-stale: -1
days-before-pr-close: -1
stale-issue-label: stale
exempt-issue-labels: enhancement,maintainer,pinned,security,no-stale
operations-per-run: 1000
ascending: true
include-only-assigned: true
remove-stale-when-updated: true
stale-issue-message: |
This assigned issue has been automatically marked as stale after 30 days of inactivity.
Please add an update or it will be closed.
close-issue-message: |
Closing due to inactivity.
If this still affects mcporter, reopen or file a new issue with current evidence.
close-issue-reason: not_planned
- name: Mark stale assigned pull requests
uses: actions/stale@v10
with:
days-before-issue-stale: -1
days-before-issue-close: -1
days-before-pr-stale: 27
days-before-pr-close: 7
stale-pr-label: stale
exempt-pr-labels: maintainer,no-stale
operations-per-run: 1000
ascending: true
include-only-assigned: true
ignore-pr-updates: true
remove-stale-when-updated: true
stale-pr-message: |
This assigned pull request has been automatically marked as stale after being open for 27 days.
Please add an update or it will be closed.
close-pr-message: |
Closing due to inactivity.
If this PR should be revived, reopen it with current context and validation.

30
SECURITY.md Normal file
View File

@ -0,0 +1,30 @@
# Security Policy
## Reporting
Report suspected vulnerabilities privately through GitHub Security Advisories for
this repository. If GHSA is unavailable to you, email security@openclaw.ai.
Do not open public issues for vulnerabilities or include secrets, private local
data, credentials, tokens, app data, or exploit details in public reports.
## Scope
In scope:
- MCP porter CLI, schema generation, provider config, package release
- config, credential, local filesystem, package, and workflow integrity surfaces
- command output, logs, artifacts, or generated data that could disclose private data
- dependency or runtime behavior that materially affects safe execution
Out of scope:
- upstream service outages, API changes, quotas, or account enforcement decisions
- compromise of a trusted local account, shell, filesystem, or maintainer device
- scanner-only findings without a reachable exploit path in supported usage
## Expectations
We prioritize reachable issues that affect credentials, private data, package
integrity, privileged automation, or safe execution. Include the affected commit,
platform, minimal reproduction steps, and sanitized impact details.