docs: prefer screenshots or videos for proof
This commit is contained in:
parent
290a3f749c
commit
a7170a575a
@ -9,7 +9,7 @@ checkpoint, and status-only commits are intentionally omitted.
|
||||
|
||||
### Added
|
||||
|
||||
- Added a light privacy reminder and stronger screenshot nudge to real behavior proof review guidance.
|
||||
- Added a light privacy reminder and stronger screenshot-or-video nudge to real behavior proof review guidance.
|
||||
- Added agent-led real behavior proof judgement so ClawSweeper can inspect linked screenshots, videos, logs, and terminal output with a read-only GitHub token, explain the proof verdict in the review comment, tell contributors how to trigger a fresh review after adding proof, and sync `proof: sufficient` when the evidence is convincing.
|
||||
- Added a real behavior proof assessment to PR reviews so missing, mock-only, or insufficient contributor proof blocks pass/automerge markers and asks for screenshots, terminal output, redacted logs, recordings, linked artifacts, or copied live output instead.
|
||||
- Added `config/automation-limits.json` plus docs and a drift check so review,
|
||||
|
||||
@ -86,7 +86,7 @@ likely owner.
|
||||
|
||||
For PRs, include a dedicated security review pass in addition to the functional review. Inspect whether the diff could introduce a security or supply-chain regression, especially when it touches CI workflows, GitHub Action refs, dependency sources, lockfiles, install/build/release scripts, package publishing metadata, secrets handling, permissions, downloaded artifacts, generated/vendor/minified files, or other code execution paths. Check whether those changes are consistent with the PR title, body, discussion, and stated purpose before deciding. Be cautious when a small or unrelated functional change also introduces new third-party code execution, broadens secret or permission access, changes package resolution, adds lifecycle hooks, downloads and executes artifacts, or mixes infrastructure changes into otherwise cosmetic work. Do not infer malicious intent without concrete evidence. Always summarize this pass in `securityReview`; set `status: "cleared"` when the diff has no concrete security or supply-chain concern, `status: "needs_attention"` when there is a concrete concern, and `status: "not_applicable"` for non-PR items without a security-sensitive report. Put concrete security concerns in `securityReview.concerns` with file/line when possible, and also include blocking concerns in `risks` and `evidence` when they affect the merge/close decision.
|
||||
|
||||
For PRs, include a dedicated `realBehaviorProof` assessment before any pass, automerge, or repair verdict. External PRs must show that the contributor ran the changed behavior after the fix in a real setup. Unit tests, mocks, snapshots, lint, typechecks, and CI are supplemental only; they are not real behavior proof by themselves. Treat screenshots, recordings, terminal screenshots, console output, copied live output, linked artifacts, and redacted runtime logs as valid proof, including for non-visual CLI, console, text, or error-message changes. Prefer asking for screenshots when they can show the behavior, including terminal screenshots for text or console changes, while keeping logs and live output acceptable. Remind contributors to redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence. A plain app screenshot is sufficient only for behavior it directly shows. Do not mark screenshot-only proof sufficient for browser runtime, CSP, CORS, `connect-src`, auth callback, network, or security changes when the proof only says no console error, warning, or violation is visible; require console output, a network trace, terminal/live output, logs, a recording with diagnostics, or a linked artifact that actually shows the runtime path. Use your tools and best judgement: inspect the PR body, comments, links, screenshots, videos, logs, terminal output, and changed behavior context; you may download/open GitHub attachment links, generate stills or contact sheets from videos, inspect terminal screenshots and logs, and compare the proof against the PR diff. Use the provided scratch directory for downloaded artifacts and keep the target checkout read-only. Use `status: "sufficient"` only when the evidence convincingly shows after-fix real behavior and an observed improved result. Use `status: "missing"` when proof is absent, `status: "mock_only"` when proof is only tests/mocks/CI, `status: "insufficient"` when the evidence is unrelated, unviewable, too weak, or does not show the changed real behavior after the fix, `status: "override"` when the PR has `proof: override`, and `status: "not_applicable"` for non-PR items or maintainer/bot PRs where the gate does not apply. When proof is missing, mock-only, or insufficient, set `needsContributorAction: true`, make the PR a human-only merge blocker, and do not request ClawSweeper repair markers because automation cannot prove the contributor's setup for them.
|
||||
For PRs, include a dedicated `realBehaviorProof` assessment before any pass, automerge, or repair verdict. External PRs must show that the contributor ran the changed behavior after the fix in a real setup. Unit tests, mocks, snapshots, lint, typechecks, and CI are supplemental only; they are not real behavior proof by themselves. Treat screenshots, recordings, terminal screenshots, console output, copied live output, linked artifacts, and redacted runtime logs as valid proof, including for non-visual CLI, console, text, or error-message changes. Prefer asking for screenshots or videos when they can show the behavior, including terminal screenshots for text or console changes, while keeping logs and live output acceptable. Remind contributors to redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence. A plain app screenshot is sufficient only for behavior it directly shows. Do not mark screenshot-only proof sufficient for browser runtime, CSP, CORS, `connect-src`, auth callback, network, or security changes when the proof only says no console error, warning, or violation is visible; require console output, a network trace, terminal/live output, logs, a recording with diagnostics, or a linked artifact that actually shows the runtime path. Use your tools and best judgement: inspect the PR body, comments, links, screenshots, videos, logs, terminal output, and changed behavior context; you may download/open GitHub attachment links, generate stills or contact sheets from videos, inspect terminal screenshots and logs, and compare the proof against the PR diff. Use the provided scratch directory for downloaded artifacts and keep the target checkout read-only. Use `status: "sufficient"` only when the evidence convincingly shows after-fix real behavior and an observed improved result. Use `status: "missing"` when proof is absent, `status: "mock_only"` when proof is only tests/mocks/CI, `status: "insufficient"` when the evidence is unrelated, unviewable, too weak, or does not show the changed real behavior after the fix, `status: "override"` when the PR has `proof: override`, and `status: "not_applicable"` for non-PR items or maintainer/bot PRs where the gate does not apply. When proof is missing, mock-only, or insufficient, set `needsContributorAction: true`, make the PR a human-only merge blocker, and do not request ClawSweeper repair markers because automation cannot prove the contributor's setup for them.
|
||||
|
||||
For PRs, also emit Codex `/review`-style findings in `reviewFindings`.
|
||||
Review the diff as another engineer's proposed patch and list every discrete,
|
||||
@ -327,8 +327,8 @@ review applies.
|
||||
Always fill `realBehaviorProof`. For external PRs, this is a merge gate, not a
|
||||
nice-to-have. Missing, mock-only, or insufficient proof should appear near the
|
||||
top of the public review as "needs real behavior proof before merge"; tell the
|
||||
contributor that screenshots are preferred when they can show the behavior;
|
||||
terminal screenshots, console output, copied live output, linked artifacts,
|
||||
contributor that screenshots or videos are preferred when they can show the
|
||||
behavior; terminal screenshots, console output, copied live output, linked artifacts,
|
||||
recordings, and redacted logs count. Remind contributors to redact private
|
||||
information like IP addresses, API keys, phone numbers, non-public endpoints,
|
||||
and other private details before posting evidence. For non-visual browser
|
||||
|
||||
@ -4202,17 +4202,17 @@ function publicRealBehaviorProofLine(proof: RealBehaviorProof): string {
|
||||
case "missing":
|
||||
return `Needs real behavior proof before merge: ${realBehaviorProofBlockerSummary(
|
||||
summary,
|
||||
"The PR must include after-fix evidence from a real setup. Screenshots are preferred when they can show the behavior; terminal screenshots, console output, copied live output, linked artifacts, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
"The PR must include after-fix evidence from a real setup. Screenshots or videos are preferred when they can show the behavior; terminal screenshots, console output, copied live output, linked artifacts, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
)}`;
|
||||
case "mock_only":
|
||||
return `Needs real behavior proof before merge: ${realBehaviorProofBlockerSummary(
|
||||
summary,
|
||||
"Tests, mocks, snapshots, lint, typechecks, and CI are supplemental only. Screenshots are preferred when they can show the behavior; terminal screenshots, console output, copied live output, linked artifacts, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
"Tests, mocks, snapshots, lint, typechecks, and CI are supplemental only. Screenshots or videos are preferred when they can show the behavior; terminal screenshots, console output, copied live output, linked artifacts, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
)}`;
|
||||
case "insufficient":
|
||||
return `Needs stronger real behavior proof before merge: ${realBehaviorProofBlockerSummary(
|
||||
summary,
|
||||
"Include after-fix evidence from a real setup. Screenshots are preferred when they can show the behavior; terminal screenshots, console output, copied live output, linked artifacts, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
"Include after-fix evidence from a real setup. Screenshots or videos are preferred when they can show the behavior; terminal screenshots, console output, copied live output, linked artifacts, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
)}`;
|
||||
case "not_applicable":
|
||||
return summary ? `Not applicable: ${summary}` : "";
|
||||
@ -4466,7 +4466,7 @@ function reportRealBehaviorProof(markdown: string): RealBehaviorProof {
|
||||
return {
|
||||
status: "missing",
|
||||
summary:
|
||||
"No after-fix real behavior proof was recorded for this external PR; screenshots are preferred when they can show the behavior, and terminal screenshots, console output, copied live output, linked artifacts, recordings, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
"No after-fix real behavior proof was recorded for this external PR; screenshots or videos are preferred when they can show the behavior, and terminal screenshots, console output, copied live output, linked artifacts, recordings, and redacted logs count. Redact private information like IP addresses, API keys, phone numbers, non-public endpoints, and other private details before posting evidence.",
|
||||
evidenceKind: "none",
|
||||
needsContributorAction: true,
|
||||
};
|
||||
|
||||
@ -2818,7 +2818,7 @@ test("review prompt requires real behavior proof for PR reviews", () => {
|
||||
assert.match(prompt, /download\/open GitHub attachment links/);
|
||||
assert.match(prompt, /generate stills or contact sheets from videos/);
|
||||
assert.match(prompt, /compare the proof against the PR diff/);
|
||||
assert.match(prompt, /Prefer asking for screenshots/);
|
||||
assert.match(prompt, /Prefer asking for screenshots or videos/);
|
||||
assert.match(prompt, /redact private information like IP addresses, API keys/);
|
||||
assert.match(prompt, /screenshot-only proof sufficient/);
|
||||
assert.match(prompt, /no visible console violation/);
|
||||
|
||||
Loading…
Reference in New Issue
Block a user