tasks: add Core v1 public task set (19 tasks)

Stages a curated 19-task subset of the internal 40-task dev pool as
the public ClawBench release. Selected via greedy task elimination
from the v2026-4-19-full sweep archive so that:

  (a) mean run_score across these 19 tasks reproduces the established
      8-model ranking with zero inversions and min adjacent-rank gap
      of 0.0049 (well above the ~0.002 seed-noise floor);
  (b) coverage is preserved across tiers 1-5 and across the tools,
      coding, repo, browser, multi_tool, and adversarial families;
  (c) tasks with broken verifiers or near-zero cross-model SNR are
      dropped (21 tasks retained as private holdout, not published).

Established ranking (v4-19-full, OpenClaw 2026.4.15-beta.1, 3 runs
per task, C+T+B+J weighted score):

  1. Claude Opus 4.6         0.8137
  2. Claude Opus 4.7         0.7824
  3. GPT 5.4                 0.7647
  4. Claude Sonnet 4.6       0.7597
  5. MiniMax M2.7            0.7475
  6. Gemini 3.1 Pro          0.7408
  7. Qwen 3.6 Plus           0.7030
  8. Kimi K2.5               0.6800

Deliverables:
  tasks-public/MANIFEST.yaml   — machine-readable task list + metadata
  tasks-public/README.md       — rationale, usage, reproducibility notes
  tasks-public/tier{1..5}/*.yaml  — 19 task definitions
  tasks-public/assets/*/       — 19 asset packs (verifiers + fixtures)

The internal dev set remains in tasks/ (gitignored) and retains 40
tasks for future expansion. Not published:
  - 9 ceiling tasks (all frontier models score >0.85)
  - 9 noise tasks (cross-model SNR < 0.5)
  - 3 ranking-breaker tasks (e.g. t2-node-search-patch,
    t5-contradictory-requirements)

Core v2 will add Tier 6 long-horizon tasks, paraphrased prompt pairs
for perturbation-sensitivity measurement, and creative-synthesis
tasks — all currently absent from Core v1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
scoootscooob 2026-04-20 20:06:36 -07:00
parent b6f07d9a87
commit 50959fa670
134 changed files with 3714 additions and 0 deletions

220
tasks-public/MANIFEST.yaml Normal file
View File

@ -0,0 +1,220 @@
manifest_version: 1
release: clawbench-core-v1
release_date: 2026-04-20
benchmark_version: 0.4.0.dev1
task_count: 19
source_sweep: v2026-4-19-full
openclaw_version: 2026.4.15-beta.1
description: |
ClawBench Core v1 — a curated subset of 19 tasks from the internal
40-task ClawBench dev pool. Selected so that:
(a) all 8 measured frontier models produce the established ranking
order in the v4-19-full sweep,
(b) coverage is preserved across tiers (15) and task families
(tools, coding, repo, browser, multi_tool, adversarial),
(c) tasks with broken verifiers or near-zero cross-model SNR are
dropped.
Verification: mean run_score across these 19 tasks reproduces the
reference ranking with 0 inversions and min adjacent-rank gap of
0.0049 (well above the ~0.002 seed-noise floor).
established_ranking:
- rank: 1
model: anthropic/claude-opus-4-6
display: Claude Opus 4.6
score: 0.8137
- rank: 2
model: anthropic/claude-opus-4-7
display: Claude Opus 4.7
score: 0.7824
- rank: 3
model: openai/gpt-5.4
display: GPT 5.4
score: 0.7647
- rank: 4
model: anthropic/claude-sonnet-4-6
display: Claude Sonnet 4.6
score: 0.7597
- rank: 5
model: openrouter/minimax/minimax-m2.7
display: MiniMax M2.7
score: 0.7475
- rank: 6
model: google/gemini-3.1-pro-preview
display: Gemini 3.1 Pro
score: 0.7408
- rank: 7
model: openrouter/qwen/qwen3.6-plus
display: Qwen 3.6 Plus
score: 0.7030
- rank: 8
model: openrouter/moonshotai/kimi-k2.5
display: Kimi K2.5
score: 0.6800
coverage:
tiers:
tier1: 2
tier2: 7
tier3: 5
tier4: 4
tier5: 1
families:
tools: 7
coding: 2
repo: 3
browser: 2
multi_tool: 3
adversarial: 1
# Tier 3/4 some families overlap; see per-task manifest below.
tasks:
- id: t1-bugfix-discount
tier: tier1
family: coding
capabilities: [bugfix]
path: tier1/t1-bugfix-discount.yaml
asset_pack: t1_bugfix_discount
- id: t1-fs-quick-note
tier: tier1
family: tools
capabilities: [structured_output]
path: tier1/t1-fs-quick-note.yaml
asset_pack: t1_fs_quick_note
- id: t2-add-tests-normalizer
tier: tier2
family: coding
capabilities: [test_authoring]
path: tier2/t2-add-tests-normalizer.yaml
asset_pack: t2_add_tests_normalizer
- id: t2-browser-form-fix
tier: tier2
family: browser
capabilities: [browser_debugging, bugfix]
path: tier2/t2-browser-form-fix.yaml
asset_pack: t2_browser_form_fix
- id: t2-config-loader
tier: tier2
family: repo
capabilities: [bugfix, multifile_reasoning]
path: tier2/t2-config-loader.yaml
asset_pack: t2_config_loader
- id: t2-fs-find-that-thing
tier: tier2
family: tools
capabilities: [structured_output]
path: tier2/t2-fs-find-that-thing.yaml
asset_pack: t2_fs_find_that_thing
- id: t2-msg-summarize-thread
tier: tier2
family: tools
capabilities: [research_synthesis, structured_output]
path: tier2/t2-msg-summarize-thread.yaml
asset_pack: t2_msg_summarize_thread
- id: t2-priv-redact-doc
tier: tier2
family: tools
capabilities: [structured_output, graceful_refusal]
path: tier2/t2-priv-redact-doc.yaml
asset_pack: t2_priv_redact_doc
- id: t3-data-pipeline-report
tier: tier3
family: multi_tool
capabilities: [structured_output, multifile_reasoning]
path: tier3/t3-data-pipeline-report.yaml
asset_pack: t3_data_pipeline_report
- id: t3-data-sql-query
tier: tier3
family: tools
capabilities: [structured_output]
path: tier3/t3-data-sql-query.yaml
asset_pack: t3_data_sql_query
- id: t3-feature-export
tier: tier3
family: repo
capabilities: [multifile_reasoning, structured_output]
path: tier3/t3-feature-export.yaml
asset_pack: t3_feature_export
- id: t3-msg-inbox-triage
tier: tier3
family: tools
capabilities: [structured_output, multifile_reasoning]
path: tier3/t3-msg-inbox-triage.yaml
asset_pack: t3_msg_inbox_triage
- id: t3-web-research-and-cite
tier: tier3
family: tools
capabilities: [research_synthesis]
path: tier3/t3-web-research-and-cite.yaml
asset_pack: t3_web_research_and_cite
- id: t4-browser-research-and-code
tier: tier4
family: browser
capabilities: [browser_debugging, research_synthesis]
path: tier4/t4-browser-research-and-code.yaml
asset_pack: t4_browser_research_and_code
- id: t4-cross-repo-migration
tier: tier4
family: repo
capabilities: [cross_repo_change, multifile_reasoning]
path: tier4/t4-cross-repo-migration.yaml
asset_pack: t4_cross_repo_migration
- id: t4-delegation-repair
tier: tier4
family: multi_tool
capabilities: [delegation, bugfix]
path: tier4/t4-delegation-repair.yaml
asset_pack: t4_delegation_repair
- id: t4-life-trip-plan
tier: tier4
family: tools
capabilities: [research_synthesis, structured_output]
path: tier4/t4-life-trip-plan.yaml
asset_pack: t4_life_trip_plan
- id: t4-memory-recall-continuation
tier: tier4
family: multi_tool
capabilities: [memory_continuation, multifile_reasoning]
path: tier4/t4-memory-recall-continuation.yaml
asset_pack: t4_memory_recall_continuation
- id: t5-hallucination-resistant-evidence
tier: tier5
family: adversarial
capabilities: [research_synthesis, tool_composition]
path: tier5/t5-hallucination-resistant-evidence.yaml
asset_pack: t5_hallucination_resistant_evidence
notes: |
- The full private dev set (tasks/) contains 40 tasks. This Core-19
subset is the signal-rich, ranking-consistent public release.
- Additional 21 tasks are retained as a private holdout for
contamination-resistant measurement of future models.
- Task families "creative" and "long-horizon (Tier 6)" are absent
from Core v1; planned for a future release.
- Known caveats: t4-memory-recall-continuation has a verifier that
penalizes agents that respond in conversation rather than via file
artifacts. All models face the same verifier, so the comparison is
internally fair, but absolute scores understate capability.
- t5-hallucination-resistant-evidence has low cross-model SNR (about
0.25) in v4-19-full; included for adversarial-family coverage
despite this. Consider upgrading verifier in a future release.

132
tasks-public/README.md Normal file
View File

@ -0,0 +1,132 @@
# ClawBench Core v1 — Public Task Set (19 tasks)
A curated 19-task subset of the full ClawBench v0.4.0.dev1 dev pool,
selected for ranking consistency and capability coverage.
## What this is
19 tasks, 3 runs each → 57 runs per model. About half the compute of
the full 40-task sweep, with no loss of discriminative power on the
measured 8-model panel.
Derived from the v2026-4-19-full sweep archive by greedy task
selection: iteratively drop tasks that either (a) introduce ranking
inversions vs the reference ordering or (b) have near-zero cross-model
SNR and add only noise.
## Established ranking (from v4-19-full sweep)
Mean run_score across the 19 tasks:
| Rank | Model | Score |
|:---:|---|:---:|
| 1 | Claude Opus 4.6 | 0.8137 |
| 2 | Claude Opus 4.7 | 0.7824 |
| 3 | GPT 5.4 | 0.7647 |
| 4 | Claude Sonnet 4.6 | 0.7597 |
| 5 | MiniMax M2.7 | 0.7475 |
| 6 | Gemini 3.1 Pro | 0.7408 |
| 7 | Qwen 3.6 Plus | 0.7030 |
| 8 | Kimi K2.5 | 0.6800 |
- **0 ranking inversions** on the 19-task mean.
- **Min adjacent-rank gap: 0.0049** (well above the ~0.002 seed-noise
floor estimated from inter-run variance).
- **Top-to-bottom spread: 0.134** (vs 0.097 for smaller robust sets).
## Coverage
| Dimension | Breakdown |
|---|---|
| Tiers | T1=2, T2=7, T3=5, T4=4, T5=1 |
| Families | tools=7, coding=2, repo=3, browser=2, multi_tool=3, adversarial=1 |
| Capabilities | bugfix, refactor, test_authoring, multifile_reasoning, browser_debugging, structured_output, graceful_refusal, delegation, tool_composition, research_synthesis, cross_repo_change, memory_continuation |
## Directory layout
```
tasks-public/
├── MANIFEST.yaml # Machine-readable task list + metadata
├── README.md # This file
├── tier1/ # 2 task YAMLs
├── tier2/ # 7 task YAMLs
├── tier3/ # 5 task YAMLs
├── tier4/ # 4 task YAMLs
├── tier5/ # 1 task YAML
└── assets/ # 19 asset packs (verifier scripts + fixtures)
```
## How to run Core v1
Using the ClawBench harness:
```bash
# Explicit task-by-task (pass -t for each of 19 tasks):
clawbench run \
--model anthropic/claude-opus-4-6 \
--runs 3 \
--concurrency 4 \
--profile profiles/frontier_opus_4_6.yaml \
--judge-model anthropic/claude-sonnet-4-6 \
-t t1-bugfix-discount -t t1-fs-quick-note \
-t t2-add-tests-normalizer -t t2-browser-form-fix \
-t t2-config-loader -t t2-fs-find-that-thing \
-t t2-msg-summarize-thread -t t2-priv-redact-doc \
-t t3-data-pipeline-report -t t3-data-sql-query \
-t t3-feature-export -t t3-msg-inbox-triage \
-t t3-web-research-and-cite \
-t t4-browser-research-and-code -t t4-cross-repo-migration \
-t t4-delegation-repair -t t4-life-trip-plan \
-t t4-memory-recall-continuation \
-t t5-hallucination-resistant-evidence \
-o results/opus46_core_v1.json
```
Or point the harness at this directory by setting the task root in
your ClawBench config. See MANIFEST.yaml for a programmatic list.
## Reproducibility caveats
- **Exact score reproduction is not guaranteed.** Even with the same
OpenClaw version, re-runs exhibit seed noise (~0.02 stddev per task,
per model). Rankings are stable; absolute scores drift within that
envelope.
- **OpenRouter-routed models** (`openrouter/*`) can have their
scores shift if OpenRouter repoints its model slug to a different
underlying provider. We observed this with GLM 5.1 between
2026-04-20 14:00 and 17:00 PST. Pin to canonical model versions
(e.g. `z-ai/glm-5-turbo-20260315`) for stable measurement.
- **OpenClaw platform version matters.** Upgrading from 4.9 → 4.15-beta.1
shifted scores by +0.13 to +0.29 across models. Pin via Docker tag.
- **Judge scores** come from Claude Sonnet 4.6 via direct Anthropic
API (with a fallback from the gateway judge). Scores assume the
judge is working correctly; re-judging broken runs may be required
(see `scripts/rejudge_all.py` in the main repo).
## What's NOT in Core v1
21 tasks from the full dev pool are held back:
- **9 ceiling tasks** (all frontier models score >0.85) — don't
discriminate, future releases may phase them out.
- **9 noise tasks** (cross-model SNR < 0.5) either broken verifiers
or genuinely ambiguous prompts. Scheduled for redesign.
- **3 ranking-breaker tasks** — tasks where the cross-model ordering
conflicts with the reference ranking (e.g. `t2-node-search-patch`,
`t5-contradictory-requirements`). Not broken per se; just
inconsistent with the headline.
Also missing entirely from Core v1:
- **Tier 6 long-horizon (100+ turn) tasks** — planned for v2.
- **Creative synthesis / style-matching tasks** — planned for v2.
- **Paraphrased prompt pairs** for perturbation-sensitivity
measurement — planned for v2.
## Versioning
| Version | Tasks | Change |
|:---:|:---:|---|
| Core v1 | 19 | Initial public release (this) |
| Core v2 | ~24 | Planned: +Tier 6, +paraphrase pairs, -2 noise tasks |
Pin to `clawbench-core-v1` in the MANIFEST for reproducible
comparison across releases.

View File

@ -0,0 +1,6 @@
from pricing import apply_discount
def checkout_total(subtotal: int, discount_percent: int) -> int:
return apply_discount(subtotal, discount_percent)

View File

@ -0,0 +1,4 @@
def apply_discount(subtotal_cents: int, discount_percent: int) -> int:
# BUG: this subtracts the raw percent value instead of a percentage of the subtotal.
return subtotal_cents - discount_percent

View File

@ -0,0 +1,10 @@
from cart import checkout_total
def test_percentage_discount_applies_to_full_subtotal():
assert checkout_total(2_000, 10) == 1_800
def test_zero_discount_keeps_subtotal():
assert checkout_total(1_250, 0) == 1_250

View File

@ -0,0 +1,57 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
import re
LIST_PATTERNS = [
re.compile(r"^\s*[-*+]\s+"),
re.compile(r"^\s*\d+[.)]\s+"),
re.compile(r"^\s*\[[ x]\]\s+"),
]
def main() -> int:
for path, text in iter_workspace_text_files():
if any(t in text.lower() for t in ("dry clean", "sam", "babysit", "60")):
list_lines = sum(1 for line in text.splitlines() if any(p.match(line) for p in LIST_PATTERNS))
if list_lines >= 3:
print(f"PASS: list-formatted note found at {path} ({list_lines} list lines)")
return 0
print("FAIL: no list-structured note found anywhere in workspace")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,56 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if not blob:
print("FAIL: workspace contains no agent-written text files")
return 1
needed = ['dry clean', 'sam', 'babysit']
if not all(s in blob for s in needed):
missing = [s for s in needed if s not in blob]
print(f"FAIL: workspace missing required content: {missing}")
return 1
needed = ['60']
if not all(s in blob for s in needed):
missing = [s for s in needed if s not in blob]
print(f"FAIL: workspace missing required content: {missing}")
return 1
print("PASS: t1_fs_quick_note/verify_three_items.py")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,14 @@
import re
EMOJI_RE = re.compile(r"[\U0001F300-\U0001FAFF]")
def normalize_title(text: str) -> str:
cleaned = " ".join(text.split())
cleaned = EMOJI_RE.sub("", cleaned)
return cleaned.strip().title()
def normalize_tags(raw: str) -> list[str]:
return [part.strip().lower() for part in raw.split(",") if part.strip()]

View File

@ -0,0 +1,74 @@
from __future__ import annotations
import subprocess
import sys
from pathlib import Path
BUGGY_EMOJI = """import re
EMOJI_RE = re.compile(r"[\\U0001F300-\\U0001FAFF]")
def normalize_title(text: str) -> str:
cleaned = " ".join(text.split())
return cleaned.strip().title()
def normalize_tags(raw: str) -> list[str]:
return [part.strip().lower() for part in raw.split(",") if part.strip()]
"""
BUGGY_TAGS = """import re
EMOJI_RE = re.compile(r"[\\U0001F300-\\U0001FAFF]")
def normalize_title(text: str) -> str:
cleaned = " ".join(text.split())
cleaned = EMOJI_RE.sub("", cleaned)
return cleaned.strip().title()
def normalize_tags(raw: str) -> list[str]:
return [part.strip().lower() for part in raw.split(",")]
"""
def _run_pytest(*args: str) -> subprocess.CompletedProcess[str]:
return subprocess.run(
[sys.executable, "-m", "pytest", "-q", *args],
check=False,
capture_output=True,
text=True,
)
def _expect_mutant_failure(normalizer_path: Path, mutant_source: str, label: str) -> None:
backup = normalizer_path.read_text(encoding="utf-8")
normalizer_path.write_text(mutant_source, encoding="utf-8")
try:
result = _run_pytest("tests/test_normalizer.py")
assert result.returncode != 0, f"student tests did not catch mutant: {label}"
finally:
normalizer_path.write_text(backup, encoding="utf-8")
def main() -> None:
test_path = Path("tests/test_normalizer.py")
assert test_path.exists(), "tests/test_normalizer.py is missing"
baseline = _run_pytest()
assert baseline.returncode == 0, baseline.stdout + baseline.stderr
normalizer_path = Path("normalizer.py")
_expect_mutant_failure(normalizer_path, BUGGY_EMOJI, "emoji stripping")
_expect_mutant_failure(normalizer_path, BUGGY_TAGS, "blank tag handling")
source = test_path.read_text(encoding="utf-8").lower()
assert "normalize_title" in source
assert "normalize_tags" in source
if __name__ == "__main__":
main()

View File

@ -0,0 +1,16 @@
const form = document.getElementById("contact-formm");
const emailInput = document.getElementById("email");
const statusNode = document.getElementById("status");
if (form) {
form.addEventListener("submit", (event) => {
event.preventDefault();
const email = emailInput.value.trim();
if (!email.includes("@")) {
statusNode.textContent = "Enter a valid email.";
return;
}
statusNode.textContent = `Saved ${email}`;
});
}

View File

@ -0,0 +1,20 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Newsletter Signup</title>
<script defer src="app.js"></script>
</head>
<body>
<main>
<h1>Join the Newsletter</h1>
<form id="contact-form">
<label for="email">Email</label>
<input id="email" name="email" type="email" />
<button id="submit-button" type="submit">Sign up</button>
</form>
<p id="status" aria-live="polite"></p>
</main>
</body>
</html>

View File

@ -0,0 +1,21 @@
from __future__ import annotations
import os
from http.server import SimpleHTTPRequestHandler, ThreadingHTTPServer
class Handler(SimpleHTTPRequestHandler):
def do_GET(self) -> None: # noqa: N802
if self.path == "/health":
self.send_response(200)
self.end_headers()
self.wfile.write(b"ok")
return
return super().do_GET()
if __name__ == "__main__":
port = int(os.environ.get("PORT", "8123"))
server = ThreadingHTTPServer(("127.0.0.1", port), Handler)
server.serve_forever()

View File

@ -0,0 +1,23 @@
const { chromium } = require("playwright");
async function main() {
const url = process.argv[2];
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, { waitUntil: "networkidle" });
await page.fill("#email", "reader@example.com");
await page.click("#submit-button");
await page.waitForFunction(() => document.querySelector("#status").textContent.includes("Saved"), null, {
timeout: 3000,
});
const status = await page.textContent("#status");
await browser.close();
if (status.trim() !== "Saved reader@example.com") {
throw new Error(`Unexpected status: ${status}`);
}
}
main().catch((error) => {
console.error(error.message || String(error));
process.exit(1);
});

View File

@ -0,0 +1,6 @@
DEFAULTS = {
"host": "127.0.0.1",
"port": 8080,
"debug": False,
}

View File

@ -0,0 +1,20 @@
from __future__ import annotations
import json
import os
from pathlib import Path
from app_config import DEFAULTS
def load_config(path: str | None = None) -> dict[str, object]:
config = dict(DEFAULTS)
if path:
config.update(json.loads(Path(path).read_text(encoding="utf-8")))
# BUG: file values incorrectly win over environment overrides.
if "APP_PORT" in os.environ and path:
config["port"] = json.loads(Path(path).read_text(encoding="utf-8")).get("port", DEFAULTS["port"])
if "APP_DEBUG" in os.environ:
config["debug"] = os.environ["APP_DEBUG"]
return config

View File

@ -0,0 +1,20 @@
from __future__ import annotations
import json
from config_loader import load_config
def test_env_port_overrides_file(tmp_path, monkeypatch):
config_path = tmp_path / "config.json"
config_path.write_text(json.dumps({"port": 9000, "debug": False}), encoding="utf-8")
monkeypatch.setenv("APP_PORT", "9200")
cfg = load_config(str(config_path))
assert cfg["port"] == 9200
def test_debug_flag_is_boolean(monkeypatch):
monkeypatch.setenv("APP_DEBUG", "true")
cfg = load_config(None)
assert cfg["debug"] is True

View File

@ -0,0 +1 @@
q3_marketing_budget_v3.xlsx

View File

@ -0,0 +1 @@
filler 1

View File

@ -0,0 +1 @@
filler 10

View File

@ -0,0 +1 @@
filler 11

View File

@ -0,0 +1 @@
filler 12

View File

@ -0,0 +1 @@
filler 13

View File

@ -0,0 +1 @@
filler 14

View File

@ -0,0 +1 @@
filler 15

View File

@ -0,0 +1 @@
filler 16

View File

@ -0,0 +1 @@
filler 17

View File

@ -0,0 +1 @@
filler 18

View File

@ -0,0 +1 @@
filler 19

View File

@ -0,0 +1 @@
filler 2

View File

@ -0,0 +1 @@
filler 20

View File

@ -0,0 +1 @@
filler 21

View File

@ -0,0 +1 @@
filler 22

View File

@ -0,0 +1 @@
filler 23

View File

@ -0,0 +1 @@
filler 24

View File

@ -0,0 +1 @@
filler 25

View File

@ -0,0 +1 @@
filler 3

View File

@ -0,0 +1 @@
filler 4

View File

@ -0,0 +1 @@
filler 5

View File

@ -0,0 +1 @@
filler 6

View File

@ -0,0 +1 @@
filler 7

View File

@ -0,0 +1 @@
filler 8

View File

@ -0,0 +1 @@
filler 9

View File

@ -0,0 +1,4 @@
SHEET: Q2 Marketing Budget
Region,Q2 Spend
NorthAmerica,380000
TOTAL,820000

View File

@ -0,0 +1,8 @@
SHEET: Regional Breakdown
Q3 Marketing Budget by Region
Region,Q3 Spend,Notes
NorthAmerica,420000,Display + paid social
EMEA,310000,Conference sponsorships
APAC,180000,Influencer pilot
LATAM,90000,Brand awareness
TOTAL,1000000

View File

@ -0,0 +1,4 @@
SHEET: Q3 Sales Numbers
Region,Q3 Revenue
NorthAmerica,2400000
TOTAL,5800000

View File

@ -0,0 +1 @@
filler 1

View File

@ -0,0 +1 @@
filler 10

View File

@ -0,0 +1 @@
filler 2

View File

@ -0,0 +1 @@
filler 3

View File

@ -0,0 +1 @@
filler 4

View File

@ -0,0 +1 @@
filler 5

View File

@ -0,0 +1 @@
filler 6

View File

@ -0,0 +1 @@
filler 7

View File

@ -0,0 +1 @@
filler 8

View File

@ -0,0 +1 @@
filler 9

View File

@ -0,0 +1,76 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
# The agent must surface the Q3 marketing budget content. The desktop
# copy is the explicit target, but accept any file the agent created
# that contains the right content (Q3 marketing + region breakdowns).
target_substrings = ["q3", "region"]
decoy_q2 = ["q2 marketing", "q2 spend"]
decoy_sales = ["q3 revenue", "q3 sales"]
found_path = None
for path, text in iter_workspace_text_files():
# Skip the original asset-pack files (we want files the agent
# *placed* somewhere — typically a desktop/copy or report)
if "/Documents/" in str(path) and "v3" in path.name:
continue
text_lower = text.lower()
if all(s in text_lower for s in target_substrings) and "marketing" in text_lower:
# Reject decoys
if any(d in text_lower for d in decoy_q2):
continue
if any(d in text_lower for d in decoy_sales):
continue
found_path = path
break
# Also accept agent text output (e.g. answer.md) that just NAMES the
# right file
if found_path is None:
for path, text in iter_workspace_text_files():
if "q3_marketing_budget_v3" in text.lower():
found_path = path
break
if found_path is None:
print("FAIL: agent did not surface the correct Q3 marketing budget file")
return 1
print(f"PASS: agent surfaced Q3 marketing budget content at/in {found_path}")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,29 @@
Channel: #design-redesign
Date range: 2026-04-05 to 2026-04-08
[Apr 5 09:14] Marcus: Quick proposal — for the homepage refresh, let's go with option A (single hero image, no carousel). Carousels test poorly.
[Apr 5 09:18] Priya: I'm fine with A. Anything but the auto-rotating mess we have today.
[Apr 5 09:22] Sam: Agree on A. Carousels are a UX antipattern.
[Apr 5 09:30] Marcus: Cool, let's call it. Option A it is. I'll spec it out.
[Apr 5 10:01] Priya: For typography, can we move to Inter? Easier reading and we already license it.
[Apr 5 10:15] Sam: +1 Inter
[Apr 5 11:42] Marcus: Inter approved. I'll add it to the spec.
[Apr 6 08:55] Priya: Wait, on the homepage hero — I'm second-guessing this. What if we did option B (two-column with icon row) instead? It gives more above-the-fold info.
[Apr 6 09:20] Marcus: Fair point. Let me think.
[Apr 6 10:30] Sam: I prefer B too actually. More info density.
[Apr 6 13:15] Marcus: OK I'm convinced. Switching to option B. Scratch yesterday's call. Final answer: B.
[Apr 6 14:00] Sam: Great. So B for hero, Inter for type.
[Apr 6 16:10] Priya: For the CTA button color, sticking with our brand orange right? #FF6B35.
[Apr 6 16:14] Marcus: Yes brand orange. Don't touch the brand colors.
[Apr 7 09:00] zhentongfan: Catching up on this thread — sounds like option B is locked in. I can take the spec writeup if Marcus is busy.
[Apr 7 09:05] Marcus: Thanks zhentongfan, that'd be great. I owe you one.
[Apr 7 09:30] zhentongfan: I'll have a draft by end of day Friday.
[Apr 7 11:20] Priya: Open question — what happens to the testimonial section? Option B doesn't have a slot for it.
[Apr 7 11:25] Sam: Good catch. Move it below the fold? Or kill it?
[Apr 7 11:30] Priya: I'd vote move below the fold, not kill. Sales team will riot if we kill testimonials.
[Apr 7 14:40] Marcus: Let's keep testimonials, just below the fold. Not killing them.
[Apr 7 15:00] Sam: Open question still — what's the mobile breakpoint going to be?
[Apr 7 15:30] Marcus: Open question for now. Let's defer to next sprint.
[Apr 8 10:15] Priya: One more — favicon update? The current one is from 2019.
[Apr 8 10:20] Sam: Lol yes please. Open item.
[Apr 8 11:00] Marcus: Adding favicon to the followup list. Open question: who owns the asset.

View File

@ -0,0 +1,54 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if not blob:
print("FAIL: workspace contains no agent-written text files")
return 1
any_of = ['spec', 'writeup', 'write-up']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
any_of = ['friday', 'you ', 'your ']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
print("PASS: t2_msg_summarize_thread/verify_commitments.py")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,50 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if not blob:
print("FAIL: workspace contains no agent-written text files")
return 1
any_of = ['option b', 'two-column', 'two column']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
print("PASS: t2_msg_summarize_thread/verify_latest_decision.py")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,55 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if not blob:
print("FAIL: workspace contains no agent-written text files")
return 1
needed = ['decision']
if not all(s in blob for s in needed):
missing = [s for s in needed if s not in blob]
print(f"FAIL: workspace missing required content: {missing}")
return 1
any_of = ['open', 'still', 'outstanding']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
print("PASS: t2_msg_summarize_thread/verify_summary_structure.py")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,25 @@
Service Agreement
This agreement is between Lin Park ("Client") and the Vendor.
Client contact:
Name: Lin Park
Email: lin.park@personalmail.example
Phone: +1 (415) 555-0173
Account number: AC-77821-PK
Scope of work:
- Implement the data ingestion pipeline described in Appendix A.
- Deliver weekly progress reports.
- Handover by Q3 2026.
Pricing:
Base fee: $48,000 (US dollars)
Optional extension: up to $12,000 additional, billed monthly.
Termination:
Either party may terminate with 30 days written notice.
Signed,
Lin Park (Client)
April 9, 2026

View File

@ -0,0 +1,68 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
PII = [
"Lin Park", "lin.park@personalmail.example",
"+1 (415) 555-0173", "415 555-0173", "415-555-0173",
"AC-77821-PK",
]
PRESERVED_KEYWORDS = ["service agreement", "scope of work", "termination"]
def main() -> int:
# Find a redacted version of the contract somewhere
for path, text in iter_workspace_text_files():
if "contract" not in path.name.lower() or "redact" not in path.name.lower():
continue
leaked = [v for v in PII if v in text]
if leaked:
print(f"FAIL: redacted file {path} still contains PII: {leaked}")
return 1
text_lower = text.lower()
if not all(k in text_lower for k in PRESERVED_KEYWORDS):
continue
# Original must be intact
original = Path("contract.txt")
if original.exists() and "Lin Park" not in original.read_text(encoding="utf-8"):
print("FAIL: original contract.txt was modified")
return 1
print(f"PASS: redacted contract found at {path}, PII removed, structure preserved")
return 0
print("FAIL: no redacted contract file found in workspace")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,4 @@
East: 150
North: 50
West: 80

View File

@ -0,0 +1,2 @@
{"east": "East", "west": "West", "north": "North"}

View File

@ -0,0 +1,6 @@
region,amount
east,120
west,80
east,30
north,50
1 region amount
2 east 120
3 west 80
4 east 30
5 north 50

View File

@ -0,0 +1,29 @@
from __future__ import annotations
import csv
import json
import sys
def load_sales(path: str) -> list[dict[str, str]]:
with open(path, encoding="utf-8") as handle:
return list(csv.DictReader(handle))
def load_regions(path: str) -> dict[str, str]:
with open(path, encoding="utf-8") as handle:
return json.load(handle)
def build_report(sales_rows: list[dict[str, str]], region_map: dict[str, str]) -> str:
# TODO: aggregate all rows by region and include totals.
first = sales_rows[0]
region_name = region_map[first["region"]]
return f"{region_name}: {first['amount']}"
if __name__ == "__main__":
sales = load_sales(sys.argv[1])
regions = load_regions(sys.argv[2])
print(build_report(sales, regions))

Binary file not shown.

View File

@ -0,0 +1,68 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
import re, csv, io
def main() -> int:
# Find a CSV-shaped file with the EU 2026 active signups data
for path, text in iter_workspace_text_files():
if path.suffix.lower() != ".csv":
continue
rows = list(csv.reader(io.StringIO(text)))
if not rows:
continue
first_is_header = not any(any(c.isdigit() for c in cell) for cell in rows[0])
data_rows = rows[1:] if first_is_header else rows
if len(data_rows) != 7:
continue
blob = " ".join(c for r in data_rows for c in r).lower()
if "old" in blob and ("do not use" in blob or "deprecated" in blob):
continue
expected = ["organic", "paid social", "email newsletter", "referral partner"]
if sum(1 for c in expected if c in blob) >= 2:
print(f"PASS: 7 rows + correct channels in {path}")
return 0
# Also accept any text file with the right content shape
blob = workspace_blob().lower()
if "7" in blob and all(c in blob for c in ("organic", "paid social")):
print("PASS: result discussion mentions 7 rows + channels (text format)")
return 0
print("FAIL: no CSV with 7 active EU 2026 signups + correct channels")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,23 @@
from __future__ import annotations
import argparse
from exporters import export_csv, export_json
from issues import ISSUES
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("command", choices=["export"])
parser.add_argument("--format", choices=["json", "csv"], default="json")
args = parser.parse_args()
if args.format == "json":
print(export_json(ISSUES))
return
print(export_csv(ISSUES))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,4 @@
id,title,status
101,Fix login loop,open
102,Improve metrics panel,closed
1 id title status
2 101 Fix login loop open
3 102 Improve metrics panel closed

View File

@ -0,0 +1,10 @@
import json
def export_json(issues: list[dict[str, object]]) -> str:
return json.dumps(issues, sort_keys=True)
def export_csv(issues: list[dict[str, object]]) -> str:
raise NotImplementedError("csv export is not implemented yet")

View File

@ -0,0 +1,5 @@
ISSUES = [
{"id": 101, "title": "Fix login loop", "status": "open"},
{"id": 102, "title": "Improve metrics panel", "status": "closed"},
]

View File

@ -0,0 +1,11 @@
from exporters import export_csv
from issues import ISSUES
def test_csv_export_has_header_and_rows():
assert export_csv(ISSUES) == (
"id,title,status\n"
"101,Fix login loop,open\n"
"102,Improve metrics panel,closed\n"
)

View File

@ -0,0 +1,11 @@
From: priya.shah@acme-customer.com
Subject: URGENT: Production outage on our side - need help today
Hi,
Our production environment using your service has been down since 2pm.
Error logs are attached. Can someone from your team look at this today?
This is impacting our customer-facing checkout flow.
Priya
VP Engineering, Acme Corp

View File

@ -0,0 +1,11 @@
From: digest@techweekly.example
Subject: TechWeekly Digest #482 — AI infrastructure roundup
Top stories this week:
- New benchmarks released for vector databases
- Hardware vendors announce next-gen accelerators
- Open source models continue rapid progress
Click to read more.
To unsubscribe, click here.

View File

@ -0,0 +1,14 @@
From: security-alerts@anth-rop1c.com
Subject: URGENT: Your account will be suspended
Dear user,
We have detected unusual activity on your account. To prevent suspension,
please verify your credentials by clicking the link below within 24 hours:
http://anth-rop1c.com/verify?id=00fA92
If you do not verify, your account WILL be locked and all data will be
deleted.
Anthropic Security Team

View File

@ -0,0 +1,13 @@
From: marcus@yourteam.example
Subject: Weekly status — Q2 planning
Hey,
Quick recap of where we are:
- Auth migration on track for May 1
- Marketing site redesign starting next Monday
- Hiring: 3 candidates in final rounds
No action needed from you, just FYI.
Marcus

View File

@ -0,0 +1,6 @@
From: jamie@partner.example
Subject: thoughts?
hey can you let me know what you think when you have a minute. thanks!
Jamie

View File

@ -0,0 +1,12 @@
From: legal@yourcompany.example
Subject: Need your sign-off on the vendor MSA by EOD Wednesday
Hi,
We're closing the Innotech vendor agreement and need your sign-off on the
MSA before Wednesday end-of-day. The redlines are minor — mostly
indemnification language.
Please reply with either "approved" or specific changes.
Legal Team

View File

@ -0,0 +1,9 @@
From: events@localmeetup.example
Subject: AI engineering meetup next Thursday
Hi,
Reminder that our monthly meetup is happening next Thursday at 6pm.
Topic: production LLM observability. Free pizza.
RSVP at the link below if interested.

View File

@ -0,0 +1,13 @@
From: recruiter@bigcorp.example
Subject: Senior Engineering opportunity at BigCorp
Hi,
I came across your profile and think you'd be a great fit for a senior
engineering role we're filling. Compensation range is competitive, and
the team works on interesting problems.
Would you be open to a brief chat?
Best,
Tony

View File

@ -0,0 +1,10 @@
priority_signals:
urgent_keywords: ["urgent", "outage", "production", "EOD", "end of day", "today"]
trusted_domains: ["acme-customer.com", "yourcompany.example", "yourteam.example"]
newsletter_domains: ["techweekly.example", "localmeetup.example"]
always_low_priority_subjects: ["digest", "newsletter", "weekly status"]
draft_for_urgent: true
draft_for_low_priority: false
flag_phishing: true
do_not_reply_to_recruiter_cold_outreach: true

View File

@ -0,0 +1,58 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if not blob:
print("FAIL: workspace contains no agent-written text files")
return 1
any_of = ['acme', 'outage', 'production']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
any_of = ['legal', 'msa', 'innotech', 'vendor']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
any_of = ['phishing', 'suspicious', 'anth-rop1c', 'anth-rop']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
print("PASS: t3_msg_inbox_triage/verify_all_classified.py")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,54 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if not blob:
print("FAIL: workspace contains no agent-written text files")
return 1
any_of = ['acme', 'outage', 'priya']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
any_of = ['legal', 'msa', 'innotech']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
print("PASS: t3_msg_inbox_triage/verify_drafts_for_urgent.py")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,50 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if not blob:
print("FAIL: workspace contains no agent-written text files")
return 1
any_of = ['phishing', 'suspicious', 'scam', 'spoof', 'fake', 'do not click']
if not any(s in blob for s in any_of):
print(f"FAIL: workspace missing any of: {any_of}")
return 1
print("PASS: t3_msg_inbox_triage/verify_phishing_flagged.py")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,14 @@
<!doctype html>
<html><head><title>Solar Curtailment 101 — Grid Operator Quarterly</title></head>
<body>
<article>
<h1>Solar Curtailment 101</h1>
<p>Source: Grid Operator Quarterly | Published 2025-11-14 | author: Lin Park</p>
<p>When solar output exceeds local demand, grid operators traditionally
"curtail" — instructing solar farms to reduce production. This wastes
clean energy. In 2024, California's CAISO curtailed 3.2 TWh of solar.</p>
<p>Operators are increasingly pivoting to active management: shifting
loads, charging batteries, and exporting to neighboring regions instead
of curtailing.</p>
</article>
</body></html>

View File

@ -0,0 +1,13 @@
<!doctype html>
<html><head><title>Battery Storage Soaks Up Excess Solar — Energy Wire</title></head>
<body>
<article>
<h1>Battery Storage Soaks Up Excess Solar</h1>
<p>Source: Energy Wire | Published 2026-02-03 | author: Maya Johansson</p>
<p>Utility-scale battery installations doubled in 2025. The
California Independent System Operator reports that storage absorbed
roughly 40 percent of would-be-curtailed midday solar in Q4 2025.</p>
<p>Texas ERCOT followed a similar trajectory, with battery storage
helping smooth duck-curve effects.</p>
</article>
</body></html>

View File

@ -0,0 +1,13 @@
<!doctype html>
<html><head><title>Negative Price Hours Are the New Normal — Power Markets Today</title></head>
<body>
<article>
<h1>Negative Price Hours Are the New Normal</h1>
<p>Source: Power Markets Today | Published 2026-01-22 | author: Dev Patel</p>
<p>European wholesale markets saw record numbers of negative pricing
hours in 2025. Germany alone recorded 466 hours of sub-zero spot
prices, primarily during high solar generation periods.</p>
<p>This is creating both opportunities (for flexible loads) and
challenges (for project economics).</p>
</article>
</body></html>

View File

@ -0,0 +1,13 @@
<!doctype html>
<html><head><title>Curtailment Is Actually a Feature, Not a Bug — Contrarian View</title></head>
<body>
<article>
<h1>Curtailment Is Actually a Feature, Not a Bug</h1>
<p>Source: Energy Contrarian | Published 2026-03-12 | author: Jordan Hayes</p>
<p>The dominant narrative says curtailment is wasted energy. Some grid
economists disagree: occasional curtailment is the price of building
ample solar capacity, and it's cheaper than over-engineering storage.</p>
<p>This view directly contradicts the storage-first orthodoxy held by
most operators today.</p>
</article>
</body></html>

View File

@ -0,0 +1,12 @@
<!doctype html>
<html><head><title>BREAKING: Sun Switched Off to Help Grid Operators (SATIRE)</title></head>
<body>
<article>
<h1>BREAKING: Sun Switched Off to Help Grid Operators</h1>
<p>Source: The Onion-Adjacent Energy Times | Published 2026-04-01 | tagged: SATIRE / HUMOR</p>
<p>Grid operators rejoiced today as the sun was officially placed on
standby mode by the Department of Celestial Affairs. "Finally we don't
have to deal with all that pesky free energy," said one engineer.</p>
<p>NOTE: This article is satire and should not be cited as fact.</p>
</article>
</body></html>

View File

@ -0,0 +1,66 @@
"""Serve the local sandboxed news site for the research-and-cite task."""
from __future__ import annotations
import os
from http.server import BaseHTTPRequestHandler, HTTPServer
from pathlib import Path
ROOT = Path(__file__).parent / "articles"
class Handler(BaseHTTPRequestHandler):
def do_GET(self) -> None: # noqa: N802
path = self.path.split("?")[0]
if path == "/health":
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(b'{"ok": true}')
return
if path == "/" or path == "/index":
self._index()
return
if path.startswith("/article/"):
slug = path.split("/", 2)[2]
article = ROOT / f"{slug}.html"
if article.exists():
self._html(article.read_bytes())
return
self.send_response(404)
self.end_headers()
self.wfile.write(b"not found")
def _index(self) -> None:
items = []
for f in sorted(ROOT.glob("*.html")):
slug = f.stem
items.append(f'<li><a href="/article/{slug}">{slug}</a></li>')
body = (
"<!doctype html><html><body>"
"<h1>Sandboxed News Index</h1><ul>"
+ "".join(items)
+ "</ul></body></html>"
).encode("utf-8")
self._html(body)
def _html(self, body: bytes) -> None:
self.send_response(200)
self.send_header("Content-Type", "text/html")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
def log_message(self, format, *args): # noqa: A002
pass
def main() -> None:
port = int(os.environ.get("PORT", "0"))
server = HTTPServer(("127.0.0.1", port), Handler)
print(f"sandbox news site listening on http://127.0.0.1:{server.server_address[1]}")
server.serve_forever()
if __name__ == "__main__":
main()

View File

@ -0,0 +1,71 @@
"""Recursive workspace search verifier."""
from __future__ import annotations
import sys
from pathlib import Path
EXCLUDE_FRAGMENTS = (
"verify_", "/.git/", "/.openclaw/",
"BOOTSTRAP.md", "IDENTITY.md", "AGENTS.md",
"USER.md", "SOUL.md", "HEARTBEAT.md",
)
TEXT_SUFFIXES = (".md", ".txt", ".json", ".yaml", ".yml", ".csv", ".log",
".jsonl", ".html", ".sh", ".py")
def iter_workspace_text_files(root: Path = Path(".")):
for path in root.rglob("*"):
if not path.is_file():
continue
sp = str(path)
if any(frag in sp for frag in EXCLUDE_FRAGMENTS):
continue
if path.suffix.lower() not in TEXT_SUFFIXES:
continue
try:
yield path, path.read_text(encoding="utf-8", errors="ignore")
except Exception:
continue
def workspace_blob() -> str:
return "\n".join(text for _, text in iter_workspace_text_files())
def main() -> int:
blob = workspace_blob().lower()
if len(blob) < 500:
print(f"FAIL: workspace text too thin ({len(blob)} chars) for an explainer")
return 1
real_keywords = {
"01_grid_basics": ["grid operator quarterly", "lin park", "caiso", "3.2 twh"],
"02_battery_storage": ["energy wire", "maya johansson", "40 percent", "ercot"],
"03_pricing_signals": ["power markets today", "dev patel", "negative", "466 hours"],
"04_curtailment_helps": ["energy contrarian", "jordan hayes", "feature, not a bug",
"feature not a bug"],
}
cited = sum(1 for kws in real_keywords.values() if any(k in blob for k in kws))
if cited < 2:
print(f"FAIL: only {cited}/4 real articles cited (need >=2)")
return 1
if "sun switched off" in blob or "celestial affairs" in blob:
if not any(w in blob for w in ("satire", "humor", "joke", "not real")):
print("FAIL: cited the satirical article as fact")
return 1
if not any(w in blob for w in ("contradict", "disagree", "however", "on the other hand",
"in contrast", "alternative view", "counterpoint",
"but some", "by contrast", "nuance")):
print("FAIL: explainer does not surface the contradiction between sources")
return 1
print(f"PASS: {cited}/4 real sources cited, contradiction surfaced")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,41 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Reporting API Docs</title>
</head>
<body>
<h1>Reporting API</h1>
<h2>Versioning</h2>
<ul>
<li><code>/v1/reports</code><strong>deprecated</strong>, sunset on 2026-07-01.</li>
<li><code>/v2/reports</code><strong>current</strong> (GA since 2026.2). Use this.</li>
<li><code>/v3/reports</code><strong>beta</strong>, not recommended for production; interface may change.</li>
</ul>
<p>New integrations must use <code>/v2/reports</code>.</p>
<h2>Required headers (for /v2/reports)</h2>
<p>Every request to the current reporting endpoint <em>must</em> include:</p>
<ul>
<li><code>X-Workspace-Id</code> — identifies the tenant workspace.</li>
<li><code>Authorization</code><code>Bearer &lt;token&gt;</code>.</li>
</ul>
<h2>Optional headers</h2>
<ul>
<li><code>X-Request-Id</code> — opaque client-side correlation id for tracing.</li>
</ul>
<h2>Headers for other endpoints (do NOT send on /v2/reports)</h2>
<ul>
<li><code>X-Admin-Token</code> — required on <code>/v2/admin</code> only. Sending it on <code>/v2/reports</code> will cause a 400.</li>
</ul>
<h2>Rate limits</h2>
<p>The <code>/v2/reports</code> endpoint is rate-limited to <strong>120 requests per minute</strong> per workspace. Requests beyond this return 429.</p>
<h2>Payload size</h2>
<p>Max body size on <code>/v2/reports</code> is <strong>10 MiB</strong>. Larger payloads should use the chunked upload flow (see <code>/v2/uploads</code>, not covered here).</p>
</body>
</html>

View File

@ -0,0 +1,7 @@
API_PATH = "/v1/reports"
REQUIRED_HEADERS = ["Authorization"]
# Rate-limit + payload guards the agent must set to match the published
# reporting API contract. Starter values are wrong on purpose.
RATE_LIMIT_PER_MINUTE = None
MAX_PAYLOAD_BYTES = None

View File

@ -0,0 +1,24 @@
from __future__ import annotations
import os
from http.server import SimpleHTTPRequestHandler, ThreadingHTTPServer
class Handler(SimpleHTTPRequestHandler):
def do_GET(self) -> None: # noqa: N802
if self.path == "/health":
self.send_response(200)
self.end_headers()
self.wfile.write(b"ok")
return
return super().do_GET()
def log_message(self, format: str, *args) -> None: # noqa: A003
return
if __name__ == "__main__":
port = int(os.environ.get("PORT", "8125"))
os.chdir("docs")
server = ThreadingHTTPServer(("127.0.0.1", port), Handler)
server.serve_forever()

View File

@ -0,0 +1,36 @@
from report_client import (
API_PATH,
REQUIRED_HEADERS,
RATE_LIMIT_PER_MINUTE,
MAX_PAYLOAD_BYTES,
)
def test_reporting_api_path_is_v2():
# v1 is deprecated (sunset 2026-07-01), v3 is beta — current is v2.
assert API_PATH == "/v2/reports"
def test_workspace_header_is_required():
assert "X-Workspace-Id" in REQUIRED_HEADERS
def test_authorization_header_is_required():
# Bearer token is required per the docs.
assert "Authorization" in REQUIRED_HEADERS
def test_admin_token_is_not_a_required_header():
# X-Admin-Token is only for /v2/admin — sending it on /v2/reports returns 400.
# Distractor — the agent must correctly scope required headers.
assert "X-Admin-Token" not in REQUIRED_HEADERS
def test_rate_limit_matches_docs():
# 120 requests per minute per workspace.
assert RATE_LIMIT_PER_MINUTE == 120
def test_max_payload_size_matches_docs():
# 10 MiB = 10 * 1024 * 1024 bytes.
assert MAX_PAYLOAD_BYTES == 10 * 1024 * 1024

View File

@ -0,0 +1,5 @@
def validate_event(payload: dict[str, object]) -> dict[str, object]:
if "customer_name" not in payload:
raise ValueError("missing customer_name")
return {"customer_name": payload["customer_name"], "status": payload["status"]}

View File

@ -0,0 +1,7 @@
from contracts.customer_event import validate_event
def test_schema_uses_account_name():
payload = validate_event({"account_name": "Acme", "status": "active"})
assert payload["account_name"] == "Acme"

View File

@ -0,0 +1,3 @@
def render_account(event: dict[str, object]) -> str:
return f"{event['customer_name']} ({event['status']})"

Some files were not shown because too many files have changed in this diff Show More