-
7da58897af
ci: default crabbox owned capacity to standard (#22)
main
Vincent Koc
2026-05-07 02:47:04 -0700
-
1a9dc8d549
ci: default crabbox owned capacity to standard
codex/crabbox-skill-refresh
Vincent Koc
2026-05-07 02:44:44 -0700
-
-
e0a86b4232
Merge pull request #21 from sallyom/k8s-job
scoootscooob
2026-05-06 15:02:15 -0700
-
-
a95423b3c6
Fix Kubernetes sidecar deploy flow
scoootscooob
2026-05-06 14:51:54 -0700
-
7d75d99643
add docs, manifests for k8s
sallyom
2026-05-05 21:36:44 -0400
-
-
d57e4a697d
Merge pull request #19 from openclaw/codex/openclaw-websocket-run-lifecycle
scoootscooob
2026-05-04 12:25:14 -0700
-
-
e3ad7ac173
fix(eval): isolate lane queues and configs
codex/openclaw-websocket-run-lifecycle
scoootscooob
2026-05-04 12:19:20 -0700
-
cce89d828b
feat: add crabbox validation wiring
Vincent Koc
2026-05-02 18:34:01 -0700
-
abf3500f69
fix(harness): keep gateway RPC sockets alive
codex/stabilized-eval-suite
scoootscooob
2026-05-02 14:51:52 -0700
-
cebd1c8026
chore(repo): clean public benchmark surface
scoootscooob
2026-05-02 12:18:58 -0700
-
7eb854710f
feat(eval): stabilize full-suite adapter runs
scoootscooob
2026-05-02 10:24:03 -0700
-
5dfa4c9280
fix(eval): stabilize OpenClaw container sweeps
scoootscooob
2026-05-02 02:50:57 -0700
-
f09a9f4bf7
fix(eval): carry tool profile through harness
scoootscooob
2026-05-02 02:01:13 -0700
-
f45eb288d9
fix(eval): harden OpenClaw run lifecycle waits
scoootscooob
2026-05-02 01:38:08 -0700
-
-
4e6a686ae5
fix(deps): update benchmark dependency bounds
Vincent Koc
2026-04-30 15:14:54 -0700
-
01dd96c71c
fix(security): constrain research article paths
Vincent Koc
2026-04-30 02:57:52 -0700
-
e80902bafa
chore: add codeowners
Vincent Koc
2026-04-29 16:02:36 -0700
-
56531fbf43
feat: add adapter canonicalization layer
scoootscooob
2026-04-29 13:57:13 -0700
-
69a2311681
fix: harden adapter workspace checks
codex/adapter-canonicalization
Vincent Koc
2026-04-29 13:53:44 -0700
-
82eaadbc61
Merge remote-tracking branch 'origin/main' into pr17-nonrewrite
Vincent Koc
2026-04-29 13:52:41 -0700
-
-
-
-
dc8a1936ab
fix(worker): harden runtime result writes
Vincent Koc
2026-04-29 13:24:40 -0700
-
82bcfc1891
fix(worker): harden runtime result writes
test/e2e-runtime-contracts
Vincent Koc
2026-04-29 13:16:40 -0700
-
-
-
30334cac88
feat: add adapter canonicalization layer
scoootscooob
2026-04-29 11:15:11 -0700
-
ea17c715b3
fix(client): clean pending rpc on send failure
Vincent Koc
2026-04-29 00:09:27 -0700
-
e7bd37c1b5
fix(client): clean pending rpc on send failure
test/critical-paths
Vincent Koc
2026-04-29 00:07:40 -0700
-
-
-
88ab0f5564
test: cover environment verifier success paths
Vincent Koc
2026-04-28 23:27:38 -0700
-
314de6a8e3
test: cover environment verifier success paths
test/environment-coverage
Vincent Koc
2026-04-28 23:26:10 -0700
-
-
-
8172fad70e
test: cover judge score gate propagation
Vincent Koc
2026-04-28 23:08:58 -0700
-
07eba26f98
test: cover judge score gate propagation
test/extend-coverage
Vincent Koc
2026-04-28 23:07:27 -0700
-
-
-
fb486a1ed3
fix(scoring): gate judge-weighted scores
Vincent Koc
2026-04-28 22:52:12 -0700
-
2670dcadf0
fix(scoring): include judge gate in run cache key
fix/gate-judge-scoring
Vincent Koc
2026-04-28 22:50:06 -0700
-
fbb13ac4d9
Merge remote-tracking branch 'origin/main' into fix/gate-judge-scoring-ff
Vincent Koc
2026-04-28 22:49:01 -0700
-
-
-
-
ed9adf8d84
fix(runtime): harden benchmark cache and task paths
Vincent Koc
2026-04-28 22:40:46 -0700
-
3946e63c7d
fix(runtime): harden benchmark cache and task paths
fix/clawbench-review-hardening
Vincent Koc
2026-04-28 22:37:07 -0700
-
-
-
e120e86601
fix: flag credential file access in dangerous shell patterns (#6)
Aaron Zhu
2026-04-29 04:17:11 +0800
-
dddfc0a175
fix: flag git push --force variants as dangerous shell commands (#5)
Aaron Zhu
2026-04-29 04:17:01 +0800
-
c72e41687d
chore: add open-source contribution scaffolding (#3)
HeYan
2026-04-28 13:16:52 -0700
-
d21648ad3d
fix: strip quoted strings before checking for shell redirect operators (#2)
HeYan
2026-04-28 13:16:42 -0700
-
-
-
453ddc0ca5
Merge remote-tracking branch 'origin/fix/gate-judge-scoring' into fix/gate-judge-scoring
Vincent Koc
2026-04-28 11:37:48 -0700
-
-
d7a2e50ea3
fix(scoring): gate judge-weighted scores
Vincent Koc
2026-04-28 10:54:18 -0700
-
-
-
0625ab7159
fix(runtime): harden queue and gateway lifecycle
Vincent Koc
2026-04-28 11:34:53 -0700
-
2b9c277512
fix(scoring): gate judge-weighted scores
Vincent Koc
2026-04-28 10:54:18 -0700
-
-
-
dd92f8884c
chore(dev): add lint guardrails
Vincent Koc
2026-04-28 10:50:07 -0700
-
38a2a0ff91
perf(app): cache leaderboard loads
Vincent Koc
2026-04-28 10:49:52 -0700
-
509f21bb95
fix(cli): sync scenario filters
Vincent Koc
2026-04-28 10:49:38 -0700
-
-
b5538e0927
Copy all package data in HF Docker build
scoootscooob
2026-04-28 02:35:09 -0700
-
425daa4fc8
Copy partner spec in HF Docker build
scoootscooob
2026-04-28 02:31:26 -0700
-
d069bcfe3a
Fix HF Docker package build
scoootscooob
2026-04-28 02:26:30 -0700
-
4ad2f1f417
fix(ci): ensure hugging face space before sync
Vincent Koc
2026-04-28 01:50:26 -0700
-
fc86dd6155
ci: add blacksmith testbox setup
Vincent Koc
2026-04-28 01:45:35 -0700
-
f373e4a710
fix: harden packaging and submissions
Vincent Koc
2026-04-28 01:17:43 -0700
-
fb029437be
Add MIT license file
scoootscooob
2026-04-28 00:05:38 -0700
-
4b7a9ee31c
Fix public Docker task copies
scoootscooob
2026-04-27 22:57:10 -0700
-
595cdc910c
Add public domain scaffold and adapter diagnostics
scoootscooob
2026-04-23 12:40:23 -0700
-
df32a5f073
Merge pull request #7 from HaoLi111/feat/dynamics-analysis
scoootscooob
2026-04-22 13:11:32 -0700
-
-
11d943f21c
fix: preserve preset submission settings and lazy-load plots
codex/pr-7-merge-ready
scoootscooob
2026-04-22 09:59:52 -0700
-
c209612d46
Add archive dynamics pipeline and audience-based model presets
pllm-uci
2026-04-21 20:24:41 -0700
-
-
5b50814dfc
Merge pull request #8 from gchlebus/gchlebus/fix-connect-timeout
scoootscooob
2026-04-22 09:47:06 -0700
-
-
79b2253bfc
fix(ci): restore public task fallback
scoootscooob
2026-04-22 09:46:33 -0700
-
e4ca2bef8e
fix(client): reject invalid timeout env values
gchlebus/fix-connect-timeout
scoootscooob
2026-04-22 09:41:44 -0700
-
547ee160ad
fix(client): raise default connect_timeout to 30s and make it env-overridable
Grzegorz Chlebus
2026-04-22 10:19:20 +0200
-
8447ab1ca6
docker: revert OpenClaw base pin; remove reference scores
scoootscooob
2026-04-20 21:24:42 -0700
-
0e250e3fe1
fix(ci): tasks-public fallback + leaderboard removed from README
scoootscooob
2026-04-20 20:32:26 -0700
-
f95e838d99
docs: rewrite README around Core v1 + dynamical-systems diagnostics
scoootscooob
2026-04-20 20:15:18 -0700
-
030e9968bd
docker: pin OpenClaw base to 2026.4.15-beta.1 for Core v1 reproducibility
scoootscooob
2026-04-20 20:09:49 -0700
-
-
50959fa670
tasks: add Core v1 public task set (19 tasks)
scoootscooob
2026-04-20 20:06:36 -0700
-
b6f07d9a87
analysis: dynamical-systems diagnostics for agent runs
scoootscooob
2026-04-20 19:49:05 -0700
-
afb14c3982
analysis: fair-comparison audit and rejudge pipeline
scoootscooob
2026-04-20 19:48:43 -0700
-
01a31e55fb
sweep: per-container state isolation + qwen model-id fix
scoootscooob
2026-04-20 19:48:30 -0700
-
deb3d5d85d
tasks: stop tracking current task set; fix t2 integration test for emptyNote
scoootscooob
2026-04-19 12:29:52 -0700
-
95b226dfed
tasks: harden 5 ceiling-bound tasks for better model differentiation
scoootscooob
2026-04-19 12:24:25 -0700
-
cb48ca72e8
tasks: drop strict completion.files checks on 19 tasks
scoootscooob
2026-04-18 13:16:34 -0700
-
8a5be9c686
clawbench: per-sweep cache archiving + generic sweep templates
scoootscooob
2026-04-18 12:46:45 -0700
-
fe8fef7795
Merge branch 'pr-4' into codex/merge-pr4
scoootscooob
2026-04-16 19:50:11 -0700
-
-
ee8ff79347
docs: fix ollama profile guidance
scoootscooob
2026-04-16 19:49:04 -0700
-
9d802d6c53
fix: classify find_replace-style tools as edits
scoootscooob
2026-04-16 19:37:01 -0700
-
517f2207b0
Refine local Ollama profile documentation for clarity and usability
pllm-uci
2026-04-15 11:45:57 -0700
-
e2d82b34c3
Add local Ollama model support and configuration guidance to README and profiles
pllm-uci
2026-04-15 11:45:12 -0700
-
a2757e6bd9
fix: classify str_replace and insert tools as mutating edits
HeYan
2026-04-14 00:57:40 -0700
-
-
eb879adf9b
Remove reports/ reference from README repo layout
scoootscooob
2026-04-14 00:52:17 -0700
-
6ab3004d63
Remove reports and scripts from repo, add to gitignore
scoootscooob
2026-04-14 00:51:50 -0700
-
0d07aa4d08
Re-judge GPT 5.4: resolve judge auth caveat, full coverage
scoootscooob
2026-04-14 00:36:27 -0700
-
952decadcf
Rewrite README, harden worker, add benchmark reports
scoootscooob
2026-04-14 00:11:34 -0700
-
44bef14f4d
Add partner trace submission spec
scoootscooob
2026-04-11 15:36:54 -0700
-
b4620d10ca
worker: harden gateway runtime and resume behavior
scoootscooob
2026-04-11 15:27:14 -0700
-
380c6b4815
bench: audit contamination and harden HF leaderboard loading
scoootscooob
2026-04-11 07:14:32 -0700
-
99803642b0
bench: add trace ingestion and template promotion pipeline
scoootscooob
2026-04-11 06:45:27 -0700
-
02573d565d
bench: add hidden release scaffolding and CI push coverage
Codex
2026-04-11 06:28:43 -0700
-
29c1cd90e4
worker: fail-fast on hung sessions.create, retry control-plane probe
Codex
2026-04-11 05:30:49 -0700
-
ab69af31be
upload: read-then-append instead of overwriting submissions split
Codex
2026-04-11 00:22:24 -0700
-
78d844364f
ci: trigger HF Space sync (secrets added)
Codex
2026-04-11 00:17:55 -0700
-
f55b990476
docs: add .github/workflows/README for HF sync setup
Codex
2026-04-11 00:02:45 -0700
-
19e4750b69
ci: auto-mirror main to HF Space on every push
Codex
2026-04-11 00:01:20 -0700
-
07a20c3f18
HF Space: dynamic stats + fix leaderboard environment parsing
Codex
2026-04-10 23:55:37 -0700
-
c24d982110
HF Space: fix container eval — pytest in runtime deps, TASKS_DIR resolver, timeouts
Codex
2026-04-10 23:03:15 -0700
-
e9ff163217
baselines: merge provenance docs into BASELINE_SOURCES.md
Codex
2026-04-10 20:36:18 -0700
-
3cdade49ce
README: rewrite for v0.5 with architecture, numbers, and positioning
Codex
2026-04-10 19:21:48 -0700
-
4744a6ae7e
ClawBench: 7-model frontier baseline + bake-off tooling
Codex
2026-04-10 19:14:11 -0700
-
4aa017838a
ClawBench v0.5: tests + task corpus expansion
Codex
2026-04-10 19:13:37 -0700
-
cf04a17fea
ClawBench v0.5: configuration-space diagnostic framework
Codex
2026-04-10 19:13:02 -0700