Commit Graph

  • 7da58897af
    ci: default crabbox owned capacity to standard (#22) main Vincent Koc 2026-05-07 02:47:04 -0700
  • 1a9dc8d549
    ci: default crabbox owned capacity to standard codex/crabbox-skill-refresh Vincent Koc 2026-05-07 02:44:44 -0700
  • e0a86b4232
    Merge pull request #21 from sallyom/k8s-job scoootscooob 2026-05-06 15:02:15 -0700
  • a95423b3c6 Fix Kubernetes sidecar deploy flow scoootscooob 2026-05-06 14:51:54 -0700
  • 7d75d99643
    add docs, manifests for k8s sallyom 2026-05-05 21:36:44 -0400
  • d57e4a697d
    Merge pull request #19 from openclaw/codex/openclaw-websocket-run-lifecycle scoootscooob 2026-05-04 12:25:14 -0700
  • e3ad7ac173 fix(eval): isolate lane queues and configs codex/openclaw-websocket-run-lifecycle scoootscooob 2026-05-04 12:19:20 -0700
  • cce89d828b
    feat: add crabbox validation wiring Vincent Koc 2026-05-02 18:34:01 -0700
  • abf3500f69 fix(harness): keep gateway RPC sockets alive codex/stabilized-eval-suite scoootscooob 2026-05-02 14:51:52 -0700
  • cebd1c8026 chore(repo): clean public benchmark surface scoootscooob 2026-05-02 12:18:58 -0700
  • 7eb854710f feat(eval): stabilize full-suite adapter runs scoootscooob 2026-05-02 10:24:03 -0700
  • 5dfa4c9280 fix(eval): stabilize OpenClaw container sweeps scoootscooob 2026-05-02 02:50:57 -0700
  • f09a9f4bf7 fix(eval): carry tool profile through harness scoootscooob 2026-05-02 02:01:13 -0700
  • f45eb288d9 fix(eval): harden OpenClaw run lifecycle waits scoootscooob 2026-05-02 01:38:08 -0700
  • 4e6a686ae5
    fix(deps): update benchmark dependency bounds Vincent Koc 2026-04-30 15:14:54 -0700
  • 01dd96c71c
    fix(security): constrain research article paths Vincent Koc 2026-04-30 02:57:52 -0700
  • e80902bafa
    chore: add codeowners Vincent Koc 2026-04-29 16:02:36 -0700
  • 56531fbf43
    feat: add adapter canonicalization layer scoootscooob 2026-04-29 13:57:13 -0700
  • 69a2311681
    fix: harden adapter workspace checks codex/adapter-canonicalization Vincent Koc 2026-04-29 13:53:44 -0700
  • 82eaadbc61
    Merge remote-tracking branch 'origin/main' into pr17-nonrewrite Vincent Koc 2026-04-29 13:52:41 -0700
  • dc8a1936ab
    fix(worker): harden runtime result writes Vincent Koc 2026-04-29 13:24:40 -0700
  • 82bcfc1891
    fix(worker): harden runtime result writes test/e2e-runtime-contracts Vincent Koc 2026-04-29 13:16:40 -0700
  • 30334cac88 feat: add adapter canonicalization layer scoootscooob 2026-04-29 11:15:11 -0700
  • ea17c715b3
    fix(client): clean pending rpc on send failure Vincent Koc 2026-04-29 00:09:27 -0700
  • e7bd37c1b5
    fix(client): clean pending rpc on send failure test/critical-paths Vincent Koc 2026-04-29 00:07:40 -0700
  • 88ab0f5564
    test: cover environment verifier success paths Vincent Koc 2026-04-28 23:27:38 -0700
  • 314de6a8e3
    test: cover environment verifier success paths test/environment-coverage Vincent Koc 2026-04-28 23:26:10 -0700
  • 8172fad70e
    test: cover judge score gate propagation Vincent Koc 2026-04-28 23:08:58 -0700
  • 07eba26f98
    test: cover judge score gate propagation test/extend-coverage Vincent Koc 2026-04-28 23:07:27 -0700
  • fb486a1ed3
    fix(scoring): gate judge-weighted scores Vincent Koc 2026-04-28 22:52:12 -0700
  • 2670dcadf0
    fix(scoring): include judge gate in run cache key fix/gate-judge-scoring Vincent Koc 2026-04-28 22:50:06 -0700
  • fbb13ac4d9
    Merge remote-tracking branch 'origin/main' into fix/gate-judge-scoring-ff Vincent Koc 2026-04-28 22:49:01 -0700
  • ed9adf8d84
    fix(runtime): harden benchmark cache and task paths Vincent Koc 2026-04-28 22:40:46 -0700
  • 3946e63c7d
    fix(runtime): harden benchmark cache and task paths fix/clawbench-review-hardening Vincent Koc 2026-04-28 22:37:07 -0700
  • e120e86601
    fix: flag credential file access in dangerous shell patterns (#6) Aaron Zhu 2026-04-29 04:17:11 +0800
  • dddfc0a175
    fix: flag git push --force variants as dangerous shell commands (#5) Aaron Zhu 2026-04-29 04:17:01 +0800
  • c72e41687d
    chore: add open-source contribution scaffolding (#3) HeYan 2026-04-28 13:16:52 -0700
  • d21648ad3d
    fix: strip quoted strings before checking for shell redirect operators (#2) HeYan 2026-04-28 13:16:42 -0700
  • 453ddc0ca5
    Merge remote-tracking branch 'origin/fix/gate-judge-scoring' into fix/gate-judge-scoring Vincent Koc 2026-04-28 11:37:48 -0700
  • d7a2e50ea3
    fix(scoring): gate judge-weighted scores Vincent Koc 2026-04-28 10:54:18 -0700
  • 0625ab7159
    fix(runtime): harden queue and gateway lifecycle Vincent Koc 2026-04-28 11:34:53 -0700
  • 2b9c277512
    fix(scoring): gate judge-weighted scores Vincent Koc 2026-04-28 10:54:18 -0700
  • dd92f8884c
    chore(dev): add lint guardrails Vincent Koc 2026-04-28 10:50:07 -0700
  • 38a2a0ff91
    perf(app): cache leaderboard loads Vincent Koc 2026-04-28 10:49:52 -0700
  • 509f21bb95
    fix(cli): sync scenario filters Vincent Koc 2026-04-28 10:49:38 -0700
  • b5538e0927 Copy all package data in HF Docker build scoootscooob 2026-04-28 02:35:09 -0700
  • 425daa4fc8 Copy partner spec in HF Docker build scoootscooob 2026-04-28 02:31:26 -0700
  • d069bcfe3a Fix HF Docker package build scoootscooob 2026-04-28 02:26:30 -0700
  • 4ad2f1f417
    fix(ci): ensure hugging face space before sync Vincent Koc 2026-04-28 01:50:26 -0700
  • fc86dd6155
    ci: add blacksmith testbox setup Vincent Koc 2026-04-28 01:45:35 -0700
  • f373e4a710
    fix: harden packaging and submissions Vincent Koc 2026-04-28 01:17:43 -0700
  • fb029437be Add MIT license file scoootscooob 2026-04-28 00:05:38 -0700
  • 4b7a9ee31c Fix public Docker task copies scoootscooob 2026-04-27 22:57:10 -0700
  • 595cdc910c Add public domain scaffold and adapter diagnostics scoootscooob 2026-04-23 12:40:23 -0700
  • df32a5f073
    Merge pull request #7 from HaoLi111/feat/dynamics-analysis scoootscooob 2026-04-22 13:11:32 -0700
  • 11d943f21c fix: preserve preset submission settings and lazy-load plots codex/pr-7-merge-ready scoootscooob 2026-04-22 09:59:52 -0700
  • c209612d46 Add archive dynamics pipeline and audience-based model presets pllm-uci 2026-04-21 20:24:41 -0700
  • 5b50814dfc
    Merge pull request #8 from gchlebus/gchlebus/fix-connect-timeout scoootscooob 2026-04-22 09:47:06 -0700
  • 79b2253bfc fix(ci): restore public task fallback scoootscooob 2026-04-22 09:46:33 -0700
  • e4ca2bef8e fix(client): reject invalid timeout env values gchlebus/fix-connect-timeout scoootscooob 2026-04-22 09:41:44 -0700
  • 547ee160ad fix(client): raise default connect_timeout to 30s and make it env-overridable Grzegorz Chlebus 2026-04-22 10:19:20 +0200
  • 8447ab1ca6 docker: revert OpenClaw base pin; remove reference scores scoootscooob 2026-04-20 21:24:42 -0700
  • 0e250e3fe1 fix(ci): tasks-public fallback + leaderboard removed from README scoootscooob 2026-04-20 20:32:26 -0700
  • f95e838d99 docs: rewrite README around Core v1 + dynamical-systems diagnostics scoootscooob 2026-04-20 20:15:18 -0700
  • 030e9968bd docker: pin OpenClaw base to 2026.4.15-beta.1 for Core v1 reproducibility scoootscooob 2026-04-20 20:09:49 -0700
  • 50959fa670 tasks: add Core v1 public task set (19 tasks) scoootscooob 2026-04-20 20:06:36 -0700
  • b6f07d9a87 analysis: dynamical-systems diagnostics for agent runs scoootscooob 2026-04-20 19:49:05 -0700
  • afb14c3982 analysis: fair-comparison audit and rejudge pipeline scoootscooob 2026-04-20 19:48:43 -0700
  • 01a31e55fb sweep: per-container state isolation + qwen model-id fix scoootscooob 2026-04-20 19:48:30 -0700
  • deb3d5d85d tasks: stop tracking current task set; fix t2 integration test for emptyNote scoootscooob 2026-04-19 12:29:52 -0700
  • 95b226dfed tasks: harden 5 ceiling-bound tasks for better model differentiation scoootscooob 2026-04-19 12:24:25 -0700
  • cb48ca72e8 tasks: drop strict completion.files checks on 19 tasks scoootscooob 2026-04-18 13:16:34 -0700
  • 8a5be9c686 clawbench: per-sweep cache archiving + generic sweep templates scoootscooob 2026-04-18 12:46:45 -0700
  • fe8fef7795 Merge branch 'pr-4' into codex/merge-pr4 scoootscooob 2026-04-16 19:50:11 -0700
  • ee8ff79347 docs: fix ollama profile guidance scoootscooob 2026-04-16 19:49:04 -0700
  • 9d802d6c53 fix: classify find_replace-style tools as edits scoootscooob 2026-04-16 19:37:01 -0700
  • 517f2207b0 Refine local Ollama profile documentation for clarity and usability pllm-uci 2026-04-15 11:45:57 -0700
  • e2d82b34c3 Add local Ollama model support and configuration guidance to README and profiles pllm-uci 2026-04-15 11:45:12 -0700
  • a2757e6bd9 fix: classify str_replace and insert tools as mutating edits HeYan 2026-04-14 00:57:40 -0700
  • eb879adf9b Remove reports/ reference from README repo layout scoootscooob 2026-04-14 00:52:17 -0700
  • 6ab3004d63 Remove reports and scripts from repo, add to gitignore scoootscooob 2026-04-14 00:51:50 -0700
  • 0d07aa4d08 Re-judge GPT 5.4: resolve judge auth caveat, full coverage scoootscooob 2026-04-14 00:36:27 -0700
  • 952decadcf Rewrite README, harden worker, add benchmark reports scoootscooob 2026-04-14 00:11:34 -0700
  • 44bef14f4d Add partner trace submission spec scoootscooob 2026-04-11 15:36:54 -0700
  • b4620d10ca worker: harden gateway runtime and resume behavior scoootscooob 2026-04-11 15:27:14 -0700
  • 380c6b4815 bench: audit contamination and harden HF leaderboard loading scoootscooob 2026-04-11 07:14:32 -0700
  • 99803642b0 bench: add trace ingestion and template promotion pipeline scoootscooob 2026-04-11 06:45:27 -0700
  • 02573d565d bench: add hidden release scaffolding and CI push coverage Codex 2026-04-11 06:28:43 -0700
  • 29c1cd90e4 worker: fail-fast on hung sessions.create, retry control-plane probe Codex 2026-04-11 05:30:49 -0700
  • ab69af31be upload: read-then-append instead of overwriting submissions split Codex 2026-04-11 00:22:24 -0700
  • 78d844364f ci: trigger HF Space sync (secrets added) Codex 2026-04-11 00:17:55 -0700
  • f55b990476 docs: add .github/workflows/README for HF sync setup Codex 2026-04-11 00:02:45 -0700
  • 19e4750b69 ci: auto-mirror main to HF Space on every push Codex 2026-04-11 00:01:20 -0700
  • 07a20c3f18 HF Space: dynamic stats + fix leaderboard environment parsing Codex 2026-04-10 23:55:37 -0700
  • c24d982110 HF Space: fix container eval — pytest in runtime deps, TASKS_DIR resolver, timeouts Codex 2026-04-10 23:03:15 -0700
  • e9ff163217 baselines: merge provenance docs into BASELINE_SOURCES.md Codex 2026-04-10 20:36:18 -0700
  • 3cdade49ce README: rewrite for v0.5 with architecture, numbers, and positioning Codex 2026-04-10 19:21:48 -0700
  • 4744a6ae7e ClawBench: 7-model frontier baseline + bake-off tooling Codex 2026-04-10 19:14:11 -0700
  • 4aa017838a ClawBench v0.5: tests + task corpus expansion Codex 2026-04-10 19:13:37 -0700
  • cf04a17fea ClawBench v0.5: configuration-space diagnostic framework Codex 2026-04-10 19:13:02 -0700