clawbench/tests
scoootscooob d57e4a697d
Some checks failed
CI / Python ${{ matrix.python-version }} test suite (3.11) (push) Has been cancelled
CI / Python ${{ matrix.python-version }} test suite (3.12) (push) Has been cancelled
Sync main to HF Space / mirror (push) Has been cancelled
Merge pull request #19 from openclaw/codex/openclaw-websocket-run-lifecycle
fix(eval): harden OpenClaw run lifecycle waits
2026-05-04 12:25:14 -07:00
..
test_ablation.py feat: add adapter canonicalization layer 2026-04-29 13:57:13 -07:00
test_adapter_base.py feat: add adapter canonicalization layer 2026-04-29 13:57:13 -07:00
test_blacksmith_setup.py feat: add crabbox validation wiring 2026-05-02 18:34:01 -07:00
test_canonical_convert.py feat: add adapter canonicalization layer 2026-04-29 13:57:13 -07:00
test_cli.py test: cover judge score gate propagation 2026-04-28 23:08:58 -07:00
test_client.py fix(eval): harden OpenClaw run lifecycle waits 2026-05-02 01:38:08 -07:00
test_dockerfiles.py Copy all package data in HF Docker build 2026-04-28 02:35:09 -07:00
test_dynamics_archive.py fix: preserve preset submission settings and lazy-load plots 2026-04-22 12:03:16 -07:00
test_dynamics_cli.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
test_dynamics.py chore(dev): add lint guardrails 2026-04-28 10:50:07 -07:00
test_e2e_significance.py chore(dev): add lint guardrails 2026-04-28 10:50:07 -07:00
test_environment_files.py feat: add adapter canonicalization layer 2026-04-29 13:57:13 -07:00
test_environment.py test: cover environment verifier success paths 2026-04-28 23:27:38 -07:00
test_harness.py fix(scoring): gate judge-weighted scores 2026-04-28 22:52:12 -07:00
test_hermes_adapter.py feat: add adapter canonicalization layer 2026-04-29 13:57:13 -07:00
test_hermes_xml.py feat: add adapter canonicalization layer 2026-04-29 13:57:13 -07:00
test_integration_checks.py tasks: stop tracking current task set; fix t2 integration test for emptyNote 2026-04-19 12:29:52 -07:00
test_judge.py fix(runtime): harden benchmark cache and task paths 2026-04-28 22:40:46 -07:00
test_openclaw_adapter.py feat: add adapter canonicalization layer 2026-04-29 13:57:13 -07:00
test_packaging.py fix(runtime): harden benchmark cache and task paths 2026-04-28 22:40:46 -07:00
test_parallel_harness.py chore(dev): add lint guardrails 2026-04-28 10:50:07 -07:00
test_queue.py fix(eval): isolate lane queues and configs 2026-05-04 12:19:20 -07:00
test_releases.py bench: add hidden release scaffolding and CI push coverage 2026-04-11 06:28:43 -07:00
test_runtime_contracts.py fix(worker): harden runtime result writes 2026-04-29 13:24:40 -07:00
test_scorer.py test: cover judge score gate propagation 2026-04-28 23:08:58 -07:00
test_services.py fix(runtime): harden benchmark cache and task paths 2026-04-28 22:40:46 -07:00
test_session_labels.py Gateway: use unique benchmark session labels 2026-04-09 18:32:41 -07:00
test_simulated_user.py Bench: redesign v0.4 benchmark and HF runtime 2026-04-09 11:15:30 -07:00
test_stats.py Bench: redesign v0.4 benchmark and HF runtime 2026-04-09 11:15:30 -07:00
test_submission_models.py fix: preserve preset submission settings and lazy-load plots 2026-04-22 12:03:16 -07:00
test_t3_web_research_serve.py fix(security): constrain research article paths 2026-04-30 02:57:52 -07:00
test_task_factory.py bench: audit contamination and harden HF leaderboard loading 2026-04-11 07:14:32 -07:00
test_tasks.py fix(ci): restore public task fallback 2026-04-22 09:46:33 -07:00
test_trajectory.py fix: flag credential file access in dangerous shell patterns (#6) 2026-04-28 13:17:11 -07:00
test_upload.py fix: harden packaging and submissions 2026-04-28 01:17:43 -07:00
test_v05_extensions.py chore(dev): add lint guardrails 2026-04-28 10:50:07 -07:00
test_v05_framework.py chore(dev): add lint guardrails 2026-04-28 10:50:07 -07:00
test_worker.py fix(eval): isolate lane queues and configs 2026-05-04 12:19:20 -07:00