clawbench

History

scoootscooob abf3500f69 Some checks failed CI / Python ${{ matrix.python-version }} test suite (3.11) (push) Has been cancelled Details CI / Python ${{ matrix.python-version }} test suite (3.12) (push) Has been cancelled Details fix(harness): keep gateway RPC sockets alive		2026-05-02 14:51:52 -07:00
..
test_ablation.py	feat(eval): stabilize full-suite adapter runs	2026-05-02 10:24:03 -07:00
test_adapter_base.py	feat(eval): stabilize full-suite adapter runs	2026-05-02 10:24:03 -07:00
test_blacksmith_setup.py	fix(ci): ensure hugging face space before sync	2026-04-28 01:50:26 -07:00
test_canonical_convert.py	feat(eval): stabilize full-suite adapter runs	2026-05-02 10:24:03 -07:00
test_client.py	fix(harness): keep gateway RPC sockets alive	2026-05-02 14:51:52 -07:00
test_dockerfiles.py	Copy all package data in HF Docker build	2026-04-28 02:35:09 -07:00
test_dynamics_archive.py	fix: preserve preset submission settings and lazy-load plots	2026-04-22 12:03:16 -07:00
test_dynamics_cli.py	Add archive dynamics pipeline and audience-based model presets	2026-04-22 12:03:13 -07:00
test_dynamics.py	Add archive dynamics pipeline and audience-based model presets	2026-04-22 12:03:13 -07:00
test_e2e_significance.py	ClawBench v0.5: tests + task corpus expansion	2026-04-10 19:13:37 -07:00
test_environment.py	Bench: redesign v0.4 benchmark and HF runtime	2026-04-09 11:15:30 -07:00
test_harness.py	chore(repo): clean public benchmark surface	2026-05-02 12:18:58 -07:00
test_hermes_adapter.py	feat(eval): stabilize full-suite adapter runs	2026-05-02 10:24:03 -07:00
test_hermes_xml.py	feat(eval): stabilize full-suite adapter runs	2026-05-02 10:24:03 -07:00
test_integration_checks.py	chore(repo): clean public benchmark surface	2026-05-02 12:18:58 -07:00
test_judge.py	Bench: redesign v0.4 benchmark and HF runtime	2026-04-09 11:15:30 -07:00
test_openclaw_adapter.py	feat(eval): stabilize full-suite adapter runs	2026-05-02 10:24:03 -07:00
test_packaging.py	fix: harden packaging and submissions	2026-04-28 01:17:43 -07:00
test_parallel_harness.py	ClawBench v0.5: tests + task corpus expansion	2026-04-10 19:13:37 -07:00
test_public_surface.py	chore(repo): clean public benchmark surface	2026-05-02 12:18:58 -07:00
test_queue.py	fix: harden packaging and submissions	2026-04-28 01:17:43 -07:00
test_releases.py	bench: add hidden release scaffolding and CI push coverage	2026-04-11 06:28:43 -07:00
test_scorer.py	ClawBench v0.5: tests + task corpus expansion	2026-04-10 19:13:37 -07:00
test_services.py	Bench: redesign v0.4 benchmark and HF runtime	2026-04-09 11:15:30 -07:00
test_session_labels.py	Gateway: use unique benchmark session labels	2026-04-09 18:32:41 -07:00
test_simulated_user.py	Bench: redesign v0.4 benchmark and HF runtime	2026-04-09 11:15:30 -07:00
test_stats.py	Bench: redesign v0.4 benchmark and HF runtime	2026-04-09 11:15:30 -07:00
test_submission_models.py	fix: preserve preset submission settings and lazy-load plots	2026-04-22 12:03:16 -07:00
test_task_factory.py	bench: audit contamination and harden HF leaderboard loading	2026-04-11 07:14:32 -07:00
test_tasks.py	chore(repo): clean public benchmark surface	2026-05-02 12:18:58 -07:00
test_trajectory.py	fix: classify find_replace-style tools as edits	2026-04-16 19:37:01 -07:00
test_upload.py	chore(repo): clean public benchmark surface	2026-05-02 12:18:58 -07:00
test_v05_extensions.py	Add public domain scaffold and adapter diagnostics	2026-04-23 12:40:23 -07:00
test_v05_framework.py	ClawBench v0.5: tests + task corpus expansion	2026-04-10 19:13:37 -07:00
test_worker.py	feat(eval): stabilize full-suite adapter runs	2026-05-02 10:24:03 -07:00