clawbench/tests
2026-04-27 22:57:10 -07:00
..
test_client.py fix(client): reject invalid timeout env values 2026-04-22 09:41:44 -07:00
test_dockerfiles.py Fix public Docker task copies 2026-04-27 22:57:10 -07:00
test_dynamics_archive.py fix: preserve preset submission settings and lazy-load plots 2026-04-22 12:03:16 -07:00
test_dynamics_cli.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
test_dynamics.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
test_e2e_significance.py ClawBench v0.5: tests + task corpus expansion 2026-04-10 19:13:37 -07:00
test_environment.py Bench: redesign v0.4 benchmark and HF runtime 2026-04-09 11:15:30 -07:00
test_harness.py Add public domain scaffold and adapter diagnostics 2026-04-23 12:40:23 -07:00
test_integration_checks.py tasks: stop tracking current task set; fix t2 integration test for emptyNote 2026-04-19 12:29:52 -07:00
test_judge.py Bench: redesign v0.4 benchmark and HF runtime 2026-04-09 11:15:30 -07:00
test_parallel_harness.py ClawBench v0.5: tests + task corpus expansion 2026-04-10 19:13:37 -07:00
test_queue.py bench: audit contamination and harden HF leaderboard loading 2026-04-11 07:14:32 -07:00
test_releases.py bench: add hidden release scaffolding and CI push coverage 2026-04-11 06:28:43 -07:00
test_scorer.py ClawBench v0.5: tests + task corpus expansion 2026-04-10 19:13:37 -07:00
test_services.py Bench: redesign v0.4 benchmark and HF runtime 2026-04-09 11:15:30 -07:00
test_session_labels.py Gateway: use unique benchmark session labels 2026-04-09 18:32:41 -07:00
test_simulated_user.py Bench: redesign v0.4 benchmark and HF runtime 2026-04-09 11:15:30 -07:00
test_stats.py Bench: redesign v0.4 benchmark and HF runtime 2026-04-09 11:15:30 -07:00
test_submission_models.py fix: preserve preset submission settings and lazy-load plots 2026-04-22 12:03:16 -07:00
test_task_factory.py bench: audit contamination and harden HF leaderboard loading 2026-04-11 07:14:32 -07:00
test_tasks.py fix(ci): restore public task fallback 2026-04-22 09:46:33 -07:00
test_trajectory.py fix: classify find_replace-style tools as edits 2026-04-16 19:37:01 -07:00
test_v05_extensions.py Add public domain scaffold and adapter diagnostics 2026-04-23 12:40:23 -07:00
test_v05_framework.py ClawBench v0.5: tests + task corpus expansion 2026-04-10 19:13:37 -07:00
test_worker.py worker: harden gateway runtime and resume behavior 2026-04-11 15:27:14 -07:00