clawbench/scripts
scoootscooob cebd1c8026
Some checks are pending
CI / Python ${{ matrix.python-version }} test suite (3.11) (push) Waiting to run
CI / Python ${{ matrix.python-version }} test suite (3.12) (push) Waiting to run
chore(repo): clean public benchmark surface
2026-05-02 12:18:58 -07:00
..
_archive_cache.sh chore(repo): clean public benchmark surface 2026-05-02 12:18:58 -07:00
analyze_open_vs_closed.py ClawBench: 7-model frontier baseline + bake-off tooling 2026-04-10 19:14:11 -07:00
ci-hydrate-live-auth.sh ci: add blacksmith testbox setup 2026-04-28 01:45:35 -07:00
ci-hydrate-testbox-env.sh ci: add blacksmith testbox setup 2026-04-28 01:45:35 -07:00
classify_regimes.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
compute_constraint_index.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
container_adapter_eval.sh feat(eval): stabilize full-suite adapter runs 2026-05-02 10:24:03 -07:00
container_lane_eval.sh feat(eval): stabilize full-suite adapter runs 2026-05-02 10:24:03 -07:00
generate_dynamical_report.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
git_checkpoint.py clawbench: per-sweep cache archiving + generic sweep templates 2026-04-18 12:46:45 -07:00
infra_log_gate.sh feat(eval): stabilize full-suite adapter runs 2026-05-02 10:24:03 -07:00
ingest_real_run.py ClawBench: 7-model frontier baseline + bake-off tooling 2026-04-10 19:14:11 -07:00
inject_judge_rubrics.py ClawBench: 7-model frontier baseline + bake-off tooling 2026-04-10 19:14:11 -07:00
refactor_verifiers.py ClawBench: 7-model frontier baseline + bake-off tooling 2026-04-10 19:14:11 -07:00
run_open_vs_closed_bakeoff.py ClawBench: 7-model frontier baseline + bake-off tooling 2026-04-10 19:14:11 -07:00
run_posterior_dynamics_pipeline.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
scale_timeouts.py ClawBench: 7-model frontier baseline + bake-off tooling 2026-04-10 19:14:11 -07:00
seed_historical_db.py ClawBench: 7-model frontier baseline + bake-off tooling 2026-04-10 19:14:11 -07:00
setup_gbrain_runtime.sh feat(eval): stabilize full-suite adapter runs 2026-05-02 10:24:03 -07:00
snr_weighted_ranking.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
survival_analysis.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00
variance_decomp.py Add archive dynamics pipeline and audience-based model presets 2026-04-22 12:03:13 -07:00