# Contributing to ClawBench

Thank you for your interest in contributing. This document explains how to get
set up, what kinds of contributions are welcome, and how the review process
works.

---

## Getting started

**Requirements:** Python 3.11+, Docker (for full end-to-end runs).

```bash
git clone https://github.com/openclaw/clawbench.git
cd clawbench
python -m venv .venv && source .venv/bin/activate
python -m pip install -e ".[dev]"
```

Run the test suite to confirm everything is working:

```bash
python -m pytest -q
python -m ruff check clawbench app.py scripts tests
```

The full local suite should pass before you make any changes.

---

## What we welcome

| Type | Notes |
|------|-------|
| **Bug fixes** | Include a test that reproduces the bug before the fix |
| **New tasks** | See [Adding tasks](#adding-tasks) below |
| **Scoring improvements** | Changes to `trajectory.py`, `scorer.py`, or `judge.py` must include updated tests and a clear rationale |
| **Documentation** | Fixes to README, spec docs, or inline comments |
| **Tooling / CI** | Workflow improvements, linting, dependency updates |

We are unlikely to merge:
- Large architectural rewrites without prior discussion in an issue
- New dependencies without justification
- Changes that reduce test coverage

---

## Making a change

1. **Open an issue first** for anything non-trivial. This lets us align on
   approach before you invest time writing code.

2. **Create a branch** from `main`:
   ```bash
   git checkout -b fix/short-description
   ```
   Branch names: `fix/`, `feat/`, `docs/`, `chore/` prefixes.

3. **Write tests.** Bug fixes must include a test that fails before the fix
   and passes after. New features must include tests covering the new
   behaviour.

4. **Run the test suite:**
   ```bash
   python -m pytest -q
   ```

5. **Open a pull request** against `main`. Fill in the PR template.

---

## Adding tasks

Public tasks live in `tasks-public/tier{1-5}/` as YAML files. Domain and
partner tasks live under `tasks-domain/`. Each task needs:

- A unique `id` and descriptive `name`
- The correct `tier` (1 = simple single-tool, 5 = adversarial/multi-step)
- `completion` checks — at least one deterministic verifier (`execution_checks`,
  `file_equality`, or a gateway assertion)
- `trajectory` expectations that reflect how a competent agent should approach
  the task
- A `judge` rubric for semantic tasks

Before submitting a new task, run it against at least one agent to verify the
completion checks fire correctly.

---

## Commit style

```
type: short imperative summary (≤72 chars)

Optional longer explanation. Wrap at 72 chars. Explain *why*, not what —
the diff shows what changed.
```

Types: `fix`, `feat`, `docs`, `test`, `chore`, `refactor`.

---

## Code style

The project uses Ruff and pre-commit for local guardrails. Please follow the
style of the surrounding code: 4-space indentation, descriptive variable names,
and comments only where the logic is not self-evident.

```bash
python -m ruff check clawbench app.py scripts tests
pre-commit run --files <changed files>
```

---

## Reporting bugs

Use the [bug report template](.github/ISSUE_TEMPLATE/bug_report.md). Include:
- The command you ran
- The full error output or unexpected behaviour
- The Python version and OS

---

## Questions

Open an issue for questions that are not bug reports or feature requests.