Per request: drop the Docker-base-pinning approach and the inline
reference scores. Treat published numbers as version-, provider-, and
seed-dependent.
Dockerfile: revert FROM ghcr.io/openclaw/openclaw:2026.4.15-beta.1
back to FROM ghcr.io/openclaw/openclaw:latest. Builds will track the
current OpenClaw release. The state-isolation patch + rejudge
pipeline (the actually load-bearing reproducibility infra) stay in
place; only the pinned-version approach is reverted.
README.md:
- drops the "Docker base pinning" row from the "What's new" table;
replaced with "Reproducibility-first infrastructure" framing
- drops the "pinned" badge; added a "Diagnostics" badge instead
- updates "Reproducibility caveats" to recommend "build both sides
of any comparison from the same OpenClaw release" rather than
"pin to 2026.4.15-beta.1"
- updates Quick Start to record (not assume) the OpenClaw version
the build resolved to
- drops the pinned-base row from the comparison table; replaced
with "State-isolation per run" (the actually distinguishing infra)
- updates the version log entry for Core v1 to highlight the
dynamical-systems diagnostics + state-isolation rather than the
pinning that's no longer there
tasks-public/README.md:
- drops the 8-row "Established ranking" table per request
- replaced with a "Selection criteria" section that explains how
the 19 tasks were chosen (0 inversions, min-gap 0.0049) without
publishing version-dependent scores
- reframes the build instructions to track :latest with a comment
about platform-version drift
tasks-public/MANIFEST.yaml:
- drops `openclaw_version: 2026.4.15-beta.1` (could be misread as
a hard requirement)
- drops the `established_ranking` block
- replaced with `selection_basis` that documents the methodology
and explicitly states why scores are intentionally omitted
Test suite still green: 156 passed locally, 152 passed in the
CI-equivalent (no private tasks/) configuration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
48 lines
1.3 KiB
Docker
48 lines
1.3 KiB
Docker
# ClawBench HF Docker Space
|
|
# Layer the benchmark harness on top of the official OpenClaw image.
|
|
|
|
FROM ghcr.io/openclaw/openclaw:latest
|
|
|
|
USER root
|
|
|
|
ENV DEBIAN_FRONTEND=noninteractive
|
|
RUN apt-get update && \
|
|
apt-get install -y python3-pip python-is-python3 && \
|
|
rm -rf /var/lib/apt/lists/*
|
|
|
|
RUN ln -s /app /openclaw
|
|
|
|
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
|
|
RUN npx -y playwright@1.59.1 install --with-deps chromium && \
|
|
CHROME_PATH="$(find /ms-playwright -path '*/chrome' -type f | sort | head -n 1)" && \
|
|
test -x "$CHROME_PATH" && \
|
|
ln -sf "$CHROME_PATH" /usr/bin/chromium
|
|
|
|
ENV HOME=/home/node PATH=/home/node/.local/bin:$PATH
|
|
WORKDIR /home/node/app
|
|
|
|
COPY --chown=node:node pyproject.toml README.md ./
|
|
COPY --chown=node:node clawbench/ clawbench/
|
|
COPY --chown=node:node tasks/ tasks/
|
|
COPY --chown=node:node baselines/ baselines/
|
|
COPY --chown=node:node app.py .
|
|
|
|
RUN python3 -m pip install --break-system-packages --no-cache-dir .
|
|
|
|
RUN mkdir -p \
|
|
/data/results \
|
|
/data/queue \
|
|
/home/node/.openclaw/agents/dev \
|
|
/home/node/.openclaw/agents/main/agent && \
|
|
chown -R node:node /data /home/node/.openclaw && \
|
|
chmod -R 777 /data /home/node/.openclaw
|
|
|
|
USER node
|
|
|
|
ENV GATEWAY_PORT=18789
|
|
ENV OPENCLAW_HOME=/home/node
|
|
ENV OPENCLAW_STATE_DIR=/home/node/.openclaw
|
|
|
|
EXPOSE 7860
|
|
CMD ["python", "app.py"]
|