build: use direct R2 S3 uploads for docs

This commit is contained in:
Peter Steinberger 2026-05-07 09:31:25 +01:00
parent d7c4211a68
commit bf6c54b4cf
No known key found for this signature in database
3 changed files with 139 additions and 64 deletions

View File

@ -67,7 +67,8 @@ jobs:
- name: Upload changed R2 objects
env:
CLOUDFLARE_ACCOUNT_ID: 91b59577e757131d68d55a471fe32aca
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
CLOUDFLARE_R2_BUCKET: openclaw-docs
R2_UPLOAD_CONCURRENCY: 8
OPENCLAW_R2_ACCESS_KEY_ID: ${{ secrets.OPENCLAW_R2_ACCESS_KEY_ID }}
OPENCLAW_R2_SECRET_ACCESS_KEY: ${{ secrets.OPENCLAW_R2_SECRET_ACCESS_KEY }}
R2_UPLOAD_CONCURRENCY: 64
run: npm run docs:r2:upload

View File

@ -23,11 +23,11 @@ The repo-side pieces are in place:
- `/concepts/models` -> `concepts/models/index.html`
- `/concepts/models.md` -> `concepts/models.md`
`r2-upload.mjs` downloads `.openclaw-docs-r2-manifest.json` from R2, compares hashes and metadata, uploads only changed objects, and then writes the new manifest back. The first upload seeds everything; later uploads should be small.
`r2-upload.mjs` downloads `.openclaw-docs-r2-manifest.json` from R2, compares hashes and metadata, uploads only changed objects through the R2 S3 API, and then writes the new manifest back. The first upload seeds everything; later uploads should be small.
## Current Production State
Production is still on the safe Worker Static Assets fallback until the Cloudflare account can write R2:
Production is still on the safe Worker Static Assets fallback until the R2 custom domain and cache rules are cut over:
- Worker: `openclaw-docs-router`
- Route: `documentation.openclaw.ai/*`
@ -40,9 +40,9 @@ The fallback uses two cache mechanisms:
- `workers/docs-router.ts` sets headers for slashless docs pages and `Accept: text/markdown` responses because those paths run through Worker code.
- `scripts/docs-site/cloudflare-prune.mjs` writes `dist/docs-site/_headers` so direct asset-first paths like `/assets/docs-site.css`, `/concepts/models.md`, and `/llms-full.txt` get the same cache policy without forcing all traffic through Worker code.
The fallback exists because the Services@openclaw.org Cloudflare token currently cannot access R2. Local verification against account `91b59577e757131d68d55a471fe32aca` fails before bucket operations with Cloudflare API auth error `10000`.
The R2 bucket is already seeded and verified. Do not remove the Worker route or switch `.github/workflows/pages.yml` to R2-only until the R2 custom domain, root rewrite, cache rules, and live smoke have completed successfully.
Do not remove the Worker route or switch `.github/workflows/pages.yml` to R2-only until R2 access is fixed and the R2 workflow has completed successfully.
The fallback remains the rollback path.
## Required Cloudflare Access
@ -52,7 +52,7 @@ Cloudflare account:
- account id: `91b59577e757131d68d55a471fe32aca`
- zone: `openclaw.ai`
Required token scopes:
Required Cloudflare API token scopes for bucket/domain/DNS setup:
- `Account: R2 Storage: Edit`
- `Zone: DNS: Edit`
@ -62,6 +62,20 @@ Required token scopes:
R2 must be enabled for the account before bucket creation works.
Required R2 S3 upload credentials:
- `OPENCLAW_R2_ACCESS_KEY_ID`
- `OPENCLAW_R2_SECRET_ACCESS_KEY`
For Cloudflare R2 API tokens, the access key id is the account-token id returned by:
```sh
curl -H "Authorization: Bearer $OPENCLAW_CLOUDFLARE_API_TOKEN" \
"https://api.cloudflare.com/client/v4/accounts/$OPENCLAW_CLOUDFLARE_ACCOUNT_ID/tokens/verify"
```
The secret access key is the SHA-256 hex digest of the R2 token value. These are stored locally in `~/.profile` and should be added to GitHub Actions secrets before enabling the R2 workflow in CI.
## Deploy Flow
The production fallback workflow remains:
@ -72,7 +86,7 @@ The production fallback workflow remains:
4. `npx wrangler@4.88.0 deploy --config wrangler.toml`
5. `docs-live-smoke.yml`
The R2 target workflow is manual until access is fixed:
The R2 target workflow is manual until production cutover:
1. `.github/workflows/r2-pages.yml`
2. `npm run docs:build:r2`
@ -85,13 +99,15 @@ Local R2 build:
npm run docs:build:r2
```
Local R2 upload after access is fixed:
Local R2 upload:
```sh
source ~/.profile
CLOUDFLARE_ACCOUNT_ID=91b59577e757131d68d55a471fe32aca \
CLOUDFLARE_R2_BUCKET=openclaw-docs \
CLOUDFLARE_API_TOKEN="$CRABBOX_CLOUDFLARE_API_TOKEN" \
OPENCLAW_R2_ACCESS_KEY_ID="$OPENCLAW_R2_ACCESS_KEY_ID" \
OPENCLAW_R2_SECRET_ACCESS_KEY="$OPENCLAW_R2_SECRET_ACCESS_KEY" \
R2_UPLOAD_CONCURRENCY=64 \
npm run docs:r2:upload
```
@ -127,15 +143,17 @@ After cutover, verify repeated requests show `cf-cache-status: MISS` then `HIT`.
## Cutover Checklist
1. Enable R2 on the Services@openclaw.org account.
2. Fix the GitHub `CLOUDFLARE_API_TOKEN` scopes listed above.
3. Create the bucket:
1. Confirm R2 is enabled on the Services@openclaw.org account.
2. Confirm the GitHub R2 upload secrets are present:
- `OPENCLAW_R2_ACCESS_KEY_ID`
- `OPENCLAW_R2_SECRET_ACCESS_KEY`
3. Confirm the bucket exists:
```sh
source ~/.profile
CLOUDFLARE_ACCOUNT_ID=91b59577e757131d68d55a471fe32aca \
CLOUDFLARE_API_TOKEN="$CRABBOX_CLOUDFLARE_API_TOKEN" \
npx wrangler@4.88.0 r2 bucket create openclaw-docs
CLOUDFLARE_API_TOKEN="$OPENCLAW_CLOUDFLARE_API_TOKEN" \
npx wrangler@4.88.0 r2 bucket list
```
4. Run the manual `R2 Pages` workflow, or run the local upload command above.

View File

@ -1,7 +1,6 @@
#!/usr/bin/env node
import { spawn } from "node:child_process";
import crypto from "node:crypto";
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
const root = process.cwd();
@ -9,11 +8,19 @@ const bucket = process.env.CLOUDFLARE_R2_BUCKET || "openclaw-docs";
const manifestPath = path.join(root, "dist", "docs-r2-manifest.json");
const remoteManifestKey = ".openclaw-docs-r2-manifest.json";
const concurrency = Number.parseInt(process.env.R2_UPLOAD_CONCURRENCY || "8", 10);
const accountId = process.env.CLOUDFLARE_ACCOUNT_ID || process.env.OPENCLAW_CLOUDFLARE_ACCOUNT_ID || process.env.OPENCLAW_R2_ACCOUNT_ID;
const endpoint = process.env.OPENCLAW_R2_S3_ENDPOINT || (accountId ? `https://${accountId}.r2.cloudflarestorage.com` : "");
const accessKeyId = process.env.OPENCLAW_R2_ACCESS_KEY_ID || process.env.AWS_ACCESS_KEY_ID;
const secretAccessKey = process.env.OPENCLAW_R2_SECRET_ACCESS_KEY || process.env.AWS_SECRET_ACCESS_KEY;
const region = process.env.OPENCLAW_R2_REGION || "auto";
const service = "s3";
const retryAttempts = Number.parseInt(process.env.R2_UPLOAD_RETRIES || "5", 10);
if (!Number.isFinite(concurrency) || concurrency < 1) throw new Error("R2_UPLOAD_CONCURRENCY must be a positive integer");
if (!fs.existsSync(manifestPath)) throw new Error("dist/docs-r2-manifest.json does not exist; run docs:build:r2 first");
if (!process.env.CLOUDFLARE_API_TOKEN) throw new Error("CLOUDFLARE_API_TOKEN is required");
if (!process.env.CLOUDFLARE_ACCOUNT_ID) throw new Error("CLOUDFLARE_ACCOUNT_ID is required");
if (!endpoint) throw new Error("OPENCLAW_R2_S3_ENDPOINT or CLOUDFLARE_ACCOUNT_ID is required");
if (!accessKeyId) throw new Error("OPENCLAW_R2_ACCESS_KEY_ID or AWS_ACCESS_KEY_ID is required");
if (!secretAccessKey) throw new Error("OPENCLAW_R2_SECRET_ACCESS_KEY or AWS_SECRET_ACCESS_KEY is required");
const manifest = JSON.parse(fs.readFileSync(manifestPath, "utf8"));
const remoteManifest = await getRemoteManifest();
@ -28,80 +35,129 @@ const changed = manifest.entries.filter((entry) => {
console.log(`r2 upload plan: ${changed.length}/${manifest.entries.length} changed objects for ${bucket}`);
await uploadEntries(changed);
const manifestSha256 = sha256Hex(fs.readFileSync(manifestPath));
await putObject({
key: remoteManifestKey,
file: manifestPath,
sha256: manifestSha256,
contentType: "application/json; charset=utf-8",
cacheControl: "private, max-age=0, no-store",
});
console.log(`r2 upload ok: ${changed.length} changed objects plus ${remoteManifestKey}`);
async function getRemoteManifest() {
const tempFile = path.join(os.tmpdir(), `openclaw-docs-r2-manifest-${process.pid}.json`);
try {
const result = await runWrangler([
"r2",
"object",
"get",
`${bucket}/${remoteManifestKey}`,
"--file",
tempFile,
"--remote",
], { quiet: true, allowFailure: true });
if (result.code !== 0 || !fs.existsSync(tempFile)) return null;
return JSON.parse(fs.readFileSync(tempFile, "utf8"));
const response = await signedFetchWithRetry("GET", remoteManifestKey);
if (response.status === 404) return null;
if (!response.ok) throw new Error(`GET manifest failed: ${response.status} ${await response.text()}`);
return await response.json();
} catch {
return null;
} finally {
fs.rmSync(tempFile, { force: true });
}
}
async function uploadEntries(entries) {
let next = 0;
let done = 0;
const workers = Array.from({ length: Math.min(concurrency, entries.length) }, async () => {
while (next < entries.length) {
const entry = entries[next++];
await putObject(entry);
done++;
if (done % 500 === 0 || done === entries.length) console.log(`r2 upload progress: ${done}/${entries.length}`);
}
});
await Promise.all(workers);
}
async function putObject(entry) {
const args = [
"r2",
"object",
"put",
`${bucket}/${entry.key}`,
"--file",
path.isAbsolute(entry.file) ? entry.file : path.join(root, entry.file),
"--content-type",
entry.contentType,
"--cache-control",
entry.cacheControl,
"--remote",
"--force",
];
const result = await runWrangler(args);
if (result.code !== 0) throw new Error(`wrangler failed uploading ${entry.key}`);
const file = path.isAbsolute(entry.file) ? entry.file : path.join(root, entry.file);
const body = fs.readFileSync(file);
const response = await signedFetchWithRetry("PUT", entry.key, body, {
"cache-control": entry.cacheControl,
"content-length": String(body.length),
"content-type": entry.contentType,
"x-amz-content-sha256": entry.sha256 || sha256Hex(body),
});
if (!response.ok) throw new Error(`R2 upload failed for ${entry.key}: ${response.status} ${await response.text()}`);
}
function runWrangler(args, options = {}) {
return new Promise((resolve) => {
const child = spawn("npx", ["wrangler@4.88.0", ...args], {
cwd: root,
env: process.env,
stdio: options.quiet ? ["ignore", "pipe", "pipe"] : "inherit",
});
let output = "";
if (options.quiet) {
child.stdout.on("data", (chunk) => { output += chunk; });
child.stderr.on("data", (chunk) => { output += chunk; });
async function signedFetchWithRetry(method, key, body, headers = {}) {
let lastError;
for (let attempt = 0; attempt <= retryAttempts; attempt++) {
try {
const response = await signedFetch(method, key, body, headers);
if (!isRetryableStatus(response.status) || attempt === retryAttempts) return response;
lastError = new Error(`HTTP ${response.status}`);
await response.arrayBuffer().catch(() => {});
} catch (error) {
lastError = error;
if (attempt === retryAttempts) throw error;
}
child.on("close", (code) => {
if (code !== 0 && !options.allowFailure && options.quiet) process.stderr.write(output);
resolve({ code, output });
});
await new Promise((resolve) => setTimeout(resolve, retryDelay(attempt)));
}
throw lastError;
}
async function signedFetch(method, key, body, headers = {}) {
const url = new URL(`${endpoint.replace(/\/$/, "")}/${bucket}/${encodeS3Key(key)}`);
const now = new Date();
const amzDate = now.toISOString().replace(/[:-]|\.\d{3}/g, "");
const date = amzDate.slice(0, 8);
const normalizedHeaders = {
host: url.host,
"x-amz-content-sha256": headers["x-amz-content-sha256"] || "UNSIGNED-PAYLOAD",
"x-amz-date": amzDate,
...Object.fromEntries(Object.entries(headers).map(([name, value]) => [name.toLowerCase(), String(value)])),
};
const signedHeaders = Object.keys(normalizedHeaders).sort();
const canonicalHeaders = signedHeaders.map((name) => `${name}:${normalizeHeader(normalizedHeaders[name])}\n`).join("");
const canonicalRequest = [
method,
url.pathname,
"",
canonicalHeaders,
signedHeaders.join(";"),
normalizedHeaders["x-amz-content-sha256"],
].join("\n");
const scope = `${date}/${region}/${service}/aws4_request`;
const stringToSign = [
"AWS4-HMAC-SHA256",
amzDate,
scope,
sha256Hex(canonicalRequest),
].join("\n");
const signingKey = hmac(hmac(hmac(hmac(`AWS4${secretAccessKey}`, date), region), service), "aws4_request");
const signature = hmac(signingKey, stringToSign, "hex");
const authorization = `AWS4-HMAC-SHA256 Credential=${accessKeyId}/${scope}, SignedHeaders=${signedHeaders.join(";")}, Signature=${signature}`;
return fetch(url, {
body,
headers: { ...normalizedHeaders, authorization },
method,
});
}
function isRetryableStatus(status) {
return status === 408 || status === 409 || status === 425 || status === 429 || status >= 500;
}
function retryDelay(attempt) {
return Math.min(10_000, 500 * 2 ** attempt) + Math.floor(Math.random() * 250);
}
function encodeS3Key(key) {
return key.split("/").map((segment) => encodeURIComponent(segment)).join("/");
}
function hmac(key, value, encoding) {
return crypto.createHmac("sha256", key).update(value).digest(encoding);
}
function normalizeHeader(value) {
return String(value).trim().replace(/\s+/g, " ");
}
function sha256Hex(value) {
return crypto.createHash("sha256").update(value).digest("hex");
}