diff --git a/.github/workflows/r2-pages.yml b/.github/workflows/r2-pages.yml index 43f684d85..1f1ed79e1 100644 --- a/.github/workflows/r2-pages.yml +++ b/.github/workflows/r2-pages.yml @@ -67,7 +67,8 @@ jobs: - name: Upload changed R2 objects env: CLOUDFLARE_ACCOUNT_ID: 91b59577e757131d68d55a471fe32aca - CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }} CLOUDFLARE_R2_BUCKET: openclaw-docs - R2_UPLOAD_CONCURRENCY: 8 + OPENCLAW_R2_ACCESS_KEY_ID: ${{ secrets.OPENCLAW_R2_ACCESS_KEY_ID }} + OPENCLAW_R2_SECRET_ACCESS_KEY: ${{ secrets.OPENCLAW_R2_SECRET_ACCESS_KEY }} + R2_UPLOAD_CONCURRENCY: 64 run: npm run docs:r2:upload diff --git a/CLOUDFLARE.md b/CLOUDFLARE.md index cc2c53c10..5fb8c63b5 100644 --- a/CLOUDFLARE.md +++ b/CLOUDFLARE.md @@ -23,11 +23,11 @@ The repo-side pieces are in place: - `/concepts/models` -> `concepts/models/index.html` - `/concepts/models.md` -> `concepts/models.md` -`r2-upload.mjs` downloads `.openclaw-docs-r2-manifest.json` from R2, compares hashes and metadata, uploads only changed objects, and then writes the new manifest back. The first upload seeds everything; later uploads should be small. +`r2-upload.mjs` downloads `.openclaw-docs-r2-manifest.json` from R2, compares hashes and metadata, uploads only changed objects through the R2 S3 API, and then writes the new manifest back. The first upload seeds everything; later uploads should be small. ## Current Production State -Production is still on the safe Worker Static Assets fallback until the Cloudflare account can write R2: +Production is still on the safe Worker Static Assets fallback until the R2 custom domain and cache rules are cut over: - Worker: `openclaw-docs-router` - Route: `documentation.openclaw.ai/*` @@ -40,9 +40,9 @@ The fallback uses two cache mechanisms: - `workers/docs-router.ts` sets headers for slashless docs pages and `Accept: text/markdown` responses because those paths run through Worker code. - `scripts/docs-site/cloudflare-prune.mjs` writes `dist/docs-site/_headers` so direct asset-first paths like `/assets/docs-site.css`, `/concepts/models.md`, and `/llms-full.txt` get the same cache policy without forcing all traffic through Worker code. -The fallback exists because the Services@openclaw.org Cloudflare token currently cannot access R2. Local verification against account `91b59577e757131d68d55a471fe32aca` fails before bucket operations with Cloudflare API auth error `10000`. +The R2 bucket is already seeded and verified. Do not remove the Worker route or switch `.github/workflows/pages.yml` to R2-only until the R2 custom domain, root rewrite, cache rules, and live smoke have completed successfully. -Do not remove the Worker route or switch `.github/workflows/pages.yml` to R2-only until R2 access is fixed and the R2 workflow has completed successfully. +The fallback remains the rollback path. ## Required Cloudflare Access @@ -52,7 +52,7 @@ Cloudflare account: - account id: `91b59577e757131d68d55a471fe32aca` - zone: `openclaw.ai` -Required token scopes: +Required Cloudflare API token scopes for bucket/domain/DNS setup: - `Account: R2 Storage: Edit` - `Zone: DNS: Edit` @@ -62,6 +62,20 @@ Required token scopes: R2 must be enabled for the account before bucket creation works. +Required R2 S3 upload credentials: + +- `OPENCLAW_R2_ACCESS_KEY_ID` +- `OPENCLAW_R2_SECRET_ACCESS_KEY` + +For Cloudflare R2 API tokens, the access key id is the account-token id returned by: + +```sh +curl -H "Authorization: Bearer $OPENCLAW_CLOUDFLARE_API_TOKEN" \ + "https://api.cloudflare.com/client/v4/accounts/$OPENCLAW_CLOUDFLARE_ACCOUNT_ID/tokens/verify" +``` + +The secret access key is the SHA-256 hex digest of the R2 token value. These are stored locally in `~/.profile` and should be added to GitHub Actions secrets before enabling the R2 workflow in CI. + ## Deploy Flow The production fallback workflow remains: @@ -72,7 +86,7 @@ The production fallback workflow remains: 4. `npx wrangler@4.88.0 deploy --config wrangler.toml` 5. `docs-live-smoke.yml` -The R2 target workflow is manual until access is fixed: +The R2 target workflow is manual until production cutover: 1. `.github/workflows/r2-pages.yml` 2. `npm run docs:build:r2` @@ -85,13 +99,15 @@ Local R2 build: npm run docs:build:r2 ``` -Local R2 upload after access is fixed: +Local R2 upload: ```sh source ~/.profile CLOUDFLARE_ACCOUNT_ID=91b59577e757131d68d55a471fe32aca \ CLOUDFLARE_R2_BUCKET=openclaw-docs \ -CLOUDFLARE_API_TOKEN="$CRABBOX_CLOUDFLARE_API_TOKEN" \ +OPENCLAW_R2_ACCESS_KEY_ID="$OPENCLAW_R2_ACCESS_KEY_ID" \ +OPENCLAW_R2_SECRET_ACCESS_KEY="$OPENCLAW_R2_SECRET_ACCESS_KEY" \ +R2_UPLOAD_CONCURRENCY=64 \ npm run docs:r2:upload ``` @@ -127,15 +143,17 @@ After cutover, verify repeated requests show `cf-cache-status: MISS` then `HIT`. ## Cutover Checklist -1. Enable R2 on the Services@openclaw.org account. -2. Fix the GitHub `CLOUDFLARE_API_TOKEN` scopes listed above. -3. Create the bucket: +1. Confirm R2 is enabled on the Services@openclaw.org account. +2. Confirm the GitHub R2 upload secrets are present: + - `OPENCLAW_R2_ACCESS_KEY_ID` + - `OPENCLAW_R2_SECRET_ACCESS_KEY` +3. Confirm the bucket exists: ```sh source ~/.profile CLOUDFLARE_ACCOUNT_ID=91b59577e757131d68d55a471fe32aca \ - CLOUDFLARE_API_TOKEN="$CRABBOX_CLOUDFLARE_API_TOKEN" \ - npx wrangler@4.88.0 r2 bucket create openclaw-docs + CLOUDFLARE_API_TOKEN="$OPENCLAW_CLOUDFLARE_API_TOKEN" \ + npx wrangler@4.88.0 r2 bucket list ``` 4. Run the manual `R2 Pages` workflow, or run the local upload command above. diff --git a/scripts/docs-site/r2-upload.mjs b/scripts/docs-site/r2-upload.mjs index 7b2fde6b9..11bb0eb5b 100644 --- a/scripts/docs-site/r2-upload.mjs +++ b/scripts/docs-site/r2-upload.mjs @@ -1,7 +1,6 @@ #!/usr/bin/env node -import { spawn } from "node:child_process"; +import crypto from "node:crypto"; import fs from "node:fs"; -import os from "node:os"; import path from "node:path"; const root = process.cwd(); @@ -9,11 +8,19 @@ const bucket = process.env.CLOUDFLARE_R2_BUCKET || "openclaw-docs"; const manifestPath = path.join(root, "dist", "docs-r2-manifest.json"); const remoteManifestKey = ".openclaw-docs-r2-manifest.json"; const concurrency = Number.parseInt(process.env.R2_UPLOAD_CONCURRENCY || "8", 10); +const accountId = process.env.CLOUDFLARE_ACCOUNT_ID || process.env.OPENCLAW_CLOUDFLARE_ACCOUNT_ID || process.env.OPENCLAW_R2_ACCOUNT_ID; +const endpoint = process.env.OPENCLAW_R2_S3_ENDPOINT || (accountId ? `https://${accountId}.r2.cloudflarestorage.com` : ""); +const accessKeyId = process.env.OPENCLAW_R2_ACCESS_KEY_ID || process.env.AWS_ACCESS_KEY_ID; +const secretAccessKey = process.env.OPENCLAW_R2_SECRET_ACCESS_KEY || process.env.AWS_SECRET_ACCESS_KEY; +const region = process.env.OPENCLAW_R2_REGION || "auto"; +const service = "s3"; +const retryAttempts = Number.parseInt(process.env.R2_UPLOAD_RETRIES || "5", 10); if (!Number.isFinite(concurrency) || concurrency < 1) throw new Error("R2_UPLOAD_CONCURRENCY must be a positive integer"); if (!fs.existsSync(manifestPath)) throw new Error("dist/docs-r2-manifest.json does not exist; run docs:build:r2 first"); -if (!process.env.CLOUDFLARE_API_TOKEN) throw new Error("CLOUDFLARE_API_TOKEN is required"); -if (!process.env.CLOUDFLARE_ACCOUNT_ID) throw new Error("CLOUDFLARE_ACCOUNT_ID is required"); +if (!endpoint) throw new Error("OPENCLAW_R2_S3_ENDPOINT or CLOUDFLARE_ACCOUNT_ID is required"); +if (!accessKeyId) throw new Error("OPENCLAW_R2_ACCESS_KEY_ID or AWS_ACCESS_KEY_ID is required"); +if (!secretAccessKey) throw new Error("OPENCLAW_R2_SECRET_ACCESS_KEY or AWS_SECRET_ACCESS_KEY is required"); const manifest = JSON.parse(fs.readFileSync(manifestPath, "utf8")); const remoteManifest = await getRemoteManifest(); @@ -28,80 +35,129 @@ const changed = manifest.entries.filter((entry) => { console.log(`r2 upload plan: ${changed.length}/${manifest.entries.length} changed objects for ${bucket}`); await uploadEntries(changed); +const manifestSha256 = sha256Hex(fs.readFileSync(manifestPath)); await putObject({ key: remoteManifestKey, file: manifestPath, + sha256: manifestSha256, contentType: "application/json; charset=utf-8", cacheControl: "private, max-age=0, no-store", }); console.log(`r2 upload ok: ${changed.length} changed objects plus ${remoteManifestKey}`); async function getRemoteManifest() { - const tempFile = path.join(os.tmpdir(), `openclaw-docs-r2-manifest-${process.pid}.json`); try { - const result = await runWrangler([ - "r2", - "object", - "get", - `${bucket}/${remoteManifestKey}`, - "--file", - tempFile, - "--remote", - ], { quiet: true, allowFailure: true }); - if (result.code !== 0 || !fs.existsSync(tempFile)) return null; - return JSON.parse(fs.readFileSync(tempFile, "utf8")); + const response = await signedFetchWithRetry("GET", remoteManifestKey); + if (response.status === 404) return null; + if (!response.ok) throw new Error(`GET manifest failed: ${response.status} ${await response.text()}`); + return await response.json(); } catch { return null; - } finally { - fs.rmSync(tempFile, { force: true }); } } async function uploadEntries(entries) { let next = 0; + let done = 0; const workers = Array.from({ length: Math.min(concurrency, entries.length) }, async () => { while (next < entries.length) { const entry = entries[next++]; await putObject(entry); + done++; + if (done % 500 === 0 || done === entries.length) console.log(`r2 upload progress: ${done}/${entries.length}`); } }); await Promise.all(workers); } async function putObject(entry) { - const args = [ - "r2", - "object", - "put", - `${bucket}/${entry.key}`, - "--file", - path.isAbsolute(entry.file) ? entry.file : path.join(root, entry.file), - "--content-type", - entry.contentType, - "--cache-control", - entry.cacheControl, - "--remote", - "--force", - ]; - const result = await runWrangler(args); - if (result.code !== 0) throw new Error(`wrangler failed uploading ${entry.key}`); + const file = path.isAbsolute(entry.file) ? entry.file : path.join(root, entry.file); + const body = fs.readFileSync(file); + const response = await signedFetchWithRetry("PUT", entry.key, body, { + "cache-control": entry.cacheControl, + "content-length": String(body.length), + "content-type": entry.contentType, + "x-amz-content-sha256": entry.sha256 || sha256Hex(body), + }); + if (!response.ok) throw new Error(`R2 upload failed for ${entry.key}: ${response.status} ${await response.text()}`); } -function runWrangler(args, options = {}) { - return new Promise((resolve) => { - const child = spawn("npx", ["wrangler@4.88.0", ...args], { - cwd: root, - env: process.env, - stdio: options.quiet ? ["ignore", "pipe", "pipe"] : "inherit", - }); - let output = ""; - if (options.quiet) { - child.stdout.on("data", (chunk) => { output += chunk; }); - child.stderr.on("data", (chunk) => { output += chunk; }); +async function signedFetchWithRetry(method, key, body, headers = {}) { + let lastError; + for (let attempt = 0; attempt <= retryAttempts; attempt++) { + try { + const response = await signedFetch(method, key, body, headers); + if (!isRetryableStatus(response.status) || attempt === retryAttempts) return response; + lastError = new Error(`HTTP ${response.status}`); + await response.arrayBuffer().catch(() => {}); + } catch (error) { + lastError = error; + if (attempt === retryAttempts) throw error; } - child.on("close", (code) => { - if (code !== 0 && !options.allowFailure && options.quiet) process.stderr.write(output); - resolve({ code, output }); - }); + await new Promise((resolve) => setTimeout(resolve, retryDelay(attempt))); + } + throw lastError; +} + +async function signedFetch(method, key, body, headers = {}) { + const url = new URL(`${endpoint.replace(/\/$/, "")}/${bucket}/${encodeS3Key(key)}`); + const now = new Date(); + const amzDate = now.toISOString().replace(/[:-]|\.\d{3}/g, ""); + const date = amzDate.slice(0, 8); + const normalizedHeaders = { + host: url.host, + "x-amz-content-sha256": headers["x-amz-content-sha256"] || "UNSIGNED-PAYLOAD", + "x-amz-date": amzDate, + ...Object.fromEntries(Object.entries(headers).map(([name, value]) => [name.toLowerCase(), String(value)])), + }; + const signedHeaders = Object.keys(normalizedHeaders).sort(); + const canonicalHeaders = signedHeaders.map((name) => `${name}:${normalizeHeader(normalizedHeaders[name])}\n`).join(""); + const canonicalRequest = [ + method, + url.pathname, + "", + canonicalHeaders, + signedHeaders.join(";"), + normalizedHeaders["x-amz-content-sha256"], + ].join("\n"); + const scope = `${date}/${region}/${service}/aws4_request`; + const stringToSign = [ + "AWS4-HMAC-SHA256", + amzDate, + scope, + sha256Hex(canonicalRequest), + ].join("\n"); + const signingKey = hmac(hmac(hmac(hmac(`AWS4${secretAccessKey}`, date), region), service), "aws4_request"); + const signature = hmac(signingKey, stringToSign, "hex"); + const authorization = `AWS4-HMAC-SHA256 Credential=${accessKeyId}/${scope}, SignedHeaders=${signedHeaders.join(";")}, Signature=${signature}`; + + return fetch(url, { + body, + headers: { ...normalizedHeaders, authorization }, + method, }); } + +function isRetryableStatus(status) { + return status === 408 || status === 409 || status === 425 || status === 429 || status >= 500; +} + +function retryDelay(attempt) { + return Math.min(10_000, 500 * 2 ** attempt) + Math.floor(Math.random() * 250); +} + +function encodeS3Key(key) { + return key.split("/").map((segment) => encodeURIComponent(segment)).join("/"); +} + +function hmac(key, value, encoding) { + return crypto.createHmac("sha256", key).update(value).digest(encoding); +} + +function normalizeHeader(value) { + return String(value).trim().replace(/\s+/g, " "); +} + +function sha256Hex(value) { + return crypto.createHash("sha256").update(value).digest("hex"); +}