[BREAKGLASS] OpenClaw runtime validation lab
Go to file
2026-05-01 11:41:26 +01:00
.github/workflows feat: add release tooling 2026-04-29 12:45:35 +01:00
artifacts feat: add kova runtime CLI scaffold 2026-04-29 10:44:11 +01:00
bin feat: add kova runtime CLI scaffold 2026-04-29 10:44:11 +01:00
docs fix: harden cleanup and clone safety 2026-05-01 11:09:03 +01:00
fixtures feat: complete live provider auth attribution 2026-05-01 08:18:50 +01:00
metrics feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
process-roles feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
profiles feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
reports feat: add kova runtime CLI scaffold 2026-04-29 10:44:11 +01:00
scenarios feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
scripts feat: collect openclaw diagnostics timeline 2026-04-29 17:46:41 +01:00
src feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
states feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
support feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
surfaces feat: add mcp runtime validation scenario 2026-05-01 11:41:26 +01:00
.gitignore feat: add kova release packaging 2026-04-29 12:23:41 +01:00
AGENTS.md docs: document kova agent workflow 2026-04-29 10:45:01 +01:00
install.sh feat: add kova version command 2026-04-29 12:30:52 +01:00
LICENSE Initial commit 2026-04-29 10:41:49 +01:00
package.json feat: add release tooling 2026-04-29 12:45:35 +01:00
README.md feat: use OpenClaw onboard for live auth 2026-05-01 10:21:41 +01:00

Kova

Kova is the OpenClaw runtime validation lab.

Kova runs real OpenClaw release, upgrade, plugin, gateway, and performance scenarios. It uses OCM as the lab control plane for isolated envs and runtimes, but Kova reports on OpenClaw behavior.

What Kova Tests

  • fresh OpenClaw installs
  • existing-user upgrades
  • gateway startup and readiness
  • bundled plugin/runtime dependency behavior
  • plugin lifecycle paths
  • model/provider discovery paths
  • dashboard, TUI, and API responsiveness
  • memory, CPU, latency, and startup regressions

Kova is not a unit test runner. It should exercise OpenClaw the way users and release builds actually run it.

Kova is designed for agents and humans:

  • agents consume JSON plans and JSON reports
  • humans read concise Markdown reports
  • successful command output stays out of Markdown noise
  • real execution is explicit and cleanup-aware

When Codex or another agent has access to the ocm-operator skill, it should load that skill before executing Kova scenarios. The skill gives the agent the OCM operating knowledge needed for safe env cloning, runtime builds, upgrades, service inspection, logs, and cleanup. Kova still reports OpenClaw behavior.

Install the skill when it is missing:

codex skills install https://github.com/shakkernerd/ocm/tree/main/skills/ocm-operator

Commands

node bin/kova.mjs version
node bin/kova.mjs setup
node bin/kova.mjs setup --non-interactive --auth env-only --provider openai --env-var OPENAI_API_KEY
node bin/kova.mjs setup --non-interactive --auth env-only --provider openai --env-var OPENAI_API_KEY --fallback-policy external-cli
node bin/kova.mjs setup --ci --json
node bin/kova.mjs self-check
node bin/kova.mjs plan
node bin/kova.mjs plan --json
node bin/kova.mjs matrix plan --profile smoke --target runtime:stable --json
node bin/kova.mjs matrix run --profile smoke --target runtime:stable --json
node bin/kova.mjs matrix run --profile release --target channel:beta --include tag:plugins --parallel 2 --json
node bin/kova.mjs matrix run --profile release --target local-build:/path/to/openclaw --execute --gate --json
node bin/kova.mjs matrix run --profile smoke --target runtime:stable --repeat 3 --execute --save-baseline --reviewed-good --json
node bin/kova.mjs report compare reports/baseline.json reports/current.json --json
node bin/kova.mjs plan --scenario fresh-install
node bin/kova.mjs run --target npm:2026.4.27 --scenario fresh-install
node bin/kova.mjs run --target npm:2026.4.27 --scenario fresh-install --state missing-plugin-index --json
node bin/kova.mjs cleanup envs

Kova runtime data lives outside the repo by default:

~/.kova/
  credentials/
  reports/
  artifacts/
  baselines/

Set KOVA_HOME to use a different data home.

Interactive setup asks for provider first, then auth method. Provider and auth answers accept either the displayed number or the name, for example 2 or anthropic, 3 or api-key. External CLI auth is strict: Kova verifies the selected CLI binary and local auth evidence before setup can pass. openai + external-cli uses Codex CLI; anthropic + external-cli uses Claude CLI. custom-openai should use API-key or env-only auth. External CLI fallback is not automatic; set --fallback-policy external-cli when a live API-key/env-only run may use the selected local CLI if the live env var is missing.

run is dry-run by default. It writes Markdown and JSON reports showing the planned OpenClaw scenario.

Every Kova-created disposable OpenClaw env receives deliberate model auth unless the scenario/state explicitly tests missing or broken auth. --auth mock is the default and uses Kova's deterministic local OpenAI-compatible provider. --auth live requires credentials configured through kova setup; live results are marked environment-dependent and should be compared separately from mock baselines. For supported API-key/env-only providers, Kova configures live auth through OpenClaw's own non-interactive onboard path with env-backed SecretRefs. Live paths without a stable OpenClaw command path are labeled fixture setup and must not be cited as proof that OpenClaw onboarding/auth UX passed.

plan --json is coverage-aware: scenarios map to declared OpenClaw surfaces, surfaces declare process roles and required metrics, and profile coverage gaps are visible before a run starts.

States are validated contracts too. A profile cannot pair a scenario with a state that is incompatible with the scenario's surface.

Real execution is explicit:

node bin/kova.mjs run --target npm:2026.4.27 --scenario fresh-install --execute
node bin/kova.mjs run --target npm:2026.4.27 --scenario fresh-install --state stale-runtime-deps --execute
node bin/kova.mjs run --target npm:2026.4.27 --scenario gateway-performance --execute --node-profile
node bin/kova.mjs run --target local-build:/path/to/openclaw --scenario release-runtime-startup --execute
node bin/kova.mjs run --target npm:2026.4.27 --scenario plugin-external-install --execute
node bin/kova.mjs run --target npm:2026.4.27 --scenario agent-cold-warm-message --auth live --execute
node bin/kova.mjs matrix run --profile smoke --target npm:2026.4.27 --execute
node bin/kova.mjs matrix run --profile release --target npm:2026.4.27 --include tag:plugins --exclude state:broken-plugin-deps --parallel 2 --execute

Matrix filters accept scenario:<id>, state:<id>, tag:<tag>, or a bare scenario/state/tag value. Matrix runs bundle their report automatically.

Release gate mode uses the existing matrix runner:

node bin/kova.mjs matrix run --profile release --target local-build:/path/to/openclaw --execute --gate --json

--gate evaluates the selected profile against its gate policy and adds a ship/no-ship verdict to the report. The verdict is SHIP, DO_NOT_SHIP, PARTIAL, or BLOCKED. Non-ship verdicts exit non-zero after writing the Markdown/JSON report and artifact bundle. Non-ship gates also retain a durable copy under artifacts/release-gates/<runId>/.

Filtered gate slices are reject-only. If a selected blocking scenario fails, the verdict is DO_NOT_SHIP; if the selected slice passes but required gate coverage is missing, the verdict is PARTIAL rather than SHIP.

Release gates check required surface/scenario/state/platform coverage, not only command exit status. report paste and report summarize --json include a concise failure brief with exact evidence, subsystem grouping, and fixer-ready prompts.

Gateway readiness is classified. Kova polls TCP listening and /health until a hard deadline, while separately enforcing the scenario readiness threshold. Reports distinguish hard failures, unhealthy gateways, slow startup, and ready gateways, with time-to-listening and time-to-health-ready evidence.

Kova destroys temporary envs by default after execution. Keep an env for debugging only when needed:

node bin/kova.mjs run --target npm:2026.4.27 --scenario fresh-install --execute --keep-env
node bin/kova.mjs run --target npm:2026.4.27 --scenario fresh-install --execute --retain-on-failure

Target Selectors

npm:<version>              published OpenClaw release
channel:<name>             published channel such as stable or beta
runtime:<name>             existing OCM runtime name
local-build:<repo-path>    OpenClaw checkout built as a release-shaped runtime

Examples:

node bin/kova.mjs run --target npm:2026.4.26 --scenario fresh-install --execute
node bin/kova.mjs run --target channel:beta --scenario gateway-performance --execute
node bin/kova.mjs run --target runtime:test-build-1 --scenario plugin-lifecycle --execute
node bin/kova.mjs run --target local-build:/path/to/openclaw --scenario fresh-install --execute

Existing User Upgrade

Existing-user scenarios must clone a source env. Do not run upgrade scenarios directly against durable user envs.

node bin/kova.mjs run \
  --scenario upgrade-existing-user \
  --source-env Violet \
  --from npm:2026.4.20 \
  --target npm:2026.4.27 \
  --execute

Executed scenarios refuse to mutate non-kova- env targets. A durable env such as Violet can be used only as clone source state; Kova mutates the generated disposable clone.

Reports

Reports are written to reports/:

  • Markdown for humans
  • JSON for agents, CI, and regression comparison

Reports should answer:

  • what OpenClaw runtime was tested
  • what scenario ran
  • what passed, failed, or blocked
  • what command failed
  • what evidence was captured
  • what OpenClaw area likely owns the issue
  • whether temporary envs were cleaned up

Agents should use node bin/kova.mjs plan --json to choose scenarios and then read the generated JSON report after run. Markdown is intentionally compact. Use run --json when an agent needs stable report paths without parsing text.

Summarize generated reports:

node bin/kova.mjs report summarize reports/<run>.json
node bin/kova.mjs report summarize reports/<run>.json --json
node bin/kova.mjs report paste reports/<run>.json
node bin/kova.mjs report compare reports/<baseline>.json reports/<current>.json
node bin/kova.mjs report bundle reports/<run>.json

report paste produces a short handoff summary for another agent or fixer. report compare flags status and metric regressions between two Kova JSON reports. report bundle packages the JSON report, Markdown report, paste summary, and run artifacts for handoff.

Current Status

The repo has the first production skeleton:

  • scenario matrix
  • OCM-backed command execution
  • timeout handling
  • stdout/stderr capture
  • gateway service snapshots
  • gateway health snapshots
  • gateway health latency samples
  • readiness classification for hard failure, unhealthy, slow startup, and ready
  • gateway log diagnostic counts
  • gateway PID/RSS/CPU metrics on executed scenarios
  • continuous resource sampling during commands
  • optional Node CPU, heap, and trace profile artifacts with --node-profile
  • --deep-profile for CPU/heap/trace profiling, diagnostic reports, heap snapshots, OpenClaw timeline envs, and denser resource sampling
  • optional OpenClaw diagnostics timeline ingestion
  • diagnostic correlation summaries that connect resource peaks, top profiler functions, OpenClaw spans, event-loop delay, runtime deps, and provider/model timing when available
  • threshold evaluation for command latency, peak RSS, missing dependency errors, and final gateway state
  • Markdown and JSON reports
  • release gate verdicts and failure cards through matrix run --gate
  • explicit execution mode
  • default cleanup of temporary envs

Next OpenClaw-side work should expand diagnostics timeline emission so Kova can attribute every slow startup phase to concrete OpenClaw spans rather than only external process/profile evidence.