Commit Graph

275 Commits

Author SHA1 Message Date
joshp123
280744ce0c infra: slim clawdinators aws footprint
What:
- bound CLAWDINATOR image artifact retention with S3 lifecycle, AMI pruning, and import provenance tags
- reduce the AWS fleet to Babelfish-only and make GitHub credentials opt-in per host
- disable the AMI build, nix-openclaw bump, and release workflows by moving them out of .github/workflows/
- update operator docs for the new explicit build and deploy model

Why:
- stop unbounded S3 and snapshot growth from image builds
- remove unattended resurrection paths and shut down the unused t3.large instances
- keep the remaining Babelfish host running without GitHub App credentials or sync timers

Tests:
- `nix shell nixpkgs#shellcheck nixpkgs#shfmt -c bash scripts/lint-shell.sh` (pass)
- `nix build .#nixosConfigurations.clawdinator-babelfish.config.system.build.toplevel .#nixosConfigurations.clawdinator-1.config.system.build.toplevel .#nixosConfigurations.clawdinator-2.config.system.build.toplevel` (pass)
- `AWS_PROFILE=homelab-admin TF_VAR_aws_region=eu-central-1 TF_VAR_ami_id=ami-0a9abe17feeee0079 TF_VAR_ssh_public_key="$(cat ~/.ssh/id_ed25519.pub)" nix shell nixpkgs#opentofu -c sh -lc 'tofu fmt -check && tofu validate'` (pass)
- live AWS apply: destroyed `clawdinator-1` and `clawdinator-2`, replaced Babelfish, and verified only `Fleet Deploy` remains active in GitHub Actions
2026-04-03 15:38:57 +02:00
joshp123
4a40ae24e2 🤖 config: restrict main clawdinator discord scope to clawdinators-test
What:
- remove #clawdributors-test and #clawdributors channel IDs from `nix/hosts/clawdinator-common.nix`
- keep only channel `1458426982579830908` (#clawdinators-test) in the main Discord allowlist
- simplify now-unused sendPolicy deny rules tied to removed channels
- align docs/memory/workspace references to #clawdinators-test only

Why:
- enforce single-channel listening surface for main clawdinator instances
- eliminate stale channel references that could cause operator confusion
- keep runtime config and docs aligned

Tests:
- nix shell nixpkgs#shellcheck nixpkgs#shfmt -c bash scripts/lint-shell.sh (pass)
- nix eval --raw .#nixosConfigurations.clawdinator-1.config.system.build.toplevel.drvPath --accept-flake-config >/dev/null (pass)
- nix eval --raw .#nixosConfigurations.clawdinator-2.config.system.build.toplevel.drvPath --accept-flake-config >/dev/null (pass)
2026-02-23 17:20:38 +01:00
joshp123
33755bec7a 🤖 fix: remove inline remote deploy logic from fleet switch
What:
- move host-side nixos switch + revision verification into scripts/remote-fleet-switch-host.sh
- update scripts/fleet-switch-nixos.sh to fetch and execute the committed remote script at the target git rev
- keep canary host loop behavior unchanged while eliminating inline remote bash payload logic

Why:
- prevent local shell interpolation bugs in deploy assertions
- align deploy flow with repo rule: put logic in script files and call them
- make host-side deploy verification easier to audit and reason about

Tests:
- nix shell nixpkgs#shellcheck nixpkgs#shfmt -c sh -c "find scripts -type f -name *.sh -print0 | xargs -0 shellcheck -S warning && find scripts -type f -name *.sh -print0 | xargs -0 shfmt -i 2 -ci -sr -d"
2026-02-16 08:59:22 -08:00
joshp123
5446b35ffe 🤖 chore: bump nix-openclaw to openai reasoning replay fix
What:
- update flake.lock nix-openclaw input from 2a9a3be to 8d7489b
- pull in openclaw pin bump that includes PR #17792 reasoning replay follower-id fix

Why:
- propagate the merged openclaw replay fix through nix-openclaw into clawdinators
- keep deployment source-of-truth aligned across the repo chain

Tests:
- nix flake lock --update-input nix-openclaw
- nix eval .#nixosConfigurations.clawdinator-1.config.system.configurationRevision --raw
- nix build .#openclaw-gateway (fails on darwin: attribute not provided for current system)
2026-02-15 23:15:45 -08:00
joshp123
9d1ee1023e Disable control API in fleet deploy workflow
CI user cannot create IAM roles/users required for the control API. Keep TF_VAR_control_api_enabled=false so fleet AMI redeploys are reliable.
2026-02-15 18:29:39 -08:00
joshp123
0f7e6570eb Fix CI S3 permissions for pr-intent bucket reads
Terraform refresh calls GetAccelerateConfiguration (and other non-GetBucket* APIs). Grant s3:Get*/s3:Put* on the pr-intent bucket ARN so fleet deploy tofu apply can refresh bucket config.
2026-02-15 18:08:05 -08:00
joshp123
ce846a36dc Switch CI IAM policy to managed policy
Inline IAM user policies hit the 2048 byte size limit. Replace aws_iam_user_policy with an aws_iam_policy + aws_iam_user_policy_attachment for the CI user.
2026-02-15 17:55:28 -08:00
joshp123
233d0d6da8 Fix pr-intent bucket name default
Default pr_intent_bucket_name to openclaw-pr-intent to match the live bucket and avoid accidental bucket replacement.
2026-02-15 17:53:45 -08:00
joshp123
7dbedacdff Fix CI tofu permissions for pr-intent public bucket
Grant CI user bucket-management read/write actions (GetBucket*/PutBucket*) on the public PR-intent bucket so fleet deploy can run tofu apply without AccessDenied.
2026-02-15 17:52:21 -08:00
joshp123
833264bbe3 Make seed-workspace resilient to permission drift
Retry rsync without --delete on exit 23 so the gateway does not crash-loop if workspace contains root-owned files.
2026-02-15 17:15:12 -08:00
joshp123
6cd6b7fada Fix jq precedence in fleet-status
Wrap defaulting expression in parentheses so jq parses correctly.
2026-02-15 17:13:09 -08:00
joshp123
028880fef3 Inline shell lint into release/image workflows
Drop dedicated shell-lint workflow; run scripts/lint-shell.sh as an early step in release.yml and image-build.yml.
2026-02-15 15:56:37 -08:00
joshp123
52f5168cd2 Add shellcheck + shfmt linting for scripts
Add CI workflow to run shellcheck + shfmt, plus a scripts/lint-shell.sh helper.

Also apply shfmt formatting and fix initial shellcheck warnings.
2026-02-15 15:51:40 -08:00
joshp123
c44d54319e Stamp deploy time and enrich version output
After nixos-rebuild switch, write /var/lib/clawd/deploy/last-switch.{time,rev}.

clawdinator-version now optionally fetches OpenClaw commit date via GitHub API when gh is authenticated.
2026-02-15 15:47:39 -08:00
joshp123
55788b92ff Teach workspace how to report running versions
Switch startup checklist to use clawdinator-version and drop the deprecated self-update unit check.
2026-02-15 15:46:14 -08:00
joshp123
eb3c79c5f5 Add version introspection tool + build info
Expose pinned component revs via /etc/clawdinator/build-info.json and ship a clawdinator-version helper script (logic lives in scripts/, not inline in Nix).

This supports fleet consistency checks and maintainer introspection.
2026-02-15 15:45:00 -08:00
joshp123
c3fd19af9f Disable host-local flake self-update
Rely on CI release pipeline for pinned, consistent versions across the fleet.
2026-02-15 15:33:13 -08:00
joshp123
3d64364853 Enable amazon-ssm-agent on hosts
Needed for SSM-based fleet deploy workflow.
2026-02-15 15:32:09 -08:00
joshp123
e126e33d54 Stamp deployed revision and verify after switch
Set system.configurationRevision from flake rev and have fleet switch verify it matches the deployed git SHA.
2026-02-15 15:31:39 -08:00
joshp123
e549dca9fd Fix SSM send-command quoting
Pass commands via JSON to avoid AWS CLI argument parsing issues.
2026-02-15 15:30:01 -08:00
joshp123
9245311395 Add fast release pipeline (bootstrap + SSM nixos-rebuild)
- Add release.yml: eval -> upload bootstrap -> deploy via SSM (canary order)
- Make image-build manual/weekly (base AMI lane)
- Add SSM permissions to CI IAM policy (requires tofu apply)
- Add scripts for SSM-based nixos-rebuild and docs for the two-lane model
2026-02-15 15:22:27 -08:00
joshp123
d7df4f0e13 Fix openclaw-gateway unit override merge
Remove conflicting systemd unit description override; keep only after/wants deps.
2026-02-15 15:03:56 -08:00
joshp123
fda12f98cb Use nix-openclaw NixOS module for gateway service
- Import nix-openclaw nixosModules.openclaw-gateway
- Replace custom systemd gateway service with upstream module
- Let upstream module own /etc/clawd/openclaw.json generation

This reduces duplication between clawdinators and nix-openclaw and aligns config merge semantics.
2026-02-15 14:56:00 -08:00
joshp123
c0794f84e2 Deep-merge OpenClaw config to avoid per-host clobber
Replace configFragments with a deep-merged config option type so host overrides (e.g. disabling telegram) don't drop sibling keys like channels.discord.

clawdinator-2: disable telegram; keep discord.
2026-02-15 14:24:54 -08:00
joshp123
e5e959f90a CI concurrency + deep-merge config fragments; fix clawdinator-2 channels
- Cancel in-progress image builds on new pushes (concurrency)
- Add services.clawdinator.configFragments for deep-merge tweaks
- Use configFragments in clawdinator-2 to disable telegram without clobbering discord

No host changes; intended to ship via next AMI build.
2026-02-15 13:33:38 -08:00
joshp123
5e1977a078 Un-deprecate landpr; hide distill skills from slash; seed ClawKeeper repo
- Revert accidental /landpr deprecation language and restore model invocation
- Make distill-pr-intent skills non user-invocable (still available to the model)
- Add openclaw/ClawKeeper to repo seeds list for AMI snapshots
2026-02-15 13:20:52 -08:00
joshp123
c3a4b7dbf1 Deprecate landpr from model prompt
Keep /landpr slash command available but prevent autonomous model invocation; steer default PR workflows to PR intent distillation.
2026-02-15 13:13:58 -08:00
joshp123
4bd99e8821 Tighten PR canned-response workflow guardrails
Make explicit-approval requirement unambiguous in workspace prompt.
2026-02-15 13:13:04 -08:00
joshp123
ac61f9551d Add PR intent distillation skills + prompt wiring
- Bundle distill-pr-intent + orchestrator as workspace skills
- Update CLAWDINATOR workspace AGENTS.md to reference dataset paths and skills
2026-02-15 13:11:11 -08:00
joshp123
5f99924bd1 Fix public S3 publisher service PATH
Include bash in systemd unit PATH so sync script runs on NixOS.
2026-02-15 12:45:24 -08:00
joshp123
ffb27ab614 Public PR intent S3 bucket + publisher timer
- Provision public S3 bucket (anonymous list/get) for PR intent artifacts
- Grant instance role PutObject and add NixOS systemd timer to publish /memory/pr-intent
- Default agent thinking level to high for GPT-5.2/Codex
- Make OpenTofu instance management explicit (manage_instances) to prevent accidental fleet destroy

Tests: not run (infra/Nix changes)
2026-02-15 12:44:11 -08:00
Josh Palmer
63fa64a0b1 tools: bump pi-coding-agent to 0.52.6
Why: keep fleet pi in sync with upstream fixes and model registry updates.

Notes:
- package-lock pins internal @mariozechner deps to 0.52.6 for determinism.

Tests:
- nix build pi-coding-agent derivation (darwin)
2026-02-06 09:59:21 -08:00
Josh Palmer
e1d3009c30 tools: bump pi-coding-agent to 0.52.0
Update the pinned pi CLI used on hosts so it matches local (0.52.x) and includes the openai-codex provider/model registry.
2026-02-05 18:45:02 -08:00
Josh Palmer
deebc3431f clawdinator: use claude-opus-4-6
Update default Anthropic fallback model to claude-opus-4-6 (available per Anthropic models API).
2026-02-05 17:50:11 -08:00
Josh Palmer
3bac990eca clawdinator: bump Opus model pin
Use the latest claude-opus-4-5 snapshot model ID in the default agent fallback list.
2026-02-05 16:47:48 -08:00
Josh Palmer
52198c23cb flake.lock: bump nix-openclaw
Pick up nix-openclaw updates (openclaw→pi pin bump).
2026-02-05 13:11:19 -08:00
Josh Palmer
aeeb41632e mem: name babelfish tech-中国 channel
Tests: not run (doc update)
2026-02-05 07:59:15 -08:00
Josh Palmer
1a64384a48 babelfish: allow second discord channel
Adds channel 1468983176620675132 to the babelfish allowlist.

Tests: not run (config/docs change)
2026-02-05 07:12:21 -08:00
Josh Palmer
7233368238 clawdinator-2: disable telegram
Prevent c2 from polling Telegram; c1 retains the token.

Tests: not run (config change)
2026-02-04 16:29:35 -08:00
Josh Palmer
dbda75c1df ops: log babelfish thread-starter redeploy
- record new babelfish instance after includeThreadStarter rollout

Tests: not run (doc update)
2026-02-04 14:30:49 -08:00
Josh Palmer
4884b6b65f clawdinator-babelfish: disable thread starter context
- bump nix-openclaw input for includeThreadStarter support
- disable thread starter injection for babelfish forum channel

Tests: not run (config change)
2026-02-04 14:20:19 -08:00
Josh Palmer
76e10eb42c ops: log babelfish forum-trim redeploy
Record new babelfish instance after context-trimming prompt.
2026-02-04 13:06:51 -08:00
Josh Palmer
debd806389 nix: trim babelfish context
Ignore thread starter metadata and disable envelope timestamps for forum threads.
2026-02-04 12:57:38 -08:00
Josh Palmer
509ee48696 ops: log babelfish sandbox redeploy
Record new babelfish instance after docker sandbox fix.
2026-02-04 12:48:54 -08:00
Josh Palmer
65cc44486f nix: disable babelfish sandbox
Avoid docker sandbox spawn so babelfish can reply.
2026-02-04 12:39:59 -08:00
Josh Palmer
d846155ad0 ops: log babelfish redeploy
Record new babelfish instance after config fix.
2026-02-04 12:31:51 -08:00
Josh Palmer
9bdf5a611a nix: fix babelfish config schema
- move tools to top-level, drop unsupported systemPrompt

- set groupChat historyLimit to 1
2026-02-04 12:25:33 -08:00
Josh Palmer
7a3ae4423a docs: capture discord channel ids
- add guild + babelfish channel ids
2026-02-04 12:20:26 -08:00
Josh Palmer
59bef8196b ops: record babelfish deploy
- add AMI + instance/IP for clawdinator-babelfish
2026-02-04 12:19:51 -08:00
Josh Palmer
446bad9107 nix: enable github app for babelfish
- satisfy clawdinator token assertion while keeping tools disabled
2026-02-04 12:08:41 -08:00