Switch pipeline to AWS AMI and remove Hetzner path

This commit is contained in:
Josh Palmer 2026-01-07 21:51:04 +01:00
parent de46f25dbc
commit 486e91508e
18 changed files with 208 additions and 197 deletions

View File

@ -24,9 +24,7 @@ jobs:
run: |
nix profile install \
nixpkgs#nixos-generators \
nixpkgs#zstd \
nixpkgs#awscli2 \
nixpkgs#hcloud
nixpkgs#awscli2
- name: Build image
run: scripts/build-image.sh
@ -38,15 +36,19 @@ jobs:
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
S3_PREFIX: ${{ secrets.S3_PREFIX }}
S3_PUBLIC_URL: ${{ secrets.S3_PUBLIC_URL }}
run: |
url="$(scripts/upload-image.sh)"
echo "IMAGE_URL=${url}" >> "${GITHUB_ENV}"
key="$(scripts/upload-image.sh)"
echo "S3_KEY=${key}" >> "${GITHUB_ENV}"
- name: Import image into Hetzner
- name: Import image into AMI
env:
HCLOUD_TOKEN: ${{ secrets.HCLOUD_TOKEN }}
HCLOUD_LOCATION: nbg1
IMAGE_DESCRIPTION: clawdinator-nixos
IMAGE_LABELS: clawdinator=true
run: scripts/import-image.sh "${IMAGE_URL}"
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
S3_BUCKET: ${{ secrets.S3_BUCKET }}
S3_KEY: ${{ env.S3_KEY }}
VMIMPORT_ROLE: ${{ secrets.VMIMPORT_ROLE }}
AMI_DESCRIPTION: clawdinator-nixos
run: |
ami_id="$(scripts/import-image.sh)"
echo "AMI_ID=${ami_id}" >> "${GITHUB_ENV}"

View File

@ -37,13 +37,12 @@ The Zen of ~~Python~~ Clawdbot, ~~by~~ shamelessly stolen from Tim Peters:
- Namespaces are one honking great idea -- let's do more of those!
Deploy flow (automation-first):
- Use `devenv.nix` for tooling (hcloud, nixos-generators, zstd).
- Build a bootstrap NixOS image with nixos-generators (raw-efi), compress it, and upload to a public URL.
- Use `devenv.nix` for tooling (nixos-generators, awscli2).
- Build a bootstrap NixOS image with nixos-generators (raw-efi) and upload it to S3.
- Use `nix/hosts/clawdinator-1-image.nix` for image builds.
- CI is preferred: `.github/workflows/image-build.yml` runs build → S3 upload → Hetzner import (via `hcloud-upload-image`).
- Bootstrap S3 bucket + scoped IAM user with `infra/opentofu/aws` (use homelab-admin creds).
- Import the image into Hetzner with `hcloud image create`.
- Provision host with OpenTofu (`infra/opentofu`; set `HCLOUD_TOKEN`, no tfvars with secrets).
- CI is preferred: `.github/workflows/image-build.yml` runs build → S3 upload → AMI import.
- Bootstrap S3 bucket + scoped IAM user + VM Import role with `infra/opentofu/aws` (use homelab-admin creds).
- Import the image into AWS as an AMI (`aws ec2 import-image`).
- Grab the host SSH key and add it to `../nix/nix-secrets/secrets.nix`; rekey secrets with agenix.
- Ensure required secrets exist: `clawdinator-github-app.pem`, `clawdinator-discord-token`, `anthropic-api-key`.
- Update `nix/hosts/<host>.nix` (Discord allowlist, GitHub App installationId, identity name).
@ -54,3 +53,4 @@ Deploy flow (automation-first):
Key principle: mental notes dont survive restarts — write it to a file.
Cattle vs pets: hosts are disposable. Prefer re-provisioning from OpenTofu + NixOS configs over in-place manual fixes.
One way only: AWS AMI pipeline via S3 + VM Import. No Hetzner, no rescue-mode hacks, no legacy paths.

View File

@ -9,7 +9,7 @@ Principles:
- Latest upstream nixclawdbot by default; breaking changes are acceptable.
Stack:
- Hetzner hosts provisioned with OpenTofu.
- AWS AMIs built in CI (nixos-generators + import-image).
- NixOS modules configure Clawdbot and CLAWDINATOR runtime.
- Shared hivemind memory stored on a mounted host volume.
@ -33,14 +33,13 @@ Secrets (required):
- GitHub App private key (for shortlived installation tokens).
- Discord bot token (per instance).
- Anthropic API key (Claude models).
- Hetzner API token (OpenTofu).
- AWS credentials (image pipeline + infra).
Secrets are stored in `../nix/nix-secrets` using agenix and decrypted to `/run/agenix/*`
on hosts. See `docs/SECRETS.md`.
Deploy (automationfirst):
- Prefer image-based provisioning for speed and repeatability.
- `infra/opentofu` provisions Hetzner hosts from a custom image.
- Host config lives in `nix/hosts/*` and is exposed in `flake.nix`.
- Ensure `/var/lib/clawd/repo` contains this repo (needed for selfupdate).
- Configure Discord guild/channel allowlist and GitHub App installation ID.
@ -48,21 +47,18 @@ Deploy (automationfirst):
Image-based deploy (Option A, recommended):
1) Build a bootstrap image with nixos-generators:
- `nix run github:nix-community/nixos-generators -- -f raw-efi -c nix/hosts/clawdinator-1-image.nix -o dist`
2) Compress the image:
- `zstd dist/nixos.img -o dist/nixos.img.zst`
3) Upload the image to S3 (private object; use a presigned URL for import).
4) Import into Hetzner:
- Use `hcloud-upload-image` (creates a snapshot image via a temporary server).
5) Point OpenTofu at the image name or id and provision.
6) Re-key agenix secrets to the new host SSH key and sync secrets to `/var/lib/clawd/nix-secrets`.
7) Run `nixos-rebuild switch --flake /var/lib/clawd/repo#clawdinator-1`.
2) Upload the raw image to S3 (private object).
3) Import into AWS as an AMI (`aws ec2 import-image`).
4) Launch hosts from the AMI.
5) Re-key agenix secrets to the new host SSH key and sync secrets to `/var/lib/clawd/nix-secrets`.
6) Run `nixos-rebuild switch --flake /var/lib/clawd/repo#clawdinator-1`.
CI (recommended):
- GitHub Actions builds the image, uploads to S3, and imports into Hetzner.
- GitHub Actions builds the image, uploads to S3, and imports an AMI.
- See `.github/workflows/image-build.yml` and `scripts/*.sh`.
AWS bucket bootstrap:
- `infra/opentofu/aws` provisions a private S3 bucket + scoped IAM user for CI uploads.
- `infra/opentofu/aws` provisions a private S3 bucket + scoped IAM user + VM Import role.
Docs:
- `docs/PHILOSOPHY.md`
@ -73,7 +69,7 @@ Docs:
- `docs/SKILLS_AUDIT.md`
Repo layout:
- `infra/opentofu` — Hetzner provisioning
- `infra/opentofu/aws` — S3 bucket + IAM + VM import role
- `nix/modules/clawdinator.nix` — NixOS module
- `nix/hosts/` — host configs
- `nix/examples/` — example host + flake wiring

View File

@ -1,10 +1,8 @@
{ pkgs, ... }:
{
packages = [
pkgs.hcloud
pkgs.nixos-generators
pkgs.zstd
pkgs.curl
pkgs.awscli2
pkgs.curl
];
}

View File

@ -1,13 +1,13 @@
# Architecture (Draft)
Goal: declaratively spawn CLAWDINATOR instances on Hetzner using OpenTofu + NixOS.
Goal: declaratively spawn CLAWDINATOR instances on AWS using OpenTofu + NixOS.
Operating mode:
- declarative-first, no manual setup
- machines are created by automation (another CLAWDINATOR)
Core pieces:
- OpenTofu provisions Hetzner hosts from a prebuilt NixOS image.
- AWS AMIs are built from a prebuilt NixOS image (nixos-generators + import-image).
- NixOS modules configure clawdbot + CLAWDINATOR runtime on each host.
- Shared memory is mounted at a consistent path on all hosts.
@ -18,8 +18,7 @@ Runtime layout (planned):
- /var/lib/clawd/repo (this repo for self-update)
Storage:
- POC uses one Hetzner volume per host, mounted at /var/lib/clawd.
- Volume device path follows the host name (e.g. /dev/disk/by-id/scsi-0HC_Volume_clawdinator-1).
- POC uses one host volume per instance (e.g., EBS), mounted at /var/lib/clawd.
- In multi-host mode, add a shared filesystem or object-sync layer and keep canonical memory files authoritative.
Instance naming:

View File

@ -1,7 +1,7 @@
# POC: CLAWDINATOR-1
Acceptance criteria:
- One Hetzner host provisioned via OpenTofu using a custom image.
- One AWS host provisioned from an AMI built from this repo.
- NixOS config applied via Nix (module or flake).
- CLAWDINATOR-1 connects to Discord #clawdributors-test.
- GitHub integration is read-only.
@ -12,21 +12,21 @@ Secrets needed (initially):
- Discord bot token (per instance).
- GitHub token (PAT or App installation token).
- Anthropic API key.
- Hetzner API token.
- AWS credentials (image pipeline + infra).
Secrets wiring:
- Infra: HCLOUD_TOKEN env var for OpenTofu and hcloud CLI.
- Infra: AWS credentials for OpenTofu and CI.
Image pipeline:
- Build a bootstrap image with nixos-generators (raw-efi) from `nix/hosts/clawdinator-1-image.nix`, compress, upload, import into Hetzner using `hcloud-upload-image`.
- OpenTofu provisions instances from the imported custom image, then nixos-rebuild applies full config.
- Build a bootstrap image with nixos-generators (raw-efi) from `nix/hosts/clawdinator-1-image.nix`, upload to S3, import as an AMI via `aws ec2 import-image`.
- Launch instances from the AMI, then nixos-rebuild applies full config.
- Runtime: explicit token files via agenix (standard).
- GitHub token is required. Prefer GitHub App (`services.clawdinator.githubApp.*`) to mint short-lived tokens.
- Store PEM and tokens in the local secrets repo (see docs/SECRETS.md) and decrypt to `/run/agenix/*`.
- Discord token is required: set `services.clawdinator.discordTokenFile` to `/run/agenix/clawdinator-discord-token`.
Deliverables:
- Infra code in infra/opentofu.
- Infra code in infra/opentofu/aws.
- Nix module in nix/.
- CLAWDINATOR config in clawdinator/.

View File

@ -3,14 +3,13 @@
Principle: secrets never land in git. One secret per file, decrypted at runtime.
Infrastructure (OpenTofu):
- `HCLOUD_TOKEN` via environment variable (required).
- AWS credentials via environment variable (required for `infra/opentofu/aws`).
- Do NOT commit `*.tfvars` with secrets.
Image pipeline (CI):
- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` / `AWS_REGION` / `S3_BUCKET` (required).
- `S3_PREFIX` (optional).
- `S3_PUBLIC_URL` (optional; if unset, use a presigned URL for a private object).
- `HCLOUD_TOKEN` (required for `hcloud image create`).
- `VMIMPORT_ROLE` (optional; defaults to `vmimport`).
Local storage:
- Keep AWS keys encrypted in `../nix/nix-secrets` for local runs if needed.
@ -36,7 +35,7 @@ Agenix (local secrets repo):
- Sync encrypted secrets to the host at `/var/lib/clawd/nix-secrets`.
- Decrypt on host with agenix; point NixOS options at `/run/agenix/*`.
- Required files (minimum): `clawdinator-github-app.pem.age`, `clawdinator-discord-token.age`, `clawdis-anthropic-api-key.age`.
- CI image pipeline (stored locally, not on hosts): `clawdinator-image-uploader-access-key-id.age`, `clawdinator-image-uploader-secret-access-key.age`, `clawdinator-image-bucket-name.age`, `clawdinator-image-bucket-region.age`.
- CI image pipeline (stored locally, not on hosts): `clawdinator-ami-importer-access-key-id.age`, `clawdinator-ami-importer-secret-access-key.age`, `clawdinator-image-bucket-name.age`, `clawdinator-image-bucket-region.age`.
Example NixOS wiring (agenix):
```

View File

@ -1,27 +0,0 @@
# OpenTofu (Hetzner)
This is the minimal, proven bootstrap: create a Hetzner host from a custom NixOS image.
No extra volumes, no fancy wiring.
Prereqs:
- OpenTofu >= 1.6
- Hetzner API token (use `HCLOUD_TOKEN` env var)
- Existing SSH key in Hetzner (set `ssh_key_name` to match)
Usage:
- export HCLOUD_TOKEN=...
- tofu init
- tofu apply
Inputs (defaults are sane for local dev):
- `name` (default: `clawdinator-1`)
- `server_type` (default: `cpx22`)
- `location` (default: `nbg1`)
- `ssh_key_name` (default: `clawdinator-deploy`)
- `image` (default: `clawdinator-nixos`)
Outputs:
- `server_name`
- `server_ipv4`
After apply, the machine boots directly into NixOS from the custom image. Then use the repo flake to configure it.

View File

@ -1,6 +1,6 @@
# OpenTofu (AWS S3 Image Bucket)
Goal: create a private S3 bucket for CLAWDINATOR images and a scoped IAM user for CI uploads.
Goal: create a private S3 bucket for CLAWDINATOR images, a scoped IAM user for CI, and the VM Import role.
Prereqs:
- AWS credentials with permissions to create S3 + IAM (use your homelab-admin key locally).
@ -17,6 +17,7 @@ Outputs:
- `aws_region`
- `access_key_id`
- `secret_access_key`
- `vmimport_role`
CI wiring:
- Set GitHub Actions secrets:
@ -24,3 +25,4 @@ CI wiring:
- `AWS_SECRET_ACCESS_KEY` = output `secret_access_key`
- `AWS_REGION` = output `aws_region`
- `S3_BUCKET` = output `bucket_name`
- `VMIMPORT_ROLE` = output `vmimport_role`

View File

@ -42,16 +42,62 @@ resource "aws_s3_bucket_versioning" "image_bucket" {
}
}
resource "aws_iam_user" "image_uploader" {
name = "clawdinator-image-uploader"
data "aws_iam_policy_document" "vmimport_assume" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["vmie.amazonaws.com"]
}
}
}
resource "aws_iam_role" "vmimport" {
name = "vmimport"
assume_role_policy = data.aws_iam_policy_document.vmimport_assume.json
tags = local.tags
}
data "aws_iam_policy_document" "vmimport" {
statement {
actions = [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket"
]
resources = [
aws_s3_bucket.image_bucket.arn,
"${aws_s3_bucket.image_bucket.arn}/*"
]
}
statement {
actions = [
"ec2:ModifySnapshotAttribute",
"ec2:CopySnapshot",
"ec2:RegisterImage",
"ec2:Describe*"
]
resources = ["*"]
}
}
resource "aws_iam_role_policy" "vmimport" {
name = "clawdinator-vmimport"
role = aws_iam_role.vmimport.id
policy = data.aws_iam_policy_document.vmimport.json
}
resource "aws_iam_user" "ami_importer" {
name = "clawdinator-ami-importer"
tags = local.tags
}
resource "aws_iam_access_key" "image_uploader" {
user = aws_iam_user.image_uploader.name
resource "aws_iam_access_key" "ami_importer" {
user = aws_iam_user.ami_importer.name
}
data "aws_iam_policy_document" "image_bucket_rw" {
data "aws_iam_policy_document" "ami_importer" {
statement {
sid = "ListBucket"
actions = [
@ -72,10 +118,27 @@ data "aws_iam_policy_document" "image_bucket_rw" {
]
resources = ["${aws_s3_bucket.image_bucket.arn}/*"]
}
statement {
sid = "ImportImage"
actions = [
"ec2:ImportImage",
"ec2:DescribeImportImageTasks",
"ec2:DescribeImages",
"ec2:CreateTags"
]
resources = ["*"]
}
statement {
sid = "PassVmImportRole"
actions = ["iam:PassRole"]
resources = [aws_iam_role.vmimport.arn]
}
}
resource "aws_iam_user_policy" "image_uploader" {
name = "clawdinator-image-bucket-rw"
user = aws_iam_user.image_uploader.name
policy = data.aws_iam_policy_document.image_bucket_rw.json
resource "aws_iam_user_policy" "ami_importer" {
name = "clawdinator-ami-importer"
user = aws_iam_user.ami_importer.name
policy = data.aws_iam_policy_document.ami_importer.json
}

View File

@ -7,13 +7,18 @@ output "aws_region" {
}
output "access_key_id" {
value = aws_iam_access_key.image_uploader.id
value = aws_iam_access_key.ami_importer.id
sensitive = true
description = "Use in CI as AWS_ACCESS_KEY_ID."
}
output "secret_access_key" {
value = aws_iam_access_key.image_uploader.secret
value = aws_iam_access_key.ami_importer.secret
sensitive = true
description = "Use in CI as AWS_SECRET_ACCESS_KEY."
}
output "vmimport_role" {
value = aws_iam_role.vmimport.name
description = "Use in CI as VMIMPORT_ROLE."
}

View File

@ -1,15 +0,0 @@
provider "hcloud" {
token = var.hcloud_token
}
data "hcloud_ssh_key" "deploy" {
name = var.ssh_key_name
}
resource "hcloud_server" "clawdinator" {
name = var.name
server_type = var.server_type
image = var.image
location = var.location
ssh_keys = [data.hcloud_ssh_key.deploy.id]
}

View File

@ -1,7 +0,0 @@
output "server_name" {
value = hcloud_server.clawdinator.name
}
output "server_ipv4" {
value = hcloud_server.clawdinator.ipv4_address
}

View File

@ -1,36 +0,0 @@
variable "hcloud_token" {
description = "Hetzner API token. Prefer setting HCLOUD_TOKEN env var instead of tfvars."
type = string
sensitive = true
default = null
}
variable "ssh_key_name" {
description = "Name of the existing Hetzner SSH key to attach."
type = string
default = "clawdinator-deploy"
}
variable "name" {
description = "Server name."
type = string
default = "clawdinator-1"
}
variable "server_type" {
description = "Hetzner server type."
type = string
default = "cpx22"
}
variable "image" {
description = "Custom Hetzner image name or id (imported via hcloud)."
type = string
default = "clawdinator-nixos"
}
variable "location" {
description = "Hetzner location (e.g., fsn1, nbg1, hel1)."
type = string
default = "nbg1"
}

View File

@ -1,10 +0,0 @@
terraform {
required_version = ">= 1.6.0"
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.48"
}
}
}

View File

@ -2,7 +2,7 @@
Canonical architecture decisions and invariants for CLAWDINATOR.
- Infra: OpenTofu + Hetzner for host provisioning.
- Infra: OpenTofu + AWS AMI pipeline for host provisioning.
- Config: NixOS modules/flake, tracking latest nix-clawdbot.
- Runtime: Clawdbot gateway + CLAWDINATOR service.
- Memory: shared filesystem under /var/lib/clawd/memory.

83
scripts/import-image.sh Executable file → Normal file
View File

@ -1,23 +1,74 @@
#!/usr/bin/env bash
set -euo pipefail
image_url="${1:-}"
if [ -z "${image_url}" ]; then
echo "Usage: import-image.sh <image-url>" >&2
bucket="${S3_BUCKET:?S3_BUCKET required}"
key="${S3_KEY:?S3_KEY required}"
region="${AWS_REGION:?AWS_REGION required}"
role_name="${VMIMPORT_ROLE:-vmimport}"
boot_mode="${AMI_BOOT_MODE:-uefi}"
arch="${AMI_ARCH:-x86_64}"
timestamp="$(date -u +%Y%m%d%H%M%S)"
ami_name="${AMI_NAME:-clawdinator-nixos-${timestamp}}"
ami_description="${AMI_DESCRIPTION:-clawdinator-nixos}"
task_id="$(
aws ec2 import-image \
--region "${region}" \
--description "${ami_description}" \
--boot-mode "${boot_mode}" \
--architecture "${arch}" \
--role-name "${role_name}" \
--disk-containers "Format=raw,UserBucket={S3Bucket=${bucket},S3Key=${key}}" \
--query 'ImportTaskId' \
--output text
)"
if [ -z "${task_id}" ] || [ "${task_id}" = "None" ]; then
echo "Failed to start import-image task." >&2
exit 1
fi
location="${HCLOUD_LOCATION:-nbg1}"
description="${IMAGE_DESCRIPTION:-clawdinator-nixos}"
labels="${IMAGE_LABELS:-clawdinator=true}"
for _ in {1..120}; do
status="$(aws ec2 describe-import-image-tasks \
--region "${region}" \
--import-task-ids "${task_id}" \
--query 'ImportImageTasks[0].Status' \
--output text)"
docker run --rm \
-e HCLOUD_TOKEN="${HCLOUD_TOKEN:?HCLOUD_TOKEN required}" \
ghcr.io/apricote/hcloud-upload-image:latest \
upload \
--image-url "${image_url}" \
--architecture x86 \
--compression zstd \
--location "${location}" \
--description "${description}" \
--labels "${labels}"
case "${status}" in
completed)
image_id="$(aws ec2 describe-import-image-tasks \
--region "${region}" \
--import-task-ids "${task_id}" \
--query 'ImportImageTasks[0].ImageId' \
--output text)"
if [ -n "${image_id}" ] && [ "${image_id}" != "None" ]; then
aws ec2 create-tags \
--region "${region}" \
--resources "${image_id}" \
--tags "Key=Name,Value=${ami_name}" "Key=clawdinator,Value=true"
echo "${image_id}"
exit 0
fi
echo "Import completed but ImageId is missing." >&2
exit 1
;;
deleted|deleting|error)
message="$(aws ec2 describe-import-image-tasks \
--region "${region}" \
--import-task-ids "${task_id}" \
--query 'ImportImageTasks[0].StatusMessage' \
--output text)"
echo "Import failed: ${status} - ${message}" >&2
exit 1
;;
*)
sleep 30
;;
esac
done
echo "Timed out waiting for AMI import to complete (task ${task_id})." >&2
exit 1

39
scripts/upload-image.sh Executable file → Normal file
View File

@ -1,36 +1,27 @@
#!/usr/bin/env bash
set -euo pipefail
bucket="${S3_BUCKET:-}"
region="${AWS_REGION:-}"
prefix="${S3_PREFIX:-clawdinator-images}"
out_dir="${OUT_DIR:-dist}"
image_path="${out_dir}/nixos.img"
if [ -z "${bucket}" ] || [ -z "${region}" ]; then
echo "S3_BUCKET and AWS_REGION are required." >&2
if [ ! -f "${image_path}" ]; then
echo "Expected image at ${image_path}" >&2
exit 1
fi
img_path="${out_dir}/nixos.img"
tmp_dir="$(mktemp -d)"
zst_path="${tmp_dir}/nixos.img.zst"
bucket="${S3_BUCKET:?S3_BUCKET required}"
region="${AWS_REGION:?AWS_REGION required}"
prefix="${S3_PREFIX:-}"
if [ ! -f "${img_path}" ]; then
echo "Missing ${img_path}. Run build-image.sh first." >&2
exit 1
timestamp="$(date -u +%Y%m%d%H%M%S)"
key_prefix="${prefix%/}"
if [ -n "${key_prefix}" ]; then
key_prefix="${key_prefix}/"
fi
key="${key_prefix}clawdinator-nixos-${timestamp}.img"
zstd -c "${img_path}" > "${zst_path}"
aws s3 cp "${image_path}" "s3://${bucket}/${key}" \
--region "${region}" \
--only-show-errors
timestamp="$(date -u +%Y%m%d-%H%M%S)"
object_key="${prefix}/nixos-${timestamp}.img.zst"
aws s3 cp "${zst_path}" "s3://${bucket}/${object_key}" --region "${region}" --only-show-errors 1>&2
rm -rf "${tmp_dir}"
if [ -n "${S3_PUBLIC_URL:-}" ]; then
echo "${S3_PUBLIC_URL}"
exit 0
fi
aws s3 presign "s3://${bucket}/${object_key}" --region "${region}" --expires-in 3600
echo "${key}"