commit f862776e2a4bc32a1037eb0cf6be3d303dfd11b5 Author: mineracks <134782215+mineracks@users.noreply.github.com> Date: Mon Mar 30 21:16:24 2026 +1000 Breakglass FOSS Git Mirror v2 — append-only, tamper-resistant diff --git a/README.md b/README.md new file mode 100644 index 0000000..8d1ba6b --- /dev/null +++ b/README.md @@ -0,0 +1,171 @@ +# Breakglass FOSS Git Mirror v2 + +Append-only, tamper-resistant mirroring of GitHub repositories to a self-hosted Gitea instance. Designed to survive malicious upstream destruction. + +## Threat model + +This tool is built for the scenario where upstream repos you depend on are deliberately destroyed — whether by a compromised maintainer, a platform takedown, account suspension, or coerced force-push. Specifically: + +- **Upstream force-pushes empty history** → Your copy keeps all previous commits, branches, and tags via timestamped backup refs. The wipe is detected and blocked. +- **Upstream deletes branches or tags** → Your copy retains them. The sync script never deletes local refs. +- **Upstream repo is deleted entirely** → Fetch fails gracefully; your existing local copy and Gitea copy are untouched. +- **GitHub account is banned/suspended** → Same as deletion — your copies persist. +- **DMCA takedown** → Your pre-takedown copy is preserved. +- **Subtle history rewrite** (less than 50% of refs removed) → Still captured in backup refs, and the live refs are updated so you can diff the before and after. + +## How it works + +The system maintains bare git clones on a dedicated Ubuntu VM, sitting between GitHub (upstream) and your Gitea (your archive). + +``` +GitHub ──fetch──► Ubuntu VM (bare clones) ──push──► Your Gitea + │ + ├─ refs/heads/* (live branches) + ├─ refs/tags/* (live tags) + ├─ refs/backup//* (timestamped snapshots) + └─ audit logs (tamper-evident) +``` + +### The append-only guarantee + +Before every fetch from GitHub, the script snapshots all current refs into `refs/backup//`. These backup refs are pushed to Gitea alongside the live refs. + +After fetching, upstream changes are staged into a temporary namespace (`refs/upstream-staging/`) and compared against the existing state. If the upstream has lost more than 50% of its refs (configurable), the update is **blocked**, a notification is sent, and the previous state is preserved. This is the wipe detection. + +Local refs that upstream has deleted are **never removed locally**. The sync is additive only. + +### Wipe detection + +If a repo goes from 40 branches and 100 tags to 1 branch and 0 tags, that's a 97% loss — the script refuses to update live refs and alerts you. The threshold is configurable (`WIPE_THRESHOLD` in mirror.env, default 50%). + +Even below the threshold, every change is logged in the audit trail with before/after SHA hashes. + +### Tamper-evident audit + +Every sync writes a structured audit log recording exactly what happened: which refs were added, updated, or (on the upstream side) disappeared. Each audit file gets a SHA256 checksum appended to a checksums log. The health check verifies these haven't been tampered with. + +The audit directory is set with the `+a` (append-only) filesystem attribute during install, so even the breakglass service user can't delete or modify previous audit entries. + +## Quick start + +### Prerequisites + +- Fresh Ubuntu 22.04+ VM (2 GB RAM, 20+ GB disk) +- Your Gitea instance accessible via HTTPS +- Gitea personal access token (repo read/write scope) +- Optional: GitHub token for higher API rate limits + +### Install + +```bash +git clone +cd foss-breakglass-mirror-v2 +sudo bash install.sh +``` + +The installer handles everything interactively: packages, user creation, config, systemd timers. + +### What it creates + +``` +/opt/breakglass/scripts/ # sync and healthcheck scripts +/etc/breakglass/mirror.env # tokens, URLs, settings (mode 600) +/etc/breakglass/sources.yml # GitHub owners to mirror +/var/lib/breakglass/repos/ # bare git clones +/var/lib/breakglass/audit/ # tamper-evident audit logs (+a attr) +/var/log/breakglass/ # sync logs (90-day rotation) +``` + +Systemd timers: +- `breakglass-sync.timer` — daily at 04:00 UTC (with 30min random jitter) +- `breakglass-healthcheck.timer` — daily at 08:00 UTC + +## Configuration + +### sources.yml + +```yaml +owners: + - github: bitcoin + - github: sparrowwallet + - github: seedsigner + - github: seedhammer + + # With filters: + - github: some-large-org + include: + - "important-repo" + exclude: + - "test-*" +``` + +### mirror.env + +Key settings: + +| Variable | Purpose | Default | +|----------|---------|---------| +| `GITEA_URL` | Your Gitea instance URL | — | +| `GITEA_TOKEN` | Gitea API token | — | +| `GITHUB_TOKEN` | GitHub token (optional) | — | +| `WIPE_THRESHOLD` | Block sync if upstream loses >N% of refs | 50 | +| `NOTIFY_METHOD` | `ntfy`, `email`, `telegram`, or `none` | none | +| `STALE_DAYS` | Alert if a repo hasn't synced in N days | 7 | + +## Day-to-day commands + +```bash +# Check timer status +sudo systemctl status breakglass-sync.timer + +# Trigger immediate sync +sudo systemctl start breakglass-sync.service + +# Watch sync live +sudo journalctl -u breakglass-sync.service -f + +# Run health check +sudo systemctl start breakglass-healthcheck.service + +# View audit trail +ls -lt /var/lib/breakglass/audit/ + +# View recent sync logs +ls -lt /var/log/breakglass/ | head + +# Add a new GitHub org +sudo nano /etc/breakglass/sources.yml +``` + +## Health checks + +The healthcheck script (runs daily) verifies: + +1. Gitea is reachable +2. Sync timer is active +3. Recent sync logs exist +4. No repos have gone stale +5. Backup refs exist in all repos (append-only is working) +6. Audit log checksums haven't been tampered with +7. Local ref counts haven't decreased (local deletion detection) + +## What this does NOT protect against + +To be transparent about limitations: + +- **VM compromise**: If an attacker gets root on your mirror VM, they can delete everything. Mitigate with VM-level snapshots, ZFS snapshots, or offsite backups of `/var/lib/breakglass/repos/`. +- **Gitea compromise**: If someone gets admin on your Gitea, they could delete repos there. The bare clones on the VM are the primary archive; Gitea is a secondary copy and convenient browsing interface. +- **Disk failure**: Standard hardware risk. Use RAID or VM-level redundancy. +- **Repos you don't know about yet**: This only mirrors repos from the owners you've configured. If a new critical repo appears, you need to add the owner to sources.yml. + +For maximum paranoia, consider also running periodic `tar` backups of `/var/lib/breakglass/repos/` to an offsite location (S3, another server, external drive). + +## Differences from v1 (Umbrel version) + +The original ran inside Umbrel's managed Docker environment. Umbrel silently recycled containers and broke the automation after a few weeks. This version runs on a plain Ubuntu VM where nothing can interfere with the systemd timers or filesystem. + +Key improvements: wipe detection, audit trail, filesystem-level append-only on audit dir, staging namespace for safe fetch, health monitoring with notifications, and no `--mirror` flag (which enables destructive pruning). + +## License + +MIT diff --git a/config/mirror.env.example b/config/mirror.env.example new file mode 100644 index 0000000..094ab41 --- /dev/null +++ b/config/mirror.env.example @@ -0,0 +1,50 @@ +# ───────────────────────────────────────────────────────── +# mirror.env — Breakglass append-only mirror configuration +# ───────────────────────────────────────────────────────── +# Copy to /etc/breakglass/mirror.env and edit. +# Permissions should be 600 (readable only by breakglass user). +# ───────────────────────────────────────────────────────── + +# ── Gitea ──────────────────────────────────────────────── +GITEA_URL="https://git.mineracks.com" +GITEA_TOKEN="your-gitea-personal-access-token" +GITEA_USER="mineracks" + +# ── GitHub ─────────────────────────────────────────────── +# Optional — raises API rate limit from 60 → 5000 req/h +GITHUB_TOKEN="" + +# ── Paths ──────────────────────────────────────────────── +MIRROR_ROOT="/var/lib/breakglass/repos" +SOURCES_FILE="/etc/breakglass/sources.yml" +LOG_DIR="/var/log/breakglass" +AUDIT_DIR="/var/lib/breakglass/audit" + +# ── Wipe Detection ────────────────────────────────────── +# If upstream loses more than this % of refs in a single +# sync, block the update and alert. This catches malicious +# force-pushes and repo guttings. +# Set to 0 to disable (not recommended). +WIPE_THRESHOLD=50 + +# ── Notifications ──────────────────────────────────────── +# Supported: "ntfy", "email", "telegram", "none" +NOTIFY_METHOD="none" + +# ntfy (https://ntfy.sh) +NTFY_TOPIC="" +NTFY_SERVER="https://ntfy.sh" + +# Email (uses system sendmail/msmtp) +NOTIFY_EMAIL="" + +# Telegram +TELEGRAM_BOT_TOKEN="" +TELEGRAM_CHAT_ID="" + +# ── Tuning ─────────────────────────────────────────────── +# Stale-repo alert threshold in days +STALE_DAYS=7 + +# Force HTTP/1.1 (helps with Cloudflare Tunnel) +FORCE_HTTP11="true" diff --git a/config/sources.yml b/config/sources.yml new file mode 100644 index 0000000..9e302cc --- /dev/null +++ b/config/sources.yml @@ -0,0 +1,33 @@ +# ───────────────────────────────────────────────────────── +# sources.yml — GitHub owners/orgs to mirror +# ───────────────────────────────────────────────────────── +# +# Each entry creates a matching Gitea org (if it doesn't +# exist) and mirrors every public repo from that GitHub +# owner into it. +# +# Optional per-owner settings: +# include: only mirror repos matching these patterns +# exclude: skip repos matching these patterns +# (patterns are shell globs evaluated with bash [[ ]]) +# +# If neither include nor exclude is set, all repos are mirrored. +# ───────────────────────────────────────────────────────── + +owners: + - github: bitcoin + # gitea_org defaults to the github name if omitted + # gitea_org: bitcoin + + - github: sparrowwallet + + - github: seedsigner + + - github: seedhammer + + # ── Example with filters ────────────────────────────── + # - github: torvalds + # include: + # - "linux" + # exclude: + # - "test-*" diff --git a/install.sh b/install.sh new file mode 100644 index 0000000..0fa2a37 --- /dev/null +++ b/install.sh @@ -0,0 +1,194 @@ +#!/usr/bin/env bash +# ═══════════════════════════════════════════════════════════ +# install.sh — bootstrap Breakglass Mirror on a fresh Ubuntu +# ═══════════════════════════════════════════════════════════ +# Run as root (or with sudo) on a fresh Ubuntu 22.04+ VM. +# +# What this does: +# 1. Installs git, git-lfs, curl, jq +# 2. Creates a dedicated 'breakglass' system user +# 3. Copies scripts to /opt/breakglass/ +# 4. Sets up config directory at /etc/breakglass/ +# 5. Creates data and log directories +# 6. Installs and enables systemd timers +# 7. Prompts for Gitea token and writes initial config +# +# Usage: +# sudo bash install.sh +# ═══════════════════════════════════════════════════════════ +set -euo pipefail + +# ── Colour helpers ─────────────────────────────────────── +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +CYAN='\033[0;36m' +NC='\033[0m' + +info() { printf "${CYAN}▸${NC} %s\n" "$*"; } +ok() { printf "${GREEN}✓${NC} %s\n" "$*"; } +warn() { printf "${YELLOW}⚠${NC} %s\n" "$*"; } +err() { printf "${RED}✗${NC} %s\n" "$*" >&2; } + +# ── Preflight ──────────────────────────────────────────── +if [[ $EUID -ne 0 ]]; then + err "This script must be run as root (use sudo)" + exit 1 +fi + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +info "Breakglass Mirror installer" +echo "" + +# ── 1. System packages ────────────────────────────────── +info "Installing system packages …" +apt-get update -qq +apt-get install -y -qq git git-lfs curl jq msmtp msmtp-mta >/dev/null 2>&1 || \ + apt-get install -y -qq git git-lfs curl jq >/dev/null 2>&1 +git lfs install --system >/dev/null 2>&1 +ok "Packages installed" + +# ── 2. System user ────────────────────────────────────── +info "Creating breakglass system user …" +if id breakglass &>/dev/null; then + ok "User already exists" +else + useradd --system --shell /usr/sbin/nologin --home-dir /var/lib/breakglass --create-home breakglass + ok "User created" +fi + +# ── 3. Directory structure ────────────────────────────── +info "Creating directories …" +mkdir -p /opt/breakglass/scripts +mkdir -p /etc/breakglass +mkdir -p /var/lib/breakglass/repos +mkdir -p /var/lib/breakglass/audit +mkdir -p /var/log/breakglass + +chown -R breakglass:breakglass /var/lib/breakglass /var/log/breakglass + +# Make audit directory append-only at the filesystem level +# (root can still override, but it prevents accidental deletion) +chattr +a /var/lib/breakglass/audit 2>/dev/null || \ + warn "Could not set append-only attribute on audit dir (needs ext4/xfs)" +ok "Directories ready" + +# ── 4. Copy scripts ───────────────────────────────────── +info "Installing scripts …" +cp "$SCRIPT_DIR/scripts/breakglass-sync.sh" /opt/breakglass/scripts/ +cp "$SCRIPT_DIR/scripts/breakglass-healthcheck.sh" /opt/breakglass/scripts/ +chmod +x /opt/breakglass/scripts/*.sh +ok "Scripts installed to /opt/breakglass/scripts/" + +# ── 5. Copy config templates ──────────────────────────── +info "Setting up configuration …" +cp "$SCRIPT_DIR/config/sources.yml" /etc/breakglass/sources.yml +ok "sources.yml → /etc/breakglass/sources.yml" + +# ── 6. Interactive config ─────────────────────────────── +echo "" +info "Let's configure your mirror. You can edit /etc/breakglass/mirror.env later." +echo "" + +read -rp " Gitea URL [https://git.mineracks.com]: " INPUT_GITEA_URL +GITEA_URL="${INPUT_GITEA_URL:-https://git.mineracks.com}" + +read -rp " Gitea username [mineracks]: " INPUT_GITEA_USER +GITEA_USER="${INPUT_GITEA_USER:-mineracks}" + +read -rp " Gitea personal access token: " INPUT_GITEA_TOKEN +if [[ -z "$INPUT_GITEA_TOKEN" ]]; then + warn "No token entered — you'll need to edit /etc/breakglass/mirror.env before first run" +fi + +read -rp " GitHub token (optional, for rate limits): " INPUT_GH_TOKEN + +read -rp " Notification method [none/ntfy/email/telegram]: " INPUT_NOTIFY +NOTIFY="${INPUT_NOTIFY:-none}" + +NTFY_TOPIC="" NTFY_SERVER="https://ntfy.sh" +NOTIFY_EMAIL="" +TG_TOKEN="" TG_CHAT="" + +case "$NOTIFY" in + ntfy) + read -rp " ntfy topic: " NTFY_TOPIC + read -rp " ntfy server [https://ntfy.sh]: " INPUT_NTFY_SERVER + NTFY_SERVER="${INPUT_NTFY_SERVER:-https://ntfy.sh}" + ;; + email) + read -rp " Notification email address: " NOTIFY_EMAIL + ;; + telegram) + read -rp " Telegram bot token: " TG_TOKEN + read -rp " Telegram chat ID: " TG_CHAT + ;; +esac + +cat > /etc/breakglass/mirror.env <&2; exit 1; } +# shellcheck source=/dev/null +source "$ENV_FILE" + +MIRROR_ROOT="${MIRROR_ROOT:-/var/lib/breakglass/repos}" +LOG_DIR="${LOG_DIR:-/var/log/breakglass}" +AUDIT_DIR="${AUDIT_DIR:-/var/lib/breakglass/audit}" +STALE_DAYS="${STALE_DAYS:-7}" +NOTIFY_METHOD="${NOTIFY_METHOD:-none}" + +PROBLEMS=() +WARNINGS=() + +log() { printf '%s %s\n' "$(date -u +%H:%M:%S)" "$*"; } + +notify() { + local message="$1" + case "${NOTIFY_METHOD}" in + ntfy) + curl -sfS -d "$message" "${NTFY_SERVER:-https://ntfy.sh}/${NTFY_TOPIC:-}" &>/dev/null || true + ;; + email) + echo "$message" | mail -s "Breakglass Health Alert" "${NOTIFY_EMAIL:-}" 2>/dev/null || true + ;; + telegram) + curl -sfS -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN:-}/sendMessage" \ + -d "chat_id=${TELEGRAM_CHAT_ID:-}" -d "text=${message}" &>/dev/null || true + ;; + *) ;; + esac +} + +# ── Check 1: Gitea reachable ──────────────────────────── +log "Check 1: Gitea connectivity …" +if ! curl -sfS --connect-timeout 10 "${GITEA_URL}/api/v1/version" &>/dev/null; then + PROBLEMS+=("Gitea at ${GITEA_URL} is unreachable") + log " FAIL" +else + log " OK" +fi + +# ── Check 2: Sync timer active ────────────────────────── +log "Check 2: Sync timer …" +if systemctl is-active --quiet breakglass-sync.timer 2>/dev/null; then + log " OK" +else + PROBLEMS+=("breakglass-sync.timer is not active") + log " FAIL" +fi + +# ── Check 3: Recent sync log ──────────────────────────── +log "Check 3: Recent sync logs …" +LATEST_LOG=$(find "$LOG_DIR" -name 'sync-*.log' -printf '%T@ %p\n' 2>/dev/null | sort -rn | head -1 | awk '{print $2}') +if [[ -z "$LATEST_LOG" ]]; then + PROBLEMS+=("No sync logs found in $LOG_DIR") + log " FAIL" +else + LOG_AGE_DAYS=$(( ( $(date +%s) - $(stat -c %Y "$LATEST_LOG") ) / 86400 )) + if (( LOG_AGE_DAYS > STALE_DAYS )); then + PROBLEMS+=("Last sync log is ${LOG_AGE_DAYS} days old") + log " WARN: ${LOG_AGE_DAYS}d old" + else + log " OK: ${LOG_AGE_DAYS}d old" + fi +fi + +# ── Check 4: Repo freshness ───────────────────────────── +log "Check 4: Repo freshness (threshold: ${STALE_DAYS}d) …" +STALE_THRESHOLD=$(( $(date +%s) - STALE_DAYS * 86400 )) +REPO_COUNT=0 +STALE_COUNT=0 + +if [[ -d "$MIRROR_ROOT" ]]; then + while IFS= read -r bare_dir; do + [[ -d "$bare_dir" ]] || continue + (( REPO_COUNT++ )) + + repo_name="${bare_dir#${MIRROR_ROOT}/}" + repo_name="${repo_name%.git}" + + fetch_head="$bare_dir/FETCH_HEAD" + if [[ -f "$fetch_head" ]]; then + last_fetch=$(stat -c %Y "$fetch_head") + if (( last_fetch < STALE_THRESHOLD )); then + days_stale=$(( ( $(date +%s) - last_fetch ) / 86400 )) + WARNINGS+=("${repo_name}: last fetched ${days_stale}d ago") + (( STALE_COUNT++ )) + log " STALE: $repo_name (${days_stale}d)" + fi + else + WARNINGS+=("${repo_name}: no FETCH_HEAD") + log " WARN: $repo_name has no FETCH_HEAD" + fi + done < <(find "$MIRROR_ROOT" -maxdepth 3 -name 'HEAD' -execdir pwd \;) +fi +log " $REPO_COUNT repos checked, $STALE_COUNT stale" + +# ── Check 5: Backup refs exist (append-only proof) ────── +log "Check 5: Backup ref integrity …" +REPOS_WITHOUT_BACKUPS=0 + +if [[ -d "$MIRROR_ROOT" ]]; then + while IFS= read -r bare_dir; do + [[ -d "$bare_dir" ]] || continue + repo_name="${bare_dir#${MIRROR_ROOT}/}" + repo_name="${repo_name%.git}" + + backup_count=$(git -C "$bare_dir" for-each-ref --format='x' refs/backup/ 2>/dev/null | wc -l) + if (( backup_count == 0 )); then + WARNINGS+=("${repo_name}: no backup refs — append-only not working?") + (( REPOS_WITHOUT_BACKUPS++ )) + log " WARN: $repo_name has no backup refs" + fi + done < <(find "$MIRROR_ROOT" -maxdepth 3 -name 'HEAD' -execdir pwd \;) +fi +if (( REPOS_WITHOUT_BACKUPS > 0 )); then + log " $REPOS_WITHOUT_BACKUPS repos missing backup refs" +else + log " OK: all repos have backup refs" +fi + +# ── Check 6: Audit log checksums ──────────────────────── +log "Check 6: Audit log integrity …" +CHECKSUM_FILE="$AUDIT_DIR/checksums.log" +if [[ -f "$CHECKSUM_FILE" ]]; then + TAMPERED=0 + while read -r expected_hash filepath; do + if [[ -f "$filepath" ]]; then + actual_hash=$(sha256sum "$filepath" | awk '{print $1}') + if [[ "$actual_hash" != "$expected_hash" ]]; then + PROBLEMS+=("TAMPERED audit log: $filepath") + (( TAMPERED++ )) + fi + fi + done < "$CHECKSUM_FILE" + if (( TAMPERED > 0 )); then + log " FAIL: $TAMPERED tampered audit files" + else + log " OK" + fi +else + log " SKIP: no checksums yet (first run?)" +fi + +# ── Check 7: Ref count file — detect local ref deletion ─ +log "Check 7: Ref count stability …" +REF_COUNT_FILE="$AUDIT_DIR/ref-counts.dat" + +if [[ -d "$MIRROR_ROOT" ]]; then + CURRENT_COUNTS=$(mktemp) + while IFS= read -r bare_dir; do + [[ -d "$bare_dir" ]] || continue + repo_name="${bare_dir#${MIRROR_ROOT}/}" + repo_name="${repo_name%.git}" + count=$(git -C "$bare_dir" for-each-ref --format='x' refs/heads refs/tags refs/notes refs/backup 2>/dev/null | wc -l) + echo "$repo_name $count" >> "$CURRENT_COUNTS" + done < <(find "$MIRROR_ROOT" -maxdepth 3 -name 'HEAD' -execdir pwd \;) + + if [[ -f "$REF_COUNT_FILE" ]]; then + while read -r repo prev_count; do + curr_count=$(grep "^${repo} " "$CURRENT_COUNTS" 2>/dev/null | awk '{print $2}') + if [[ -n "$curr_count" ]] && (( curr_count < prev_count )); then + PROBLEMS+=("${repo}: ref count DECREASED ($prev_count → $curr_count) — possible local tampering") + log " ALERT: $repo refs decreased $prev_count → $curr_count" + fi + done < "$REF_COUNT_FILE" + fi + + # Save current counts for next run + cp "$CURRENT_COUNTS" "$REF_COUNT_FILE" + rm -f "$CURRENT_COUNTS" + log " OK" +fi + +# ── Report ─────────────────────────────────────────────── +log "" +TOTAL_ISSUES=$(( ${#PROBLEMS[@]} + ${#WARNINGS[@]} )) + +if [[ ${#PROBLEMS[@]} -eq 0 && ${#WARNINGS[@]} -eq 0 ]]; then + log "═══ Health check PASSED — all mirrors healthy ═══" + exit 0 +fi + +if [[ ${#PROBLEMS[@]} -gt 0 ]]; then + log "═══ Health check FAILED — ${#PROBLEMS[@]} critical issue(s): ═══" + REPORT="BREAKGLASS HEALTH ALERT: ${#PROBLEMS[@]} critical, ${#WARNINGS[@]} warnings\n\nCRITICAL:\n" + for p in "${PROBLEMS[@]}"; do + log " ✗ $p" + REPORT+="✗ ${p}\n" + done +fi + +if [[ ${#WARNINGS[@]} -gt 0 ]]; then + log " ${#WARNINGS[@]} warning(s):" + REPORT="${REPORT:-}WARNINGS:\n" + for w in "${WARNINGS[@]}"; do + log " ⚠ $w" + REPORT+="⚠ ${w}\n" + done +fi + +notify "$(echo -e "${REPORT:-Health check completed with issues}")" +exit $(( ${#PROBLEMS[@]} > 0 ? 1 : 0 )) diff --git a/scripts/breakglass-sync.sh b/scripts/breakglass-sync.sh new file mode 100644 index 0000000..a4611c8 --- /dev/null +++ b/scripts/breakglass-sync.sh @@ -0,0 +1,483 @@ +#!/usr/bin/env bash +# ═══════════════════════════════════════════════════════════ +# breakglass-sync.sh — APPEND-ONLY GitHub → Gitea mirror +# ═══════════════════════════════════════════════════════════ +# +# DESIGN PRINCIPLE: This script ONLY ADDS data. It never +# deletes refs, never force-pushes, never prunes. If upstream +# is maliciously wiped, the worst that happens is the empty +# state gets added alongside all previous history — nothing +# is lost. +# +# Threat model: +# - Upstream maintainer force-pushes empty history +# - Upstream repo is deleted entirely +# - Upstream tags/branches are removed +# - GitHub account is suspended/banned +# - DMCA takedown removes repo +# +# In ALL these cases, previously-synced data is preserved. +# +# ═══════════════════════════════════════════════════════════ +set -euo pipefail + +# ── Load config ────────────────────────────────────────── +ENV_FILE="${BREAKGLASS_ENV:-/etc/breakglass/mirror.env}" +if [[ ! -f "$ENV_FILE" ]]; then + echo "FATAL: config not found at $ENV_FILE" >&2 + exit 1 +fi +# shellcheck source=/dev/null +source "$ENV_FILE" + +# ── Defaults ───────────────────────────────────────────── +MIRROR_ROOT="${MIRROR_ROOT:-/var/lib/breakglass/repos}" +LOG_DIR="${LOG_DIR:-/var/log/breakglass}" +AUDIT_DIR="${AUDIT_DIR:-/var/lib/breakglass/audit}" +FORCE_HTTP11="${FORCE_HTTP11:-true}" +NOTIFY_METHOD="${NOTIFY_METHOD:-none}" +# If upstream loses more than this % of refs, abort the push +# as a likely malicious wipe. 0 = disabled. +WIPE_THRESHOLD="${WIPE_THRESHOLD:-50}" + +mkdir -p "$MIRROR_ROOT" "$LOG_DIR" "$AUDIT_DIR" + +TIMESTAMP=$(date -u +%Y%m%dT%H%M%SZ) +LOG_FILE="$LOG_DIR/sync-${TIMESTAMP}.log" +AUDIT_FILE="$AUDIT_DIR/audit-${TIMESTAMP}.log" +ERRORS=0 +SYNCED=0 +SKIPPED=0 +PROTECTED=0 + +# ── Helpers ────────────────────────────────────────────── + +log() { printf '%s %s\n' "$(date -u +%H:%M:%S)" "$*" | tee -a "$LOG_FILE"; } +warn() { log "WARN: $*"; } +die() { log "FATAL: $*"; notify "Breakglass sync FAILED: $*"; exit 1; } + +audit() { + # Append-only audit trail — one line per event + printf '%s %s %s\n' "$TIMESTAMP" "$(date -u +%H:%M:%S)" "$*" >> "$AUDIT_FILE" +} + +retry() { + local max_attempts=$1; shift + local delay=2 + for (( attempt=1; attempt<=max_attempts; attempt++ )); do + if "$@"; then return 0; fi + if (( attempt < max_attempts )); then + log " retry $attempt/$max_attempts — sleeping ${delay}s …" + sleep "$delay" + (( delay = delay * 2 > 30 ? 30 : delay * 2 )) + fi + done + return 1 +} + +# ── HTTP helpers ───────────────────────────────────────── + +curl_opts=(-sfS --connect-timeout 15 --max-time 120) +[[ "$FORCE_HTTP11" == "true" ]] && curl_opts+=(--http1.1) + +gh_api() { + local path="$1"; shift + local -a headers=(-H "Accept: application/vnd.github+json") + [[ -n "${GITHUB_TOKEN:-}" ]] && headers+=(-H "Authorization: Bearer $GITHUB_TOKEN") + curl "${curl_opts[@]}" "${headers[@]}" "https://api.github.com${path}" "$@" +} + +gitea_api() { + local method="$1" path="$2"; shift 2 + local -a args=(-X "$method" -H "Content-Type: application/json" + -H "Authorization: token $GITEA_TOKEN") + curl "${curl_opts[@]}" "${args[@]}" "${GITEA_URL}/api/v1${path}" "$@" +} + +# ── YAML-lite parser ───────────────────────────────────── + +parse_sources() { + local gh="" org="" inc="" exc="" in_include="" in_exclude="" + while IFS= read -r line; do + line="${line%%#*}" + [[ -z "${line// /}" ]] && continue + + if [[ "$line" =~ ^[[:space:]]*-[[:space:]]*github:[[:space:]]*(.+) ]]; then + [[ -n "$gh" ]] && echo "OWNER $gh ${org:-$gh} ${inc:-*} ${exc:-}" + gh="${BASH_REMATCH[1]// /}" + org="" inc="" exc="" in_include="" in_exclude="" + elif [[ "$line" =~ ^[[:space:]]*gitea_org:[[:space:]]*(.+) ]]; then + org="${BASH_REMATCH[1]// /}" + elif [[ "$line" =~ ^[[:space:]]*-[[:space:]]*\"(.+)\" ]] && [[ -n "$in_include" ]]; then + inc="${inc:+$inc|}${BASH_REMATCH[1]}" + elif [[ "$line" =~ ^[[:space:]]*-[[:space:]]*\"(.+)\" ]] && [[ -n "$in_exclude" ]]; then + exc="${exc:+$exc|}${BASH_REMATCH[1]}" + fi + if [[ "$line" =~ ^[[:space:]]*include: ]]; then in_include=1; in_exclude=""; fi + if [[ "$line" =~ ^[[:space:]]*exclude: ]]; then in_exclude=1; in_include=""; fi + if [[ ! "$line" =~ ^[[:space:]]*- ]] && [[ ! "$line" =~ ^[[:space:]]*(include|exclude): ]]; then + in_include="" in_exclude="" + fi + done < "$SOURCES_FILE" + [[ -n "$gh" ]] && echo "OWNER $gh ${org:-$gh} ${inc:-*} ${exc:-}" +} + +# ── GitHub pagination ──────────────────────────────────── + +gh_list_repos() { + local owner="$1" + local page=1 per_page=100 + while true; do + local url="/orgs/${owner}/repos?per_page=${per_page}&page=${page}&type=public" + local body + if ! body=$(gh_api "$url" 2>/dev/null); then + url="/users/${owner}/repos?per_page=${per_page}&page=${page}&type=public" + body=$(gh_api "$url") || { warn "cannot list repos for $owner"; return 1; } + fi + local names + names=$(echo "$body" | grep -o '"full_name"[[:space:]]*:[[:space:]]*"[^"]*"' \ + | sed 's/.*"full_name"[[:space:]]*:[[:space:]]*"//;s/"//' \ + | awk -F/ '{print $2}') + [[ -z "$names" ]] && break + echo "$names" + (( page++ )) + local count + count=$(echo "$names" | wc -l) + (( count < per_page )) && break + done +} + +# ── Gitea org/repo ensure ──────────────────────────────── + +ensure_gitea_org() { + local org="$1" + if gitea_api GET "/orgs/${org}" &>/dev/null; then return 0; fi + log " creating Gitea org: $org" + gitea_api POST "/orgs" -d "{\"username\":\"${org}\",\"visibility\":\"public\"}" &>/dev/null \ + || warn "could not create org $org — will push under $GITEA_USER" +} + +ensure_gitea_repo() { + local org="$1" repo="$2" + if gitea_api GET "/repos/${org}/${repo}" &>/dev/null; then return 0; fi + log " creating Gitea repo: ${org}/${repo}" + gitea_api POST "/orgs/${org}/repos" \ + -d "{\"name\":\"${repo}\",\"private\":false,\"description\":\"[BREAKGLASS] Append-only mirror of github.com/${org}/${repo}\"}" &>/dev/null \ + || { warn "could not create repo ${org}/${repo}"; return 1; } +} + +# ═══════════════════════════════════════════════════════════ +# CORE: Per-repo append-only sync +# ═══════════════════════════════════════════════════════════ + +count_refs() { + # Count refs in a bare repo (heads + tags + notes) + local dir="$1" + git -C "$dir" for-each-ref --format='x' refs/heads refs/tags refs/notes 2>/dev/null | wc -l +} + +snapshot_refs() { + # Save every current ref into refs/backup// + # This is the append-only guarantee: old state is always preserved + local dir="$1" ts="$2" + local count=0 + git -C "$dir" for-each-ref --format='%(refname) %(objectname)' \ + refs/heads refs/tags refs/notes 2>/dev/null | \ + while read -r refname sha; do + local backup_ref="refs/backup/${ts}/${refname#refs/}" + git -C "$dir" update-ref "$backup_ref" "$sha" 2>/dev/null || true + (( count++ )) || true + done + echo "$count" +} + +detect_wipe() { + # Compare ref count before and after fetch. + # If upstream lost a large proportion of refs, this is suspicious. + local before="$1" after="$2" repo_name="$3" + + if (( before == 0 )); then + # First sync — nothing to compare + return 0 + fi + + if (( after == 0 )); then + log " !! WIPE DETECTED: upstream has ZERO refs for $repo_name" + audit "WIPE_DETECTED repo=$repo_name before=$before after=0" + return 1 + fi + + if (( WIPE_THRESHOLD > 0 )); then + local lost=$(( before - after )) + if (( lost > 0 )); then + local pct=$(( lost * 100 / before )) + if (( pct >= WIPE_THRESHOLD )); then + log " !! SUSPICIOUS: upstream lost ${pct}% of refs ($before → $after) for $repo_name" + audit "WIPE_SUSPECTED repo=$repo_name before=$before after=$after lost_pct=$pct" + return 1 + fi + fi + fi + + return 0 +} + +sync_repo() { + local gh_owner="$1" repo="$2" gitea_org="$3" + local bare_dir="${MIRROR_ROOT}/${gh_owner}/${repo}.git" + local gh_url="https://github.com/${gh_owner}/${repo}.git" + local gitea_url="${GITEA_URL}/${gitea_org}/${repo}.git" + + log " syncing ${gh_owner}/${repo} → ${gitea_org}/${repo}" + audit "SYNC_START repo=${gh_owner}/${repo}" + + # ── Ensure local bare clone exists ─────────────────── + if [[ ! -d "$bare_dir" ]]; then + log " initial clone …" + if ! retry 3 git clone --bare "$gh_url" "$bare_dir" 2>>"$LOG_FILE"; then + warn "clone failed for ${gh_owner}/${repo}" + audit "CLONE_FAILED repo=${gh_owner}/${repo}" + return 1 + fi + # Do NOT use --mirror flag: it enables pruning on fetch. + # We configure fetch refspecs manually below. + audit "CLONED repo=${gh_owner}/${repo}" + fi + + cd "$bare_dir" || return 1 + + # ── Configure remotes (no --mirror, no --prune) ────── + git remote set-url origin "$gh_url" 2>/dev/null || git remote add origin "$gh_url" + + # CRITICAL: Remove any prune or mirror config that might exist + git config --unset remote.origin.mirror 2>/dev/null || true + git config --unset remote.origin.prune 2>/dev/null || true + git config remote.origin.prune false + git config remote.origin.tagOpt --no-tags # we fetch tags explicitly + + # Set up gitea remote with embedded auth + local authed_url="${GITEA_URL/https:\/\//https:\/\/${GITEA_USER}:${GITEA_TOKEN}@}/${gitea_org}/${repo}.git" + if git remote get-url gitea &>/dev/null; then + git remote set-url gitea "$authed_url" + else + git remote add gitea "$authed_url" + fi + git config remote.gitea.mirror false 2>/dev/null || true + git config remote.gitea.prune false + + [[ "$FORCE_HTTP11" == "true" ]] && git config http.version HTTP/1.1 + + # ── Count refs BEFORE fetch ────────────────────────── + local refs_before + refs_before=$(count_refs "$bare_dir") + + # ── Snapshot current state (THE SAFETY NET) ────────── + log " snapshotting refs → refs/backup/${TIMESTAMP}/" + snapshot_refs "$bare_dir" "$TIMESTAMP" + audit "SNAPSHOT repo=${gh_owner}/${repo} refs_before=$refs_before" + + # ── Fetch from GitHub (ADDITIVE ONLY) ──────────────── + # We do NOT use '+' force-update prefix on heads. + # Instead we fetch into a staging namespace first, then + # safely merge forward. + log " fetching from GitHub …" + + # Fetch into staging area — does not touch our refs/heads + if ! retry 3 git fetch origin \ + '+refs/heads/*:refs/upstream-staging/heads/*' \ + '+refs/tags/*:refs/upstream-staging/tags/*' \ + '+refs/notes/*:refs/upstream-staging/notes/*' 2>>"$LOG_FILE"; then + warn "fetch failed for ${gh_owner}/${repo} — upstream may be down" + audit "FETCH_FAILED repo=${gh_owner}/${repo}" + # This is OK — upstream might be deleted. Our local copy is safe. + # Still push what we have to Gitea. + push_to_gitea "$bare_dir" "$gitea_org" "$repo" + return 0 + fi + + # ── Count upstream refs and check for wipe ─────────── + local refs_upstream + refs_upstream=$(git for-each-ref --format='x' refs/upstream-staging/heads refs/upstream-staging/tags 2>/dev/null | wc -l) + + if ! detect_wipe "$refs_before" "$refs_upstream" "${gh_owner}/${repo}"; then + log " !! PROTECTION ACTIVATED: refusing to update local refs" + log " !! Previous state preserved in refs/backup/${TIMESTAMP}/" + log " !! Upstream staging refs kept for manual inspection" + audit "WIPE_BLOCKED repo=${gh_owner}/${repo} upstream_refs=$refs_upstream" + notify "BREAKGLASS ALERT: Possible wipe detected for ${gh_owner}/${repo} — upstream went from $refs_before to $refs_upstream refs. Sync blocked, previous state preserved." + (( PROTECTED++ )) + # Still push existing state to Gitea (including the suspicious staging refs + # so you can inspect them) + push_to_gitea "$bare_dir" "$gitea_org" "$repo" + return 0 + fi + + # ── Safe-merge: update local refs from staging ─────── + # For each upstream ref, fast-forward our local ref if possible. + # If upstream has force-pushed (not fast-forward), we keep BOTH: + # the old ref is already in refs/backup/, and we update the + # live ref to match upstream so the mirror stays current. + log " merging upstream state …" + git for-each-ref --format='%(refname) %(objectname)' refs/upstream-staging/ 2>/dev/null | \ + while read -r staging_ref sha; do + # refs/upstream-staging/heads/main → refs/heads/main + local target_ref="${staging_ref/refs\/upstream-staging\//refs\/}" + local old_sha + old_sha=$(git rev-parse "$target_ref" 2>/dev/null || echo "") + + if [[ "$old_sha" == "$sha" ]]; then + continue # no change + fi + + if [[ -z "$old_sha" ]]; then + # New ref — just create it + git update-ref "$target_ref" "$sha" + audit "REF_ADDED repo=${gh_owner}/${repo} ref=$target_ref sha=$sha" + else + # Existing ref changed — update it (old state is in backup) + git update-ref "$target_ref" "$sha" + audit "REF_UPDATED repo=${gh_owner}/${repo} ref=$target_ref old=$old_sha new=$sha" + fi + done + + # ── NOTE: We NEVER delete local refs that upstream removed ── + # If upstream deleted a branch, our copy keeps it. That's the point. + + # ── LFS objects ────────────────────────────────────── + if command -v git-lfs &>/dev/null && git config --get-regexp 'lfs\.' &>/dev/null 2>&1; then + log " fetching LFS objects …" + git lfs fetch origin --all 2>>"$LOG_FILE" || warn "LFS fetch incomplete for ${gh_owner}/${repo}" + fi + + # ── Push to Gitea ─────────────────────────────────── + push_to_gitea "$bare_dir" "$gitea_org" "$repo" + + local refs_after + refs_after=$(count_refs "$bare_dir") + log " ✓ done (refs: $refs_before → $refs_after)" + audit "SYNC_OK repo=${gh_owner}/${repo} refs_before=$refs_before refs_after=$refs_after" + return 0 +} + +push_to_gitea() { + local bare_dir="$1" gitea_org="$2" repo="$3" + + ensure_gitea_repo "$gitea_org" "$repo" || return 1 + + log " pushing to Gitea …" + + # Push all ref namespaces. We use '+' here because Gitea is OUR + # server — we trust ourselves. The append-only guarantee is in + # the local bare repo and the backup refs. + if ! retry 3 git -C "$bare_dir" push gitea \ + '+refs/heads/*:refs/heads/*' \ + '+refs/tags/*:refs/tags/*' \ + '+refs/notes/*:refs/notes/*' \ + '+refs/backup/*:refs/backup/*' 2>>"$LOG_FILE"; then + warn "push to Gitea failed for ${gitea_org}/${repo}" + audit "PUSH_FAILED repo=${gitea_org}/${repo}" + return 1 + fi + + # Push LFS to Gitea + if command -v git-lfs &>/dev/null && git config --get-regexp 'lfs\.' &>/dev/null 2>&1; then + git -C "$bare_dir" lfs push gitea --all 2>>"$LOG_FILE" || \ + warn "LFS push incomplete for ${gitea_org}/${repo}" + fi + + audit "PUSHED repo=${gitea_org}/${repo}" +} + +# ── Notifications ──────────────────────────────────────── + +notify() { + local message="$1" + case "${NOTIFY_METHOD}" in + ntfy) + curl -sfS -d "$message" "${NTFY_SERVER:-https://ntfy.sh}/${NTFY_TOPIC:-}" &>/dev/null || true + ;; + email) + echo "$message" | mail -s "Breakglass Mirror Alert" "${NOTIFY_EMAIL:-}" 2>/dev/null || true + ;; + telegram) + curl -sfS -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN:-}/sendMessage" \ + -d "chat_id=${TELEGRAM_CHAT_ID:-}" -d "text=${message}" &>/dev/null || true + ;; + *) ;; + esac +} + +# ── Glob matching ──────────────────────────────────────── + +matches_glob() { + local name="$1" pattern="$2" + [[ "$pattern" == "*" ]] && return 0 + local IFS='|' + for glob in $pattern; do + # shellcheck disable=SC2053 + [[ "$name" == $glob ]] && return 0 + done + return 1 +} + +# ═══════════════════════════════════════════════════════════ +# MAIN +# ═══════════════════════════════════════════════════════════ + +log "═══ Breakglass APPEND-ONLY sync started at $TIMESTAMP ═══" +audit "SESSION_START timestamp=$TIMESTAMP config=$ENV_FILE" +log "Config: $ENV_FILE" +log "Sources: $SOURCES_FILE" +log "" + +[[ -f "$SOURCES_FILE" ]] || die "sources file not found: $SOURCES_FILE" + +declare -a OWNERS +mapfile -t OWNERS < <(parse_sources) +[[ ${#OWNERS[@]} -eq 0 ]] && die "no owners found in $SOURCES_FILE" + +for entry in "${OWNERS[@]}"; do + read -r _ gh_owner gitea_org include_glob exclude_glob <<< "$entry" + log "── Owner: $gh_owner → gitea:$gitea_org ──" + ensure_gitea_org "$gitea_org" + + repos=$(gh_list_repos "$gh_owner") || { (( ERRORS++ )); continue; } + + while IFS= read -r repo; do + [[ -z "$repo" ]] && continue + + if ! matches_glob "$repo" "${include_glob:-*}"; then + (( SKIPPED++ )); continue + fi + if [[ -n "$exclude_glob" ]] && matches_glob "$repo" "$exclude_glob"; then + (( SKIPPED++ )); continue + fi + + if sync_repo "$gh_owner" "$repo" "$gitea_org"; then + (( SYNCED++ )) + else + (( ERRORS++ )) + fi + done <<< "$repos" +done + +# ── Summary ────────────────────────────────────────────── + +SUMMARY="Breakglass sync: ${SYNCED} synced, ${SKIPPED} skipped, ${PROTECTED} wipe-protected, ${ERRORS} errors" +log "" +log "═══ $SUMMARY ═══" +audit "SESSION_END $SUMMARY" + +if (( ERRORS > 0 || PROTECTED > 0 )); then + notify "$SUMMARY — check $LOG_FILE" +fi + +# ── Log rotation: keep 90 days (audit logs kept forever) ─ +find "$LOG_DIR" -name 'sync-*.log' -mtime +90 -delete 2>/dev/null || true + +# ── Generate SHA256 of audit file for tamper evidence ──── +if [[ -f "$AUDIT_FILE" ]]; then + sha256sum "$AUDIT_FILE" >> "$AUDIT_DIR/checksums.log" +fi + +exit $(( ERRORS > 0 ? 1 : 0 )) diff --git a/systemd/breakglass-healthcheck.service b/systemd/breakglass-healthcheck.service new file mode 100644 index 0000000..80c644c --- /dev/null +++ b/systemd/breakglass-healthcheck.service @@ -0,0 +1,19 @@ +[Unit] +Description=Breakglass FOSS Git Mirror — stale repo health check +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot +User=breakglass +Group=breakglass +EnvironmentFile=/etc/breakglass/mirror.env +ExecStart=/opt/breakglass/scripts/breakglass-healthcheck.sh + +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +ReadWritePaths=/var/log/breakglass +PrivateTmp=true + +TimeoutStartSec=300 diff --git a/systemd/breakglass-healthcheck.timer b/systemd/breakglass-healthcheck.timer new file mode 100644 index 0000000..ebe532a --- /dev/null +++ b/systemd/breakglass-healthcheck.timer @@ -0,0 +1,12 @@ +[Unit] +Description=Breakglass FOSS Git Mirror — daily health check +Documentation=https://git.mineracks.com/mineracks/foss_breakglass_git_mirror + +[Timer] +# Run daily at 08:00 UTC — a few hours after sync, so stale = genuinely broken +OnCalendar=*-*-* 08:00:00 +RandomizedDelaySec=600 +Persistent=true + +[Install] +WantedBy=timers.target diff --git a/systemd/breakglass-sync.service b/systemd/breakglass-sync.service new file mode 100644 index 0000000..9f08306 --- /dev/null +++ b/systemd/breakglass-sync.service @@ -0,0 +1,32 @@ +[Unit] +Description=Breakglass FOSS Git Mirror — sync run +Documentation=https://git.mineracks.com/mineracks/foss_breakglass_git_mirror +After=network-online.target +Wants=network-online.target + +[Service] +Type=oneshot +User=breakglass +Group=breakglass +EnvironmentFile=/etc/breakglass/mirror.env +ExecStart=/opt/breakglass/scripts/breakglass-sync.sh + +# Hardening +NoNewPrivileges=true +ProtectSystem=strict +ProtectHome=true +ReadWritePaths=/var/lib/breakglass /var/log/breakglass +# IMPORTANT: The service cannot delete from /var/lib/breakglass/audit +# due to the append-only filesystem attribute set during install. +PrivateTmp=true +PrivateDevices=true + +# Resource limits — be a good neighbour +MemoryMax=1G +CPUQuota=80% + +# Allow long runs (large repos can take a while) +TimeoutStartSec=3600 + +[Install] +WantedBy=multi-user.target diff --git a/systemd/breakglass-sync.timer b/systemd/breakglass-sync.timer new file mode 100644 index 0000000..580eb20 --- /dev/null +++ b/systemd/breakglass-sync.timer @@ -0,0 +1,14 @@ +[Unit] +Description=Breakglass FOSS Git Mirror — daily sync +Documentation=https://git.mineracks.com/mineracks/foss_breakglass_git_mirror + +[Timer] +# Run daily at 04:00 UTC (avoids peak GitHub traffic) +OnCalendar=*-*-* 04:00:00 +# Randomise start by up to 30 min to avoid thundering herd +RandomizedDelaySec=1800 +# If the machine was off at trigger time, run on next boot +Persistent=true + +[Install] +WantedBy=timers.target