feat: add repo metadata sync, wiki/release mirroring, disk alerts, and updated README
This commit is contained in:
parent
f6b3bde95d
commit
8f1add8346
129
README.md
129
README.md
@ -6,12 +6,12 @@ Append-only, tamper-resistant mirroring of GitHub repositories to a self-hosted
|
||||
|
||||
This tool is built for the scenario where upstream repos you depend on are deliberately destroyed — whether by a compromised maintainer, a platform takedown, account suspension, or coerced force-push. Specifically:
|
||||
|
||||
- **Upstream force-pushes empty history** → Your copy keeps all previous commits, branches, and tags via timestamped backup refs. The wipe is detected and blocked.
|
||||
- **Upstream deletes branches or tags** → Your copy retains them. The sync script never deletes local refs.
|
||||
- **Upstream repo is deleted entirely** → Fetch fails gracefully; your existing local copy and Gitea copy are untouched.
|
||||
- **GitHub account is banned/suspended** → Same as deletion — your copies persist.
|
||||
- **DMCA takedown** → Your pre-takedown copy is preserved.
|
||||
- **Subtle history rewrite** (less than 50% of refs removed) → Still captured in backup refs, and the live refs are updated so you can diff the before and after.
|
||||
- **Upstream force-pushes empty history** — Your copy keeps all previous commits, branches, and tags via timestamped backup refs. The wipe is detected and blocked.
|
||||
- **Upstream deletes branches or tags** — Your copy retains them. The sync script never deletes local refs.
|
||||
- **Upstream repo is deleted entirely** — Fetch fails gracefully; your existing local copy and Gitea copy are untouched.
|
||||
- **GitHub account is banned/suspended** — Same as deletion — your copies persist.
|
||||
- **DMCA takedown** — Your pre-takedown copy is preserved.
|
||||
- **Subtle history rewrite** (less than 50% of refs removed) — Still captured in backup refs, and the live refs are updated so you can diff the before and after.
|
||||
|
||||
## How it works
|
||||
|
||||
@ -46,14 +46,51 @@ Every sync writes a structured audit log recording exactly what happened: which
|
||||
|
||||
The audit directory is set with the `+a` (append-only) filesystem attribute during install, so even the breakglass service user can't delete or modify previous audit entries.
|
||||
|
||||
## Features
|
||||
|
||||
### Organisation and repo metadata sync
|
||||
|
||||
On first sync, the script automatically creates Gitea organisations matching each GitHub owner and syncs their avatars. Repository metadata — including default branch, description, and homepage URL — is read from the GitHub API and applied to the Gitea repo. Descriptions are prefixed with `[BREAKGLASS]` so it's always clear which repos are mirrors.
|
||||
|
||||
This runs once per repo (tracked by marker files) and prevents Gitea from showing 500 errors due to default branch mismatches.
|
||||
|
||||
### Wiki mirroring
|
||||
|
||||
When `SYNC_WIKIS=true` (the default), the script checks whether each GitHub repo has an associated wiki. If one exists, it clones the wiki as a separate bare repo and pushes it to the matching Gitea repo's wiki. This preserves project documentation alongside the code.
|
||||
|
||||
Wiki repos use the standard `.wiki.git` suffix and are pushed via HTTP with credential-store authentication.
|
||||
|
||||
### Release asset downloads
|
||||
|
||||
When `SYNC_RELEASES=true`, the script downloads release assets (binaries, source archives, installers) for the latest N releases per repo (configured by `RELEASE_KEEP`, default 3). Assets are stored locally under `RELEASE_ROOT` and uploaded to Gitea as proper releases via the API, preserving the tag name, release title, and body text.
|
||||
|
||||
This ensures that even if GitHub removes download links, you have local copies of the actual release binaries people need to verify and install software.
|
||||
|
||||
### LFS support with timeouts
|
||||
|
||||
Repos using Git LFS are handled automatically. LFS objects are fetched and pushed alongside regular git objects. To prevent massive LFS repos (like seedsigner/buildroot) from blocking the entire sync indefinitely, each LFS operation is wrapped in a configurable timeout (`LFS_TIMEOUT`, default 600 seconds). If a timeout is hit, the sync continues with remaining repos rather than stalling.
|
||||
|
||||
### Push notifications
|
||||
|
||||
The sync script and health check both send push notifications for significant events. Supported backends are ntfy (recommended — free, no server needed, push to phone), email, and Telegram. Notifications include priority levels and tags:
|
||||
|
||||
- **Urgent** — wipe detection triggered, sync blocked
|
||||
- **High** — errors during sync, healthcheck failures
|
||||
- **Default** — sync completed successfully, new repos mirrored
|
||||
- **Low** — routine status updates
|
||||
|
||||
### Disk space monitoring
|
||||
|
||||
The health check includes disk usage monitoring. It warns at 80% usage and sends a critical alert at 90%, giving you time to expand storage or prune release assets before the mirror runs out of space.
|
||||
|
||||
## Quick start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Fresh Ubuntu 22.04+ VM (2 GB RAM, 20+ GB disk)
|
||||
- Your Gitea instance accessible via HTTPS
|
||||
- Your Gitea instance accessible via HTTP/HTTPS
|
||||
- Gitea personal access token (repo read/write scope)
|
||||
- Optional: GitHub token for higher API rate limits
|
||||
- Optional: GitHub token for higher API rate limits (60 → 5000 req/h)
|
||||
|
||||
### Install
|
||||
|
||||
@ -72,18 +109,24 @@ The installer handles everything interactively: packages, user creation, config,
|
||||
/etc/breakglass/mirror.env # tokens, URLs, settings (mode 600)
|
||||
/etc/breakglass/sources.yml # GitHub owners to mirror
|
||||
/var/lib/breakglass/repos/ # bare git clones
|
||||
/var/lib/breakglass/releases/ # downloaded release assets
|
||||
/var/lib/breakglass/audit/ # tamper-evident audit logs (+a attr)
|
||||
/var/log/breakglass/ # sync logs (90-day rotation)
|
||||
```
|
||||
|
||||
Systemd timers:
|
||||
- `breakglass-sync.timer` — daily at 04:00 UTC (with 30min random jitter)
|
||||
- `breakglass-healthcheck.timer` — daily at 08:00 UTC
|
||||
|
||||
- `breakglass-sync.timer` — daily at 02:00 local time, with `Persistent=true` so missed runs fire on next boot
|
||||
- `breakglass-healthcheck.timer` — daily at 08:00 local time
|
||||
|
||||
Both services use `Restart=on-failure` with a 5-minute backoff and an 8-hour timeout to handle large initial syncs.
|
||||
|
||||
## Configuration
|
||||
|
||||
### sources.yml
|
||||
|
||||
Define which GitHub owners to mirror. You can mirror entire organisations or filter to specific repos:
|
||||
|
||||
```yaml
|
||||
owners:
|
||||
- github: bitcoin
|
||||
@ -91,26 +134,37 @@ owners:
|
||||
- github: seedsigner
|
||||
- github: seedhammer
|
||||
|
||||
# With filters:
|
||||
- github: some-large-org
|
||||
# Mirror only specific repos from an org:
|
||||
- github: cmyk
|
||||
include:
|
||||
- "important-repo"
|
||||
- "seedetcher"
|
||||
|
||||
# Exclude repos by pattern:
|
||||
- github: some-large-org
|
||||
exclude:
|
||||
- "test-*"
|
||||
- "deprecated-*"
|
||||
```
|
||||
|
||||
### mirror.env
|
||||
|
||||
Key settings:
|
||||
|
||||
| Variable | Purpose | Default |
|
||||
|----------|---------|---------|
|
||||
| `GITEA_URL` | Your Gitea instance URL | — |
|
||||
| `GITEA_TOKEN` | Gitea API token | — |
|
||||
| `GITHUB_TOKEN` | GitHub token (optional) | — |
|
||||
| `WIPE_THRESHOLD` | Block sync if upstream loses >N% of refs | 50 |
|
||||
| `NOTIFY_METHOD` | `ntfy`, `email`, `telegram`, or `none` | none |
|
||||
| `STALE_DAYS` | Alert if a repo hasn't synced in N days | 7 |
|
||||
| `GITEA_USER` | Gitea username for push auth | — |
|
||||
| `GITHUB_TOKEN` | GitHub token (optional, raises rate limit) | — |
|
||||
| `WIPE_THRESHOLD` | Block sync if upstream loses >N% of refs | `50` |
|
||||
| `NOTIFY_METHOD` | `ntfy`, `email`, `telegram`, or `none` | `none` |
|
||||
| `NTFY_TOPIC` | ntfy topic name (make it unguessable) | `breakglass` |
|
||||
| `NTFY_SERVER` | ntfy server URL | `https://ntfy.sh` |
|
||||
| `STALE_DAYS` | Alert if a repo hasn't synced in N days | `7` |
|
||||
| `LFS_TIMEOUT` | Max seconds per LFS fetch/push (0 = no limit) | `600` |
|
||||
| `SYNC_WIKIS` | Mirror GitHub wikis to Gitea | `true` |
|
||||
| `SYNC_RELEASES` | Download and mirror release assets | `true` |
|
||||
| `RELEASE_KEEP` | How many releases to keep per repo | `3` |
|
||||
| `RELEASE_ROOT` | Where to store downloaded release assets | `/var/lib/breakglass/releases` |
|
||||
| `FORCE_HTTP11` | Force HTTP/1.1 (helps with Cloudflare Tunnel) | `true` |
|
||||
|
||||
## Day-to-day commands
|
||||
|
||||
@ -121,8 +175,14 @@ sudo systemctl status breakglass-sync.timer
|
||||
# Trigger immediate sync
|
||||
sudo systemctl start breakglass-sync.service
|
||||
|
||||
# Watch sync live
|
||||
sudo journalctl -u breakglass-sync.service -f
|
||||
# Run sync in foreground (useful for debugging)
|
||||
sudo -u breakglass /opt/breakglass/scripts/breakglass-sync.sh
|
||||
|
||||
# Run sync detached from your SSH session (won't die if you disconnect)
|
||||
sudo -u breakglass nohup /opt/breakglass/scripts/breakglass-sync.sh &
|
||||
|
||||
# Watch sync logs live
|
||||
tail -f /var/log/breakglass/sync-$(date +%Y%m%d)*.log
|
||||
|
||||
# Run health check
|
||||
sudo systemctl start breakglass-healthcheck.service
|
||||
@ -133,30 +193,41 @@ ls -lt /var/lib/breakglass/audit/
|
||||
# View recent sync logs
|
||||
ls -lt /var/log/breakglass/ | head
|
||||
|
||||
# Check disk usage
|
||||
du -sh /var/lib/breakglass/repos/ /var/lib/breakglass/releases/
|
||||
|
||||
# Re-sync metadata for all repos (e.g., after fixing a bug)
|
||||
sudo rm -f /var/lib/breakglass/repos/.avatars/*.meta.synced
|
||||
sudo systemctl start breakglass-sync.service
|
||||
|
||||
# Add a new GitHub org
|
||||
sudo nano /etc/breakglass/sources.yml
|
||||
```
|
||||
|
||||
## Health checks
|
||||
|
||||
The healthcheck script (runs daily) verifies:
|
||||
The healthcheck script (runs daily at 08:00) verifies:
|
||||
|
||||
1. Gitea is reachable
|
||||
2. Sync timer is active
|
||||
2. Sync timer is active and enabled
|
||||
3. Recent sync logs exist
|
||||
4. No repos have gone stale
|
||||
4. No repos have gone stale (configurable threshold)
|
||||
5. Backup refs exist in all repos (append-only is working)
|
||||
6. Audit log checksums haven't been tampered with
|
||||
7. Local ref counts haven't decreased (local deletion detection)
|
||||
8. Disk usage is below warning (80%) and critical (90%) thresholds
|
||||
|
||||
Results are sent as a push notification with appropriate priority levels.
|
||||
|
||||
## What this does NOT protect against
|
||||
|
||||
To be transparent about limitations:
|
||||
|
||||
- **VM compromise**: If an attacker gets root on your mirror VM, they can delete everything. Mitigate with VM-level snapshots, ZFS snapshots, or offsite backups of `/var/lib/breakglass/repos/`.
|
||||
- **Gitea compromise**: If someone gets admin on your Gitea, they could delete repos there. The bare clones on the VM are the primary archive; Gitea is a secondary copy and convenient browsing interface.
|
||||
- **Disk failure**: Standard hardware risk. Use RAID or VM-level redundancy.
|
||||
- **Repos you don't know about yet**: This only mirrors repos from the owners you've configured. If a new critical repo appears, you need to add the owner to sources.yml.
|
||||
- **VM compromise** — If an attacker gets root on your mirror VM, they can delete everything. Mitigate with VM-level snapshots, ZFS snapshots, or offsite backups of `/var/lib/breakglass/repos/`.
|
||||
- **Gitea compromise** — If someone gets admin on your Gitea, they could delete repos there. The bare clones on the VM are the primary archive; Gitea is a secondary copy and convenient browsing interface.
|
||||
- **Disk failure** — Standard hardware risk. Use RAID or VM-level redundancy.
|
||||
- **Repos you don't know about yet** — This only mirrors repos from the owners you've configured. If a new critical repo appears, you need to add the owner to `sources.yml`.
|
||||
- **GitHub API rate limits** — Without a `GITHUB_TOKEN`, you're limited to 60 requests/hour. Large orgs with many repos will hit this. A token raises the limit to 5000/hour.
|
||||
|
||||
For maximum paranoia, consider also running periodic `tar` backups of `/var/lib/breakglass/repos/` to an offsite location (S3, another server, external drive).
|
||||
|
||||
@ -164,7 +235,7 @@ For maximum paranoia, consider also running periodic `tar` backups of `/var/lib/
|
||||
|
||||
The original ran inside Umbrel's managed Docker environment. Umbrel silently recycled containers and broke the automation after a few weeks. This version runs on a plain Ubuntu VM where nothing can interfere with the systemd timers or filesystem.
|
||||
|
||||
Key improvements: wipe detection, audit trail, filesystem-level append-only on audit dir, staging namespace for safe fetch, health monitoring with notifications, and no `--mirror` flag (which enables destructive pruning).
|
||||
Key improvements over v1: wipe detection with configurable threshold, tamper-evident audit trail with checksums, filesystem-level append-only on audit directory, staging namespace for safe fetch, wiki and release mirroring, LFS support with timeouts, org/repo avatar and metadata sync, 8-point health monitoring with disk space alerts, push notifications via ntfy/email/Telegram, systemd timers with persistence and failure restart, and no `--mirror` flag (which enables destructive pruning).
|
||||
|
||||
## License
|
||||
|
||||
|
||||
@ -745,9 +745,62 @@ push_to_gitea() {
|
||||
warn "LFS push skipped/incomplete for ${gitea_org}/${repo} (may have timed out)"
|
||||
fi
|
||||
|
||||
# ── Sync default branch and description from GitHub ───
|
||||
sync_repo_metadata "$gitea_org" "$repo"
|
||||
|
||||
audit "PUSHED repo=${gitea_org}/${repo}"
|
||||
}
|
||||
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
# METADATA: Sync default branch, description, website from GitHub
|
||||
# ═══════════════════════════════════════════════════════════
|
||||
|
||||
sync_repo_metadata() {
|
||||
local gitea_org="$1" repo="$2"
|
||||
local marker="${MIRROR_ROOT}/.avatars/${gitea_org}_${repo}.meta.synced"
|
||||
|
||||
# Only sync metadata once per repo (delete marker to force re-sync)
|
||||
[[ -f "$marker" ]] && return 0
|
||||
|
||||
# Get GitHub repo info
|
||||
local gh_data
|
||||
gh_data=$(gh_api "/repos/${gitea_org}/${repo}" 2>/dev/null) || return 0
|
||||
|
||||
# Extract default branch
|
||||
local default_branch
|
||||
default_branch=$(echo "$gh_data" | grep -o '"default_branch"[[:space:]]*:[[:space:]]*"[^"]*"' \
|
||||
| sed 's/"default_branch"[[:space:]]*:[[:space:]]*"//;s/"//')
|
||||
|
||||
# Extract description
|
||||
local description
|
||||
description=$(echo "$gh_data" | grep -o '"description"[[:space:]]*:[[:space:]]*"[^"]*"' | head -1 \
|
||||
| sed 's/"description"[[:space:]]*:[[:space:]]*"//;s/"//')
|
||||
|
||||
# Extract homepage
|
||||
local homepage
|
||||
homepage=$(echo "$gh_data" | grep -o '"homepage"[[:space:]]*:[[:space:]]*"[^"]*"' \
|
||||
| sed 's/"homepage"[[:space:]]*:[[:space:]]*"//;s/"//')
|
||||
|
||||
if [[ -n "$default_branch" ]]; then
|
||||
# Build the update payload
|
||||
local payload="{\"default_branch\":\"${default_branch}\""
|
||||
if [[ -n "$description" ]]; then
|
||||
# Escape special JSON chars in description
|
||||
description=$(echo "$description" | sed 's/\\/\\\\/g;s/"/\\"/g')
|
||||
payload+=",\"description\":\"[BREAKGLASS] ${description}\""
|
||||
fi
|
||||
if [[ -n "$homepage" ]]; then
|
||||
payload+=",\"website\":\"${homepage}\""
|
||||
fi
|
||||
payload+="}"
|
||||
|
||||
gitea_api PATCH "/repos/${gitea_org}/${repo}" -d "$payload" &>/dev/null || true
|
||||
log " metadata synced (default_branch: $default_branch)"
|
||||
fi
|
||||
|
||||
touch "$marker"
|
||||
}
|
||||
|
||||
# ── Notifications ────────────────────────────────────────
|
||||
|
||||
notify() {
|
||||
|
||||
Loading…
Reference in New Issue
Block a user