diff --git a/.github/workflows/publish-discord-backup.yml b/.github/workflows/publish-discord-backup.yml index 449be92..8f355b0 100644 --- a/.github/workflows/publish-discord-backup.yml +++ b/.github/workflows/publish-discord-backup.yml @@ -76,7 +76,12 @@ jobs: git clone "$BACKUP_REMOTE" "$BACKUP_REPO" go run ./cmd/discrawl --config "$CONFIG" init --db "$DB" --guild "$DISCRAWL_GUILD_ID" if [ -f "$BACKUP_REPO/manifest.json" ]; then - go run ./cmd/discrawl --config "$CONFIG" update --repo "$BACKUP_REPO" --remote "$BACKUP_REMOTE" + if [ -s "$DB" ]; then + echo "Restored Discord DB cache at $DB; skipping pre-sync snapshot import." + else + echo "Discord DB cache missing; importing latest published snapshot before latest-only sync." + go run ./cmd/discrawl --config "$CONFIG" update --repo "$BACKUP_REPO" --remote "$BACKUP_REMOTE" + fi fi go run ./cmd/discrawl --config "$CONFIG" sync --guild "$DISCRAWL_GUILD_ID" --skip-members --latest-only git -C "$BACKUP_REPO" pull --ff-only origin main diff --git a/CHANGELOG.md b/CHANGELOG.md index a294b37..45c607b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,7 @@ All notable changes to `discrawl` will be documented in this file. ### Fixes +- Scheduled Discord backup publishing now skips redundant pre-sync snapshot imports when the workflow DB cache is warm, keeping fresh Git snapshots from getting delayed by a full archive reimport. - `discrawl sync` now keeps Git snapshot refreshes explicit by default; use `--update=auto` or `--update=force` when you want a sync run to pull/import the shared snapshot before live Discord or desktop-cache deltas. - Snapshot imports now emit phase/table/file progress and keep the sync lock file updated with the active phase, making long update/import runs diagnosable instead of looking hung. - Recent-message scans are backed by a plain `messages(created_at, id)` index so archive freshness and short-window analysis queries avoid full-table scans. diff --git a/README.md b/README.md index 657ab8d..58bc1b8 100644 --- a/README.md +++ b/README.md @@ -493,7 +493,7 @@ discrawl report --readme path/to/discord-backup/README.md Every scheduled snapshot publish updates deterministic README stats: latest update time, latest archived message, archive totals, and day/week/month activity. -The backup workflows restore and save `.discrawl-ci/discrawl.db` with `actions/cache`. On a warm runner cache, `discrawl update` compares the cached DB's last imported snapshot timestamp with `manifest.json` and skips the full sharded import when they match. Cache misses and newer backup manifests still take the normal pull/import path. +The backup workflows restore and save `.discrawl-ci/discrawl.db` with `actions/cache`. On a warm runner cache, scheduled publishers skip the pre-sync snapshot import and go straight to the live latest-message delta before publishing. Cache misses still import the latest published snapshot first so `--latest-only` has channel cursors to resume from. ### `digest`