ci: skip warm discord backup imports

This commit is contained in:
Peter Steinberger 2026-05-03 19:41:40 +01:00
parent 86502b251c
commit 68b49c90a5
No known key found for this signature in database
3 changed files with 8 additions and 2 deletions

View File

@ -76,7 +76,12 @@ jobs:
git clone "$BACKUP_REMOTE" "$BACKUP_REPO"
go run ./cmd/discrawl --config "$CONFIG" init --db "$DB" --guild "$DISCRAWL_GUILD_ID"
if [ -f "$BACKUP_REPO/manifest.json" ]; then
go run ./cmd/discrawl --config "$CONFIG" update --repo "$BACKUP_REPO" --remote "$BACKUP_REMOTE"
if [ -s "$DB" ]; then
echo "Restored Discord DB cache at $DB; skipping pre-sync snapshot import."
else
echo "Discord DB cache missing; importing latest published snapshot before latest-only sync."
go run ./cmd/discrawl --config "$CONFIG" update --repo "$BACKUP_REPO" --remote "$BACKUP_REMOTE"
fi
fi
go run ./cmd/discrawl --config "$CONFIG" sync --guild "$DISCRAWL_GUILD_ID" --skip-members --latest-only
git -C "$BACKUP_REPO" pull --ff-only origin main

View File

@ -6,6 +6,7 @@ All notable changes to `discrawl` will be documented in this file.
### Fixes
- Scheduled Discord backup publishing now skips redundant pre-sync snapshot imports when the workflow DB cache is warm, keeping fresh Git snapshots from getting delayed by a full archive reimport.
- `discrawl sync` now keeps Git snapshot refreshes explicit by default; use `--update=auto` or `--update=force` when you want a sync run to pull/import the shared snapshot before live Discord or desktop-cache deltas.
- Snapshot imports now emit phase/table/file progress and keep the sync lock file updated with the active phase, making long update/import runs diagnosable instead of looking hung.
- Recent-message scans are backed by a plain `messages(created_at, id)` index so archive freshness and short-window analysis queries avoid full-table scans.

View File

@ -493,7 +493,7 @@ discrawl report --readme path/to/discord-backup/README.md
Every scheduled snapshot publish updates deterministic README stats: latest update time, latest archived message, archive totals, and day/week/month activity.
The backup workflows restore and save `.discrawl-ci/discrawl.db` with `actions/cache`. On a warm runner cache, `discrawl update` compares the cached DB's last imported snapshot timestamp with `manifest.json` and skips the full sharded import when they match. Cache misses and newer backup manifests still take the normal pull/import path.
The backup workflows restore and save `.discrawl-ci/discrawl.db` with `actions/cache`. On a warm runner cache, scheduled publishers skip the pre-sync snapshot import and go straight to the live latest-message delta before publishing. Cache misses still import the latest published snapshot first so `--latest-only` has channel cursors to resume from.
### `digest`