telecrawl/README.md
2026-05-08 17:13:41 +01:00

4.1 KiB

telecrawl

Telegram Desktop archive CLI.

telecrawl reads your local Telegram Desktop tdata through opentele2 and Telethon, stores a searchable SQLite archive in ~/.telecrawl/telecrawl.db, and can back it up to GitHub as encrypted age shards.

It is local-first:

  • Normal archive/search commands do not upload data.
  • backup push uploads only age-encrypted shards when you run it explicitly.
  • Telegram message text, chat names, sender names, and media metadata stay inside encrypted backup payloads.

Install

brew tap steipete/tap
brew install telecrawl

Or install with Go:

go install github.com/openclaw/telecrawl/cmd/telecrawl@latest

Setup

Install the Python bridge used for Telegram Desktop tdata imports:

telecrawl deps install

This creates ~/.telecrawl/venv and installs opentele2 plus Telethon.

Import

telecrawl doctor
telecrawl import
telecrawl status

Import defaults to:

  • latest 200 dialogs
  • latest 500 messages per dialog

Use 0 for no limit:

telecrawl import --dialogs-limit 0 --messages-limit 0

Useful reads:

telecrawl chats --limit 20
telecrawl chats --unread
telecrawl messages --limit 20
telecrawl messages --chat CHAT_ID --after 2026-01-01
telecrawl search "query"
telecrawl search "query" --chat CHAT_ID

Add --json before the command for machine-readable output:

telecrawl --json status
telecrawl --json search "invoice"

Data Paths

Defaults:

  • Telegram Desktop source: ~/Library/Application Support/Telegram Desktop/tdata
  • archive DB: ~/.telecrawl/telecrawl.db
  • Python bridge venv: ~/.telecrawl/venv
  • Telethon sessions: ~/.telecrawl/sessions/
  • backup config: ~/.telecrawl/backup.json
  • age identity: ~/.telecrawl/age.key
  • backup checkout: ~/Projects/backup-telecrawl

Override the archive DB:

telecrawl --db /tmp/telecrawl.db status

Override the Telegram Desktop source:

telecrawl --source "/path/to/tdata" doctor
telecrawl --source "/path/to/tdata" import

Backup

Create https://github.com/steipete/backup-telecrawl first, then initialize:

telecrawl backup init
telecrawl backup push

The default backup config points at:

{
  "repo": "~/Projects/backup-telecrawl",
  "remote": "https://github.com/steipete/backup-telecrawl.git",
  "identity": "~/.telecrawl/age.key"
}

Use a different repository or config path:

telecrawl backup init \
  --config ~/.telecrawl/backup.json \
  --repo ~/Projects/backup-telecrawl \
  --remote https://github.com/steipete/backup-telecrawl.git

Inspect backup metadata:

telecrawl backup status

Restore into the current archive DB:

telecrawl backup pull
telecrawl status

Restore into a throwaway DB for validation:

telecrawl --db /tmp/telecrawl-restore-test.db backup pull
telecrawl --db /tmp/telecrawl-restore-test.db status

Backup Security Model

Backup shards are JSONL, gzip-compressed with deterministic gzip metadata, and encrypted with age before Git sees them.

Git can still see cleartext metadata:

  • export time
  • public age recipients
  • table names
  • row counts
  • shard paths
  • encrypted byte sizes
  • plaintext shard hashes
  • backup cadence and which encrypted shards changed

Git cannot read message text, chat names, sender names, or media metadata without an age identity.

Keep ~/.telecrawl/age.key private. If you lose it and no other recipient can decrypt the backup, the encrypted backup cannot be restored.

Multi-Machine Backups

On another machine:

telecrawl backup init --no-push
cat ~/.telecrawl/backup.json

Copy that machine's public recipient into the first machine's ~/.telecrawl/backup.json, then re-encrypt current shards:

telecrawl backup push

The private AGE-SECRET-KEY-... identity must not be committed or shared.

Reset

Remove local state:

rm -rf ~/.telecrawl

Remove only the archive:

rm -f ~/.telecrawl/telecrawl.db ~/.telecrawl/telecrawl.db-*

Do not delete ~/.telecrawl/age.key unless you have another working backup recipient or you no longer need to restore existing encrypted backups.