[BREAKGLASS] Append-only mirror of github.com/openclaw/notcrawl
Go to file
2026-04-22 17:14:10 -07:00
.github chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
cmd/notcrawl chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
docs chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
internal chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
scripts chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
.gitignore chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
.goreleaser.yml chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
config.example.toml chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
CONTRIBUTING.md chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
go.mod chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
go.sum feat: add sqlite archive store 2026-04-22 14:44:47 -07:00
LICENSE chore: scaffold notioncrawl 2026-04-22 14:41:56 -07:00
Makefile chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00
README.md Update README.md 2026-04-22 17:14:10 -07:00
SPEC.md chore: rename project to notcrawl 2026-04-22 16:57:43 -07:00

notcrawl banner

📰 notcrawl

notcrawl mirrors Notion workspace data into local SQLite and normalized Markdown so you can search, query, diff, and share your Notion memory without depending on the Notion UI.

It has two ingestion paths:

  • desktop: read-only snapshots of the local Notion desktop cache
  • api: official Notion API sync with rate-limit aware crawling

SQLite is the canonical archive. Markdown is the durable human/agent surface. Git share mode publishes normalized snapshots that other machines can subscribe to without holding Notion credentials.

Current Scope

  • local SQLite storage with FTS5
  • read-only local desktop cache ingestion from macOS Notion
  • official API page/block/user/comment ingestion
  • normalized Markdown export organized by space and page path
  • compressed JSONL git-share snapshots plus import/update workflows
  • read-only SQL access for ad hoc inspection

Quick Start

go build -o bin/notcrawl ./cmd/notcrawl
bin/notcrawl init
bin/notcrawl doctor
bin/notcrawl sync --source desktop
bin/notcrawl export-md
bin/notcrawl search "launch plan"

For API sync:

export NOTION_TOKEN="secret_..."
bin/notcrawl sync --source api

Default paths:

  • config: ~/.notcrawl/config.toml
  • database: ~/.notcrawl/notcrawl.db
  • cache: ~/.notcrawl/cache
  • Markdown archive: ~/.notcrawl/pages
  • git share repo: ~/.notcrawl/share

Commands

  • init writes a starter config
  • doctor checks config, SQLite, desktop cache, and token presence
  • sync ingests from desktop, api, or all
  • export-md renders normalized Markdown files from SQLite
  • search searches page and comment text through FTS5
  • sql runs read-only SQL against the archive
  • publish exports SQLite tables and Markdown into a git share repo
  • subscribe clones a share repo and imports the latest snapshot
  • update pulls and imports a subscribed share repo

Distribution

Release packaging is managed with GoReleaser. Tagged releases build tarballs, checksums, .deb, .rpm, GitHub release notes, and a Homebrew tap update.

See docs/distribution.md for release operations.

Safety Model

Desktop mode is read-only. It snapshots Notion's local SQLite database before reading it and never writes to Notion application storage.

API mode uses the official Notion API. It stores raw API payloads alongside normalized rows so renderers can improve without recrawling.

Secrets are never exported into Markdown or git-share snapshots.