[BREAKGLASS] Append-only mirror of github.com/openclaw/crawlkit
Go to file
2026-05-08 07:58:41 +01:00
.github Merge pull request #5 from openclaw/ci-security-baseline 2026-05-06 01:55:23 -07:00
cache refactor: rename public package nouns 2026-05-01 12:30:13 -07:00
config refactor: rename public package nouns 2026-05-01 12:30:13 -07:00
control feat(control): add app metadata and status contracts 2026-05-01 15:23:03 -07:00
docs feat(snapshot): add incremental shard import planning 2026-05-08 07:10:12 +01:00
mirror feat(mirror): add safer share repo helpers 2026-05-05 17:18:53 -07:00
output refactor: rename public package nouns 2026-05-01 12:30:13 -07:00
progress feat(progress): add ci-safe tracker 2026-05-02 19:19:53 -07:00
snapshot feat(snapshot): add incremental shard import planning 2026-05-08 07:10:12 +01:00
state feat(state): add legacy sync adapters 2026-05-05 17:21:30 -07:00
store refactor: rename public package nouns 2026-05-01 12:30:13 -07:00
tui fix(tui): open archive groups by recency 2026-05-04 01:28:54 -07:00
.editorconfig chore: bootstrap crawlkit module 2026-05-01 08:34:01 -07:00
.gitattributes chore: bootstrap crawlkit module 2026-05-01 08:34:01 -07:00
.gitignore chore: bootstrap crawlkit module 2026-05-01 08:34:01 -07:00
AGENTS.md feat(snapshot): add incremental shard import planning 2026-05-08 07:10:12 +01:00
CHANGELOG.md docs: prepare v0.4.2 release notes 2026-05-08 07:58:41 +01:00
CONTRIBUTING.md chore: bootstrap crawlkit module 2026-05-01 08:34:01 -07:00
go.mod feat(snapshot): add incremental shard import planning 2026-05-08 07:10:12 +01:00
go.sum build(deps): bump github.com/mattn/go-isatty from 0.0.20 to 0.0.22 (#4) 2026-05-07 02:45:10 -07:00
LICENSE chore: bootstrap crawlkit module 2026-05-01 08:34:01 -07:00
Makefile ci: add validation and publishing metadata 2026-05-01 08:43:29 -07:00
README.md feat(snapshot): add incremental shard import planning 2026-05-08 07:10:12 +01:00

crawlkit

Shared Go infrastructure for local-first crawler archives.

crawlkit is not a universal Slack, Discord, Notion, or GitHub crawler. It is the reusable foundation beneath those tools: SQLite hygiene, TOML config defaults, portable JSONL/Gzip packing, git-backed snapshot sharing, sync state, CLI output helpers, control/status metadata, a shared terminal explorer, and safe desktop-cache snapshot utilities.

Install

go get github.com/openclaw/crawlkit@latest

Go packages are published by tagging this repository. There is no separate package registry step. See docs/publishing.md for the release commands. See docs/boundary.md for the crawlkit-versus-app ownership boundary.

Packages

  • config: standard TOML config paths, runtime dirs, and token diagnostics.
  • store: SQLite open/read-only/transaction/query helpers.
  • snapshot: manifest.json plus JSONL/Gzip table snapshot export, file fingerprints, full import, and planned incremental shard import.
  • mirror: clone/init/pull/commit/push helpers for private snapshot repos.
  • state: generic crawler cursor and freshness records.
  • output: text/json/log output helpers.
  • control: crawl app metadata, command manifests, status payloads, and database inventory for launchers and automation.
  • tui: shared terminal archive explorer with gitcrawl-style responsive panes, entity/member/detail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting/filtering, and local/remote source status.
  • cache: safe read-only local cache snapshot helpers.

Downstream apps

  • gitcrawl and discrawl consume crawlkit on main.
  • slacrawl and notcrawl consume crawlkit on their feat/use-crawlkit integration branches until those app rewires are merged.
  • The apps keep provider schemas, auth, desktop/API parsing, privacy filters, and user-facing CLI contracts. crawlkit owns only the reusable mechanics.

Safety

Library tests use temporary directories. They do not touch app runtime stores such as ~/.config/gitcrawl, ~/.slacrawl, ~/.discrawl, or ~/.notcrawl.