From 57630c2f95cee4ff8d71dfe0fb35ac707d33873d Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Tue, 5 May 2026 19:16:51 -0700 Subject: [PATCH] docs: document crawlkit adoption --- CHANGELOG.md | 4 ++++ README.md | 14 ++++++++++++-- docs/boundary.md | 9 +++++++++ docs/publishing.md | 12 +++++++----- 4 files changed, 32 insertions(+), 7 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 21e6173..40d8912 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,10 @@ ## Unreleased - Initial `crawlkit` module scaffold. +- Document downstream adoption status for `gitcrawl`, `discrawl`, `slacrawl`, + and `notcrawl`, including the app-owned provider/auth/privacy boundary. +- Add the `control` package to the public package inventory for app metadata, + command manifests, status payloads, and database inventory. - Document the `crawlkit` versus crawl-app boundary for embeddings, search, inference, sync state, snapshots, SQLite, and git mirrors. - Add safer `mirror` helpers for origin updates, existing-origin pulls, diff --git a/README.md b/README.md index b321b1f..7f8436e 100644 --- a/README.md +++ b/README.md @@ -5,8 +5,8 @@ Shared Go infrastructure for local-first crawler archives. `crawlkit` is not a universal Slack, Discord, Notion, or GitHub crawler. It is the reusable foundation beneath those tools: SQLite hygiene, TOML config defaults, portable JSONL/Gzip packing, git-backed snapshot sharing, sync state, -CLI output helpers, a shared terminal explorer, and safe desktop-cache snapshot -utilities. +CLI output helpers, control/status metadata, a shared terminal explorer, and +safe desktop-cache snapshot utilities. ## Install @@ -26,9 +26,19 @@ See `docs/boundary.md` for the crawlkit-versus-app ownership boundary. - `mirror`: clone/init/pull/commit/push helpers for private snapshot repos. - `state`: generic crawler cursor and freshness records. - `output`: text/json/log output helpers. +- `control`: crawl app metadata, command manifests, status payloads, and + database inventory for launchers and automation. - `tui`: shared terminal archive explorer with gitcrawl-style responsive panes, entity/member/detail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting/filtering, and local/remote source status. - `cache`: safe read-only local cache snapshot helpers. +## Downstream apps + +- `gitcrawl` and `discrawl` consume `crawlkit` on `main`. +- `slacrawl` and `notcrawl` consume `crawlkit` on their `feat/use-crawlkit` + integration branches until those app rewires are merged. +- The apps keep provider schemas, auth, desktop/API parsing, privacy filters, + and user-facing CLI contracts. `crawlkit` owns only the reusable mechanics. + ## Safety Library tests use temporary directories. They do not touch app runtime stores diff --git a/docs/boundary.md b/docs/boundary.md index bbcfb9f..e43039c 100644 --- a/docs/boundary.md +++ b/docs/boundary.md @@ -9,6 +9,15 @@ neutral, reusable by at least two apps, and can preserve the app's existing database and CLI contracts. Keep provider schemas, auth, API clients, cache parsers, and product-specific ranking in the apps. +## adoption status + +| app | branch | crawlkit usage | still app-owned | +| --- | --- | --- | --- | +| `gitcrawl` | `main` | config paths, SQLite openers, command/control metadata, status inventory, and the reference TUI/control contract | GitHub API sync, `gh` shim behavior, embeddings, clustering, inference, portable-store schema pruning, and the richer cluster TUI | +| `discrawl` | `main` | config/status/control, snapshot packing/import, git mirror mechanics, sync-state adapters, output helpers, and shared chat TUI | Discord bot API, desktop wiretap parsing, DM privacy filters, Discord schema, FTS/ranking, embeddings, and analytics | +| `slacrawl` | `feat/use-crawlkit` | config/status/control, snapshot packing/import, git mirror mechanics, state helpers, output helpers, and shared chat TUI | Slack API/Desktop parsing, token scopes, Slack schema, Slack text normalization, channel/thread semantics, and analytics | +| `notcrawl` | `feat/use-crawlkit` | config/status/control, snapshot packing/import, git mirror mechanics, output helpers, and shared document TUI | Notion API/Desktop parsing, Markdown rendering, page/comment/database schema, Notion FTS body construction, and data-source compatibility | + ## owns `crawlkit` should own these surfaces: diff --git a/docs/publishing.md b/docs/publishing.md index 05430ec..7b27e8b 100644 --- a/docs/publishing.md +++ b/docs/publishing.md @@ -14,9 +14,11 @@ go vet ./... go test ./... ``` -3. Test downstream apps against the local checkout through a temporary Go workspace. -4. Merge `crawlkit` to `main`. -5. Tag the next semver release from `main`: +3. Update docs and changelogs in `crawlkit` plus every downstream app branch + that consumes the release. +4. Test downstream apps against the local checkout through a temporary Go workspace. +5. Merge `crawlkit` to `main`. +6. Tag the next semver release from `main`: ```bash git tag -s v0.4.0 @@ -24,14 +26,14 @@ git push origin main git push origin v0.4.0 ``` -6. Prime and verify module proxy visibility: +7. Prime and verify module proxy visibility: ```bash GOPROXY=https://proxy.golang.org go list -m github.com/vincentkoc/crawlkit@v0.4.0 go list -m github.com/vincentkoc/crawlkit@v0.4.0 ``` -7. Bump downstream apps to the new tag and commit their `go.mod`/`go.sum` updates: +8. Bump downstream apps to the new tag and commit their `go.mod`/`go.sum` updates: ```bash go get github.com/vincentkoc/crawlkit@v0.4.0