docs: document crawlkit adoption

This commit is contained in:
Vincent Koc 2026-05-05 19:16:51 -07:00
parent bb51c9ea12
commit 57630c2f95
No known key found for this signature in database
4 changed files with 32 additions and 7 deletions

View File

@ -3,6 +3,10 @@
## Unreleased
- Initial `crawlkit` module scaffold.
- Document downstream adoption status for `gitcrawl`, `discrawl`, `slacrawl`,
and `notcrawl`, including the app-owned provider/auth/privacy boundary.
- Add the `control` package to the public package inventory for app metadata,
command manifests, status payloads, and database inventory.
- Document the `crawlkit` versus crawl-app boundary for embeddings, search,
inference, sync state, snapshots, SQLite, and git mirrors.
- Add safer `mirror` helpers for origin updates, existing-origin pulls,

View File

@ -5,8 +5,8 @@ Shared Go infrastructure for local-first crawler archives.
`crawlkit` is not a universal Slack, Discord, Notion, or GitHub crawler. It is
the reusable foundation beneath those tools: SQLite hygiene, TOML config
defaults, portable JSONL/Gzip packing, git-backed snapshot sharing, sync state,
CLI output helpers, a shared terminal explorer, and safe desktop-cache snapshot
utilities.
CLI output helpers, control/status metadata, a shared terminal explorer, and
safe desktop-cache snapshot utilities.
## Install
@ -26,9 +26,19 @@ See `docs/boundary.md` for the crawlkit-versus-app ownership boundary.
- `mirror`: clone/init/pull/commit/push helpers for private snapshot repos.
- `state`: generic crawler cursor and freshness records.
- `output`: text/json/log output helpers.
- `control`: crawl app metadata, command manifests, status payloads, and
database inventory for launchers and automation.
- `tui`: shared terminal archive explorer with gitcrawl-style responsive panes, entity/member/detail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting/filtering, and local/remote source status.
- `cache`: safe read-only local cache snapshot helpers.
## Downstream apps
- `gitcrawl` and `discrawl` consume `crawlkit` on `main`.
- `slacrawl` and `notcrawl` consume `crawlkit` on their `feat/use-crawlkit`
integration branches until those app rewires are merged.
- The apps keep provider schemas, auth, desktop/API parsing, privacy filters,
and user-facing CLI contracts. `crawlkit` owns only the reusable mechanics.
## Safety
Library tests use temporary directories. They do not touch app runtime stores

View File

@ -9,6 +9,15 @@ neutral, reusable by at least two apps, and can preserve the app's existing
database and CLI contracts. Keep provider schemas, auth, API clients, cache
parsers, and product-specific ranking in the apps.
## adoption status
| app | branch | crawlkit usage | still app-owned |
| --- | --- | --- | --- |
| `gitcrawl` | `main` | config paths, SQLite openers, command/control metadata, status inventory, and the reference TUI/control contract | GitHub API sync, `gh` shim behavior, embeddings, clustering, inference, portable-store schema pruning, and the richer cluster TUI |
| `discrawl` | `main` | config/status/control, snapshot packing/import, git mirror mechanics, sync-state adapters, output helpers, and shared chat TUI | Discord bot API, desktop wiretap parsing, DM privacy filters, Discord schema, FTS/ranking, embeddings, and analytics |
| `slacrawl` | `feat/use-crawlkit` | config/status/control, snapshot packing/import, git mirror mechanics, state helpers, output helpers, and shared chat TUI | Slack API/Desktop parsing, token scopes, Slack schema, Slack text normalization, channel/thread semantics, and analytics |
| `notcrawl` | `feat/use-crawlkit` | config/status/control, snapshot packing/import, git mirror mechanics, output helpers, and shared document TUI | Notion API/Desktop parsing, Markdown rendering, page/comment/database schema, Notion FTS body construction, and data-source compatibility |
## owns
`crawlkit` should own these surfaces:

View File

@ -14,9 +14,11 @@ go vet ./...
go test ./...
```
3. Test downstream apps against the local checkout through a temporary Go workspace.
4. Merge `crawlkit` to `main`.
5. Tag the next semver release from `main`:
3. Update docs and changelogs in `crawlkit` plus every downstream app branch
that consumes the release.
4. Test downstream apps against the local checkout through a temporary Go workspace.
5. Merge `crawlkit` to `main`.
6. Tag the next semver release from `main`:
```bash
git tag -s v0.4.0
@ -24,14 +26,14 @@ git push origin main
git push origin v0.4.0
```
6. Prime and verify module proxy visibility:
7. Prime and verify module proxy visibility:
```bash
GOPROXY=https://proxy.golang.org go list -m github.com/vincentkoc/crawlkit@v0.4.0
go list -m github.com/vincentkoc/crawlkit@v0.4.0
```
7. Bump downstream apps to the new tag and commit their `go.mod`/`go.sum` updates:
8. Bump downstream apps to the new tag and commit their `go.mod`/`go.sum` updates:
```bash
go get github.com/vincentkoc/crawlkit@v0.4.0