docs: document crawlkit adoption
This commit is contained in:
parent
bb51c9ea12
commit
57630c2f95
@ -3,6 +3,10 @@
|
||||
## Unreleased
|
||||
|
||||
- Initial `crawlkit` module scaffold.
|
||||
- Document downstream adoption status for `gitcrawl`, `discrawl`, `slacrawl`,
|
||||
and `notcrawl`, including the app-owned provider/auth/privacy boundary.
|
||||
- Add the `control` package to the public package inventory for app metadata,
|
||||
command manifests, status payloads, and database inventory.
|
||||
- Document the `crawlkit` versus crawl-app boundary for embeddings, search,
|
||||
inference, sync state, snapshots, SQLite, and git mirrors.
|
||||
- Add safer `mirror` helpers for origin updates, existing-origin pulls,
|
||||
|
||||
14
README.md
14
README.md
@ -5,8 +5,8 @@ Shared Go infrastructure for local-first crawler archives.
|
||||
`crawlkit` is not a universal Slack, Discord, Notion, or GitHub crawler. It is
|
||||
the reusable foundation beneath those tools: SQLite hygiene, TOML config
|
||||
defaults, portable JSONL/Gzip packing, git-backed snapshot sharing, sync state,
|
||||
CLI output helpers, a shared terminal explorer, and safe desktop-cache snapshot
|
||||
utilities.
|
||||
CLI output helpers, control/status metadata, a shared terminal explorer, and
|
||||
safe desktop-cache snapshot utilities.
|
||||
|
||||
## Install
|
||||
|
||||
@ -26,9 +26,19 @@ See `docs/boundary.md` for the crawlkit-versus-app ownership boundary.
|
||||
- `mirror`: clone/init/pull/commit/push helpers for private snapshot repos.
|
||||
- `state`: generic crawler cursor and freshness records.
|
||||
- `output`: text/json/log output helpers.
|
||||
- `control`: crawl app metadata, command manifests, status payloads, and
|
||||
database inventory for launchers and automation.
|
||||
- `tui`: shared terminal archive explorer with gitcrawl-style responsive panes, entity/member/detail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting/filtering, and local/remote source status.
|
||||
- `cache`: safe read-only local cache snapshot helpers.
|
||||
|
||||
## Downstream apps
|
||||
|
||||
- `gitcrawl` and `discrawl` consume `crawlkit` on `main`.
|
||||
- `slacrawl` and `notcrawl` consume `crawlkit` on their `feat/use-crawlkit`
|
||||
integration branches until those app rewires are merged.
|
||||
- The apps keep provider schemas, auth, desktop/API parsing, privacy filters,
|
||||
and user-facing CLI contracts. `crawlkit` owns only the reusable mechanics.
|
||||
|
||||
## Safety
|
||||
|
||||
Library tests use temporary directories. They do not touch app runtime stores
|
||||
|
||||
@ -9,6 +9,15 @@ neutral, reusable by at least two apps, and can preserve the app's existing
|
||||
database and CLI contracts. Keep provider schemas, auth, API clients, cache
|
||||
parsers, and product-specific ranking in the apps.
|
||||
|
||||
## adoption status
|
||||
|
||||
| app | branch | crawlkit usage | still app-owned |
|
||||
| --- | --- | --- | --- |
|
||||
| `gitcrawl` | `main` | config paths, SQLite openers, command/control metadata, status inventory, and the reference TUI/control contract | GitHub API sync, `gh` shim behavior, embeddings, clustering, inference, portable-store schema pruning, and the richer cluster TUI |
|
||||
| `discrawl` | `main` | config/status/control, snapshot packing/import, git mirror mechanics, sync-state adapters, output helpers, and shared chat TUI | Discord bot API, desktop wiretap parsing, DM privacy filters, Discord schema, FTS/ranking, embeddings, and analytics |
|
||||
| `slacrawl` | `feat/use-crawlkit` | config/status/control, snapshot packing/import, git mirror mechanics, state helpers, output helpers, and shared chat TUI | Slack API/Desktop parsing, token scopes, Slack schema, Slack text normalization, channel/thread semantics, and analytics |
|
||||
| `notcrawl` | `feat/use-crawlkit` | config/status/control, snapshot packing/import, git mirror mechanics, output helpers, and shared document TUI | Notion API/Desktop parsing, Markdown rendering, page/comment/database schema, Notion FTS body construction, and data-source compatibility |
|
||||
|
||||
## owns
|
||||
|
||||
`crawlkit` should own these surfaces:
|
||||
|
||||
@ -14,9 +14,11 @@ go vet ./...
|
||||
go test ./...
|
||||
```
|
||||
|
||||
3. Test downstream apps against the local checkout through a temporary Go workspace.
|
||||
4. Merge `crawlkit` to `main`.
|
||||
5. Tag the next semver release from `main`:
|
||||
3. Update docs and changelogs in `crawlkit` plus every downstream app branch
|
||||
that consumes the release.
|
||||
4. Test downstream apps against the local checkout through a temporary Go workspace.
|
||||
5. Merge `crawlkit` to `main`.
|
||||
6. Tag the next semver release from `main`:
|
||||
|
||||
```bash
|
||||
git tag -s v0.4.0
|
||||
@ -24,14 +26,14 @@ git push origin main
|
||||
git push origin v0.4.0
|
||||
```
|
||||
|
||||
6. Prime and verify module proxy visibility:
|
||||
7. Prime and verify module proxy visibility:
|
||||
|
||||
```bash
|
||||
GOPROXY=https://proxy.golang.org go list -m github.com/vincentkoc/crawlkit@v0.4.0
|
||||
go list -m github.com/vincentkoc/crawlkit@v0.4.0
|
||||
```
|
||||
|
||||
7. Bump downstream apps to the new tag and commit their `go.mod`/`go.sum` updates:
|
||||
8. Bump downstream apps to the new tag and commit their `go.mod`/`go.sum` updates:
|
||||
|
||||
```bash
|
||||
go get github.com/vincentkoc/crawlkit@v0.4.0
|
||||
|
||||
Loading…
Reference in New Issue
Block a user