docs: document crawlkit control surface
This commit is contained in:
parent
ec7a91465c
commit
bed1da5471
10
CHANGELOG.md
10
CHANGELOG.md
@ -1,5 +1,15 @@
|
||||
# Changelog
|
||||
|
||||
## Unreleased
|
||||
|
||||
- Document the crawlkit control surface now available on `main`, including
|
||||
`metadata --json`, `status --json`, and `doctor --json` for local launchers
|
||||
and CI.
|
||||
- Clarify that `gitcrawl tui` remains the reference terminal browser for the
|
||||
crawl app family while shared `crawlkit/tui` converges on the same panes,
|
||||
sorting, action menus, and status chrome.
|
||||
- Add command-reference coverage for the read-only metadata/status commands.
|
||||
|
||||
## 0.2.1 - 2026-05-05
|
||||
|
||||
- Improve `gh` shim cache coordination and observability with stale-while-revalidate reads, finer Actions/API TTLs, recent-window stats, top miss keys, and `xcache snapshot`.
|
||||
|
||||
@ -15,6 +15,8 @@ Early bootstrap. The implementation is being built in small commits.
|
||||
```bash
|
||||
gitcrawl init
|
||||
gitcrawl doctor
|
||||
gitcrawl metadata --json
|
||||
gitcrawl status --json
|
||||
gitcrawl sync owner/repo
|
||||
gitcrawl sync owner/repo --state open
|
||||
gitcrawl sync owner/repo --numbers 123,456 --include-comments
|
||||
@ -46,6 +48,7 @@ gitcrawl tui owner/repo
|
||||
```
|
||||
|
||||
`gitcrawl clusters` and `gitcrawl tui` match ghcrawl's display view: latest raw run clusters first, closed durable rows merged as historical context, sorted by size by default. Pass `--hide-closed` to focus only currently open clusters. `gitcrawl durable-clusters` stays on governed durable rows and needs `--include-closed` for inactive rows.
|
||||
`gitcrawl metadata --json`, `gitcrawl status --json`, and `gitcrawl doctor --json` are crawlkit control surfaces for launchers, local automation, and CI checks. They are read-only and do not mutate archive data.
|
||||
`gitcrawl cluster` and `gitcrawl refresh` build ghcrawl-shaped durable clusters by default (`--threshold 0.80`, `--min-size 1`, `--max-cluster-size 40`, `--k 16`, `--cross-kind-threshold 0.93`): every active vector-backed thread is represented, singleton rows use `singleton_orphan`, multi-member rows use `duplicate_candidate`, and stable IDs are derived from the representative thread. They also add deterministic GitHub reference evidence for direct issue/PR links such as `#123`, `issues/123`, and `pull/123`. Weak embedding edges need concrete title-token overlap unless their similarity is already high, which keeps generic low-confidence bridges from forming unrelated clusters.
|
||||
`gitcrawl tui` infers the most recently updated local repository when `owner/repo` is omitted. `serve` is intentionally not part of `gitcrawl`.
|
||||
`gitcrawl sync` fetches open issues and pull requests by default. Pass `--state all` or `--state closed` for explicit backfill workflows; incremental open syncs with `--since` also sweep recently closed items so local open state does not rot.
|
||||
@ -54,6 +57,7 @@ Pass `--with pr-details` or `--include-pr-details` to hydrate pull request files
|
||||
`gitcrawl search issues|prs` accepts the common `gh search` shape (`<query> -R owner/repo --state open --json fields --limit N`) and answers from the local SQLite cache. It is intended for discovery without spending GitHub REST search quota; use `gh` for final live verification and GitHub write actions. Pass `--sync-if-stale 5m` to perform one metadata sync before the cached search when the local repository mirror is older than that duration.
|
||||
`gitcrawl gh` is a gh-compatible shim for agent workflows. It answers broad `gh search issues|prs`, `gh issue/pr list`, supported `gh issue/pr view --json` fields, hydrated `gh pr checks`, and hydrated `gh run list/view` from local SQLite, then falls through to the real GitHub CLI for unsupported commands. Local `gh issue/pr list` supports common filters such as `--author`, `--assignee`, and repeated `--label`. Read-only fallthroughs such as `gh pr diff`, `gh repo view/list`, `gh release list/view`, `gh workflow list/view`, `gh secret list`, `gh variable get/list`, `gh label list`, read-only `gh search` kinds, GET-only REST `gh api` calls, and read-only `gh api graphql` queries use a command-aware persistent cache under `cache/gh-shim`; Actions run/job logs get longer TTLs, completed run/job reads are kept much longer than active CI status, user profile reads get a 7-day TTL, read-only GraphQL gets a 6-hour TTL, and `gh pr diff` entries are keyed by the cached PR head SHA when available. Explicit API paths and explicit repositories share cache entries across sibling checkouts even when agents set different `GH_REPO` values; implicit repo reads stay isolated by `GH_REPO` or current working directory. Cache keys canonicalize common flags such as `-R`/`--repo` and sorted `--json` fields so equivalent agent commands coalesce. Repeat read failures are cached by default so agents do not rediscover the same missing release or workflow, but rate-limit error entries expire quickly; if GitHub rate-limits a refresh and an expired successful entry exists, the shim serves the stale response with a warning instead of failing the read. When another process is refreshing an expired successful entry, peers may serve stale inside a short grace window instead of joining the backend stampede. Set `GITCRAWL_GH_STALE_GRACE=0` to disable stale-while-revalidate, or `GITCRAWL_GH_CACHE_ERRORS=0` to disable error caching. Mutating commands pass through, increment write counters, and invalidate matching cache tags instead of flushing unrelated entries. `gh xcache stats|keys|gc|flush|reset|snapshot` inspects, garbage-collects, clears, resets, or snapshots fallthrough-cache counters, including hit rate plus per-command, per-route, per-key, and `--since` recent-window miss counters. Set `GITCRAWL_GH_PATH` to choose the backend `gh`, and symlink or install the binary as `gh`/`gitcrawl-gh` to run the shim directly.
|
||||
The TUI starts at `--min-size 5` and `--sort size`, like ghcrawl's saved default, so the first screen is the useful cluster workload instead of singleton noise. Pass `--min-size 1` when you intentionally want singleton clusters. Mouse support is built in: click rows, wheel panes, and right-click for copy, sort, filter, jump, link, neighbor, local close/reopen, and member triage actions. Press `a` to open the same action menu from the keyboard, `#` to jump directly to an issue or PR number, `p` to switch between repositories already present in the local store, or `n` to load neighbors for the selected issue or PR. Enter from the members pane also loads neighbors before opening detail. The TUI quietly refreshes from the local store every 15 seconds.
|
||||
`gitcrawl tui` remains the reference terminal interaction model for the crawl app family: pane focus, sortable headers, mouse/right-click actions, detail rendering, and status chrome are the behavior the shared `crawlkit/tui` browser is converging on for Slack, Discord, and Notion archives.
|
||||
|
||||
## Local Defaults
|
||||
|
||||
|
||||
@ -32,6 +32,8 @@ These work on every command.
|
||||
| --- | --- | --- |
|
||||
| `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](/installation/), [Portable stores](/portable-stores/) |
|
||||
| `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](/configuration/#gitcrawl-doctor) |
|
||||
| `gitcrawl metadata [--json]` | Print the crawlkit command/control manifest for launchers and automation | — |
|
||||
| `gitcrawl status [--json]` | Print read-only archive status, database inventory, and control state | — |
|
||||
| `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](/configuration/#gitcrawl-configure) |
|
||||
| `gitcrawl version` | Print version | — |
|
||||
|
||||
|
||||
@ -10,6 +10,11 @@ permalink: /tui/
|
||||
`gitcrawl tui` is the interactive cluster browser. Keyboard-first, mouse-friendly, refreshes from local SQLite every 15 seconds.
|
||||
{: .fs-6 .fw-300 }
|
||||
|
||||
It is also the reference terminal interaction model for the crawl app family.
|
||||
The shared `crawlkit/tui` browser used by Slack, Discord, and Notion archives
|
||||
is expected to match its pane focus, sortable headers, mouse/right-click
|
||||
actions, detail rendering, and status chrome wherever the data model allows it.
|
||||
|
||||
1. TOC
|
||||
{:toc}
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user