From bed1da54710a9a8855e79911c1b2387e3096276c Mon Sep 17 00:00:00 2001 From: Vincent Koc Date: Tue, 5 May 2026 19:16:51 -0700 Subject: [PATCH] docs: document crawlkit control surface --- CHANGELOG.md | 10 ++++++++++ README.md | 4 ++++ docs/commands.md | 2 ++ docs/tui.md | 5 +++++ 4 files changed, 21 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 26288d8..0a73fa2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,15 @@ # Changelog +## Unreleased + +- Document the crawlkit control surface now available on `main`, including + `metadata --json`, `status --json`, and `doctor --json` for local launchers + and CI. +- Clarify that `gitcrawl tui` remains the reference terminal browser for the + crawl app family while shared `crawlkit/tui` converges on the same panes, + sorting, action menus, and status chrome. +- Add command-reference coverage for the read-only metadata/status commands. + ## 0.2.1 - 2026-05-05 - Improve `gh` shim cache coordination and observability with stale-while-revalidate reads, finer Actions/API TTLs, recent-window stats, top miss keys, and `xcache snapshot`. diff --git a/README.md b/README.md index 48cecad..0c774a0 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,8 @@ Early bootstrap. The implementation is being built in small commits. ```bash gitcrawl init gitcrawl doctor +gitcrawl metadata --json +gitcrawl status --json gitcrawl sync owner/repo gitcrawl sync owner/repo --state open gitcrawl sync owner/repo --numbers 123,456 --include-comments @@ -46,6 +48,7 @@ gitcrawl tui owner/repo ``` `gitcrawl clusters` and `gitcrawl tui` match ghcrawl's display view: latest raw run clusters first, closed durable rows merged as historical context, sorted by size by default. Pass `--hide-closed` to focus only currently open clusters. `gitcrawl durable-clusters` stays on governed durable rows and needs `--include-closed` for inactive rows. +`gitcrawl metadata --json`, `gitcrawl status --json`, and `gitcrawl doctor --json` are crawlkit control surfaces for launchers, local automation, and CI checks. They are read-only and do not mutate archive data. `gitcrawl cluster` and `gitcrawl refresh` build ghcrawl-shaped durable clusters by default (`--threshold 0.80`, `--min-size 1`, `--max-cluster-size 40`, `--k 16`, `--cross-kind-threshold 0.93`): every active vector-backed thread is represented, singleton rows use `singleton_orphan`, multi-member rows use `duplicate_candidate`, and stable IDs are derived from the representative thread. They also add deterministic GitHub reference evidence for direct issue/PR links such as `#123`, `issues/123`, and `pull/123`. Weak embedding edges need concrete title-token overlap unless their similarity is already high, which keeps generic low-confidence bridges from forming unrelated clusters. `gitcrawl tui` infers the most recently updated local repository when `owner/repo` is omitted. `serve` is intentionally not part of `gitcrawl`. `gitcrawl sync` fetches open issues and pull requests by default. Pass `--state all` or `--state closed` for explicit backfill workflows; incremental open syncs with `--since` also sweep recently closed items so local open state does not rot. @@ -54,6 +57,7 @@ Pass `--with pr-details` or `--include-pr-details` to hydrate pull request files `gitcrawl search issues|prs` accepts the common `gh search` shape (` -R owner/repo --state open --json fields --limit N`) and answers from the local SQLite cache. It is intended for discovery without spending GitHub REST search quota; use `gh` for final live verification and GitHub write actions. Pass `--sync-if-stale 5m` to perform one metadata sync before the cached search when the local repository mirror is older than that duration. `gitcrawl gh` is a gh-compatible shim for agent workflows. It answers broad `gh search issues|prs`, `gh issue/pr list`, supported `gh issue/pr view --json` fields, hydrated `gh pr checks`, and hydrated `gh run list/view` from local SQLite, then falls through to the real GitHub CLI for unsupported commands. Local `gh issue/pr list` supports common filters such as `--author`, `--assignee`, and repeated `--label`. Read-only fallthroughs such as `gh pr diff`, `gh repo view/list`, `gh release list/view`, `gh workflow list/view`, `gh secret list`, `gh variable get/list`, `gh label list`, read-only `gh search` kinds, GET-only REST `gh api` calls, and read-only `gh api graphql` queries use a command-aware persistent cache under `cache/gh-shim`; Actions run/job logs get longer TTLs, completed run/job reads are kept much longer than active CI status, user profile reads get a 7-day TTL, read-only GraphQL gets a 6-hour TTL, and `gh pr diff` entries are keyed by the cached PR head SHA when available. Explicit API paths and explicit repositories share cache entries across sibling checkouts even when agents set different `GH_REPO` values; implicit repo reads stay isolated by `GH_REPO` or current working directory. Cache keys canonicalize common flags such as `-R`/`--repo` and sorted `--json` fields so equivalent agent commands coalesce. Repeat read failures are cached by default so agents do not rediscover the same missing release or workflow, but rate-limit error entries expire quickly; if GitHub rate-limits a refresh and an expired successful entry exists, the shim serves the stale response with a warning instead of failing the read. When another process is refreshing an expired successful entry, peers may serve stale inside a short grace window instead of joining the backend stampede. Set `GITCRAWL_GH_STALE_GRACE=0` to disable stale-while-revalidate, or `GITCRAWL_GH_CACHE_ERRORS=0` to disable error caching. Mutating commands pass through, increment write counters, and invalidate matching cache tags instead of flushing unrelated entries. `gh xcache stats|keys|gc|flush|reset|snapshot` inspects, garbage-collects, clears, resets, or snapshots fallthrough-cache counters, including hit rate plus per-command, per-route, per-key, and `--since` recent-window miss counters. Set `GITCRAWL_GH_PATH` to choose the backend `gh`, and symlink or install the binary as `gh`/`gitcrawl-gh` to run the shim directly. The TUI starts at `--min-size 5` and `--sort size`, like ghcrawl's saved default, so the first screen is the useful cluster workload instead of singleton noise. Pass `--min-size 1` when you intentionally want singleton clusters. Mouse support is built in: click rows, wheel panes, and right-click for copy, sort, filter, jump, link, neighbor, local close/reopen, and member triage actions. Press `a` to open the same action menu from the keyboard, `#` to jump directly to an issue or PR number, `p` to switch between repositories already present in the local store, or `n` to load neighbors for the selected issue or PR. Enter from the members pane also loads neighbors before opening detail. The TUI quietly refreshes from the local store every 15 seconds. +`gitcrawl tui` remains the reference terminal interaction model for the crawl app family: pane focus, sortable headers, mouse/right-click actions, detail rendering, and status chrome are the behavior the shared `crawlkit/tui` browser is converging on for Slack, Discord, and Notion archives. ## Local Defaults diff --git a/docs/commands.md b/docs/commands.md index f5221df..74ea18a 100644 --- a/docs/commands.md +++ b/docs/commands.md @@ -32,6 +32,8 @@ These work on every command. | --- | --- | --- | | `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](/installation/), [Portable stores](/portable-stores/) | | `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](/configuration/#gitcrawl-doctor) | +| `gitcrawl metadata [--json]` | Print the crawlkit command/control manifest for launchers and automation | — | +| `gitcrawl status [--json]` | Print read-only archive status, database inventory, and control state | — | | `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](/configuration/#gitcrawl-configure) | | `gitcrawl version` | Print version | — | diff --git a/docs/tui.md b/docs/tui.md index ff5895b..7af4495 100644 --- a/docs/tui.md +++ b/docs/tui.md @@ -10,6 +10,11 @@ permalink: /tui/ `gitcrawl tui` is the interactive cluster browser. Keyboard-first, mouse-friendly, refreshes from local SQLite every 15 seconds. {: .fs-6 .fw-300 } +It is also the reference terminal interaction model for the crawl app family. +The shared `crawlkit/tui` browser used by Slack, Discord, and Notion archives +is expected to match its pane focus, sortable headers, mouse/right-click +actions, detail rendering, and status chrome wherever the data model allows it. + 1. TOC {:toc}