From a94a53217daca373f4724f7983bd29dd650e4628 Mon Sep 17 00:00:00 2001 From: Peter Steinberger Date: Fri, 8 May 2026 09:50:20 +0100 Subject: [PATCH] docs: update gitcrawl changelog and command docs --- CHANGELOG.md | 24 +++++++++--------------- README.md | 2 +- docs/clustering.md | 2 +- docs/commands.md | 2 +- docs/gh-shim.md | 2 ++ docs/portable-stores.md | 1 + 6 files changed, 15 insertions(+), 18 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 6b7ebc1..15c6451 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,25 +2,19 @@ ## Unreleased +- Fix gh-shim portable-store auto-hydration so exact issue/PR refreshes write to the runtime mirror instead of dirtying the Git checkout, clear stale portable refresh locks, and make empty open issue discovery fall through when only targeted sync history exists. +- Keep `cluster-detail` aligned with the default cluster list by showing closed historical members unless `--hide-closed` is passed, and fail fast when `GITCRAWL_GH_PATH` points back at the `gitcrawl` shim. + ## 0.3.0 - 2026-05-08 - Bump routine release workflow dependencies. -- Add a repo-local `gitcrawl` agent skill for local archive, freshness, - gh-shim, cluster, and verification workflows. -- Accept full GitHub issue and pull request URLs anywhere `gitcrawl` expects a - thread number, including sync filters, gh-shim views/diffs, governance - commands, neighbor lookup, embedding, and TUI jumps. -- Document read-only SQLite query examples in the repo-local agent skill so - agents can do exact local archive counts without mutating state. -- Document the crawlkit control surface now available on `main`, including - `metadata --json`, `status --json`, and `doctor --json` for local launchers - and CI. -- Clarify that `gitcrawl tui` remains the reference terminal browser for the - crawl app family while shared `crawlkit/tui` converges on the same panes, - sorting, action menus, and status chrome. +- Add a repo-local `gitcrawl` agent skill for local archive, freshness, gh-shim, cluster, and verification workflows. +- Accept full GitHub issue and pull request URLs anywhere `gitcrawl` expects a thread number, including sync filters, gh-shim views/diffs, governance commands, neighbor lookup, embedding, and TUI jumps. +- Document read-only SQLite query examples in the repo-local agent skill so agents can do exact local archive counts without mutating state. +- Document the crawlkit control surface now available on `main`, including `metadata --json`, `status --json`, and `doctor --json` for local launchers and CI. +- Clarify that `gitcrawl tui` remains the reference terminal browser for the crawl app family while shared `crawlkit/tui` converges on the same panes, sorting, action menus, and status chrome. - Add command-reference coverage for the read-only metadata/status commands. -- Add broader CLI, gh-shim, TUI, and store regression coverage for the verified - release surface. +- Add broader CLI, gh-shim, TUI, and store regression coverage for the verified release surface. ## 0.2.1 - 2026-05-05 diff --git a/README.md b/README.md index ed35d2d..e67050e 100644 --- a/README.md +++ b/README.md @@ -61,7 +61,7 @@ Pass `--numbers` to refresh exact issue or pull request rows without relying on Thread-reference inputs accept bare numbers, `#123`, `issues/123`, `pull/123`, `owner/repo#123`, and full GitHub issue/PR URLs. This applies to sync filters, `--number` flags, governance member commands, neighbor/embed lookups, gh-shim `view`/`checks`/`diff`, and TUI jump input. For gh-shim view/checks/diff, a full GitHub URL also supplies the repository, so `-R owner/repo` can be omitted. Pass `--with pr-details` or `--include-pr-details` to hydrate pull request files, commits, checks, and workflow runs for local review. The `gh` shim can also auto-hydrate one exact PR on a PR-detail miss, then retry locally. `gitcrawl search issues|prs` accepts the common `gh search` shape (` -R owner/repo --state open --json fields --limit N`) and answers from the local SQLite cache. It is intended for discovery without spending GitHub REST search quota; use `gh` for final live verification and GitHub write actions. Pass `--sync-if-stale 5m` to perform one metadata sync before the cached search when the local repository mirror is older than that duration. -`gitcrawl gh` is a gh-compatible shim for agent workflows. It answers broad `gh search issues|prs`, `gh issue/pr list`, supported `gh issue/pr view --json` fields, hydrated `gh pr checks`, and hydrated `gh run list/view` from local SQLite, then falls through to the real GitHub CLI for unsupported commands. Local `gh issue/pr list` supports common filters such as `--author`, `--assignee`, and repeated `--label`. Read-only fallthroughs such as `gh pr diff`, `gh repo view/list`, `gh release list/view`, `gh workflow list/view`, `gh secret list`, `gh variable get/list`, `gh label list`, read-only `gh search` kinds, GET-only REST `gh api` calls, and read-only `gh api graphql` queries use a command-aware persistent cache under `cache/gh-shim`; Actions run/job logs get longer TTLs, completed run/job reads are kept much longer than active CI status, user profile reads get a 7-day TTL, read-only GraphQL gets a 6-hour TTL, and `gh pr diff` entries are keyed by the cached PR head SHA when available. Explicit API paths and explicit repositories share cache entries across sibling checkouts even when agents set different `GH_REPO` values; implicit repo reads stay isolated by `GH_REPO` or current working directory. Cache keys canonicalize common flags such as `-R`/`--repo` and sorted `--json` fields so equivalent agent commands coalesce. Repeat read failures are cached by default so agents do not rediscover the same missing release or workflow, but rate-limit error entries expire quickly; if GitHub rate-limits a refresh and an expired successful entry exists, the shim serves the stale response with a warning instead of failing the read. When another process is refreshing an expired successful entry, peers may serve stale inside a short grace window instead of joining the backend stampede. Set `GITCRAWL_GH_STALE_GRACE=0` to disable stale-while-revalidate, or `GITCRAWL_GH_CACHE_ERRORS=0` to disable error caching. Mutating commands pass through, increment write counters, and invalidate matching cache tags instead of flushing unrelated entries. `gh xcache stats|keys|gc|flush|reset|snapshot` inspects, garbage-collects, clears, resets, or snapshots fallthrough-cache counters, including hit rate plus per-command, per-route, per-key, and `--since` recent-window miss counters. Set `GITCRAWL_GH_PATH` to choose the backend `gh`, and symlink or install the binary as `gh`/`gitcrawl-gh` to run the shim directly. +`gitcrawl gh` is a gh-compatible shim for agent workflows. It answers broad `gh search issues|prs`, `gh issue/pr list`, supported `gh issue/pr view --json` fields, hydrated `gh pr checks`, and hydrated `gh run list/view` from local SQLite, then falls through to the real GitHub CLI for unsupported commands. Local `gh issue/pr list` supports common filters such as `--author`, `--assignee`, and repeated `--label`; empty open issue discovery falls through when the local repo only has targeted sync history. Read-only fallthroughs such as `gh pr diff`, `gh repo view/list`, `gh release list/view`, `gh workflow list/view`, `gh secret list`, `gh variable get/list`, `gh label list`, read-only `gh search` kinds, GET-only REST `gh api` calls, and read-only `gh api graphql` queries use a command-aware persistent cache under `cache/gh-shim`; Actions run/job logs get longer TTLs, completed run/job reads are kept much longer than active CI status, user profile reads get a 7-day TTL, read-only GraphQL gets a 6-hour TTL, and `gh pr diff` entries are keyed by the cached PR head SHA when available. Explicit API paths and explicit repositories share cache entries across sibling checkouts even when agents set different `GH_REPO` values; implicit repo reads stay isolated by `GH_REPO` or current working directory. Cache keys canonicalize common flags such as `-R`/`--repo` and sorted `--json` fields so equivalent agent commands coalesce. Repeat read failures are cached by default so agents do not rediscover the same missing release or workflow, but rate-limit error entries expire quickly; if GitHub rate-limits a refresh and an expired successful entry exists, the shim serves the stale response with a warning instead of failing the read. When another process is refreshing an expired successful entry, peers may serve stale inside a short grace window instead of joining the backend stampede. Set `GITCRAWL_GH_STALE_GRACE=0` to disable stale-while-revalidate, or `GITCRAWL_GH_CACHE_ERRORS=0` to disable error caching. Mutating commands pass through, increment write counters, and invalidate matching cache tags instead of flushing unrelated entries. `gh xcache stats|keys|gc|flush|reset|snapshot` inspects, garbage-collects, clears, resets, or snapshots fallthrough-cache counters, including hit rate plus per-command, per-route, per-key, and `--since` recent-window miss counters. Set `GITCRAWL_GH_PATH` to choose the backend `gh`, and symlink or install the binary as `gh`/`gitcrawl-gh` to run the shim directly. The TUI starts at `--min-size 5` and `--sort size`, like ghcrawl's saved default, so the first screen is the useful cluster workload instead of singleton noise. Pass `--min-size 1` when you intentionally want singleton clusters. Mouse support is built in: click rows, wheel panes, and right-click for copy, sort, filter, jump, link, neighbor, local close/reopen, and member triage actions. Press `a` to open the same action menu from the keyboard, `#` to jump directly to an issue or PR number, `p` to switch between repositories already present in the local store, or `n` to load neighbors for the selected issue or PR. Enter from the members pane also loads neighbors before opening detail. The TUI quietly refreshes from the local store every 15 seconds. `gitcrawl tui` remains the reference terminal interaction model for the crawl app family: pane focus, sortable headers, mouse/right-click actions, detail rendering, and status chrome are the behavior the shared `crawlkit/tui` browser is converging on for Slack, Discord, and Notion archives. diff --git a/docs/clustering.md b/docs/clustering.md index 552a49f..121ea6b 100644 --- a/docs/clustering.md +++ b/docs/clustering.md @@ -88,7 +88,7 @@ gitcrawl cluster-explain owner/repo --id 123 # alias | `--id ` | _(required)_ | Cluster ID | | `--member-limit ` | _(no limit)_ | Maximum members to return | | `--body-chars ` | `280` | Body snippet length per member | -| `--include-closed` | _(off)_ | Include closed members | +| `--hide-closed` | _(off)_ | Hide locally closed members | `cluster-explain` is the same command — it exists so the verb reads naturally in agent prompts ("explain why these things ended up together"). diff --git a/docs/commands.md b/docs/commands.md index 3fb84bd..3c59422 100644 --- a/docs/commands.md +++ b/docs/commands.md @@ -78,7 +78,7 @@ This applies to `sync --numbers`, `threads --numbers`, `embed --number`, | `gitcrawl cluster owner/repo [--threshold --min-size --max-cluster-size --k --cross-kind-threshold --limit --model --basis --include-closed --json]` | Build durable clusters from vectors | [Clustering](/clustering/#generate-clusters) | | `gitcrawl clusters owner/repo [--sort size\|recent\|oldest --min-size --limit --hide-closed --json]` | Latest-run cluster summary, merged with closed durable rows | [Clustering](/clustering/#list-clusters) | | `gitcrawl durable-clusters owner/repo [--include-closed --sort --min-size --limit --json]` | Strict durable-cluster audit view | [Clustering](/clustering/#list-clusters) | -| `gitcrawl cluster-detail owner/repo --id [--member-limit --body-chars --include-closed --json]` | Cluster + members detail | [Clustering](/clustering/#inspect-a-cluster) | +| `gitcrawl cluster-detail owner/repo --id [--member-limit --body-chars --hide-closed --json]` | Cluster + members detail | [Clustering](/clustering/#inspect-a-cluster) | | `gitcrawl cluster-explain owner/repo --id [...]` | Alias for `cluster-detail` | [Clustering](/clustering/#inspect-a-cluster) | ## Governance diff --git a/docs/gh-shim.md b/docs/gh-shim.md index c777478..e6c6608 100644 --- a/docs/gh-shim.md +++ b/docs/gh-shim.md @@ -125,6 +125,8 @@ When a local issue or PR read misses the cache, the shim can auto-hydrate exactl This keeps `gh issue view`, `gh pr view`, `gh pr checks`, and `gh run` reads cheap and fresh without manual sync orchestration. Disable with `GITCRAWL_GH_AUTO_HYDRATE=0` if you want the shim to be strictly cache-or-fallthrough. +When the configured database comes from a portable store, auto-hydration writes to the local runtime mirror, not the Git checkout. Broad empty open-issue discovery is also guarded: if `gh issue list` or empty-query `gh search issues --state open` would return no rows but the repo only has targeted sync history, the shim falls through to the real `gh` instead of treating that incomplete local snapshot as authoritative. + ## Cache inspection: `xcache` ```bash diff --git a/docs/portable-stores.md b/docs/portable-stores.md index 2a22ea0..245bc29 100644 --- a/docs/portable-stores.md +++ b/docs/portable-stores.md @@ -56,6 +56,7 @@ Write commands (`embed`, `refresh`, `cluster`, neighbor generation) need to pers This separation means: - You can `gitcrawl embed` against a portable store without dirtying the Git checkout +- gh-shim exact-thread auto-hydration writes into the same runtime mirror - Local cluster overrides (`close-cluster`, exclusions, canonicals) live in the runtime mirror - Only the publishing workflow writes back into the portable checkout