docs: document gh cache improvements
This commit is contained in:
parent
c341231048
commit
17c09e1580
@ -52,7 +52,7 @@ gitcrawl tui owner/repo
|
||||
Pass `--numbers` to refresh exact issue or pull request rows without relying on list ordering or updated-time windows.
|
||||
Pass `--with pr-details` or `--include-pr-details` to hydrate pull request files, commits, checks, and workflow runs for local review. The `gh` shim can also auto-hydrate one exact PR on a PR-detail miss, then retry locally.
|
||||
`gitcrawl search issues|prs` accepts the common `gh search` shape (`<query> -R owner/repo --state open --json fields --limit N`) and answers from the local SQLite cache. It is intended for discovery without spending GitHub REST search quota; use `gh` for final live verification and GitHub write actions. Pass `--sync-if-stale 5m` to perform one metadata sync before the cached search when the local repository mirror is older than that duration.
|
||||
`gitcrawl gh` is a gh-compatible shim for agent workflows. It answers broad `gh search issues|prs`, `gh issue/pr list`, supported `gh issue/pr view --json` fields, hydrated `gh pr checks`, and hydrated `gh run list/view` from local SQLite, then falls through to the real GitHub CLI for unsupported commands. Local `gh issue/pr list` supports common filters such as `--author`, `--assignee`, and repeated `--label`. Read-only fallthroughs such as `gh pr diff`, `gh repo view/list`, `gh release list/view`, `gh workflow list/view`, `gh secret list`, `gh variable get/list`, `gh label list`, read-only `gh search` kinds, GET-only REST `gh api` calls, and read-only `gh api graphql` queries use a command-aware persistent cache under `cache/gh-shim`; Actions run/job logs get longer TTLs, user profile reads get a 7-day TTL, read-only GraphQL gets a 6-hour TTL, CI status reads stay short, and `gh pr diff` entries are keyed by the cached PR head SHA when available. Explicit API paths and explicit repositories share cache entries across sibling checkouts even when agents set different `GH_REPO` values; implicit repo reads stay isolated by `GH_REPO` or current working directory. Repeat read failures are cached by default so agents do not rediscover the same missing release or workflow, but rate-limit error entries expire quickly; set `GITCRAWL_GH_CACHE_ERRORS=0` to disable error caching. Mutating commands pass through, increment write counters, and clear that cache. `gh xcache stats|keys|gc|flush` inspects, garbage-collects, or clears the fallthrough cache, including per-command and per-route backend miss counters. Set `GITCRAWL_GH_PATH` to choose the backend `gh`, and symlink or install the binary as `gh`/`gitcrawl-gh` to run the shim directly.
|
||||
`gitcrawl gh` is a gh-compatible shim for agent workflows. It answers broad `gh search issues|prs`, `gh issue/pr list`, supported `gh issue/pr view --json` fields, hydrated `gh pr checks`, and hydrated `gh run list/view` from local SQLite, then falls through to the real GitHub CLI for unsupported commands. Local `gh issue/pr list` supports common filters such as `--author`, `--assignee`, and repeated `--label`. Read-only fallthroughs such as `gh pr diff`, `gh repo view/list`, `gh release list/view`, `gh workflow list/view`, `gh secret list`, `gh variable get/list`, `gh label list`, read-only `gh search` kinds, GET-only REST `gh api` calls, and read-only `gh api graphql` queries use a command-aware persistent cache under `cache/gh-shim`; Actions run/job logs get longer TTLs, completed run views are kept much longer than active CI status, user profile reads get a 7-day TTL, read-only GraphQL gets a 6-hour TTL, and `gh pr diff` entries are keyed by the cached PR head SHA when available. Explicit API paths and explicit repositories share cache entries across sibling checkouts even when agents set different `GH_REPO` values; implicit repo reads stay isolated by `GH_REPO` or current working directory. Cache keys canonicalize common flags such as `-R`/`--repo` and sorted `--json` fields so equivalent agent commands coalesce. Repeat read failures are cached by default so agents do not rediscover the same missing release or workflow, but rate-limit error entries expire quickly; if GitHub rate-limits a refresh and an expired successful entry exists, the shim serves the stale response with a warning instead of failing the read. Set `GITCRAWL_GH_CACHE_ERRORS=0` to disable error caching. Mutating commands pass through, increment write counters, and invalidate matching cache tags instead of flushing unrelated entries. `gh xcache stats|keys|gc|flush|reset` inspects, garbage-collects, clears, or resets fallthrough-cache counters, including hit rate plus per-command and per-route backend miss counters. Set `GITCRAWL_GH_PATH` to choose the backend `gh`, and symlink or install the binary as `gh`/`gitcrawl-gh` to run the shim directly.
|
||||
The TUI starts at `--min-size 5` and `--sort size`, like ghcrawl's saved default, so the first screen is the useful cluster workload instead of singleton noise. Pass `--min-size 1` when you intentionally want singleton clusters. Mouse support is built in: click rows, wheel panes, and right-click for copy, sort, filter, jump, link, neighbor, local close/reopen, and member triage actions. Press `a` to open the same action menu from the keyboard, `#` to jump directly to an issue or PR number, `p` to switch between repositories already present in the local store, or `n` to load neighbors for the selected issue or PR. Enter from the members pane also loads neighbors before opening detail. The TUI quietly refreshes from the local store every 15 seconds.
|
||||
|
||||
## Local Defaults
|
||||
|
||||
5
SPEC.md
5
SPEC.md
@ -113,17 +113,18 @@ gitcrawl gh issue list -R owner/repo --state open --search "hot loop" --json num
|
||||
gitcrawl gh pr list -R owner/repo --state open --search "manifest cache" --json number,title,url
|
||||
```
|
||||
|
||||
Unsupported commands fall through to the real GitHub CLI. Read-only fallthroughs use a command-aware persistent cache in `cache/gh-shim` for repeated agent calls (`run list/view`, `pr diff/checks/list/status/view`, `issue list/status/view`, `repo view/list`, `release list/view`, `workflow list/view`, `secret list`, `variable get/list`, `project` list/view reads, `ruleset` reads, `gist` reads, `org list`, `label list`, read-only `search` kinds, and GET-only `api`). Actions run/job logs are cached much longer than CI status reads, and `xcache stats` records backend misses by command and normalized route so remaining GitHub-heavy patterns are visible. Repeat read failures are cached by default so many agents do not rediscover the same missing release, workflow, or field, with short caps for error entries and rate-limit responses; set `GITCRAWL_GH_CACHE_ERRORS=0` to disable that behavior. Mutating commands are never cached and clear the fallthrough cache on success. The shim does not add GitHub write-back behavior of its own; writes remain delegated to `gh`.
|
||||
Unsupported commands fall through to the real GitHub CLI. Read-only fallthroughs use a command-aware persistent cache in `cache/gh-shim` for repeated agent calls (`run list/view`, `pr diff/checks/list/status/view`, `issue list/status/view`, `repo view/list`, `release list/view`, `workflow list/view`, `secret list`, `variable get/list`, `project` list/view reads, `ruleset` reads, `gist` reads, `org list`, `label list`, read-only `search` kinds, and GET-only `api`). Actions run/job logs are cached much longer than CI status reads, completed run reads receive longer TTLs, and `xcache stats` records hit rate plus backend misses by command and normalized route so remaining GitHub-heavy patterns are visible. Repeat read failures are cached by default so many agents do not rediscover the same missing release, workflow, or field, with short caps for error entries and rate-limit responses; if GitHub rate-limits a refresh and a stale successful entry exists, the stale entry is served with a warning. Set `GITCRAWL_GH_CACHE_ERRORS=0` to disable error caching. Mutating commands are never cached and invalidate matching cache-tag entries on success. Unknown mutation scope falls back to clearing the fallthrough cache. The shim does not add GitHub write-back behavior of its own; writes remain delegated to `gh`.
|
||||
|
||||
Cache inspection commands:
|
||||
|
||||
```text
|
||||
gitcrawl gh xcache stats
|
||||
gitcrawl gh xcache keys
|
||||
gitcrawl gh xcache reset
|
||||
gitcrawl gh xcache flush
|
||||
```
|
||||
|
||||
The cache key includes the resolved gitcrawl config path, current working directory, `GH_HOST`, `GH_REPO`, and exact `gh` arguments. This keeps sibling checkouts and portable stores isolated while still coalescing repeated calls from the same agent workspace. Concurrent cache misses use a lock file so one process populates the entry while peers wait for the result.
|
||||
The cache key includes the resolved gitcrawl config path, current working directory, `GH_HOST`, `GH_REPO`, stable PR-diff identity when available, and canonicalized `gh` arguments. This keeps sibling checkouts and portable stores isolated while still coalescing equivalent agent calls such as reordered flags or sorted `--json` fields. Concurrent cache misses use a lock file so one process populates the entry while peers wait for the result.
|
||||
|
||||
## Config
|
||||
|
||||
|
||||
@ -101,7 +101,7 @@ These work on every command.
|
||||
| `gitcrawl gh label list ...` | Falls through; cached briefly | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh api <GET path>` | Falls through; cached briefly (GET-only REST) | [gh shim](/gh-shim/) |
|
||||
| `gitcrawl gh api graphql -f query=...` | Falls through; read-only queries are cached | [gh shim](/gh-shim/#read-only-fallthroughs-cached) |
|
||||
| `gitcrawl gh xcache stats\|keys\|gc\|flush [--json]` | Cache inspection / housekeeping | [gh shim](/gh-shim/#cache-inspection-xcache) |
|
||||
| `gitcrawl gh xcache stats\|keys\|gc\|flush\|reset [--json]` | Cache inspection / housekeeping | [gh shim](/gh-shim/#cache-inspection-xcache) |
|
||||
| _Anything else_ | Falls through to real `gh` | [gh shim](/gh-shim/) |
|
||||
|
||||
The shim binary can be installed standalone by symlinking the `gitcrawl` binary as `gh` or `gitcrawl-gh`.
|
||||
|
||||
@ -101,20 +101,20 @@ These commands always run real `gh` but the response body is cached for the next
|
||||
|
||||
Common Actions REST reads such as run status, job lists, and logs get Actions-aware TTLs.
|
||||
|
||||
Default cache TTLs are command-aware: `gh run list` and run-status reads use `2m`; workflow, job detail, and Actions job-list reads use `5m`; search reads use `15m`; release metadata uses `30m`; GitHub user profile reads use `7d`; read-only GraphQL queries use `6h`; completed-style run/job log reads use `12h`; `gh pr diff` uses `5m` without a stable SHA and `7d` with one. Most other read-only fallthroughs use `5m` to `10m`. Override with `GITCRAWL_GH_CACHE_TTL=5m` or similar.
|
||||
Default cache TTLs are command-aware: active `gh run list` and run-status reads use `2m`; completed run views are kept for `12h`; completed run lists are kept for `30m`; workflow, job detail, and Actions job-list reads use `5m`; search reads use `15m`; release metadata uses `30m`; GitHub user profile reads use `7d`; read-only GraphQL queries use `6h`; completed-style run/job log reads use `12h`; `gh pr diff` uses `5m` without a stable SHA and `7d` with one. Most other read-only fallthroughs use `5m` to `10m`. Override with `GITCRAWL_GH_CACHE_TTL=5m` or similar.
|
||||
|
||||
Repeat read failures are cached by default too. That avoids a fleet of agents all rediscovering the same missing release, workflow, secret, or unsupported field. Error entries are capped to shorter lifetimes, and rate-limit errors are capped at `2m` so a reset is not masked all day. Set `GITCRAWL_GH_CACHE_ERRORS=0` to cache successful reads only.
|
||||
Repeat read failures are cached by default too. That avoids a fleet of agents all rediscovering the same missing release, workflow, secret, or unsupported field. Error entries are capped to shorter lifetimes, and rate-limit errors are capped at `2m` so a reset is not masked all day. If GitHub returns a rate-limit error while refreshing an expired successful entry, the shim serves that stale success with a warning instead of failing the read. Set `GITCRAWL_GH_CACHE_ERRORS=0` to cache successful reads only.
|
||||
|
||||
## Auto-hydration
|
||||
|
||||
When a local PR-detail read misses the cache, the shim can auto-hydrate exactly one PR before falling back:
|
||||
When a local issue or PR read misses the cache, the shim can auto-hydrate exactly one thread before falling back:
|
||||
|
||||
1. Shim detects missing or stale PR detail (older than 90s, or head SHA mismatch)
|
||||
2. If `GITCRAWL_GH_AUTO_HYDRATE != 0` (the default), runs `gitcrawl sync --numbers <n> --with pr-details`
|
||||
1. Shim detects a missing issue/PR row or stale PR detail (older than 90s, or head SHA mismatch)
|
||||
2. If `GITCRAWL_GH_AUTO_HYDRATE != 0` (the default), runs `gitcrawl sync --numbers <n>` and adds `--with pr-details` for PR detail reads
|
||||
3. Retries the local query against the freshly populated cache
|
||||
4. Falls through to the real `gh` if hydration failed
|
||||
|
||||
This keeps `gh pr view`, `gh pr checks`, and `gh run` reads cheap and fresh without manual sync orchestration. Disable with `GITCRAWL_GH_AUTO_HYDRATE=0` if you want the shim to be strictly cache-or-fallthrough.
|
||||
This keeps `gh issue view`, `gh pr view`, `gh pr checks`, and `gh run` reads cheap and fresh without manual sync orchestration. Disable with `GITCRAWL_GH_AUTO_HYDRATE=0` if you want the shim to be strictly cache-or-fallthrough.
|
||||
|
||||
## Cache inspection: `xcache`
|
||||
|
||||
@ -123,6 +123,7 @@ gitcrawl gh xcache stats # summary
|
||||
gitcrawl gh xcache keys # per-entry detail
|
||||
gitcrawl gh xcache gc # remove expired entries + stale lock files
|
||||
gitcrawl gh xcache flush # clear everything
|
||||
gitcrawl gh xcache reset # reset counters without deleting entries
|
||||
```
|
||||
|
||||
All accept `--json` for scripting.
|
||||
@ -136,9 +137,13 @@ All accept `--json` for scripting.
|
||||
"expired": 6,
|
||||
"locks": 0,
|
||||
"bytes": 1841234,
|
||||
"cache_hits": 629,
|
||||
"total_reads": 641,
|
||||
"hit_rate_percent": 98.1,
|
||||
"counters": {
|
||||
"local_hits": 540,
|
||||
"fallback_hits": 88,
|
||||
"stale_hits": 1,
|
||||
"backend_misses": 12,
|
||||
"pass_through_writes": 4,
|
||||
"backend_misses_by_command": {
|
||||
@ -156,26 +161,26 @@ All accept `--json` for scripting.
|
||||
}
|
||||
```
|
||||
|
||||
`local_hits` are answered from SQLite; `fallback_hits` are answered from the fallthrough cache; `backend_misses` actually hit GitHub. The per-command and per-route miss maps show which shapes still escape the cache, which is usually the fastest way to find the next optimization.
|
||||
`local_hits` are answered from SQLite; `fallback_hits` are answered from the fallthrough cache; `stale_hits` are expired successful cache entries served after a backend rate-limit response; `backend_misses` actually hit GitHub. The per-command and per-route miss maps show which shapes still escape the cache, which is usually the fastest way to find the next optimization.
|
||||
|
||||
## Cache key composition
|
||||
|
||||
Cache keys are deterministic SHA-256 hashes of:
|
||||
|
||||
- A version tag (`v3`)
|
||||
- A version tag (`v4`)
|
||||
- The resolved gitcrawl config path
|
||||
- The current working directory when the command depends on implicit repo resolution
|
||||
- The `GH_HOST` env var
|
||||
- The `GH_REPO` env var when the command relies on it for implicit repo resolution
|
||||
- An explicit-scope marker for commands that include their own API path or repository
|
||||
- For `gh pr diff`: the stable identity `pr-diff:owner/repo:number:head-sha` (when available)
|
||||
- The full command argument vector, null-separated
|
||||
- A canonicalized command argument vector, null-separated. Common equivalent forms such as `-R` vs. `--repo`, flag ordering, and `--json a,b` vs. `--json b,a` share the same cache key.
|
||||
|
||||
This isolates implicit repo reads in sibling checkouts while still coalescing explicit reads such as `gh api users/octocat`, `gh api repos/openclaw/openclaw/...`, and `gh repo view openclaw/gitcrawl` across those checkouts. Explicit reads ignore unrelated `GH_REPO` values so agents with different ambient repo settings still share cache entries when the command itself names the target. Concurrent cache misses use a lock file so one process populates the entry while peers wait for the result, instead of all of them firing at GitHub.
|
||||
|
||||
## What does not flow through the shim
|
||||
|
||||
- **Mutating commands** — `gh issue close`, `gh pr merge`, `gh pr comment`, `gh api -X POST`, etc. These pass straight through, increment `pass_through_writes`, and clear the relevant cache entries on success.
|
||||
- **Mutating commands** — `gh issue close`, `gh pr merge`, `gh pr comment`, `gh api -X POST`, etc. These pass straight through, increment `pass_through_writes`, and invalidate matching cache tags on success. Unknown mutation scope falls back to clearing all entries.
|
||||
- **Auth flows** — `gh auth login`, `gh auth refresh`, etc. Always real `gh`.
|
||||
- **Anything the shim does not recognize** — falls through unmodified.
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user