docs: improve site install and links

This commit is contained in:
Peter Steinberger 2026-05-05 04:38:46 +01:00
parent fc12f81b6a
commit 1a2f5ba6e0
No known key found for this signature in database
16 changed files with 107 additions and 96 deletions

View File

@ -23,7 +23,7 @@ gitcrawl clusters owner/repo --json --sort size --min-size 5 \
| jq '.clusters[] | {id, members: .member_count, latest: .latest_thread_number}'
```
For the full per-command JSON shapes, see the individual feature pages and the [Commands reference](./commands).
For the full per-command JSON shapes, see the individual feature pages and the [Commands reference](/commands/).
## Exit codes
@ -51,7 +51,7 @@ Best for ad-hoc agent tools that should bound staleness but minimize sync calls.
### Auto-hydration via the gh shim
Symlink the gitcrawl binary as `gh` (or `gitcrawl-gh`) and let the shim pull a single PR's detail when an agent calls `gh pr view` or `gh pr checks` against an unhydrated PR. See [gh shim → auto-hydration](./gh-shim#auto-hydration).
Symlink the gitcrawl binary as `gh` (or `gitcrawl-gh`) and let the shim pull a single PR's detail when an agent calls `gh pr view` or `gh pr checks` against an unhydrated PR. See [gh shim → auto-hydration](/gh-shim/#auto-hydration).
This is the lowest-overhead pattern for fleets of agents — no scheduling required.
@ -61,7 +61,7 @@ Run `gitcrawl refresh owner/repo` on a cron, systemd timer, or `launchd` agent e
```cron
# Every 5 minutes, refresh the active repos.
*/5 * * * * /usr/local/bin/gitcrawl refresh openclaw/gitcrawl --json > /tmp/gitcrawl.openclaw.json 2>&1
*/5 * * * * $HOME/bin/gitcrawl refresh openclaw/gitcrawl --json > /tmp/gitcrawl.openclaw.json 2>&1
```
For multiple repos, loop in a small shell script — gitcrawl is happy to run sequentially against a shared SQLite file.

View File

@ -142,7 +142,7 @@ Or slice it manually:
gitcrawl exclude-cluster-member owner/repo --id 12 --number 456 --reason "different repro"
```
See [Governance](./governance) for the full override workflow.
See [Governance](/governance/) for the full override workflow.
## Re-clustering and stable IDs
@ -155,6 +155,6 @@ Cluster runs are recorded in `run_records` and visible via `gitcrawl runs --kind
## See also
- [Governance](./governance) — close clusters, exclude members, set canonical
- [TUI](./tui) — the interactive cluster browser
- [Concepts](./concepts#cluster) — durable clusters and cluster kinds
- [Governance](/governance/) — close clusters, exclude members, set canonical
- [TUI](/tui/) — the interactive cluster browser
- [Concepts](/concepts/#cluster) — durable clusters and cluster kinds

View File

@ -30,78 +30,78 @@ These work on every command.
| Command | Purpose | Detailed docs |
| --- | --- | --- |
| `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](./installation), [Portable stores](./portable-stores) |
| `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](./configuration#gitcrawl-doctor) |
| `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](./configuration#gitcrawl-configure) |
| `gitcrawl init [--db --portable-store --portable-db --store-dir --json]` | Create config, database, runtime directories; optionally clone a portable store | [Installation](/installation/), [Portable stores](/portable-stores/) |
| `gitcrawl doctor [--json]` | Health check for config, database, credentials, model selection, repo/thread counts | [Configuration](/configuration/#gitcrawl-doctor) |
| `gitcrawl configure [--summary-model --embed-model --embedding-basis --json]` | Update model fields in `config.toml` | [Configuration](/configuration/#gitcrawl-configure) |
| `gitcrawl version` | Print version | — |
## Sync
| Command | Purpose | Docs |
| --- | --- | --- |
| `gitcrawl sync owner/repo [--state --since --numbers --limit --include-comments --include-pr-details --with pr-details --json]` | Sync issues and PRs from GitHub into local SQLite | [Sync](./sync) |
| `gitcrawl refresh owner/repo [--no-sync --no-embed --no-cluster ...]` | Wrapper that runs sync → embed → cluster | [Refresh and embed](./refresh-and-embed) |
| `gitcrawl embed owner/repo [--number --limit --force --include-closed --json]` | Generate OpenAI embeddings for thread documents | [Refresh and embed](./refresh-and-embed#embed) |
| `gitcrawl runs owner/repo [--kind sync\|embedding\|cluster --limit --json]` | List recorded run history | [Refresh and embed](./refresh-and-embed#runs) |
| `gitcrawl sync owner/repo [--state --since --numbers --limit --include-comments --include-pr-details --with pr-details --json]` | Sync issues and PRs from GitHub into local SQLite | [Sync](/sync/) |
| `gitcrawl refresh owner/repo [--no-sync --no-embed --no-cluster ...]` | Wrapper that runs sync → embed → cluster | [Refresh and embed](/refresh-and-embed/) |
| `gitcrawl embed owner/repo [--number --limit --force --include-closed --json]` | Generate OpenAI embeddings for thread documents | [Refresh and embed](/refresh-and-embed/#embed) |
| `gitcrawl runs owner/repo [--kind sync\|embedding\|cluster --limit --json]` | List recorded run history | [Refresh and embed](/refresh-and-embed/#runs) |
## Inspect
| Command | Purpose | Docs |
| --- | --- | --- |
| `gitcrawl threads owner/repo [--include-closed --numbers --limit --json]` | List threads from local cache | — |
| `gitcrawl search owner/repo --query <text> [--mode keyword\|semantic\|hybrid --limit --json]` | Local search (direct mode) | [Search](./search) |
| `gitcrawl search issues\|prs <query> -R owner/repo [--state --json --limit --sync-if-stale]` | Local search (`gh search` shape) | [Search](./search#gh-search-compatibility-mode) |
| `gitcrawl neighbors owner/repo --number <n> [--limit --threshold --json]` | Vector-similar threads to a specific issue/PR | [Clustering](./clustering#find-similar-threads-neighbors) |
| `gitcrawl search owner/repo --query <text> [--mode keyword\|semantic\|hybrid --limit --json]` | Local search (direct mode) | [Search](/search/) |
| `gitcrawl search issues\|prs <query> -R owner/repo [--state --json --limit --sync-if-stale]` | Local search (`gh search` shape) | [Search](/search/#gh-search-compatibility-mode) |
| `gitcrawl neighbors owner/repo --number <n> [--limit --threshold --json]` | Vector-similar threads to a specific issue/PR | [Clustering](/clustering/#find-similar-threads-neighbors) |
## Cluster
| Command | Purpose | Docs |
| --- | --- | --- |
| `gitcrawl cluster owner/repo [--threshold --min-size --max-cluster-size --k --cross-kind-threshold --limit --model --basis --include-closed --json]` | Build durable clusters from vectors | [Clustering](./clustering#generate-clusters) |
| `gitcrawl clusters owner/repo [--sort size\|recent\|oldest --min-size --limit --hide-closed --json]` | Latest-run cluster summary, merged with closed durable rows | [Clustering](./clustering#list-clusters) |
| `gitcrawl durable-clusters owner/repo [--include-closed --sort --min-size --limit --json]` | Strict durable-cluster audit view | [Clustering](./clustering#list-clusters) |
| `gitcrawl cluster-detail owner/repo --id <n> [--member-limit --body-chars --include-closed --json]` | Cluster + members detail | [Clustering](./clustering#inspect-a-cluster) |
| `gitcrawl cluster-explain owner/repo --id <n> [...]` | Alias for `cluster-detail` | [Clustering](./clustering#inspect-a-cluster) |
| `gitcrawl cluster owner/repo [--threshold --min-size --max-cluster-size --k --cross-kind-threshold --limit --model --basis --include-closed --json]` | Build durable clusters from vectors | [Clustering](/clustering/#generate-clusters) |
| `gitcrawl clusters owner/repo [--sort size\|recent\|oldest --min-size --limit --hide-closed --json]` | Latest-run cluster summary, merged with closed durable rows | [Clustering](/clustering/#list-clusters) |
| `gitcrawl durable-clusters owner/repo [--include-closed --sort --min-size --limit --json]` | Strict durable-cluster audit view | [Clustering](/clustering/#list-clusters) |
| `gitcrawl cluster-detail owner/repo --id <n> [--member-limit --body-chars --include-closed --json]` | Cluster + members detail | [Clustering](/clustering/#inspect-a-cluster) |
| `gitcrawl cluster-explain owner/repo --id <n> [...]` | Alias for `cluster-detail` | [Clustering](/clustering/#inspect-a-cluster) |
## Governance
| Command | Purpose | Docs |
| --- | --- | --- |
| `gitcrawl close-thread owner/repo --number <n> [--reason --json]` | Local close on a thread | [Governance](./governance#local-close) |
| `gitcrawl close-thread owner/repo --number <n> [--reason --json]` | Local close on a thread | [Governance](/governance/#local-close) |
| `gitcrawl reopen-thread owner/repo --number <n> [--json]` | Inverse | — |
| `gitcrawl close-cluster owner/repo --id <n> [--reason --json]` | Local close on a cluster | [Governance](./governance#local-close) |
| `gitcrawl close-cluster owner/repo --id <n> [--reason --json]` | Local close on a cluster | [Governance](/governance/#local-close) |
| `gitcrawl reopen-cluster owner/repo --id <n> [--json]` | Inverse | — |
| `gitcrawl exclude-cluster-member owner/repo --id <n> --number <m> [--reason --json]` | Pull a thread out of a cluster | [Governance](./governance#member-exclusion) |
| `gitcrawl exclude-cluster-member owner/repo --id <n> --number <m> [--reason --json]` | Pull a thread out of a cluster | [Governance](/governance/#member-exclusion) |
| `gitcrawl include-cluster-member owner/repo --id <n> --number <m> [--reason --json]` | Inverse | — |
| `gitcrawl set-cluster-canonical owner/repo --id <n> --number <m> [--reason --json]` | Pin canonical thread for a cluster | [Governance](./governance#canonical-member) |
| `gitcrawl set-cluster-canonical owner/repo --id <n> --number <m> [--reason --json]` | Pin canonical thread for a cluster | [Governance](/governance/#canonical-member) |
## TUI
| Command | Purpose | Docs |
| --- | --- | --- |
| `gitcrawl tui [owner/repo] [--min-size --sort --limit --hide-closed --json]` | Interactive cluster browser; `--json` emits a snapshot instead of launching the UI | [TUI](./tui) |
| `gitcrawl tui [owner/repo] [--min-size --sort --limit --hide-closed --json]` | Interactive cluster browser; `--json` emits a snapshot instead of launching the UI | [TUI](/tui/) |
## gh shim
| Command | Purpose | Docs |
| --- | --- | --- |
| `gitcrawl gh search issues\|prs <query> -R owner/repo [...]` | Local-first `gh search` | [gh shim](./gh-shim) |
| `gitcrawl gh issue view <n> -R owner/repo --json <fields>` | Local-first thread view | [gh shim](./gh-shim) |
| `gitcrawl gh pr view <n> -R owner/repo --json <fields>` | Same, for PRs (with auto-hydration) | [gh shim](./gh-shim) |
| `gitcrawl gh issue list -R owner/repo [--state --search --author --assignee --label --json]` | Local-first list | [gh shim](./gh-shim) |
| `gitcrawl gh pr list -R owner/repo [...]` | Same, for PRs | [gh shim](./gh-shim) |
| `gitcrawl gh pr checks <n> -R owner/repo --json <fields>` | Cached PR checks (auto-hydrates if stale) | [gh shim](./gh-shim) |
| `gitcrawl gh pr diff <n> -R owner/repo` | Falls through; cached by head SHA | [gh shim](./gh-shim) |
| `gitcrawl gh run list -R owner/repo [--branch --commit --json]` | Cached workflow runs | [gh shim](./gh-shim) |
| `gitcrawl gh run view <run-id> -R owner/repo [--json]` | Same, single run | [gh shim](./gh-shim) |
| `gitcrawl gh repo view\|list ...` | Falls through; cached briefly | [gh shim](./gh-shim) |
| `gitcrawl gh release list\|view ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) |
| `gitcrawl gh workflow list\|view ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) |
| `gitcrawl gh secret list ...` / `variable get\|list ...` | Falls through; cached briefly | [gh shim](./gh-shim#read-only-fallthroughs-cached) |
| `gitcrawl gh label list ...` | Falls through; cached briefly | [gh shim](./gh-shim) |
| `gitcrawl gh api <GET path>` | Falls through; cached briefly (GET-only) | [gh shim](./gh-shim) |
| `gitcrawl gh xcache stats\|keys\|gc\|flush [--json]` | Cache inspection / housekeeping | [gh shim](./gh-shim#cache-inspection-xcache) |
| _Anything else_ | Falls through to real `gh` | [gh shim](./gh-shim) |
| `gitcrawl gh search issues\|prs <query> -R owner/repo [...]` | Local-first `gh search` | [gh shim](/gh-shim/) |
| `gitcrawl gh issue view <n> -R owner/repo --json <fields>` | Local-first thread view | [gh shim](/gh-shim/) |
| `gitcrawl gh pr view <n> -R owner/repo --json <fields>` | Same, for PRs (with auto-hydration) | [gh shim](/gh-shim/) |
| `gitcrawl gh issue list -R owner/repo [--state --search --author --assignee --label --json]` | Local-first list | [gh shim](/gh-shim/) |
| `gitcrawl gh pr list -R owner/repo [...]` | Same, for PRs | [gh shim](/gh-shim/) |
| `gitcrawl gh pr checks <n> -R owner/repo --json <fields>` | Cached PR checks (auto-hydrates if stale) | [gh shim](/gh-shim/) |
| `gitcrawl gh pr diff <n> -R owner/repo` | Falls through; cached by head SHA | [gh shim](/gh-shim/) |
| `gitcrawl gh run list -R owner/repo [--branch --commit --json]` | Cached workflow runs | [gh shim](/gh-shim/) |
| `gitcrawl gh run view <run-id> -R owner/repo [--json]` | Same, single run | [gh shim](/gh-shim/) |
| `gitcrawl gh repo view\|list ...` | Falls through; cached briefly | [gh shim](/gh-shim/) |
| `gitcrawl gh release list\|view ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) |
| `gitcrawl gh workflow list\|view ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) |
| `gitcrawl gh secret list ...` / `variable get\|list ...` | Falls through; cached briefly | [gh shim](/gh-shim/#read-only-fallthroughs-cached) |
| `gitcrawl gh label list ...` | Falls through; cached briefly | [gh shim](/gh-shim/) |
| `gitcrawl gh api <GET path>` | Falls through; cached briefly (GET-only) | [gh shim](/gh-shim/) |
| `gitcrawl gh xcache stats\|keys\|gc\|flush [--json]` | Cache inspection / housekeeping | [gh shim](/gh-shim/#cache-inspection-xcache) |
| _Anything else_ | Falls through to real `gh` | [gh shim](/gh-shim/) |
The shim binary can be installed standalone by symlinking the `gitcrawl` binary as `gh` or `gitcrawl-gh`.
@ -109,7 +109,7 @@ The shim binary can be installed standalone by symlinking the `gitcrawl` binary
| Command | Purpose | Docs |
| --- | --- | --- |
| `gitcrawl portable prune [--body-chars --no-vacuum --json]` | Truncate thread bodies and (optionally) `VACUUM` for a small publishable database | [Portable stores](./portable-stores#publishing-gitcrawl-portable-prune) |
| `gitcrawl portable prune [--body-chars --no-vacuum --json]` | Truncate thread bodies and (optionally) `VACUUM` for a small publishable database | [Portable stores](/portable-stores/#publishing-gitcrawl-portable-prune) |
## Not yet implemented

View File

@ -17,7 +17,7 @@ The handful of nouns gitcrawl uses, and how they connect.
A **repository** is the `owner/repo` you sync. Every gitcrawl command takes one, and most state in SQLite is keyed by it. You can mirror as many repos as you like into a single `gitcrawl.db`; commands always scope to the one you name.
The mirror is metadata-first: titles, bodies, authors, labels, state, timestamps, and IDs land in SQLite immediately. Comments, reviews, review comments, and full PR detail (files, commits, checks, workflow runs) are opt-in on a per-sync basis (see [Sync](./sync)).
The mirror is metadata-first: titles, bodies, authors, labels, state, timestamps, and IDs land in SQLite immediately. Comments, reviews, review comments, and full PR detail (files, commits, checks, workflow runs) are opt-in on a per-sync basis (see [Sync](/sync/)).
## Thread
@ -78,7 +78,7 @@ Per-cluster maintainer overrides let you correct what the algorithm produced wit
- **Member exclusion** (`exclude-cluster-member`/`include-cluster-member`) — pulls a specific thread out of a cluster and remembers why.
- **Canonical member** (`set-cluster-canonical`) — pins which thread represents the cluster.
See [Governance](./governance) for the full workflow.
See [Governance](/governance/) for the full workflow.
## Run
@ -88,7 +88,7 @@ Every sync, embed, and cluster operation records a **run** in `run_records` with
A **portable store** is a Git-backed publish target for a `gitcrawl.db` plus its derived bodies, designed for sharing a local cache across agents or machines without a hosted service.
`gitcrawl init --portable-store https://github.com/org/repo` clones a portable store into `~/.config/gitcrawl/portable/`, points the runtime at it, and `gitcrawl portable prune --body-chars 256` keeps the published payload small. Read-only commands run against portable stores refresh the checkout before reading. See [Portable stores](./portable-stores).
`gitcrawl init --portable-store https://github.com/org/repo` clones a portable store into `~/.config/gitcrawl/portable/`, points the runtime at it, and `gitcrawl portable prune --body-chars 256` keeps the published payload small. Read-only commands run against portable stores refresh the checkout before reading. See [Portable stores](/portable-stores/).
## Cache
@ -97,4 +97,4 @@ The `cache/` directory under `~/.config/gitcrawl/` holds:
- `cache/gh-shim/` — the short-lived fallthrough cache for the `gh` shim, keyed by config path, CWD, `GH_HOST`, `GH_REPO`, and command args. Inspect or clean it with `gitcrawl gh xcache stats|keys|gc|flush`.
- `cache/pr/` — hydrated PR detail blobs used to answer `gh pr view`, `gh pr checks`, and `gh run` reads from local SQLite.
See [gh shim](./gh-shim) for the cache key composition and TTL behavior.
See [gh shim](/gh-shim/) for the cache key composition and TTL behavior.

View File

@ -47,8 +47,8 @@ embed_dimensions = 1024
embedding_basis = "title_original"
[env]
GITHUB_TOKEN = "ghp_xxx"
OPENAI_API_KEY = "sk-xxx"
GITHUB_TOKEN = "<github-token>"
OPENAI_API_KEY = "<openai-api-key>"
[portable_store]
url = "https://github.com/org/portable-store.git"
@ -102,6 +102,7 @@ checkout_dir = "/Users/me/.config/gitcrawl/portable"
| `GITCRAWL_GH_PATH` | Path to the real `gh` binary used for fallthrough |
| `GITCRAWL_GH_AUTO_HYDRATE` | Set to `0` to disable PR auto-hydration on cache miss |
| `GITCRAWL_GH_CACHE_TTL` | Override fallthrough cache TTL (e.g., `5m`, `1h`) |
| `GITCRAWL_GH_CACHE_ERRORS` | Set to `0` to avoid caching non-zero read-only fallthroughs |
If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew install paths and then your `PATH`. Set it explicitly when you symlink the gitcrawl binary as `gh` (otherwise the shim will recurse into itself).

View File

@ -23,14 +23,16 @@ The shim never adds GitHub write behavior. Mutating commands (`gh issue close`,
```bash
# Side-by-side: agents opt in by calling `gitcrawl-gh`.
ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh
mkdir -p "$HOME/bin"
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh"
# Or replace the global `gh` so every caller picks up the cache automatically.
ln -s "$(command -v gitcrawl)" /usr/local/bin/gh
export GITCRAWL_GH_PATH=/opt/homebrew/bin/gh # tell the shim where the real gh is
REAL_GH="$(command -v gh)" # capture this before shadowing gh
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gh"
export GITCRAWL_GH_PATH="$REAL_GH" # tell the shim where the real gh is
```
If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew paths and then `PATH`. Set it explicitly when you replace the global `gh` so the shim does not recurse into itself.
Make sure `~/bin` is on `PATH` before the original `gh` location if you want the shim to be picked up as `gh`. If `GITCRAWL_GH_PATH` is unset, the shim probes common Homebrew paths and then `PATH`. Set it explicitly when you replace the global `gh` so the shim does not recurse into itself.
## Supported local reads
@ -138,8 +140,8 @@ All accept `--json` for scripting.
"pass_through_writes": 4
},
"commands": {
"gh pr view": { "entries": 30, "bytes": 184320 },
"gh search issues": { "entries": 14, "bytes": 18230 }
"pr diff": { "entries": 30, "bytes": 184320 },
"release view": { "entries": 14, "bytes": 18230 }
}
}
```
@ -172,4 +174,4 @@ Pattern: replace `gh` with `gitcrawl-gh` (or symlink to `gh`) for every agent in
For best results, schedule a periodic `gitcrawl refresh owner/repo` (every few minutes per repo, depending on activity) so the local mirror stays warm. The shim's `--sync-if-stale` (via `gitcrawl search`) and auto-hydration handle the rest.
See [Automation](./automation) for full agent recipes and JSON contracts.
See [Automation](/automation/) for full agent recipes and JSON contracts.

View File

@ -127,4 +127,4 @@ The thread stays open on GitHub; only your local triage view hides it.
- It does not edit, label, comment on, or close GitHub issues. Use `gh` for that.
- It does not retrain embeddings or reshape the underlying graph — it overlays decisions on top of the algorithm output.
- It does not propagate to other gitcrawl installations unless you publish your database via a [portable store](./portable-stores).
- It does not propagate to other gitcrawl installations unless you publish your database via a [portable store](/portable-stores/).

View File

@ -12,7 +12,7 @@ permalink: /
A local-first GitHub issue and pull request crawler for maintainer triage. Sync, search, cluster, and review related threads from a SQLite cache that lives entirely on your machine.
{: .fs-6 .fw-300 }
[Quickstart](./quickstart){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 }
[Quickstart](/quickstart/){: .btn .btn-primary .fs-5 .mb-4 .mb-md-0 .mr-2 }
[View on GitHub](https://github.com/openclaw/gitcrawl){: .btn .fs-5 .mb-4 .mb-md-0 }
---
@ -34,16 +34,16 @@ A local-first GitHub issue and pull request crawler for maintainer triage. Sync,
<div class="code-example" markdown="1">
### I want to try it
[Quickstart](./quickstart) walks you from `git clone` to a populated cluster view in five minutes.
[Quickstart](/quickstart/) walks you from `git clone` to a populated cluster view in five minutes.
### I want to wire up an agent
The [`gh` shim](./gh-shim) is the fastest way to cut GitHub API load — point your agent at `gitcrawl-gh`, keep the agent's `gh` calls intact.
The [`gh` shim](/gh-shim/) is the fastest way to cut GitHub API load — point your agent at `gitcrawl-gh`, keep the agent's `gh` calls intact.
### I want to triage a busy repo
Read [Sync](./sync) to bring data local, then [Clustering](./clustering) and the [TUI](./tui) for the maintainer workflow.
Read [Sync](/sync/) to bring data local, then [Clustering](/clustering/) and the [TUI](/tui/) for the maintainer workflow.
### I want the full reference
[Commands](./commands) lists every flag and JSON field. [Configuration](./configuration) covers env vars and paths.
[Commands](/commands/) lists every flag and JSON field. [Configuration](/configuration/) covers env vars and paths.
</div>

View File

@ -26,13 +26,16 @@ Each tagged release publishes archives for `darwin_amd64`, `darwin_arm64`, `linu
```bash
# Replace VERSION and PLATFORM with the values you want.
curl -L "https://github.com/openclaw/gitcrawl/releases/download/v0.1.2/gitcrawl_0.1.2_darwin_arm64.tar.gz" \
| tar -xz -C /usr/local/bin gitcrawl
VERSION=v0.1.2
PLATFORM=darwin_arm64
mkdir -p "$HOME/bin"
curl -L "https://github.com/openclaw/gitcrawl/releases/download/${VERSION}/gitcrawl_${VERSION#v}_${PLATFORM}.tar.gz" \
| tar -xz -C "$HOME/bin" gitcrawl
gitcrawl --version
```
Browse the [releases page](https://github.com/openclaw/gitcrawl/releases) for the latest tag and the full asset list.
Browse the [releases page](https://github.com/openclaw/gitcrawl/releases) for the latest tag and the full asset list. Use a directory that is already on your `PATH`; `~/bin` and `~/.local/bin` avoid needing elevated permissions.
## Install from source
@ -54,14 +57,16 @@ The shim is the same binary. Symlink it as `gh` (replacing the real CLI) or as `
```bash
# Side-by-side install — agents can opt in by calling `gitcrawl-gh`.
ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh
mkdir -p "$HOME/bin"
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh"
# Or replace the global `gh` so every agent picks up the cache automatically.
ln -s "$(command -v gitcrawl)" /usr/local/bin/gh
export GITCRAWL_GH_PATH="$(command -v /opt/homebrew/bin/gh)" # point shim at the real gh
REAL_GH="$(command -v gh)" # capture this before shadowing gh
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gh"
export GITCRAWL_GH_PATH="$REAL_GH" # point shim at the real gh
```
When invoked as `gh` or `gitcrawl-gh`, the binary auto-detects shim mode. See [the gh shim guide](./gh-shim) for details.
When invoked as `gh` or `gitcrawl-gh`, the binary auto-detects shim mode. See [the gh shim guide](/gh-shim/) for details.
## Verify the install

View File

@ -108,5 +108,5 @@ Other agents and machines pull the new commit on their next read-only command.
## See also
- [Sync](./sync) — what gets written into the database that ends up in the portable store
- [gh shim](./gh-shim) — agents reading a shared portable store benefit doubly from the shim's local-first answers
- [Sync](/sync/) — what gets written into the database that ends up in the portable store
- [gh shim](/gh-shim/) — agents reading a shared portable store benefit doubly from the shim's local-first answers

View File

@ -19,7 +19,8 @@ Five minutes from clean machine to a populated cluster view.
# Build (or download a release archive — see Installation).
git clone https://github.com/openclaw/gitcrawl.git
cd gitcrawl
go build -o /usr/local/bin/gitcrawl ./cmd/gitcrawl
mkdir -p "$HOME/bin"
go build -o "$HOME/bin/gitcrawl" ./cmd/gitcrawl
# Create config + database under ~/.config/gitcrawl.
gitcrawl init
@ -36,16 +37,16 @@ Defaults written:
## 2. Set credentials
```bash
export GITHUB_TOKEN=ghp_xxx # required for sync
export OPENAI_API_KEY=sk-xxx # required for embeddings
export GITHUB_TOKEN="<github-token>" # required for sync
export OPENAI_API_KEY="<openai-api-key>" # required for embeddings
```
Either set them in your shell profile or store them in `~/.config/gitcrawl/config.toml`:
```toml
[env]
GITHUB_TOKEN = "ghp_xxx"
OPENAI_API_KEY = "sk-xxx"
GITHUB_TOKEN = "<github-token>"
OPENAI_API_KEY = "<openai-api-key>"
```
`gitcrawl doctor` confirms the credentials are visible and reports their source.
@ -72,7 +73,7 @@ The `refresh` command runs sync → embed → cluster end to end:
gitcrawl refresh openclaw/gitcrawl
```
You can run the stages individually if you want finer control — see [Refresh and embed](./refresh-and-embed) and [Clustering](./clustering).
You can run the stages individually if you want finer control — see [Refresh and embed](/refresh-and-embed/) and [Clustering](/clustering/).
## 5. Browse clusters
@ -116,17 +117,18 @@ Add `--sync-if-stale 5m` to refresh the local mirror first when it is older than
## 7. Wire up the `gh` shim (optional)
```bash
ln -s "$(command -v gitcrawl)" /usr/local/bin/gitcrawl-gh
mkdir -p "$HOME/bin"
ln -sf "$(command -v gitcrawl)" "$HOME/bin/gitcrawl-gh"
gitcrawl-gh search issues "download stalls" -R openclaw/gitcrawl --json number,title,url
gitcrawl-gh pr view 123 -R openclaw/gitcrawl --json number,title,state,url
gitcrawl-gh xcache stats
```
Most read-only `gh` calls answer locally, mutating commands pass straight through to the real `gh`. See [gh shim](./gh-shim) for the full surface.
Most read-only `gh` calls answer locally, mutating commands pass straight through to the real `gh`. See [gh shim](/gh-shim/) for the full surface.
## Where to next
- [Concepts](./concepts) — what threads, durable clusters, and embeddings actually mean
- [Sync](./sync) — every flag for hydrating the local store
- [Clustering](./clustering) — tuning the cluster graph for a specific repo
- [Automation](./automation) — JSON contracts for agents and scripts
- [Concepts](/concepts/) — what threads, durable clusters, and embeddings actually mean
- [Sync](/sync/) — every flag for hydrating the local store
- [Clustering](/clustering/) — tuning the cluster graph for a specific repo
- [Automation](/automation/) — JSON contracts for agents and scripts

View File

@ -62,6 +62,7 @@ Override the config root with `--config <path>` or `GITCRAWL_CONFIG`.
| `GITCRAWL_GH_PATH` | _(probed)_ | Path to the real `gh` binary |
| `GITCRAWL_GH_AUTO_HYDRATE` | _(on)_ | Set `0` to disable PR auto-hydration on cache miss |
| `GITCRAWL_GH_CACHE_TTL` | `30s` for most commands | Override fallthrough cache TTL (e.g., `5m`, `1h`) |
| `GITCRAWL_GH_CACHE_ERRORS` | _(on)_ | Set `0` to avoid caching non-zero read-only fallthroughs |
## Configuration defaults
@ -158,7 +159,7 @@ stderr always carries error messages. stdout is reserved for command output.
## See also
- [Configuration](./configuration) — narrative version of this reference
- [Commands](./commands) — every command and flag, in one table
- [Configuration](/configuration/) — narrative version of this reference
- [Commands](/commands/) — every command and flag, in one table
- [SPEC.md](https://github.com/openclaw/gitcrawl/blob/main/SPEC.md) — product contract
- [CHANGELOG.md](https://github.com/openclaw/gitcrawl/blob/main/CHANGELOG.md) — what shipped recently

View File

@ -21,7 +21,7 @@ gitcrawl refresh owner/repo
By default this performs:
1. **Sync** — open + recently closed issues and PRs (see [Sync](./sync))
1. **Sync** — open + recently closed issues and PRs (see [Sync](/sync/))
2. **Embed** — fill `thread_vectors` for any thread whose document changed
3. **Cluster** — rebuild durable clusters with the standard thresholds
@ -127,4 +127,4 @@ Each row carries `started_at`, `finished_at`, `status`, and `stats_json` — use
- **GitHub.** Sync uses standard REST endpoints; the API quota is the dominant cost on busy repos. Use `--include-comments` and `--with pr-details` selectively.
- **OpenAI.** `text-embedding-3-small` is inexpensive but not free. `embed` is bounded by `--limit` if you want to stay under a budget on initial backfills.
- **Disk.** Vectors and PR detail blobs grow with the repo. The portable-store flow includes `gitcrawl portable prune` to keep published payloads small — see [Portable stores](./portable-stores).
- **Disk.** Vectors and PR detail blobs grow with the repo. The portable-store flow includes `gitcrawl portable prune` to keep published payloads small — see [Portable stores](/portable-stores/).

View File

@ -100,7 +100,7 @@ There are two ways to run cached searches:
| `gitcrawl search issues|prs ...` | Human use; mixes naturally with the rest of the gitcrawl CLI |
| `gitcrawl gh search issues|prs ...` | Agents and scripts that call `gh` directly — symlinked as `gh` or `gitcrawl-gh` it is invisible to callers |
Both paths share the same local cache and produce gh-shaped JSON. The shim adds the additional `gh issue/pr view`, `gh issue/pr list`, `gh pr checks`, `gh run`, and `xcache` surface — see [gh shim](./gh-shim).
Both paths share the same local cache and produce gh-shaped JSON. The shim adds the additional `gh issue/pr view`, `gh issue/pr list`, `gh pr checks`, `gh run`, and `xcache` surface — see [gh shim](/gh-shim/).
## Combining with sync

View File

@ -58,7 +58,7 @@ gitcrawl sync owner/repo --numbers 123,456 --include-comments
`--numbers` is the safest way to refresh specific issues or PRs — it bypasses list ordering and the updated-time window, fetching exactly the rows you ask for. Pair it with `--include-comments` and/or `--include-pr-details` to hydrate the conversation and PR-only data at the same time.
This is also what the `gh` shim uses internally for [auto-hydration](./gh-shim#auto-hydration).
This is also what the `gh` shim uses internally for [auto-hydration](/gh-shim/#auto-hydration).
## Hydration depth
@ -68,7 +68,7 @@ This is also what the `gh` shim uses internally for [auto-hydration](./gh-shim#a
| `--include-pr-details` | PR files, commits, status checks, workflow runs |
| `--with pr-details` | Same as `--include-pr-details` (gh-style flag) |
PR details land in `pr_files`, `pr_commits`, `pr_checks`, and `pr_runs` tables and back the `gh pr view`, `gh pr checks`, and `gh run list/view` shim paths. See [gh shim](./gh-shim).
PR details land in `pr_files`, `pr_commits`, `pr_checks`, and `pr_runs` tables and back the `gh pr view`, `gh pr checks`, and `gh run list/view` shim paths. See [gh shim](/gh-shim/).
`--include-code` is accepted for compatibility but is currently a no-op.
@ -150,6 +150,6 @@ gitcrawl sync owner/repo --numbers "$NUMS" --with pr-details
## See also
- [Refresh and embed](./refresh-and-embed) — the wrapper that runs sync, embed, and cluster end to end
- [gh shim](./gh-shim) — how synced PR details power `gh pr view` / `gh pr checks` / `gh run` from local cache
- [Portable stores](./portable-stores) — sharing the synced cache across machines
- [Refresh and embed](/refresh-and-embed/) — the wrapper that runs sync, embed, and cluster end to end
- [gh shim](/gh-shim/) — how synced PR details power `gh pr view` / `gh pr checks` / `gh run` from local cache
- [Portable stores](/portable-stores/) — sharing the synced cache across machines

View File

@ -91,7 +91,7 @@ Member actions:
- Local close / reopen this thread
- Exclude from cluster
These map directly onto the [governance](./governance) commands. Anything you can do interactively, you can also script.
These map directly onto the [governance](/governance/) commands. Anything you can do interactively, you can also script.
## Display rules